Huffman Compress Implementation in C#
Huffman coding is a popular algorithm used for lossless data compression. In this blog post, we will delve into the details of Huffman coding and implement a basic version in C# without relying on third-party libraries. This step-by-step guide will help you comprehend the inner workings of Huffman coding and its application in data compression.
Huffman Coding Overview
Huffman coding works by assigning variable-length codes to input characters based on their frequencies in the input data. Characters with higher frequencies get shorter codes, resulting in efficient data compression.
Huffman Coding Algorithm Explained
-
Build Frequency Map:
Create a dictionary to store the frequency of each character in the input data. -
Build Priority Queue:
Use a priority queue to organize the characters based on their frequencies, with the lowest frequency having the highest priority. -
Build Huffman Tree:
Construct a Huffman tree by repeatedly combining the two nodes with the lowest frequencies until only one node remains. -
Build Encoding Map:
Traverse the Huffman tree to assign variable-length codes to each character, creating an encoding map. -
Compress Input:
Encode the input data using the generated Huffman encoding map, resulting in a compressed output. -
Decompress Output:
Decode the compressed output using the Huffman encoding map, recovering the original input data.
C# Implementation of huffman encoding
Below is a simplified implementation in C#:
using System;
using System.Collections.Generic;
using System.Linq;
class Node : IComparable<Node>
{
public char? Symbol { get; set; }
public int Frequency { get; set; }
public Node Left { get; set; }
public Node Right { get; set; }
public Node(char symbol, int frequency)
{
Symbol = symbol;
Frequency = frequency;
}
public Node(int frequency, Node left, Node right)
{
Frequency = frequency;
Left = left;
Right = right;
}
public int CompareTo(Node other)
{
return Frequency.CompareTo(other.Frequency);
}
}
class HuffmanCoding
{
public static Dictionary<char, string> Encode(string input)
{
Dictionary<char, int> frequencyMap = BuildFrequencyMap(input);
PriorityQueue<Node> priorityQueue = BuildPriorityQueue(frequencyMap);
while (priorityQueue.Count > 1)
{
Node left = priorityQueue.Dequeue();
Node right = priorityQueue.Dequeue();
Node parent = new Node(left.Frequency + right.Frequency, left, right);
priorityQueue.Enqueue(parent);
}
Node root = priorityQueue.Dequeue();
Dictionary<char, string> encodingMap = BuildEncodingMap(root);
return encodingMap;
}
public static string Compress(string input, Dictionary<char, string> encodingMap)
{
return string.Concat(input.Select(c => encodingMap[c]));
}
public static string Decompress(string input, Dictionary<char, string> encodingMap)
{
Dictionary<string, char> decodingMap = encodingMap.ToDictionary(pair => pair.Value, pair => pair.Key);
string decoded = "";
string currentCode = "";
foreach (char bit in input)
{
currentCode += bit;
if (decodingMap.ContainsKey(currentCode))
{
decoded += decodingMap[currentCode];
currentCode = "";
}
}
return decoded;
}
private static Dictionary<char, int> BuildFrequencyMap(string input)
{
Dictionary<char, int> frequencyMap = new Dictionary<char, int>();
foreach (char c in input)
{
if (frequencyMap.ContainsKey(c))
frequencyMap[c]++;
else
frequencyMap[c] = 1;
}
return frequencyMap;
}
private static PriorityQueue<Node> BuildPriorityQueue(Dictionary<char, int> frequencyMap)
{
PriorityQueue<Node> priorityQueue = new PriorityQueue<Node>();
foreach (var entry in frequencyMap)
{
priorityQueue.Enqueue(new Node(entry.Key, entry.Value));
}
return priorityQueue;
}
private static Dictionary<char, string> BuildEncodingMap(Node root)
{
Dictionary<char, string> encodingMap = new Dictionary<char, string>();
BuildEncodingMapRecursive(root, "", encodingMap);
return encodingMap;
}
private static void BuildEncodingMapRecursive(Node node, string code, Dictionary<char, string> encodingMap)
{
if (node.Symbol.HasValue)
{
encodingMap[node.Symbol.Value] = code;
}
else
{
BuildEncodingMapRecursive(node.Left, code + "0", encodingMap);
BuildEncodingMapRecursive(node.Right, code + "1", encodingMap);
}
}
}
class PriorityQueue<T> where T : IComparable<T>
{
private List<T> heap;
public int Count { get { return heap.Count; } }
public PriorityQueue()
{
heap = new List<T>();
}
public void Enqueue(T item)
{
heap.Add(item);
int i = Count - 1;
while (i > 0)
{
int parent = (i - 1) / 2;
if (heap[parent].CompareTo(item) <= 0)
break;
heap[i] = heap[parent];
i = parent;
}
heap[i] = item;
}
public T Dequeue()
{
if (Count == 0)
throw new InvalidOperationException("Queue is empty");
T item = heap[0];
int i = Count - 1;
T last = heap[i];
heap.RemoveAt(i);
if (i > 0)
{
int parent = 0;
while (true)
{
int child = parent * 2 + 1;
if (child >= i)
break;
if (child + 1 < i && heap[child + 1].CompareTo(heap[child]) < 0)
child++;
if (last.CompareTo(heap[child]) <= 0)
break;
heap[parent] = heap[child];
parent = child;
}
heap[parent] = last;
}
return item;
}
}
class Program
{
static void Main()
{
string input = "huffman coding example";
Console.WriteLine($"Original input: {input}");
var encodingMap = HuffmanCoding.Encode(input);
Console.WriteLine("Huffman Encoding Map:");
foreach (var entry in encodingMap)
{
Console.WriteLine($"{entry.Key}: {entry.Value}");
}
string compressed = HuffmanCoding.Compress(input, encodingMap);
Console.WriteLine($"Compressed: {compressed}");
string decompressed = HuffmanCoding.Decompress(compressed, encodingMap);
Console.WriteLine($"Decompressed: {decompressed}");
}
}
This implementation includes a Node
class for building the Huffman tree, a PriorityQueue
class for efficient node management, and the HuffmanCoding
class containing methods for encoding, compressing, and decompressing. The example demonstrates encoding, compression, and decompression of a sample input string.
_ This implementation is a basic example and may not be optimized for large datasets. It is intended for educational purposes to understand the principles of Huffman coding._
Example
Let’s delve deeper into how the Huffman coding algorithm works step by step
Step 1: Build Frequency Map
The first step in Huffman coding is to analyze the input data and create a frequency map. This map records the occurrences of each character in the data.
For example, given the input ABRACADABRA, the frequency map would look like this:
A: 5
B: 2
R: 2
C: 1
D: 1
Step 2: Build Priority Queue
Once we have the frequency map, we create a priority queue, where each node represents a character and its frequency. Nodes with lower frequencies have higher priority in the queue.
For our example, the priority queue would look like this:
(C:1) (D:1) (B:2) (R:2) (A:5)
Step 3: Build Huffman Tree
We start building the Huffman tree by repeatedly dequeuing the two nodes with the lowest frequencies from the priority queue and combining them into a new node. This process continues until only one node remains in the queue, forming the root of the Huffman tree.
In our example:
- Combine (C:1) and (D:1) to create a new node (CD:2).
- Combine (B:2) and (R:2) to create a new node (BR:4).
- Combine (CD:2) and (BR:4) to create a new node (CD-BR:6).
- Combine (A:5) and (CD-BR:6) to create the root of the Huffman tree (A-CD-BR:11).
Step 4: Build Encoding Map
Traverse the Huffman tree to assign binary codes to each character. Moving left in the tree corresponds to adding a ‘0’ to the code, and moving right corresponds to adding a ‘1’.
The resulting encoding map for our example:
A: 0
B: 10
R: 11
C: 100
D: 101
Step 5: Compress Input
Encode the original input data using the generated Huffman encoding map. Replace each character with its corresponding binary code.
For our example, “ABRACADABRA” would be compressed to:
01001100110010011001001110
Step 6: Decompress Output
To decompress, use the Huffman tree to traverse the binary codes and reconstruct the original input.
The compressed data 01001100110010011001001110 would be decompressed back to “ABRACADABRA.”
In summary, Huffman coding optimally represents more frequent characters with shorter codes, resulting in efficient data compression. The steps outlined above illustrate the process of building the Huffman tree and generating codes for compression and decompression.
Conclusion
Huffman coding is a powerful algorithm for data compression, enabling efficient storage and transmission of information. This guide has provided an in-depth look into the Huffman coding algorithm and demonstrated its implementation in C#. Understanding Huffman coding contributes to a foundational knowledge of compression techniques, which are fundamental in various computer science applications.