Huffman Coding


About Huffman codes

Huffman codes belongs into a family of codes with a variable codeword length. That means that individual symbols which makes a message are represented (encoded) with bit sequences that have distinct length.This characteristic of the code words helps to decrease the amount of redundancy in message data i.e. it makes data compression possible. For example,  symbols  A, B, C  and D are represented with following code words:
 

Symbol

Code word

A 0
B 10
C 110
D 111

 

Symbols A and B have distinct lengths of their code words ( `0` and `10` ). At the first look it seems that the code words are not uniquely decodable. But, the power of Huffman codes is in the fact that all code words are uniquely decodable. So, the sequence of bits:

                                 01101110100

is uniquely decodable as  `ACDABA`.

 

Decreasing of redundancy in data by Huffman codes is based on the fact that distinct symbols have distinct probabilities of incidence. This fact helps to create such code words, which really contribute to decreasing of redundancy  i.e. to data compression. Symbols with higher probabilities of incidence are coded with shorter code words, while symbols with higher probabilities are coded with longer code words. Though, we cant prevent of longer code words to show up, because of theirs less probabilities they will contribute to the optimality of code.
 
 

 

Building Huffman codes

As it was said above, building of distinguished code words is based on the distinct probabilities of incidence for distinct symbols. The algorithm for building Huffman code is based on the so-called "coding tree". The algorithm steps are:

  1. Line up the symbols by falling probabilities
  2. Link two symbols with least probabilities into one new symbol which probability is a sum of  probabilities of two symbols
  3. Go to step 2. until you generate a single symbol which probability is 1
  4. Trace the coding tree from a root (the generated symbol with probability 1)  to origin symbols, and assign to each lower branch 1, and to each upper branch 0
     
      
     

Example

Assume that we have the following text:

                   "EXAMPLE OF HUFFMAN CODE"

First, we calculate the probabilities of each symbol in the text as follows:
 
 

Symbol

Probability

E

2/25

X 1/25
A 2/25
M 2/25
P 1/25
L 1/25
O 2/25
F 2/25

H

1/25
U 1/25
C 1/25
D 1/25
I 1/25
N 2/25
G 1/25
space 3/25

Now we create the coding tree:
hufftree.gif (3309 bytes)

 

Expansions of Huffman code theory

Adaptive Huffman code enables dinamically change of the code words accordingly to the change of probabilities of the symbols. In this way, the produced code is more effective then the primary Huffman code.

Extended Huffman codes have the charachteristic that the coding sheme is coding group of symbols rather than a single symbol.
 
 

               Books

               References

               Related Links

               

HomePage - Basic Facts -Commercial - Algorithms - Hardware - FAQ - Glossary

arcsepd.gif (196 bytes)

Maintained and Copyrighted © 1997-2000 by DataCompression Reference Center 

(compresswww@rasip.fer.hr)

arcsepd.gif (196 bytes)