About Huffman codes
Huffman codes belongs into a family of codes with a variable codeword length. That means that individual symbols which makes a message are represented (encoded) with bit sequences that have distinct length.This characteristic of the code words helps to decrease the amount of redundancy in message data i.e. it makes data compression possible. For example, symbols A, B, C and D are represented with following code words:
Symbol
Code word
A 0 B 10 C 110 D 111
Symbols A and B have distinct lengths of their code words ( `0` and `10` ). At the first look it seems that the code words are not uniquely decodable. But, the power of Huffman codes is in the fact that all code words are uniquely decodable. So, the sequence of bits:
01101110100
is uniquely decodable as `ACDABA`.
Decreasing of redundancy in data by Huffman codes is based on the fact that distinct symbols have distinct probabilities of incidence. This fact helps to create such code words, which really contribute to decreasing of redundancy i.e. to data compression. Symbols with higher probabilities of incidence are coded with shorter code words, while symbols with higher probabilities are coded with longer code words. Though, we cant prevent of longer code words to show up, because of theirs less probabilities they will contribute to the optimality of code.
Building Huffman codes
As it was said above, building of distinguished code words is based on the distinct probabilities of incidence for distinct symbols. The algorithm for building Huffman code is based on the so-called "coding tree". The algorithm steps are:
- Line up the symbols by falling probabilities
- Link two symbols with least probabilities into one new symbol which probability is a sum of probabilities of two symbols
- Go to step 2. until you generate a single symbol which probability is 1
- Trace the coding tree from a root (the generated symbol with probability 1) to origin symbols, and assign to each lower branch 1, and to each upper branch 0
Example
Assume that we have the following text:
"EXAMPLE OF HUFFMAN CODE"
First, we calculate the probabilities of each symbol in the text as follows:
Symbol
Probability
E
2/25
X 1/25 A 2/25 M 2/25 P 1/25 L 1/25 O 2/25 F 2/25 H
1/25 U 1/25 C 1/25 D 1/25 I 1/25 N 2/25 G 1/25 space 3/25 Now we create the coding tree:
Expansions of Huffman code theory
Adaptive Huffman code enables dinamically change of the code words accordingly to the change of probabilities of the symbols. In this way, the produced code is more effective then the primary Huffman code.
Extended Huffman codes have the charachteristic that the coding sheme is coding group of symbols rather than a single symbol.
HomePage - Basic
Facts -Commercial - Algorithms
- Hardware - FAQ - Glossary
![]()
Maintained and Copyrighted © 1997-2000 by DataCompression Reference Center
![]()