RLE - Run Length Encoding
Run-length encoding (RLE) is a very simple form of data
compression encoding. It is based on simple principle of encoding data. This
principle is to every stream which is formed of the same data values (repeating
values is called a run) i.e sequence of repeated data values is replaced
with count number and a single value. This intuitive principle works best on
certain data types in which sequences of repeated data values can be noticed;
RLE is usually applied to the files that a contain large number of consecutive
occurrences of the same byte pattern. RLE may be used on any kind of data regardless of its content, but data which
is being compressed by RLE determines how good compression ratio will be achieved.
So RLE is used on text files which contains multiple spaces for indention and
formatting paragraphs, tables and charts. Digitized signals also consist of
unchanged streams so such signals can also be compressed by RLE. A good example
of such signal are monochrome images, and questionable compression would be probably
achieved if such compression was used on continous-tone (photographic) images.
RLE is a lossless type of compression and cannot achieve great compression ratios,
but a good point of that compression is that it can be easily implemented and quickly
executed. As it was said above basic RLE principle is that the run of characters is replaced
with the number of the same characters and a single character. In the following paragraph the process of encoding using flow chart will be shown. Fig 1. Format of three byte code word Fig 2. RLE - flow chart Process of RLE starts with initialization of character counter, repetition
counter and a variable which represents the current character (Ch), then if all characters
in file have been processed encoding ends. If there are more characters then
Ch variable is being stored in temporary variable (if ChCount equals 1), else
actual character is being compared to the previous character and then result of
that comparison leads to repetition counter increment or to another comparison
in which it is being tested if the number of consecutive characters is greater than
four in other words does the stream is just copied or coded according
to code word shown in Fig 1. RLE algorithms are parts of various image compression techniques like BMP, PCX,
TIFF, and is also used in PDF file format, but RLE also exists as separate compression
technique and file format. CompuServe RLE file format standard was formed in the 80's and defines the compression
for 1-bit images. 1B 47 48 - is header <ESC><G><H> and represents high resolution,
first data sequence pair 20hex, 7Ehex means that first 94dec
pixels are all turned on, the second data sequence is the same so second 94d pixels
are also turned on (the first 188d pixels are turned on so far), and so on. Then
somewhere in the file pairs 41h 36h occurs which means that next 33d pixels are
turned off and after that 22d pixels are turned off, etc. Last four character
are the ending sequence which was described above. MS Windows standard for RLE have the same file format as well-known BMP file format,
but it's RLE format is defined only for 4-bit and 8-bit colour images. 4bit RLE Table 1. Definition of escape codes(the first byte of compression sequence is 0)
Examples for 4bit RLE: 8bit RLE Examples for 8bit RLE Detailed description about
MS Windows BMP file format (RLE is special case of BMP) RLE scheme which will be described in this chapter is being used in PDF and
TIFF file format. RLE encoded data consists of compression sequences, one compression
sequence starts with number n (byte), this byte may be followed by 1 to 128
bytes, so this 2 to 129 bytes form one compression sequence. Examples:What is RLE ?
Fair compression ratio may be achieved if RLE is applied on computer generated
color images.Principle of RLE
Example:
before: R
T A A A A S D E E E E E
after RLE compression: R T *4A S D *5E
In the above example each stream of the same characters is being replaced with the number of
characters, single character and character *, at the first sight usage of * may seem redundant
but characters * which represents control character (which may variously implemented) are necessary
because some streams consist of the same characters or number of character is too small, so this
special character represents that next characters represents the number of repetition and character
after the number is the character which will be repeated. If control character or a
run of control characters (in our example *) are in a stream which is being
encoded then one extra character is coded. It is important to realize that the encoding
process is effective only if there are sequences of 4 or more repeating characters
because three characters are used to conduct RLE so for instance coding two
repeating characters would lead to expansion and coding three repeating characters
wouldn't cause compression or expansion. The decoding process is also very simple,
if there aren't control characters the stream is just copied and if control character
occurred then it must be removed and appropriate character is copied in a defined
number of times. It can be noticed that the process of decoding control characters
don't lead to any special procedures.
It is important to point out that there are many different run-length encoding
schemes. The above example has just been used to demonstrate the basic principle of
RLE encoding. In a particular case the implementation of RLE depends on what type
of data is being compressed.
CTRL - control character which is used to indicate compression
COUNT- number of counted characters in stream of the same characters
CHAR - repeating characters
Examples of RLE implementations
CompuServe standard for RLE file format
Header sequence in RLE file represents Graphic Mode Control, control is initiated
when program runs onto a sequence of three characters, those characters are
ASCII ESC (HEX 1B), ASCII G(HEX 47) and the third character is ASCII H (HEX 48)
or M (HEX 4D). Third character represents resolution, there are two possible
graphics modes, and those are high resolution graphic mode (256 x 192
pixels) represented by sequence <ESC><G><H> and medium
resolution graphic mode (128 x 96 pixels)which is represented by <ESC><G><M>
sequence.
After header sequence ,data sequence starts, basic data sequence consists of
a pair of run length encoded ASCII characters. The first number represents number
of the background (turned off) pixels and the second character is the number of foreground
(turned on) pixels. Each number of a pair represents the count number of pixels
plus 32 decimal, i.e. from each number 32 is substracted and that number represents
how many next pixels will be turned on or turned off depending on what number of
pair we observe. Usually it is used ASCII ~(HEX 7E, DEC 126) as highest possible
value, because some terminals interpret ASCII character 7F HEX as <RUBOUT>,
because RLE file format was used as file which was to show graphic on terminals.
Previous facts lead to conlussion that in each byte we can denote repetition
of 94 pixels (126 - 32). For example pair <D><'> (HEX: 44 27, DECIMAL
68 39) means next 68 (decimal) pixels are turned off and then 39 (decimal) pixels
are turned on.
Data in file is written in such a way that if the last pixel set was on position 254
then the next pixel will be on the first position in next line i.e. pictures are being
drawn from up to down. Let's illustrate this with an example; if the last pixel set
on line was on position 252 and data sequence consists of pair 21 hex, 27hex
i.e one background pixel and seven foreground pixels then following pixel is
turned off, then the following two pixels of current line are turned on, and then
the rest of five pixels turned on, on the beginning of the next line.
The ending sequence for RLE standard consists of three characters <ESC><G><N>,
<ESC> is a control character which ends the graphic display. Basic convention
is that control character shouldn't effect the display. All control characters
should be ignored besides <ESC> and <BEL> characters, <BEL>
can be optionally used, so in some cases RLE file ending sequence consists of
<BEL><ESC><G><N>. In other words end of RLE file according
to standard is <ESC><G><H> or <BEL><ESC><G><N>.
Example - RLE file (all numbers are in ASCII HEX format):
1B 47 48 7E 20 7E 20 7E 20 7E 20 ....
. . . . . .
. . . .
.
. 41 36 . . . . .
. . .
. . . . .
. . . . .
.
. . . 07
1B 47 4E
![]()
Click to download samples of CompuServe RLE
images MS Windows standard for RLE file format
Two types of RLE compression is used 4bit RLE and 8bit RLE as expected the first
type is used for 4-bit images, second for 8-bit images.
Compression sequence consists of two bytes, first byte (if not zero) determines
number of pixels which will be drawn. The second byte specifies two colors, high-order
4 bits (upper 4 bits) specifies the first color, low-order 4bits specifies the second
color this means that after expansion 1st, 3rd and other odd pixels will be
in color specified by high-order bits, while even 2nd, 4th and other even pixels
will be in color specified by low-order bits. If first byte is zero then the second
byte specifies escape code. (See table below)
Second byte
Definition
0
End-of-line
1
End-of-Rle(Bitmap)
2
Following two bytes defines offset in x and y direction (x
is right,y is up). The skipped pixels get color zero.
>=3
when expanding following >=3 nibbles (4bits) are just copied
from compressed file, file/memory pointer must be on 16bit boundary so adequate
number of zeros follows
Compressed data
Expanded data
06 52
5 2 5 2 5 2
08 1B
1 B 1 B 1 B 1 B
00 06 83 14 34
8 3 1 4 3 4
00 02 09 06
Move 9 positions right and 6 up
00 00
End-of -line
04 22
2 2 2 2
00 01
End-of-RLE(Bitmap)
Sequence when compressing is also formed from 2 bytes, the first byte (if not zero)
is a number of consecutive pixels which are in color specified by the second byte.
Same as 4bit RLE if the first byte is zero the second byte defines escape code, escape
codes 0, 1, 2, have same meaning as described in Table 1. while if escape code
is >=3 then when expanding the following >=3 bytes will be just copied from
the compressed file, if escape code is 3 or other greater odd number then zero follows
to ensure 16bit boundary.
Compressed data
Expanded data
06 52
52 52 52 52 52 52
08 1B
1B 1B 1B 1B 1B 1B 1B 1B
00 03 83 14 34
83 14 34
00 02 09 06
Move 9 positions right and 6 up
00 00
End-of -line
04 2A
2A 2A 2A 2A
00 01
End-of-RLE(Bitmap)
Example of RLE usage in other file formats
If n is between 0 and 127 inclusive then following n+1 (1 to 128) bytes are
just copied during decompression. If n is between 129 and 255 inclusive then
byte which follows n is being copied 256-(n-1) i.e. 2 to 128 times in decompressed
file. If 128 occurs then we reached the end of compressed data.
This scheme is similar to PackBits encoding scheme known to Macintosh
users.
Compressed data-hex format
Decompressed data-hex format
07 A4 56 C9 90 E5 F1 DB 32
A4 56 C9 E5 F1 DB 32
02 23 A1 56
23 A1 56
FE 12
12 12 12
FC 6C
6C 6C 6C 6C 6C
HomePage - Basic
Facts -Commercial - Algorithms
- Hardware - FAQ - Glossary
![]()
Maintained and Copyrighted © 1997-2000 by DataCompression Reference Center
![]()