The LZSS Algorithm
Explanation of terms: see LZ77.
Difference to the LZ77
The algorihtm LZ77 solves the case of no match in the window by outputting an explicit character after each pointer. This solution contains redundancy: either is the null-pointer redundant, or the extra character that could be included in the next match. The LZSS algorithm solves this problem in a more efficient manner: the pointer is output only if it points to a match longer than the pointer itself; otherwise, explicit characters are sent. Since the output stream now contains assorted pointers and characters, each of them has to have an extra ID-bit which discriminates between them.
The encoding algorithm
- Place the coding position to the beginning of the input stream;
- find the longest match in the window for the lookahead buffer:
- P := pointer to this match;
- L := length of the match;
- is L >= MIN_LENGTH?
- YES: output P and move the coding position L characters forward;
- NO: output the first character of the lookahead buffer and move the coding positon one character forward;
- if there are more characters in the input stream, go back to step 2.
The encoding process is presented in Table 1.
Input stream for encoding:
- The column Step indicates the number of the encoding step. It completes each time the encoding algorithm makes an output. With LZSS this happens in each pass through the step 3.
- The column Pos indicates the coding position. The first character in the input stream has the coding position 1.
- The column Match shows the longest match found in the window.
- The column Output presents the output, which can be one of the following:
- A pointer to the Match, in the format (B,L). This gives the following instruction to the decoder: "Go back B characters in the window and copy L characters to the output";
- The Match itself in explicit form (if it is only one character long, since MIN_LENGTH is set to 2).
|Char ||A ||A
The encoding process
(MIN_LENGTH = 2)
|Step ||Pos ||Match ||Output |
|7||8||A A B||(7,3)|
The window is slid over the output stream in the same manner the encoding algorithm slides it over the input stream. Explicit characters are output directly, and when a pointer is encountered, the string in the window it points to is output.
Performance comparison to LZ77
This algorithm generally yields a better compression ratio than LZ77 with practically the same processor and memory requirements. The decoding is still extremely simple and quick. That's why it has become the basis for practically all the later algorithms of this type.
It is implemented in almost all of the popular archivers: PKZip, ARJ, LHArc, ZOO etc. Of course, every archiver implements it a bit differently, depending on the pointer length (it can also be variable), the window size, the way it is moved (some implementations move the window in N-character steps) and so on.
LZSS can also be combined with the entropy coding methods: for example, ARJ applies Huffman encoding, and PKZip 1.x applies Shannon-Fano encoding (later versions of PKZip also apply Huffman encoding).