3

Why does there seem to be a preference for using Huffman (or Arithmetic) coding instead of a Lempel-Ziv type compression algorithm for Burrows-Wheeler Transformed data?

I noticed that data compressors such as Bzip2 or ZZip use mainly a combination of BWT, RLE and Huffman/Arithmetic coding. I would like to know what are the reasons against using a LZ-type algorithm instead of entropy encoding.

Log
  • 31
  • 1
  • Have you looked at the obvious characteristics, i.e. runtime and compression rate (on different kinds of data)? – Raphael Feb 16 '15 at 16:13
  • @Raphael No, I have not. This a theoretical question, because I expect the reason for favoring entropy coding to be theoretical. In other words, my intuition tells me that LZ should be just as good, yet I am waiting to see if someone can show that my intuition is wrong. – Log Feb 16 '15 at 16:45
  • You reference artifacts of practice, and sometimes developers of such do not follow what is "best" in theory. That's fair, because practice adds additional layers of concern. But looking at the theory can certainly provide a first view on the matter, and I'm certain you can find resource analyses and compression rates in the literature; hence my query. – Raphael Feb 16 '15 at 21:20
  • @Raphael I was hoping that I could get an answer with a short proof or link to a study, which shows that BWT "goes well" with LZ algorithms (or doesn't). The "artifacts of practice", as you call them, point to "doesn't" but I'm looking for a (preferably simple) explanation rather than benchmarks. – Log Feb 16 '15 at 22:04
  • 1
    Well, there are some articles that contain both terms. Sorry I can't be of more help, but I don't know much in this area. – Raphael Feb 17 '15 at 07:12
  • Thanks. I have also found this, which looks promising. Bijective BWT is also something which I should look at. – Log Feb 17 '15 at 16:08

1 Answers1

0

LZ-type algorithms are useful for data for which context is predictive, like text consisting of lexemes and following some grammar. That is not so for the output of the BWT (nor of the MTF transform which is performed after BWT).

Leo B.
  • 133
  • 5
  • Specifically, the BWT plays the same role as prediction in LZ-type algorithms: Exploiting the fact that the same symbols tend to follow the same contexts. It's also the same story with PPM or DMC. – Pseudonym Dec 16 '23 at 02:45