It's strange that they intend for the decompressor to be fast on embedded devices and handhelds, yet give no benchmarks for ARM processors. Instead, they only talk about i7 performance, a chip that you won't find in the target environments.
For that matter, does it run well on a modern x64 processor? I'm confused by people who do optimization at this level, but don't distinguish between the different generations of processors. Core i7 is a marketing term corresponding to Intel's positioning of the CPU within the market, but says little about the performance characteristics of the processor.
He mentions later that he's using a "Core i7 970", which is from the Westmere generation that came out in 2010. The decompression speed on the more modern Sandy Bridge, Ivy Bridge, Haswell, or Broadwell chips could be anywhere from the same to 8x better depending on how amenable the algorithm is SIMD improvements.
Without testing it's hard to know, but my guess would be that since it was not consciously designed to take advantage of the newer processor features, it probably will be at the low end of this, and could be beaten handily by an approach that targets only new instruction sets.
There's a link to a table of benchmarks in the description (http://mattmahoney.net/dc/text.html). LZ4 still looks a little more compelling than LZHAM in terms of compress/decompress time. It doesn't mention the specific application though (CPU/Disk/etc...).
Are you sure it doesn't compare favorably to LZ4? For a small (1.5x) slower decompression speed, the compression factor appears to be substantially better. LZHAM managed to compress the enwiki8 set to 23.80 MiB, and the enwiki9 set to 196.83 MiB.
LZ4, by comparison, had compressed sizes of 40.88 MiB and 362.40 MiB. A database or a Linux distribution using zswap/zram/etc. would be able to store 84% more data in the same amount of memory. For caches, that's huge.
One trade-off appears to be that lzham often requires a larger dictionary in memory, but it appears that even with smaller dictionary sizes it is appreciably more compact than lz4.
That was also one of the things I noticed at a quick glance. It would probably be a fairly big win for anything I/O constrained. I didn't take a look at the data being compressed or the algorithm in detail though.
One of the LZMA variants should be added to Chrome/Firefox and others as an optional compression format. Gzip compression ratio is brutally bad compared to the LZMA variants for text and binary compression. It would make a big difference to the web, especially for static but commonly downloaded data.
"It would make a big difference to the web, especially for static but commonly downloaded data."
Static but commonly downloaded data is already cached. For downloading software packages, or differential sofware updates (e.g. initializing the dictionary with previously downloaded elements), it would make a lot of sense (basically, the things that are already being done with LZMA getting extra speed).
It will take time to get LZHAM as stable as LZMA. It looks promising, anyway.
> Static but commonly downloaded data is already cached.
I was trying to refer to things that are downloaded by a lot of people (such as our 3D scenes here: https://clara.io/library), rather than something that is repeatedly downloaded by a single person.
Downloading the data to these 3D scenes is by far the slowest thing and we know that LZMA variants lead to nearly 2x reduction in data transfer.
Given that LZMA is the algorithm with the highest compression ratio and lowest speed of all commonly used algorithms, this isn't particularly impressive.
Why is it better than bzip2, or gzip? Both of these are 2-3x faster than LZMA (much more so for gzip --fast) but have lower compression ratios.
LZ4 would be the extreme example of an algorithm even faster than gzip (but with still lower compression ratio).
FWIW, from the "Large Text Compression Benchmark" linked from that page:
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note
lzham alpha 3 x64 -m4 -d29 24,954,329 206,393,809 155,282 x 206,549,091 595 9 4800 LZ77 45
gzip 1.3.5 -9 36,445,248 322,591,995 38,801 x 322,630,796 101 17 1.6 LZ77
bzip2 1.0.2 -9 29,008,736 253,977,839 30,036 x 254,007,875 379 129 8 BWT
Salient point is the decompression time (ns/byte), which is 129 for bzip2, 17 for gzip, and... 9 (!) for lzham. So on that point it blows them out of the water, while still achieving higher compression rate on this type of input. As the original webpage implies, this is perfect for stuff like video games.
That changes things, it's a shame the website doesn't explicitly say that it's faster than gzip to decompress, it just says it's faster than LZMA, which isn't saying much.
Much better than lz4hc (7 ns/byte, but only compresses to 44MB):
Is the "Mem" column saying that lzham used 4800 megabytes vs gzip using 1.6 megabytes? If that's so, it makes lzham much less attractive for console and mobile games. If that could be reduced to 8 while still being faster than gzip, then it would be very interesting.
For anyone publishing compression benchmarks: Please, please include memory usage in the report! Very few reports bother to do so at the moment. Meanwhile, many compressors take the assumption that since 16 gigs of RAM is cheap these days, that means using a few extra gigs to get a speedup is totally reasonable. It may be reasonable for you, but it makes the algo useless for me.
Disclosure: I'm probably the only person to release a game that uses LZHAM other than Rich… I chose it because because I needed an LZMA-like compression ratio with a faster decompressor. Performed up to expectations.
Does the algorithm run well on ARM?