restwed.blogg.se - Snappy compression sles

#Snappy compression sles archive#

lz4 blows lzo and google snappy by all metrics, by a fair margin.īetter yet, they come with a wide range of compression levels that can adjust speed/ratio almost linearly.

zstd blows deflate out of the water, achieving a better compression ratio than gzip while being multiple times faster to compress.

That could very well be the biggest advance in computing in the last decade. Google and Facebook have people working on compression, they have a lot of data and a ton to gain by shaving off a few percents here and there.įacebook in particular has hired the top compression research scientist and rolled 2 compressors based on a novel compression approach that is doing wonder. We're in the 3rd millenimum and there was surprisingly little progress in general compression in the past decades.ĭeflate, lzma and lzo are from the 90's, the origin of lz compression traces back to at least the 70's.Īctually, it's not true that nothing happened. For example, ElasticSearch (Lucene) compresses indexes with lz4 by default. Using compression can reduce I/O and it will make the application faster if I/O was the bottleneck. The compression and decompression speed is actually faster than most I/O. The fastest algorithms are ideal to reduce storage/disk/network usage and make application more efficient. Static web assets can be compressed on the fly by some web servers to save bandwidth (html, css, js). The medium algorithms are ideal to save storage space and/or network transfer at the expense of CPU time.įor example, backups or logs are often gzip'ed on archival.

It used to be tar.gz historically, the switch to stronger compression must have saved a lot of bandwidth on the Linux mirrors. The strongest and slowest algorithms are ideal to compress a single time and decompress many times.įor example, linux packages are distributed as (lzma) for the last few years. It's mostly lzo, lz4 (facebook) and snappy (google). The fast algorithms are around 1 GB/s and above, a whole gigabyte that is correct, at both compression and decompression.Note that deflate is on the lower end while zstd is on the higher end. It's mostly deflate (used by gzip) and zstd (facebook). The medium are in the 10 - 500 MB/s range at compression.It's mostly LZMA derivatives (LZMA, LZMA2, XZ, 7-zip default), bzip2 and brotli (from google). The slow are in the 0 - 10 MB/s range at compression.Let's split the compressors in categories: the slow, the medium and the fast:

There are some bugs or edge cases to account for so you should always test your implementation against your use case.įor instance kafka have offered snappy compression for a few years (off by default) but the buffers are misconfigured and it cannot achieve any meaningful compression. It has similar results to everything else that is based on deflate (particularly the zlib library).

#Snappy compression sles archive#

A C library well-optimized over a decade should do a bit better than a random java lib from github.įor example, gzip designates both the tool and its archive format (specific to that tool) but it's based on deflate. The algorithm family is the most defining characteristic by far, then comes the implementation. (gzip, tar, 7-zip, zlib, liblzma, libdeflate, etc.) A tool or a library, also knows as the implementation.An algorithm, with adjustable settings.It's easier to understand the comparison once you realize that a compressor is just the combination of the following 3 things. The large amount of compressors and the similarity between them can cause confusion. It's tested against the low level C libraries with the available flags. The following benchmark cover the most common compression methods.