Compression Tools

This is a simple test I've made to test how long it takes for every compression tool to compress a 13 MB text file, and also what the final size would be.
Take into account that compression is a CPU intensive task, and the more seconds it takes to complete, the more time the CPU is busy preventing other tasks from running.

Prepare the file:
root@localhost:~# for i in `find /etc/ -type f`; do cat $i » etc.log; done

Make 5 copies of the file:
root@localhost:~# cp etc.log etc.log.1
root@localhost:~# cp etc.log etc.log.2
root@localhost:~# cp etc.log etc.log.3
root@localhost:~# cp etc.log etc.log.4
root@localhost:~# cp etc.log etc.log.5

Compress using lzo
root@localhost:~# time lzop etc.log.1
real 0m0.192s
user 0m0.168s
sys 0m0.020s

Compress using gzip
root@localhost:~# time gzip etc.log.2
real 0m2.091s
user 0m2.060s
sys 0m0.024s

Compress using zip
root@debug:~# time zip etc.log.3.zip etc.log.3
adding: etc.log.5 (deflated 75%)
real 0m2.359s
user 0m2.288s
sys 0m0.056s

Compress using bzip
root@localhost:~# time bzip2 etc.log.4
real 0m6.641s
user 0m6.548s
sys 0m0.056s

Compress using lzma
root@localhost:~# time lzma etc.log.5
real 0m23.801s
user 0m23.389s
sys 0m0.292s

Sizes of the final files

root@localhost:~# ls -l etc.log*
-rw-r--r-- 1 root root 13300136 Nov  2 22:00 etc.log
-rw-r--r-- 1 root root  5081296 Nov  2 22:12 etc.log.1.lzo
-rw-r--r-- 1 root root  3345139 Nov  2 22:12 etc.log.2.gz
-rw-r--r-- 1 root root  3345279 Nov  2 22:12 etc.log.3.zip
-rw-r--r-- 1 root root  2877770 Nov  2 22:12 etc.log.4.bz2
-rw-r--r-- 1 root root  2212810 Nov  2 22:12 etc.log.5.lzma

root@localhost:~# ls -lh etc.log*

-rw-r--r-- 1 root root  13M Nov  2 22:00 etc.log
-rw-r--r-- 1 root root 4.9M Nov  2 22:12 etc.log.1.lzo
-rw-r--r-- 1 root root 3.2M Nov  2 22:12 etc.log.2.gz
-rw-r--r-- 1 root root 3.2M Nov  2 22:12 etc.log.3.zip
-rw-r--r-- 1 root root 2.8M Nov  2 22:12 etc.log.4.bz2
-rw-r--r-- 1 root root 2.2M Nov  2 22:12 etc.log.5.lzma

It seems to me, gzip seems to offer good compression at a reasonable cpu usage, but in specific environments other formats would be better.

Some highlights

(Stolen from here, but shared with my conclusions)
As we already seen, lzop is the fastest algorithm, but if you’re looking for pure speed, you might better want to take a look at gzip and its lowest compression levels. It’s also pretty fast, and achieves a way better compression ratio than lzop.

The higher level of gzip (9, which is the default), and the lower levels of bzip2 (1, 2, 3) are outperformed by the lower levels of xz (0, 1, 2).

The level 0 of xz might not be used, its use is somewhat discouraged in the man, because its meaning might change in a future version, and select an non-lzma2 algorithm to try to achieve an higher compression speed.

The higher levels of xz (3 and above) might only be used if you want the best compression ratio, and definitely don’t care about the enormous time of compression, and gigantic amount of RAM used. The levels 7 to 9 are particularly insane in this regard, while offering you a ridiculously tiny better compression ratio than mid-levels.

The bzip2 decompression time is particularly bad, whatever level is used. If you care about the decompression time, better avoid bzip2 entirely, and use gzip if you prefer speed or xz if you prefer compression ratio.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License