Overview of Compression Algorithms
Far out in the uncharted backwaters of the unfashionable end of the western spiral arm of the Galaxy lies a small unregarded yellow sun.
Orbiting this at a distance of roughly ninety-two million miles is an utterly insignificant little blue green planet whose ape-descended life forms are so amazingly primitive that they still think digital watches are a pretty neat idea.
This planet has a problem, which was this: files are too big. Many solutions were suggested for solving this problem via lossless compression, such as Lempel-Ziv and Huffman coding, but most of these were implemented into common compression utilities and promptly forgotten. Today, much of the relevant work to compression is in an obscure corner of the internet between lengthy PhD thesis papers and hard-to-find gems.
Lossless file compression, and file compression in general has become a lost art. The modern developer community has moved on from working on compression algorithms to bigger and better problems, such as creating the next major NodeJS framework. However, compression as it stands in the computer science aspect is still as interesting as it was in 1980s, possibly even more so today with an estimated 463 Exabytes of data to be created everyday in 2025.
It’s no secret that the internet is growing rapidly, and with it more people are becoming connected. From urban dwellers to rural farmers, fast internet speeds are not a given. To counter this, there are numerous projects focused on improving internet speeds for rural users, but there are far fewer projects focused on the other half of improving internet access: compressing data.
These claims about the “lost of art of compression” may seem a bit unsubstantiated, as there are new and actively developed compression projects out there today, such as, but not limited to:
There are also some other notable projects which I’ve included at the end, but either they aren’t active or universal enough to be included in this short list.
However this argument still holds true, compression isn’t really mainstream, and I don’t know why it isn’t. Internet speeds are a real problem for much of the world and better compression stands as a promising solution. The possibilities of better compression are truly endless:
- Faster 4k video streaming
- Faster download speeds
- Faster upload speeds
- Faster website and content loading speeds
- Better accessibility to services in regions with poor internet connections
- Higher quality video and voice calls
- Less expensive deep data storage costs
- and more
Compression ratios are generally used to represent how good a compression algorithm is at compressing. Generally, this is represented as the uncompressed size divided by the compressed size, yielding a number (hopefully) greater than 1. The higher the compression ratio, the better the compression algorithm is.
It should also be noted that a better compression ratio does not always indicate a better compression algorithm. Some algorithms are designed to give a moderate compression ratio with very good speed, while others are focused on good compression ratios and moderate speed. The use case of a compression algorithm are what determines what factors of a compression algorithm are favorable. For example, when streaming video you must be able to decode each frame relatively quickly, but when downloading a large game it may be preferable to download a smaller file and take time to decode the compressed files.
The goal of this project, and by extension, the goal of all resources here is to help people learn about compression algorithms and encourage people to tinker, build, and experiment with their own algorithms and implementations. After all, the best way to innovate in tech is to get a bunch of developers interested in something and let them lead the way.
Additionally, this project itself is intended to be a community-sourced resource for people interested in compression algorithms. The idea is that anyone can contribute to this website through GitHub so that this can be a constantly improving and expanding resource for others.
Notable Compression Project Mentions
- LZHAM (not super active)
- Google’s WebP (only for images)
- Dropbox’s DivANS
- Dropbox’s avrecode
- Dropbox’s Lepton
- H.266/VCC Codec (domain-specific for video)
When jumping into this guide, it’s important to identify what you want to gain from it. For example, if you’d simply like to familiarize yourself with different compression algorithms, you could just skim over the different algorithms and play with the interactive algorithms. However, if you’re interested in understanding each algorithm at a deep level and creating your implementations, you’d likely want to take more time to read each page section-by-section.
- Compression Ratio Equation is from Wikipedia