¶And I thought my implementation of Deflate was bad
When I added PNG support to 1.7.0, I wrote my own routines to handle the Deflate compression algorithm. The Deflate algorithm is also the underlying compression algorithm behind the well-known 'zip' format, and is surprisingly complex, with a sliding window matching algorithm combined with three layers of Huffman trees. It was quite an educational experience to write a fast sliding window compressor.
Someone posted an interesting bug in the Visual Studio public bug database about the new zip file support enlarging files, so I decided to try .NET Framework's Deflate compressor:
using System; using System.IO; using System.IO.Compression; namespace VCZipTest { class Program { static void Main(string[] args) { using (FileStream src = new FileStream(args[0], FileMode.Open)) { using (FileStream dst = new FileStream(args[1], FileMode.Create)) { using (GZipStream def = new GZipStream(dst, CompressionMode.Compress, true)) { byte[] buf = new byte[4096]; for (; ; ) { int act = src.Read(buf, 0, 4096); if (act <= 0) break; def.Write(buf, 0, act); } } System.Console.WriteLine("{0} ({1} bytes) -> {2} ({3} bytes)", args[0],
src.Length, args[1], dst.Length); } } } } } D:projwinVCZipTestbinDebug>vcziptest f:shuffle.mp3 f:test.bin.gz f:shuffle.mp3 (1439439 bytes) -> f:test.bin.gz (2114778 bytes)
Yes, the .NET Framework implementation actually enlarged the file by 47%, and yes, gunzip was able to "decompress" the 2.1MB file correctly. I ran the file through VirtualDub's zip decompressor and found that the literal/length Huffman tree in the stream was pretty badly misbalanced, with most of the literal codes allocated 14 bits. Deflate should never enlarge a stream by more than a tiny fraction since it allows individual blocks to be stored.
In contrast, even "gzip -1" was able to compress the file from 1,439,439 bytes to 1,416,488 bytes, and gzip -9 got it down to 1,406,973 bytes.