Wednesday, January 26, 2011

Executable compression

Packing an executable to a smaller size is a common task, and the compression algorithms share a lot in common with good text compression systems. Most good executable compression algorithms have some method of interpreting, filtering or transforming the program instructions to a more compressible format. The most well-known compression algorithms are:
  • Run Length Encoding - where multiples of the same in-order data are represented in a compressed form. For example, the word "beekeeper", may be represented as "b(2x e)k(2x e)per", with an appropriate coding results in a reduction in size. This kind of compression works well on 8bit images and is found in image formats such as BMP.
  • Huffman compression - each code is represented with a number of bits proportional to the frequency of occurrence. For example, we could represent vowels with shorter bits, and consonants with longer bit sequences, or "e" could be represented with "1" and every other character with 0 plus the usual 8 bits then "beekeeper" would come to "b11k11p1r" in a total of 4x 9 bits and 5x 1 bit totalling to 41 bits rather than 72 bits.
  • Lempel-Ziv (LZ) - a dictionary of all previous data is kept and references are made back to the history of the data. For example, "beekeeper" could be represented as "be(-1,1)k(-3,2)p(-2,1),r" which translates to "be" plus the same character as before (-1) and one (1) byte of it, resulting in "bee", then "k", followed by the occurrence three characters before (-3) and (2) bytes of it, resulting in "beekee", etc.
  • Arithmetic coding is similar to huffman coding (and explained very concisely by Charles Bloom), in essence it also assigns bits according to the probability of occurrence (as a "floating point" number), and then re-scales/encodes the probabilities to fixed-point. Arturo ("Dario Phong") Campos has a concise explanation.
  • PAQ - this is generally considered the state-of-the-art compression algorithm. It uses conditional probabilistic models to achieve optimal compression.
There is plenty of source code around the web for these algorithms, but the Basic Compression Library provides routines for RLE (Run Length Encoding), Huffman, Rice, Lempel-Ziv (LZ77) and Shannon-Fano compression algorithms. And Matt Mahoney maintains the PAQ algorithms. There are also quite a few general executable compressors available including UPX and Mpress, both zip and 7zip can generate self-extracting executables. Microsoft CAB also performs very well (See trimmit for Mac). But the state-of-the-art in self-extracting executable compressors for small programs are Crinkler and kkrunchy for < 4kb and < 128kb programs respectively. (A good resource for various tools is available at in4k)

These compressors use a standard compression algorithm but also transform the input to a more compressible format. The most common transform algorithm is the Burros Wheeler Transform (BWT), this attempts to re-order data so that similar codes are close to each other - increasing the ability for the data to be compressed.

For executable compression patterns in the instructions stream can be leveraged to increase the compression ratio. For example, instructions often address similar areas of memory, simple delta-encoding of the address values can typically greatly increase the compression ratio.

Recently, both kkrunchy and Crinkler were described in detail. The author of kkrunchy Fabian "ryg" Giesen wrote a great blog post describing how kkrunchy works. Since kkrunchy works with larger file sizes some more intelligent tricks can be applied. Alexandre Mutel decomposed the Crinkler compressor in his blog post and posted some fabulous pictures of before-vs-after entropy of a compressed executable. Unfortunately, both of these programs are targeted at Windows PE files, and so don't work for other platforms (See here, here, here and here for tips on PE).

The usual approach to exe compression on linux/osx is just to do the modern-day version of com/cab dropping and compress your program with gzip and have a script extract and execute the program. This lacks a bit of flare, but works. Articles describing elf and mach-o are available by Brian Raiter and Amit Singh respectively, but unfortunately they are a little out of date.

A few tools have been written for linux and osx including the byte-optimized linker (x86/64) and obj stripping for linux, and iPakk and muncho for OSX (again, a bit dated). An older elf stripper is also around, and Timo Wiren gives a good overview too. Unfortunately, a similar page on OSX is no longer available online (maybe one day). All in all, I've still found your best bet on a platform other than Windows is just to follow Timo Wiren's first step of advice and do a simple dropper:
  1. Write your program, and compile it: g++ helloworld.cpp
    #include 
    
    int main() {
    printf("hello world!\n");
    return 0;
    }
    
  2. Optional: Strip it
    strip a.out
  3. Compress it:
    gzip a.out
  4. Create the unpack script (unpack.header):
    a=/tmp/I;tail -n+2 $0|zcat>$a;chmod +x $a;$a;rm $a; exit
    
  5. Combine the script and zip into a final executable:
    cat unpack.header a.out.gz > program
    chmod +x program
    
  6. Enjoy!

My a.out went from 8768 to 1153 bytes. Nice. (Using strip saves me 160 bytes on the original and 136 bytes on the final, but I'm sure you can save more by being clever about use of the gcc crt, gcc flags and throwing a few tools at this process).

1 comment:

Anonymous said...

Nice overview with a lot of very interesting links, awesome!