WIM Compression

For the last couple of days I have been playing with WIM compression. I wanted to be able to directly read a given file from a WIM, even if that WIM is embedded in a DLL resource (specifically the activity.bmp file in the WIM in bootres.dll). I predicted this would greatly speed up the start up and running of my Windows 7 Boot Updater program since I use that file whenever the animation is set to "Default" and it was successful.

WIM files support 2 types of compression and also support not being compressed. The supported types of compression are LZX and XPRESS, however the official WIM document states:

This document does not define these compression formats. The purpose of this document is not to enable developers to encode or decode WIM-compressed data.

So it won't tell you that the formats are not what they seem! In implementing them I came across many curiosities which I have documented below. I hope this helps someone so they won't have to go through the same frustrations.

LZX

This is based on the LZX format used by Microsoft CAB files. That compression is documented on MSDN but there is a nicer PDF version with the same text/figures.

There are some differences from that LZX format however:

The window is fixed to 32KB (NUM_POSITION_SLOTS = 8)
There is no translation mode header: the translation mode is always on and has a size of 12000000
The uncompressed size entry for verbatim or aligned offset blocks is not 24 bits, it is:
- 1 bit if that bit has a value of 1, indicating the block is 32 KB (32768 bytes) when uncompressed
- 17 bits if the first bit is 0, and is the size of the block when uncompressed

For aligned offset blocks, the aligned offset tree precedes the main pre-tree

Each block is treated independent of every other block (there is no delta compression of the trees and translation offset resets in each block)

XPRESS

This one made me quite frustrated. There are numerous references to an XPRESS compression format that can be decompressed using DecompressWin2k3. They say that it is used for RPC and the hiber.sys file. Some even claim it is the method used in WIM files. It is not!.

However, after lots of searching, I finally found the compression format documented on MSDN. It is not called XPRESS, in fact it has no name. It is definitely distinct from the 'standard' XPRESS format. It is used for WIM files, the 'Distributed File System Replication Protocol', and SuperFetch files (http://blog.rewolf.pl/blog/?p=214).

The MSDN provides a complete pseudo-code implementation for decompression. It works as is (when I translated to C++, watching out for when variables were marked as 'inout'). The code they give is slow though, in fact it is slower than my LZX implementation.