For the last couple of days I have been playing with WIM compression. I wanted to be able to directly read a given file from a WIM, even if that WIM is embedded in a DLL resource (specifically the activity.bmp file in the WIM in bootres.dll). I predicted this would greatly speed up the start up and running of my Windows 7 Boot Updater program since I use that file whenever the animation is set to "Default", see an entry on that for more information about its success.
When you compress your data, you are essentially reducing the space of bits on your hard drive. It helps conserve hard disk space and transition bandwidth. It can take many hours, so while your computer is busy making more space, you can easily log in to your laptop, Play at Foxy Bingo or download a new film to watch and pass the time. You may have done this before but with home video cameras. The data must be compressed before it can be watched via cassette or DVD.
WIM files support 2 types of compression and also support not being compressed. The supported types of compression are LZX and XPRESS, however the official WIM document states:
This document does not define these compression formats. The purpose of this document is not to enable developers to encode or decode WIM-compressed data.
So it won't tell you that the formats are not what they seem! In implementing them I came across many curiosities which I have documented below. I hope this helps someone so they won't have to go through the same frustrations.
LZX
This is based on the LZX format used by Microsoft CAB files. That compression is documented on MSDN but there is a nicer PDF version with the same text/figures.
There are some differences from that LZX format however:
- The window is fixed to 32KB (NUM_POSITION_SLOTS = 8)
- There is no translation mode header: the translation mode is always on and has a size of 12000000
- The uncompressed size entry for verbatim or aligned offset blocks is not 24 bits, it is:
- 1 bit if that bit has a value of 1, indicating the block is 32 KB (32768 bytes) when uncompressed
- 17 bits if the first bit is 0, and is the size of the block when uncompressed
- For aligned offset blocks, the aligned offset tree precedes the main pre-tree
- Each block is treated independent of every other block (there is no delta compression of the trees and translation offset resets in each block)
XPRESS
This one made me quite frustrated. There are numerous references to an XPRESS compression format that can be decompressed using DecompressWin2k3. They say that it is used for RPC and the hiber.sys file. Some even claim it is the method used in WIM files. It is not!.
However, after lots of searching, I finally found the compression format documented on MSDN. It is not called XPRESS, in fact it has no name. It is definitely distinct from the 'standard' XPRESS format. It is used for WIM files, the 'Distributed File System Replication Protocol', and SuperFetch files (http://blog.rewolf.pl/blog/?p=214).
The MSDN provides a complete pseudo-code implementation for decompression. It works as is (when I translated to C++, watching out for when variables were marked as 'inout'). The code they give is slow though, in fact it is slower than my LZX implementation.



Great information! I tried to put together some code a couple of years back to do this and finally broke down and just used the wimgapi library. Thank you for sharing your success!
@Larry
Thanks! I still use the wimgapi library for most things, such as for compression, however for reading a single file from a single embedded WIM file, it is incredibly faster (~98%) to use a custom solution.
Which one is faster in decompression?
@eadmaster
With my (old) implementation LZX decompression was faster, however I bet that it was just my code and that WIMGAPI does XPRESS decompression faster. I am currently working on a new revised set of code that I will publish. In the mean time I am publishing an entry on Microsoft compression formats.