Table of ContentsClose
Coder for Life

Microsoft Compression Formats

While making a Xpress Huffman compressor (for the Windows 8 bootmgr) I did a lot of research on different Microsoft compression formats. I have summarized my findings in the table below. If you have anything to add to or correct in the table below please tell me!

I plan to release source code for LZNT1, LZX, Xpress (Lz), and Xpress Huffman compression and decompression. My goal is to have faster and/or better compressors than the Microsoft provided ones.

Enlarge Table

Algorithm Appearance Basis Usage MSDN Availability
MSZIP 1994 deflate[1]
  • CAB files[1]
  • COMPRESS.EXE
  • Dir. Repl. Serv. (LDAP/RPC/AD)[MS-DRSR]
Quantum 1994 LZ
  • CAB files
Not mentioned at all
LZNT1 1995 NT 3.51[2] LZSS
  • NTFS file compression[2][MS-FSCC]
  • Win 2000 hibernation file[3]
  • Active Directory[MS-ADTS]
  • File Replication Serv.[MS-FRS1]
  • Vista SuperFetch[4]
  • Vista and Win 7 bootmgr
LZX 1995[5] (1997) LZ77[5]
  • CAB files[1]
  • COMPRESS.EXE
  • CHM and LIT files
  • Xbox Live Avatars
  • WIM*[6] (good)
Delta
(LZXD)
2000 Win 2000 LZX
  • Windows Updates
  • MSP files
Xpress (LZ)
(Decompress2k3)
2001 XP[3] LZ77 &
DIRECT2
  • XP and newer hibernation file[3]
  • Dir. Repl. Serv. (LDAP/RPC/AD)[MS-DRSR]
  • Windows Update Services [MS-WUSP]
  • Win CE
Xpress Huffman ? LZ77 &
Huffman
  • Dist. File System Replication[MS-FRS2]
  • WIM[6] (fast)
  • Win 7 SuperFetch[4]
  • Win 8 bootmgr
RDC 2005 Ser 2003 R2 RSYNC
TLZ 2012? Win 8?[7] ?
  • Win 8 hibernation file?[7]
Not mentioned at all
  1. Microsoft Cabinet Format
  2. Optimizing NTFS
  3. Windows Hibernation File for Fun 'n' Profit
  4. Windows SuperFetch file format – partial specification
  5. Compression Resources - Glossary
  6. Windows Imaging File Format (WIM)
  7. R.I.P. Xpress - Welcome TLZ

WIM Compression

For the last couple of days I have been playing with WIM compression. I wanted to be able to directly read a given file from a WIM, even if that WIM is embedded in a DLL resource (specifically the activity.bmp file in the WIM in bootres.dll). I predicted this would greatly speed up the start up and running of my Windows 7 Boot Updater program since I use that file whenever the animation is set to "Default" and it was successful.

WIM files support 2 types of compression and also support not being compressed. The supported types of compression are LZX and XPRESS, however the official WIM document states:

This document does not define these compression formats. The purpose of this document is not to enable developers to encode or decode WIM-compressed data.

So it won't tell you that the formats are not what they seem! In implementing them I came across many curiosities which I have documented below. I hope this helps someone so they won't have to go through the same frustrations.

LZX

This is based on the LZX format used by Microsoft CAB files. That compression is documented on MSDN but there is a nicer PDF version with the same text/figures.

There are some differences from that LZX format however:

  • The window is fixed to 32KB (NUM_POSITION_SLOTS = 8)
  • There is no translation mode header: the translation mode is always on and has a size of 12000000
  • The uncompressed size entry for verbatim or aligned offset blocks is not 24 bits, it is:
    • 1 bit if that bit has a value of 1, indicating the block is 32 KB (32768 bytes) when uncompressed
    • 17 bits if the first bit is 0, and is the size of the block when uncompressed
  • For aligned offset blocks, the aligned offset tree precedes the main pre-tree

  • Each block is treated independent of every other block (there is no delta compression of the trees and translation offset resets in each block)

XPRESS

This one made me quite frustrated. There are numerous references to an XPRESS compression format that can be decompressed using DecompressWin2k3. They say that it is used for RPC and the hiber.sys file. Some even claim it is the method used in WIM files. It is not!.

However, after lots of searching, I finally found the compression format documented on MSDN. It is not called XPRESS, in fact it has no name. It is definitely distinct from the 'standard' XPRESS format. It is used for WIM files, the 'Distributed File System Replication Protocol', and SuperFetch files (http://blog.rewolf.pl/blog/?p=214).

The MSDN provides a complete pseudo-code implementation for decompression. It works as is (when I translated to C++, watching out for when variables were marked as 'inout'). The code they give is slow though, in fact it is slower than my LZX implementation.