|
本帖最后由 不点 于 2015-1-25 10:38 编辑
lzma file format
lzma 文件格式说明
1. File Format
- +-+-+-+-+-+-+-+-+-+-+-+-+-+==================+
- | Header | LZMA Compressed Data |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+==================+
复制代码
lzma 文件由头部和压缩了的数据构成。头部是 13 个字节。
The .lzma format file consist of 13-byte Header followed by
the LZMA Compressed Data.
Unlike the .gz, .bz2, and .xz formats, it is not possible to
concatenate multiple .lzma files as is and expect the
decompression tool to decode the resulting file as if it were
a single .lzma file.
For example, the command line tools from LZMA Utils and
LZMA SDK silently ignore all the data after the first .lzma
stream. In contrast, the command line tool from XZ Utils
considers the .lzma file to be corrupt if there is data after
the first .lzma stream.
1.1. Header
- +------------+----+----+----+----+--+--+--+--+--+--+--+--+
- | Properties | Dictionary Size | Uncompressed Size |
- +------------+----+----+----+----+--+--+--+--+--+--+--+--+
复制代码
头部的格式,第一字节是属性字节,它的最大值是 (4*5+4)*9+8=224=0xE0,如果超过这个值,那就不是 lzma 格式。
1.1.1. Properties
The Properties field contains three properties. An abbreviation
is given in parentheses, followed by the value range of the
property. The field consists of
1) the number of literal context bits (lc, [0, 8]);
2) the number of literal position bits (lp, [0, 4]); and
3) the number of position bits (pb, [0, 4]).
The properties are encoded using the following formula:
Properties = (pb * 5 + lp) * 9 + lc
The following C code illustrates a straightforward way to
decode the Properties field:
uint8_t lc, lp, pb;
uint8_t prop = get_lzma_properties();
if (prop > (4 * 5 + 4) * 9 + 8)
return LZMA_PROPERTIES_ERROR;
pb = prop / (9 * 5);
prop -= pb * 9 * 5;
lp = prop / 9;
lc = prop - lp * 9;
XZ Utils has an additional requirement: lc + lp <= 4. Files
which don't follow this requirement cannot be decompressed
with XZ Utils. Usually this isn't a problem since the most
common lc/lp/pb values are 3/0/2. It is the only lc/lp/pb
combination that the files created by LZMA Utils can have,
but LZMA Utils can decompress files with any lc/lp/pb.
紧接属性字节之后,是4字节的整数,表示本次压缩所采用的字典的长度。一般的 lzma 格式,其字典长度可以是任意值。但 grub4dos 不承认那些太过任意的字典长度。grub4dos 只承认字典长度为 2 的 n 次方的 lzma 头部,否则,如果不是 2 的 n 次方,那么 grub4dos 就不把它当作 lzma 格式来对待。
1.1.2. Dictionary Size
Dictionary Size is stored as an unsigned 32-bit little endian
integer. Any 32-bit value is possible, but for maximum
portability, only sizes of 2^n and 2^n + 2^(n-1) should be
used.
LZMA Utils creates only files with dictionary size 2^n,
16 <= n <= 25. LZMA Utils can decompress files with any
dictionary size.
XZ Utils creates and decompresses .lzma files only with
dictionary sizes 2^n and 2^n + 2^(n-1). If some other
dictionary size is specified when compressing, the value
stored in the Dictionary Size field is a rounded up, but the
specified value is still used in the actual compression code.
紧接着字典长度,就是文件未压缩的长度值,换句话说,就是解压后的长度值。这个长度值占用 8 字节,是 long long 类型的整数。
普通的 lzma 文件,其解压后的长度域可以是 0xFFFFFFFFFFFFFFFF (即负的 1),但 grub4dos 不承认这样的 lzma 格式。也就是说,为了让 grub4dos 承认它是 lzma 格式,其解压后的长度域不可以是负1。
1.1.3. Uncompressed Size
Uncompressed Size is stored as unsigned 64-bit little endian
integer. A special value of 0xFFFF_FFFF_FFFF_FFFF indicates
that Uncompressed Size is unknown. End of Payload Marker (*)
is used if and only if Uncompressed Size is unknown.
XZ Utils rejects files whose Uncompressed Size field specifies
a known size that is 256 GiB or more. This is to reject false
positives when trying to guess if the input file is in the
.lzma format. When Uncompressed Size is unknown, there is no
limit for the uncompressed size of the file.
(*) Some tools use the term End of Stream (EOS) marker
instead of End of Payload Marker.
以上就全部解释了头部 13 个字节的意义。一个属性字节,四个字节的字典长度,八个字节的解压后长度。
接下来就是压缩数据了,没有什么可解释的。仅就判断 lzma 格式的合法性而言,grub4dos 不检查压缩数据的结构。待到真正开始读文件的时候,才去从压缩数据中抽取解压后的数据。
1.2. LZMA Compressed Data
Detailed description of the format of this field is out of
scope of this document.
2. References
LZMA SDK - The original LZMA implementation
http://7-zip.org/sdk.html
7-Zip
http://7-zip.org/
LZMA Utils - LZMA adapted to POSIX-like systems
http://tukaani.org/lzma/
XZ Utils - The next generation of LZMA Utils
http://tukaani.org/xz/
The .xz file format - The successor of the .lzma format
http://tukaani.org/xz/xz-file-format.txt
|
|