无忧启动论坛

 找回密码
 注册
搜索
系统gho:最纯净好用系统下载站投放广告、加入VIP会员,请联系 微信:wuyouceo
查看: 9311|回复: 26

[分享] grub4dos尾续批处理史记

[复制链接]
发表于 2015-1-25 10:29:39 | 显示全部楼层
本帖最后由 不点 于 2015-1-25 10:38 编辑

lzma file format

lzma 文件格式说明

1. File Format
  1. +-+-+-+-+-+-+-+-+-+-+-+-+-+==================+
  2. |         Header          |        LZMA Compressed Data   |
  3. +-+-+-+-+-+-+-+-+-+-+-+-+-+==================+
复制代码


lzma 文件由头部和压缩了的数据构成。头部是 13 个字节。

        The .lzma format file consist of 13-byte Header followed by
        the LZMA Compressed Data.

        Unlike the .gz, .bz2, and .xz formats, it is not possible to
        concatenate multiple .lzma files as is and expect the
        decompression tool to decode the resulting file as if it were
        a single .lzma file.

        For example, the command line tools from LZMA Utils and
        LZMA SDK silently ignore all the data after the first .lzma
        stream. In contrast, the command line tool from XZ Utils
        considers the .lzma file to be corrupt if there is data after
        the first .lzma stream.


1.1. Header
  1. +------------+----+----+----+----+--+--+--+--+--+--+--+--+
  2. | Properties |  Dictionary Size  |   Uncompressed Size   |
  3. +------------+----+----+----+----+--+--+--+--+--+--+--+--+
复制代码


头部的格式,第一字节是属性字节,它的最大值是 (4*5+4)*9+8=224=0xE0,如果超过这个值,那就不是 lzma 格式。

1.1.1. Properties

        The Properties field contains three properties. An abbreviation
        is given in parentheses, followed by the value range of the
        property. The field consists of

            1) the number of literal context bits (lc, [0, 8]);
            2) the number of literal position bits (lp, [0, 4]); and
            3) the number of position bits (pb, [0, 4]).

        The properties are encoded using the following formula:

            Properties = (pb * 5 + lp) * 9 + lc

        The following C code illustrates a straightforward way to
        decode the Properties field:

            uint8_t lc, lp, pb;
            uint8_t prop = get_lzma_properties();
            if (prop > (4 * 5 + 4) * 9 + 8)
                return LZMA_PROPERTIES_ERROR;

            pb = prop / (9 * 5);
            prop -= pb * 9 * 5;
            lp = prop / 9;
            lc = prop - lp * 9;

        XZ Utils has an additional requirement: lc + lp <= 4. Files
        which don't follow this requirement cannot be decompressed
        with XZ Utils. Usually this isn't a problem since the most
        common lc/lp/pb values are 3/0/2. It is the only lc/lp/pb
        combination that the files created by LZMA Utils can have,
        but LZMA Utils can decompress files with any lc/lp/pb.

紧接属性字节之后,是4字节的整数,表示本次压缩所采用的字典的长度。一般的 lzma 格式,其字典长度可以是任意值。但 grub4dos 不承认那些太过任意的字典长度。grub4dos 只承认字典长度为 2 的 n 次方的 lzma 头部,否则,如果不是 2 的 n 次方,那么 grub4dos 就不把它当作 lzma 格式来对待。

1.1.2. Dictionary Size

        Dictionary Size is stored as an unsigned 32-bit little endian
        integer. Any 32-bit value is possible, but for maximum
        portability, only sizes of 2^n and 2^n + 2^(n-1) should be
        used.

        LZMA Utils creates only files with dictionary size 2^n,
        16 <= n <= 25. LZMA Utils can decompress files with any
        dictionary size.

        XZ Utils creates and decompresses .lzma files only with
        dictionary sizes 2^n and 2^n + 2^(n-1). If some other
        dictionary size is specified when compressing, the value
        stored in the Dictionary Size field is a rounded up, but the
        specified value is still used in the actual compression code.

紧接着字典长度,就是文件未压缩的长度值,换句话说,就是解压后的长度值。这个长度值占用 8 字节,是 long long 类型的整数。
普通的 lzma 文件,其解压后的长度域可以是 0xFFFFFFFFFFFFFFFF (即负的 1),但 grub4dos 不承认这样的 lzma 格式。也就是说,为了让 grub4dos 承认它是 lzma 格式,其解压后的长度域不可以是负1。


1.1.3. Uncompressed Size

        Uncompressed Size is stored as unsigned 64-bit little endian
        integer. A special value of 0xFFFF_FFFF_FFFF_FFFF indicates
        that Uncompressed Size is unknown. End of Payload Marker (*)
        is used if and only if Uncompressed Size is unknown.

        XZ Utils rejects files whose Uncompressed Size field specifies
        a known size that is 256 GiB or more. This is to reject false
        positives when trying to guess if the input file is in the
        .lzma format. When Uncompressed Size is unknown, there is no
        limit for the uncompressed size of the file.

        (*) Some tools use the term End of Stream (EOS) marker
            instead of End of Payload Marker.

以上就全部解释了头部 13 个字节的意义。一个属性字节,四个字节的字典长度,八个字节的解压后长度。

接下来就是压缩数据了,没有什么可解释的。仅就判断 lzma 格式的合法性而言,grub4dos 不检查压缩数据的结构。待到真正开始读文件的时候,才去从压缩数据中抽取解压后的数据。

1.2. LZMA Compressed Data

        Detailed description of the format of this field is out of
        scope of this document.


2. References

        LZMA SDK - The original LZMA implementation
        http://7-zip.org/sdk.html

        7-Zip
        http://7-zip.org/

        LZMA Utils - LZMA adapted to POSIX-like systems
        http://tukaani.org/lzma/

        XZ Utils - The next generation of LZMA Utils
        http://tukaani.org/xz/

        The .xz file format - The successor of the .lzma format
        http://tukaani.org/xz/xz-file-format.txt



回复

使用道具 举报

您需要登录后才可以回帖 登录 | 注册

本版积分规则

小黑屋|手机版|Archiver|捐助支持|无忧启动 ( 闽ICP备05002490号-1 )

闽公网安备 35020302032614号

GMT+8, 2025-11-6 06:48

Powered by Discuz! X3.3

© 2001-2017 Comsenz Inc.

快速回复 返回顶部 返回列表