std/deflate/README.md - external/github.com/google/wuffs - Git at Google

 # Deflate

 Deflate is a general purpose compression format, using Huffman codes and
 Lempel-Ziv style length-distance back-references. It is specified in [RFC
 1951](https://www.ietf.org/rfc/rfc1951.txt).

 Gzip ([RFC 1952](https://www.ietf.org/rfc/rfc1952.txt)) and zlib ([RFC
 1950](https://www.ietf.org/rfc/rfc1950.txt)) are two thin wrappers (with
 similar purpose, but different wire formats) around the raw deflate encoding.
 Gzip (the file format) is used by the `gzip` and `tar` command line tools and
 by the HTTP network protocol. Zlib is used by the ELF executable and PNG image
 file formats.

 [Zip](https://support.pkware.com/display/PKZIP/APPNOTE) (also known as PKZIP)
 is another wrapper, similar to combining gzip with tar, that can compress
 multiple files into a single archive. Zip is widely used by the ECMA Office
 Open XML format, the OASIS Open Document Format for Office Applications and the
 Java JAR format.

 Wrangling those formats that build on deflate (gzip, zip and zlib) is not
 provided by this package. For zlib, look at the `std/zlib` package instead. The
 other formats are TODO.

 For example, look at `test/data/romeo.txt*`. First, the uncompressed text:

     $ xxd test/data/romeo.txt
     00000000: 526f 6d65 6f20 616e 6420 4a75 6c69 6574  Romeo and Juliet
     00000010: 0a45 7863 6572 7074 2066 726f 6d20 4163  .Excerpt from Ac
     etc
     00000390: 740a 536f 2073 7475 6d62 6c65 7374 206f  t.So stumblest o
     000003a0: 6e20 6d79 2063 6f75 6e73 656c 3f0a       n my counsel?.

 The raw deflate encoding:

     $ xxd test/data/romeo.txt.deflate
     00000000: 4d53 c16e db30 0cbd f32b d853 2e46 0e3d  MS.n.0...+.S.F.=
     00000010: 2e87 20ed 0234 c5ba 0049 861e 861d 149b  .. ..4...I......
     etc
     00000200: 7d13 8ffd b9a3 24bb 68f4 eb30 7a59 610d  }.....$.h..0zYa.
     00000210: 7f01                                     ..

 The gzip format wraps a variable length header (in this case, 20 bytes) and 8
 byte footer around the raw deflate data. The header contains the NUL-terminated
 C string name of the original, uncompressed file, "romeo.txt", amongst other
 data. The footer contains a 4 byte CRC32 checksum and a 4 byte length of the
 uncompressed file (0x3ae = 942 bytes).

     $ xxd test/data/romeo.txt.gz
     00000000: 1f8b 0808 26d8 5d59 0003 726f 6d65 6f2e  ....&.]Y..romeo.
     00000010: 7478 7400 4d53 c16e db30 0cbd f32b d853  txt.MS.n.0...+.S
     etc
     00000210: de5d 2c67 7d13 8ffd b9a3 24bb 68f4 eb30  .],g}.....$.h..0
     00000220: 7a59 610d 7f01 ef07 e5ab ae03 0000       zYa...........

 The zlib format wraps a 2 byte header and 4 byte footer around the raw deflate
 data. The footer contains a 4 byte Adler32 checksum. TODO: move this to
 std/zlib/README.md.

     $ xxd test/data/romeo.txt.zlib
     00000000: 789c 4d53 c16e db30 0cbd f32b d853 2e46  x.MS.n.0...+.S.F
     00000010: 0e3d 2e87 20ed 0234 c5ba 0049 861e 861d  .=.. ..4...I....
     etc
     00000200: 2c67 7d13 8ffd b9a3 24bb 68f4 eb30 7a59  ,g}.....$.h..0zY
     00000210: 610d 7f01 57bb 3ede                      a...W.>.


 # Wire Format Worked Example

 Consider `test/data/romeo.txt.deflate`. The relevant spec is RFC 1951.

     offset  xoffset ASCII   hex     binary
     000000  0x0000  M       0x4D    0b_0100_1101
     000001  0x0001  S       0x53    0b_0101_0011
     000002  0x0002  .       0xC1    0b_1100_0001
     000003  0x0003  n       0x6E    0b_0110_1110
     000004  0x0004  .       0xDB    0b_1101_1011
     000005  0x0005  0       0x30    0b_0011_0000
     000006  0x0006  .       0x0C    0b_0000_1100
     000007  0x0007  .       0xBD    0b_1011_1101
     000008  0x0008  .       0xF3    0b_1111_0011
     000009  0x0009  +       0x2B    0b_0010_1011
     000010  0x000A  .       0xD8    0b_1101_1000
     000011  0x000B  S       0x53    0b_0101_0011
     000012  0x000C  .       0x2E    0b_0010_1110
     000013  0x000D  F       0x46    0b_0100_0110
     000014  0x000E  .       0x0E    0b_0000_1110
     000015  0x000F  =       0x3D    0b_0011_1101
     000016  0x0010  .       0x2E    0b_0010_1110
     000017  0x0011  .       0x87    0b_1000_0111
     000018  0x0012          0x20    0b_0010_0000
     000019  0x0013  .       0xED    0b_1110_1101
     etc
     000522  0x020A  .       0xEB    0b_1110_1011
     000523  0x020B  0       0x30    0b_0011_0000
     000524  0x020C  z       0x7A    0b_0111_1010
     000525  0x020D  Y       0x59    0b_0101_1001
     000526  0x020E  a       0x61    0b_0110_0001
     000527  0x020F  .       0x0D    0b_0000_1101
     000528  0x0210  .       0x7F    0b_0111_1111
     000529  0x0211  .       0x01    0b_0000_0001

 Starting at the top:

     offset  xoffset ASCII   hex     binary
     000000  0x0000  M       0x4D    0b_0100_1101

 As per the RFC section 3.2.3. Details of block format:

   - BFINAL is the first (LSB) bit, 0b1. This block is the final block.
   - BTYPE is the next two bits in natural (MSB to LSB) order, 0b10, which means
     a block that's compressed with dynamic Huffman codes.

 There are 3 block types: uncompressed, compressed with fixed Huffman codes and
 compressed with dynamic Huffman codes. The first type is straightforward:
 containing a uint16 length `N` (and its complement, for error detection) and
 then `N` literal bytes. The second type is the same as the third type but with
 built-in `lcode` and `dcode` tables (see below). The third type is the
 interesting part of the deflate format, and its deconstruction continues below.


 ## Dynamic Huffman Blocks

 The bit stream is now at:

     offset  xoffset ASCII   hex     binary
     000000  0x0000  M       0x4D    0b_0100_1...
     000001  0x0001  S       0x53    0b_0101_0011
     000002  0x0002  .       0xC1    0b_1100_0001

 As per the RFC section 3.2.7. Compression with dynamic Huffman codes, three
 variables (5, 5 and 4 bits) are read in the same natural (MSB to LSB) order:

   - HLIT  = 0b01001 =  9
   - HDIST = 0b10011 = 19
   - HCLEN =  0b1010 = 10

 Adding the implicit biases give:

     nlit  =  9 + 257 = 266
     ndist = 19 +   1 =  20
     nclen = 10 +   4 =  14


 ## Code Length Code Lengths

 The bit stream is now at:

     offset  xoffset ASCII   hex     binary
     000002  0x0002  .       0xC1    0b_1100_000.
     000003  0x0003  n       0x6E    0b_0110_1110
     000004  0x0004  .       0xDB    0b_1101_1011
     000005  0x0005  0       0x30    0b_0011_0000
     000006  0x0006  .       0x0C    0b_0000_1100
     000007  0x0007  .       0xBD    0b_1011_1101

 There are 19 possible code length code lengths (this is not a typo), but the
 wire format order is shuffled (in the idiosyncratic order: 16, 17, 18, 0, 8,
 ..., 15) so that later elements are more likely to be zero (and hence
 compressible). There are `nclen` (in this case, 14) explicit code length code
 lengths, 3 bits each.

     clcode_lengths[16] = 0b000 = 0
     clcode_lengths[17] = 0b100 = 4
     clcode_lengths[18] = 0b101 = 5
     clcode_lengths[ 0] = 0b011 = 3
     clcode_lengths[ 8] = 0b011 = 3
     clcode_lengths[ 7] = 0b011 = 3
     clcode_lengths[ 9] = 0b011 = 3
     clcode_lengths[ 6] = 0b011 = 3
     clcode_lengths[10] = 0b000 = 0
     clcode_lengths[ 5] = 0b011 = 3
     clcode_lengths[11] = 0b000 = 0
     clcode_lengths[ 4] = 0b011 = 3
     clcode_lengths[12] = 0b000 = 0
     clcode_lengths[ 3] = 0b101 = 5

 The remaining (19 - `nclen`) = 5 entries are implicitly zero:

     clcode_lengths[13] = 0
     clcode_lengths[ 2] = 0
     clcode_lengths[14] = 0
     clcode_lengths[ 1] = 0
     clcode_lengths[15] = 0

 Undoing the shuffle gives:

     clcode_lengths[ 0] = 3
     clcode_lengths[ 1] = 0
     clcode_lengths[ 2] = 0
     clcode_lengths[ 3] = 5
     clcode_lengths[ 4] = 3
     clcode_lengths[ 5] = 3
     clcode_lengths[ 6] = 3
     clcode_lengths[ 7] = 3
     clcode_lengths[ 8] = 3
     clcode_lengths[ 9] = 3
     clcode_lengths[10] = 0
     clcode_lengths[11] = 0
     clcode_lengths[12] = 0
     clcode_lengths[13] = 0
     clcode_lengths[14] = 0
     clcode_lengths[15] = 0
     clcode_lengths[16] = 0
     clcode_lengths[17] = 4
     clcode_lengths[18] = 5

 For `clcode`s, there are 7 + 1 + 2 = 10 non-zero entries: 7 3-bit codes, 1
 4-bit code and 2 5-bit codes. The canonical Huffman encoding's map from bits to
 `clcode`s is:

     bits        clcode
     0b000        0
     0b001        4
     0b010        5
     0b011        6
     0b100        7
     0b101        8
     0b110        9
     0b1110      17
     0b11110      3
     0b11111     18

 Call that Huffman table `H-CL`.


 ## Lcodes and Dcodes

 The bit stream is now at:

     offset  xoffset ASCII   hex     binary
     000007  0x0007  .       0xBD    0b_1011_1...
     000008  0x0008  .       0xF3    0b_1111_0011
     000009  0x0009  +       0x2B    0b_0010_1011
     000010  0x000A  .       0xD8    0b_1101_1000
     000011  0x000B  S       0x53    0b_0101_0011
     000012  0x000C  .       0x2E    0b_0010_1110
     000013  0x000D  F       0x46    0b_0100_0110

 When decoding via a Huffman table, bits are read in LSB to MSB order,
 right-to-left in this debugging output. Extra bits, if any, are then read in
 the other, natural order (MSB to LSB). Decoding via `H-CL` gives:

     bits        clcode
     0b1110      17, 3 extra bits = 0b111 = 7
     0b001        4
     0b11111     18, 7 extra bits = 0b0001010 = 10
     0b001        4
     0b101        8
     0b1110      17, 3 extra bits = 0b010 = 2
     0b100        7
     0b1110      17, 3 extra bits = 0b001 = 1
     0b011        6
     0b000        0

 Still in the RFC section 3.2.7., this means:

     lcode_lengths has 3 + 7 = 10 consecutive zeroes
     lcode_lengths[ 10] = 4
     lcode_lengths has 11 + 10 = 21 consecutive zeroes
     lcode_lengths[ 32] = 4
     lcode_lengths[ 33] = 8
     lcode_lengths has 3 + 2 = 5 consecutive zeroes
     lcode_lengths[ 39] = 7
     lcode_lengths has 3 + 1 = 4 consecutive zeroes
     lcode_lengths[ 44] = 6
     lcode_lengths[ 45] = 0

 After decoding the first 10 code lengths, the bit stream is now at:

     offset  xoffset ASCII   hex     binary
     000013  0x000D  F       0x46    0b_01.._....

 Continuing to read all `nlit` (266) `lcode` lengths and `ndist` (20) `dcode`
 lengths (zeroes are not shown here):

     lcode_lengths[ 10] = 4
     lcode_lengths[ 32] = 4
     lcode_lengths[ 33] = 8
     lcode_lengths[ 39] = 7
     lcode_lengths[ 44] = 6
     lcode_lengths[ 46] = 7
     lcode_lengths[ 50] = 8
     lcode_lengths[ 58] = 9
     lcode_lengths[ 59] = 7
     lcode_lengths[ 63] = 7
     etc
     lcode_lengths[264] = 9
     lcode_lengths[265] = 9

 and

     dcode_lengths[ 5] = 6
     dcode_lengths[ 6] = 5
     dcode_lengths[ 8] = 4
     dcode_lengths[ 9] = 5
     dcode_lengths[10] = 3
     etc
     dcode_lengths[18] = 3
     dcode_lengths[19] = 6

 The `H-CL` table is no longer used from here onwards.

 For `lcode`s, there are 6 + 9 + 10 + 16 + 10 + 12 = 63 non-zero entries: 6
 4-bit codes, 9 5-bit codes, 10 6-bit codes, 16 7-bit codes, 10 8-bit codes and
 12 9-bit codes. The canonical Huffman encoding's map from bits to `lcode`
 values is:

     bits        lcode
     0b0000       10
     0b0001       32
     0b0010      101
     0b0011      116
     0b0100      257
     0b0101      259
     0b01100      97
     0b01101     104
     0b01110     105
     0b01111     110
     0b10000     111
     0b10001     114
     0b10010     115
     0b10011     258
     0b10100     260
     0b101010     44
     0b101011     99
     0b101100    100
     0b101101    102
     0b101110    108
     0b101111    109
     0b110000    112
     0b110001    117
     0b110010    119
     0b110011    121
     0b1101000    39
     0b1101001    46
     0b1101010    59
     0b1101011    63
     0b1101100    65
     0b1101101    69
     0b1101110    73
     0b1101111    79
     0b1110000    82
     0b1110001    83
     0b1110010    84
     0b1110011    98
     0b1110100   103
     0b1110101   261
     0b1110110   262
     0b1110111   263
     0b11110000   33
     0b11110001   50
     0b11110010   66
     0b11110011   67
     0b11110100   72
     0b11110101   74
     0b11110110   77
     0b11110111   87
     0b11111000  107
     0b11111001  118
     0b111110100  58
     0b111110101  68
     0b111110110  76
     0b111110111  78
     0b111111000  85
     0b111111001  91
     0b111111010  93
     0b111111011 120
     0b111111100 122
     0b111111101 256
     0b111111110 264
     0b111111111 265

 Call that Huffman table `H-L`.

 For `dcode`s, there are 5 + 4 + 3 + 2 = 14 non-zero entries: 5 3-bit codes, 4
 4-bit codes, 3 5-bit codes and 2 6-bit codes. The canonical Huffman encoding's
 map from bits to `dcode` values is:

     bits        dcode
     0b000       10
     0b001       14
     0b010       16
     0b011       17
     0b100       18
     0b1010       8
     0b1011      12
     0b1100      13
     0b1101      15
     0b11100      6
     0b11101      9
     0b11110     11
     0b111110     5
     0b111111    19

 Call that Huffman table `H-D`.


 ## Decoding Literals

 The bit stream is now at:

     offset  xoffset ASCII   hex     binary
     000052  0x0034  .       0xE7    0b_1..._....
     000053  0x0035  C       0x43    0b_0100_0011
     000054  0x0036  .       0xE8    0b_1110_1000
     000055  0x0037  )       0x29    0b_0010_1001
     000056  0x0038  .       0xA0    0b_1010_0000
     000057  0x0039  .       0xF1    0b_1111_0001

 Decoding from that bit stream via `H-L` gives some literal bytes (where the
 `lcode` value is < 256):

     bits        lcode
     0b1110000    82 (ASCII 'R')
     0b10000     111 (ASCII 'o')
     0b101111    109 (ASCII 'm')
     0b0010      101 (ASCII 'e')
     0b10000     111 (ASCII 'o')
     0b0001       32 (ASCII ' ')
     0b01100      97 (ASCII 'a')
     0b01111     110 (ASCII 'n')


 ## Decoding Back-References

 This continues, up until we get to the "O Romeo, Romeo! wherefore art thou
 Romeo?" line.

 The bit stream is now at:

     offset  xoffset ASCII   hex     binary
     000089  0x0059  .       0xC1    0b_11.._....
     000090  0x005A  .       0x1E    0b_0001_1110
     000091  0x005B  .       0x0F    0b_0000_1111
     000092  0x005C  .       0xF9    0b_1111_1001

 Decoding from that bit stream via `H-L` gives:

     bits        lcode
     0b1101111    79 (ASCII 'O')
     0b0001       32 (ASCII ' ')
     0b1110000    82 (ASCII 'R')
     0b10011     258

 This 258 is our first non-literal `lcode`, the start of a length-distance
 back-reference. We first need to decode the length. The RFC section 3.2.5.
 Compressed blocks (length and distance codes) says that an `lcode` of 258 means
 that length=4 with no extra bits.

 The bit stream is now at:

     offset  xoffset ASCII   hex     binary
     000092  0x005C  .       0xF9    0b_111._....
     000093  0x005D  Y       0x59    0b_0101_1001

 We next decode via `H-D` instead of `H-L`.

     bits        dcode
     0b11110     11

 Again, from section 3.2.5., a `dcode` of 11 means a distance in the range [49,
 64], 49 plus 4 extra bits. The next 4 bits are 0b0110=6, so the distance is 55.
 We therefore copy length=4 bytes from distance=55 bytes ago: "omeo".

 The bit stream is now at:

     offset  xoffset ASCII   hex     binary
     000093  0x005D  Y       0x59    0b_01.._....
     000094  0x005E  U       0x55    0b_0101_0101
     000095  0x005F  >       0x3E    0b_0011_1110

 We go back to decoding via `H-L`, which gives

     bits        lcode
     0b101010     44 (ASCII ',')
     0b10100     260

 This is another non-literal. Section 3.2.5. says that an `lcode` of 260 means
 length=6 with no extra bits.

 The bit stream is now at:

     offset  xoffset ASCII   hex     binary
     000095  0x005F  >       0x3E    0b_0011_111.

 Decode with `H-D`.

     bits        dcode
     0b111110     5

 Again, from section 3.2.5., a `dcode` of 5 means a distance in the range [7,
 8], 7 plus 1 extra bit. The next 1 bits are 0b0=0, so the distance is 7. We
 therefore copy length=6 bytes from distance=7 bytes ago: " Romeo".

 The bit stream is now at:

     offset  xoffset ASCII   hex     binary
     000096  0x0060  .       0x0F    0b_0000_1111

 We go back to decoding via `H-L`, which gives

     bits        lcode
     0b11110000   33 (ASCII '!')

 And on we go.


 ## End of Block

 The block finishes with the bit stream at:

     offset  xoffset ASCII   hex     binary
     000522  0x020A  .       0xEB    0b_1110_101.
     000523  0x020B  0       0x30    0b_0011_0000
     000524  0x020C  z       0x7A    0b_0111_1010
     000525  0x020D  Y       0x59    0b_0101_1001
     000526  0x020E  a       0x61    0b_0110_0001
     000527  0x020F  .       0x0D    0b_0000_1101
     000528  0x0210  .       0x7F    0b_0111_1111
     000529  0x0211  .       0x01    0b_0000_0001

 The decoding is:

     bits        lcode
     0b101011     99 (ASCII 'c')
     0b10000     111 (ASCII 'o')
     0b110001    117 (ASCII 'u')
     0b01111     110 (ASCII 'n')
     0b0100      257 (length=3 + 0_extra_bits...)

     bits        dcode
     0b1101      15  (distance=193 + 6_extra_bits, 0b000010 in MSB to LSB order;
                      length=3, distance=195: copy "sel")

     bits        lcode
     0b1101011    63 (ASCII '?')
     0b0000       10 (ASCII '\n')
     0b111111101 256 (end of block)

 In other words, the block ends with "counsel?\n".


 # More Wire Format Examples

 See `test/data/artificial/deflate-*.commentary.txt`
	# Deflate

	Deflate is a general purpose compression format, using Huffman codes and
	Lempel-Ziv style length-distance back-references. It is specified in [RFC
	1951](https://www.ietf.org/rfc/rfc1951.txt).

	Gzip ([RFC 1952](https://www.ietf.org/rfc/rfc1952.txt)) and zlib ([RFC
	1950](https://www.ietf.org/rfc/rfc1950.txt)) are two thin wrappers (with
	similar purpose, but different wire formats) around the raw deflate encoding.
	Gzip (the file format) is used by the `gzip` and `tar` command line tools and
	by the HTTP network protocol. Zlib is used by the ELF executable and PNG image
	file formats.

	[Zip](https://support.pkware.com/display/PKZIP/APPNOTE) (also known as PKZIP)
	is another wrapper, similar to combining gzip with tar, that can compress
	multiple files into a single archive. Zip is widely used by the ECMA Office
	Open XML format, the OASIS Open Document Format for Office Applications and the
	Java JAR format.

	Wrangling those formats that build on deflate (gzip, zip and zlib) is not
	provided by this package. For zlib, look at the `std/zlib` package instead. The
	other formats are TODO.

	For example, look at `test/data/romeo.txt*`. First, the uncompressed text:

	$ xxd test/data/romeo.txt
	00000000: 526f 6d65 6f20 616e 6420 4a75 6c69 6574 Romeo and Juliet
	00000010: 0a45 7863 6572 7074 2066 726f 6d20 4163 .Excerpt from Ac
	etc
	00000390: 740a 536f 2073 7475 6d62 6c65 7374 206f t.So stumblest o
	000003a0: 6e20 6d79 2063 6f75 6e73 656c 3f0a n my counsel?.

	The raw deflate encoding:

	$ xxd test/data/romeo.txt.deflate
	00000000: 4d53 c16e db30 0cbd f32b d853 2e46 0e3d MS.n.0...+.S.F.=
	00000010: 2e87 20ed 0234 c5ba 0049 861e 861d 149b .. ..4...I......
	etc
	00000200: 7d13 8ffd b9a3 24bb 68f4 eb30 7a59 610d }.....$.h..0zYa.
	00000210: 7f01 ..

	The gzip format wraps a variable length header (in this case, 20 bytes) and 8
	byte footer around the raw deflate data. The header contains the NUL-terminated
	C string name of the original, uncompressed file, "romeo.txt", amongst other
	data. The footer contains a 4 byte CRC32 checksum and a 4 byte length of the
	uncompressed file (0x3ae = 942 bytes).

	$ xxd test/data/romeo.txt.gz
	00000000: 1f8b 0808 26d8 5d59 0003 726f 6d65 6f2e ....&.]Y..romeo.
	00000010: 7478 7400 4d53 c16e db30 0cbd f32b d853 txt.MS.n.0...+.S
	etc
	00000210: de5d 2c67 7d13 8ffd b9a3 24bb 68f4 eb30 .],g}.....$.h..0
	00000220: 7a59 610d 7f01 ef07 e5ab ae03 0000 zYa...........

	The zlib format wraps a 2 byte header and 4 byte footer around the raw deflate
	data. The footer contains a 4 byte Adler32 checksum. TODO: move this to
	std/zlib/README.md.

	$ xxd test/data/romeo.txt.zlib
	00000000: 789c 4d53 c16e db30 0cbd f32b d853 2e46 x.MS.n.0...+.S.F
	00000010: 0e3d 2e87 20ed 0234 c5ba 0049 861e 861d .=.. ..4...I....
	etc
	00000200: 2c67 7d13 8ffd b9a3 24bb 68f4 eb30 7a59 ,g}.....$.h..0zY
	00000210: 610d 7f01 57bb 3ede a...W.>.


	# Wire Format Worked Example

	Consider `test/data/romeo.txt.deflate`. The relevant spec is RFC 1951.

	offset xoffset ASCII hex binary
	000000 0x0000 M 0x4D 0b_0100_1101
	000001 0x0001 S 0x53 0b_0101_0011
	000002 0x0002 . 0xC1 0b_1100_0001
	000003 0x0003 n 0x6E 0b_0110_1110
	000004 0x0004 . 0xDB 0b_1101_1011
	000005 0x0005 0 0x30 0b_0011_0000
	000006 0x0006 . 0x0C 0b_0000_1100
	000007 0x0007 . 0xBD 0b_1011_1101
	000008 0x0008 . 0xF3 0b_1111_0011
	000009 0x0009 + 0x2B 0b_0010_1011
	000010 0x000A . 0xD8 0b_1101_1000
	000011 0x000B S 0x53 0b_0101_0011
	000012 0x000C . 0x2E 0b_0010_1110
	000013 0x000D F 0x46 0b_0100_0110
	000014 0x000E . 0x0E 0b_0000_1110
	000015 0x000F = 0x3D 0b_0011_1101
	000016 0x0010 . 0x2E 0b_0010_1110
	000017 0x0011 . 0x87 0b_1000_0111
	000018 0x0012 0x20 0b_0010_0000
	000019 0x0013 . 0xED 0b_1110_1101
	etc
	000522 0x020A . 0xEB 0b_1110_1011
	000523 0x020B 0 0x30 0b_0011_0000
	000524 0x020C z 0x7A 0b_0111_1010
	000525 0x020D Y 0x59 0b_0101_1001
	000526 0x020E a 0x61 0b_0110_0001
	000527 0x020F . 0x0D 0b_0000_1101
	000528 0x0210 . 0x7F 0b_0111_1111
	000529 0x0211 . 0x01 0b_0000_0001

	Starting at the top:

	offset xoffset ASCII hex binary
	000000 0x0000 M 0x4D 0b_0100_1101

	As per the RFC section 3.2.3. Details of block format:

	- BFINAL is the first (LSB) bit, 0b1. This block is the final block.
	- BTYPE is the next two bits in natural (MSB to LSB) order, 0b10, which means
	a block that's compressed with dynamic Huffman codes.

	There are 3 block types: uncompressed, compressed with fixed Huffman codes and
	compressed with dynamic Huffman codes. The first type is straightforward:
	containing a uint16 length `N` (and its complement, for error detection) and
	then `N` literal bytes. The second type is the same as the third type but with
	built-in `lcode` and `dcode` tables (see below). The third type is the
	interesting part of the deflate format, and its deconstruction continues below.


	## Dynamic Huffman Blocks

	The bit stream is now at:

	offset xoffset ASCII hex binary
	000000 0x0000 M 0x4D 0b_0100_1...
	000001 0x0001 S 0x53 0b_0101_0011
	000002 0x0002 . 0xC1 0b_1100_0001

	As per the RFC section 3.2.7. Compression with dynamic Huffman codes, three
	variables (5, 5 and 4 bits) are read in the same natural (MSB to LSB) order:

	- HLIT = 0b01001 = 9
	- HDIST = 0b10011 = 19
	- HCLEN = 0b1010 = 10

	Adding the implicit biases give:

	nlit = 9 + 257 = 266
	ndist = 19 + 1 = 20
	nclen = 10 + 4 = 14


	## Code Length Code Lengths

	The bit stream is now at:

	offset xoffset ASCII hex binary
	000002 0x0002 . 0xC1 0b_1100_000.
	000003 0x0003 n 0x6E 0b_0110_1110
	000004 0x0004 . 0xDB 0b_1101_1011
	000005 0x0005 0 0x30 0b_0011_0000
	000006 0x0006 . 0x0C 0b_0000_1100
	000007 0x0007 . 0xBD 0b_1011_1101

	There are 19 possible code length code lengths (this is not a typo), but the
	wire format order is shuffled (in the idiosyncratic order: 16, 17, 18, 0, 8,
	..., 15) so that later elements are more likely to be zero (and hence
	compressible). There are `nclen` (in this case, 14) explicit code length code
	lengths, 3 bits each.

	clcode_lengths[16] = 0b000 = 0
	clcode_lengths[17] = 0b100 = 4
	clcode_lengths[18] = 0b101 = 5
	clcode_lengths[ 0] = 0b011 = 3
	clcode_lengths[ 8] = 0b011 = 3
	clcode_lengths[ 7] = 0b011 = 3
	clcode_lengths[ 9] = 0b011 = 3
	clcode_lengths[ 6] = 0b011 = 3
	clcode_lengths[10] = 0b000 = 0
	clcode_lengths[ 5] = 0b011 = 3
	clcode_lengths[11] = 0b000 = 0
	clcode_lengths[ 4] = 0b011 = 3
	clcode_lengths[12] = 0b000 = 0
	clcode_lengths[ 3] = 0b101 = 5

	The remaining (19 - `nclen`) = 5 entries are implicitly zero:

	clcode_lengths[13] = 0
	clcode_lengths[ 2] = 0
	clcode_lengths[14] = 0
	clcode_lengths[ 1] = 0
	clcode_lengths[15] = 0

	Undoing the shuffle gives:

	clcode_lengths[ 0] = 3
	clcode_lengths[ 1] = 0
	clcode_lengths[ 2] = 0
	clcode_lengths[ 3] = 5
	clcode_lengths[ 4] = 3
	clcode_lengths[ 5] = 3
	clcode_lengths[ 6] = 3
	clcode_lengths[ 7] = 3
	clcode_lengths[ 8] = 3
	clcode_lengths[ 9] = 3
	clcode_lengths[10] = 0
	clcode_lengths[11] = 0
	clcode_lengths[12] = 0
	clcode_lengths[13] = 0
	clcode_lengths[14] = 0
	clcode_lengths[15] = 0
	clcode_lengths[16] = 0
	clcode_lengths[17] = 4
	clcode_lengths[18] = 5

	For `clcode`s, there are 7 + 1 + 2 = 10 non-zero entries: 7 3-bit codes, 1
	4-bit code and 2 5-bit codes. The canonical Huffman encoding's map from bits to
	`clcode`s is:

	bits clcode
	0b000 0
	0b001 4
	0b010 5
	0b011 6
	0b100 7
	0b101 8
	0b110 9
	0b1110 17
	0b11110 3
	0b11111 18

	Call that Huffman table `H-CL`.


	## Lcodes and Dcodes

	The bit stream is now at:

	offset xoffset ASCII hex binary
	000007 0x0007 . 0xBD 0b_1011_1...
	000008 0x0008 . 0xF3 0b_1111_0011
	000009 0x0009 + 0x2B 0b_0010_1011
	000010 0x000A . 0xD8 0b_1101_1000
	000011 0x000B S 0x53 0b_0101_0011
	000012 0x000C . 0x2E 0b_0010_1110
	000013 0x000D F 0x46 0b_0100_0110

	When decoding via a Huffman table, bits are read in LSB to MSB order,
	right-to-left in this debugging output. Extra bits, if any, are then read in
	the other, natural order (MSB to LSB). Decoding via `H-CL` gives:

	bits clcode
	0b1110 17, 3 extra bits = 0b111 = 7
	0b001 4
	0b11111 18, 7 extra bits = 0b0001010 = 10
	0b001 4
	0b101 8
	0b1110 17, 3 extra bits = 0b010 = 2
	0b100 7
	0b1110 17, 3 extra bits = 0b001 = 1
	0b011 6
	0b000 0

	Still in the RFC section 3.2.7., this means:

	lcode_lengths has 3 + 7 = 10 consecutive zeroes
	lcode_lengths[ 10] = 4
	lcode_lengths has 11 + 10 = 21 consecutive zeroes
	lcode_lengths[ 32] = 4
	lcode_lengths[ 33] = 8
	lcode_lengths has 3 + 2 = 5 consecutive zeroes
	lcode_lengths[ 39] = 7
	lcode_lengths has 3 + 1 = 4 consecutive zeroes
	lcode_lengths[ 44] = 6
	lcode_lengths[ 45] = 0

	After decoding the first 10 code lengths, the bit stream is now at:

	offset xoffset ASCII hex binary
	000013 0x000D F 0x46 0b_01.._....

	Continuing to read all `nlit` (266) `lcode` lengths and `ndist` (20) `dcode`
	lengths (zeroes are not shown here):

	lcode_lengths[ 10] = 4
	lcode_lengths[ 32] = 4
	lcode_lengths[ 33] = 8
	lcode_lengths[ 39] = 7
	lcode_lengths[ 44] = 6
	lcode_lengths[ 46] = 7
	lcode_lengths[ 50] = 8
	lcode_lengths[ 58] = 9
	lcode_lengths[ 59] = 7
	lcode_lengths[ 63] = 7
	etc
	lcode_lengths[264] = 9
	lcode_lengths[265] = 9

	and

	dcode_lengths[ 5] = 6
	dcode_lengths[ 6] = 5
	dcode_lengths[ 8] = 4
	dcode_lengths[ 9] = 5
	dcode_lengths[10] = 3
	etc
	dcode_lengths[18] = 3
	dcode_lengths[19] = 6

	The `H-CL` table is no longer used from here onwards.

	For `lcode`s, there are 6 + 9 + 10 + 16 + 10 + 12 = 63 non-zero entries: 6
	4-bit codes, 9 5-bit codes, 10 6-bit codes, 16 7-bit codes, 10 8-bit codes and
	12 9-bit codes. The canonical Huffman encoding's map from bits to `lcode`
	values is:

	bits lcode
	0b0000 10
	0b0001 32
	0b0010 101
	0b0011 116
	0b0100 257
	0b0101 259
	0b01100 97
	0b01101 104
	0b01110 105
	0b01111 110
	0b10000 111
	0b10001 114
	0b10010 115
	0b10011 258
	0b10100 260
	0b101010 44
	0b101011 99
	0b101100 100
	0b101101 102
	0b101110 108
	0b101111 109
	0b110000 112
	0b110001 117
	0b110010 119
	0b110011 121
	0b1101000 39
	0b1101001 46
	0b1101010 59
	0b1101011 63
	0b1101100 65
	0b1101101 69
	0b1101110 73
	0b1101111 79
	0b1110000 82
	0b1110001 83
	0b1110010 84
	0b1110011 98
	0b1110100 103
	0b1110101 261
	0b1110110 262
	0b1110111 263
	0b11110000 33
	0b11110001 50
	0b11110010 66
	0b11110011 67
	0b11110100 72
	0b11110101 74
	0b11110110 77
	0b11110111 87
	0b11111000 107
	0b11111001 118
	0b111110100 58
	0b111110101 68
	0b111110110 76
	0b111110111 78
	0b111111000 85
	0b111111001 91
	0b111111010 93
	0b111111011 120
	0b111111100 122
	0b111111101 256
	0b111111110 264
	0b111111111 265

	Call that Huffman table `H-L`.

	For `dcode`s, there are 5 + 4 + 3 + 2 = 14 non-zero entries: 5 3-bit codes, 4
	4-bit codes, 3 5-bit codes and 2 6-bit codes. The canonical Huffman encoding's
	map from bits to `dcode` values is:

	bits dcode
	0b000 10
	0b001 14
	0b010 16
	0b011 17
	0b100 18
	0b1010 8
	0b1011 12
	0b1100 13
	0b1101 15
	0b11100 6
	0b11101 9
	0b11110 11
	0b111110 5
	0b111111 19

	Call that Huffman table `H-D`.


	## Decoding Literals

	The bit stream is now at:

	offset xoffset ASCII hex binary
	000052 0x0034 . 0xE7 0b_1..._....
	000053 0x0035 C 0x43 0b_0100_0011
	000054 0x0036 . 0xE8 0b_1110_1000
	000055 0x0037 ) 0x29 0b_0010_1001
	000056 0x0038 . 0xA0 0b_1010_0000
	000057 0x0039 . 0xF1 0b_1111_0001

	Decoding from that bit stream via `H-L` gives some literal bytes (where the
	`lcode` value is < 256):

	bits lcode
	0b1110000 82 (ASCII 'R')
	0b10000 111 (ASCII 'o')
	0b101111 109 (ASCII 'm')
	0b0010 101 (ASCII 'e')
	0b10000 111 (ASCII 'o')
	0b0001 32 (ASCII ' ')
	0b01100 97 (ASCII 'a')
	0b01111 110 (ASCII 'n')


	## Decoding Back-References

	This continues, up until we get to the "O Romeo, Romeo! wherefore art thou
	Romeo?" line.

	The bit stream is now at:

	offset xoffset ASCII hex binary
	000089 0x0059 . 0xC1 0b_11.._....
	000090 0x005A . 0x1E 0b_0001_1110
	000091 0x005B . 0x0F 0b_0000_1111
	000092 0x005C . 0xF9 0b_1111_1001

	Decoding from that bit stream via `H-L` gives:

	bits lcode
	0b1101111 79 (ASCII 'O')
	0b0001 32 (ASCII ' ')
	0b1110000 82 (ASCII 'R')
	0b10011 258

	This 258 is our first non-literal `lcode`, the start of a length-distance
	back-reference. We first need to decode the length. The RFC section 3.2.5.
	Compressed blocks (length and distance codes) says that an `lcode` of 258 means
	that length=4 with no extra bits.

	The bit stream is now at:

	offset xoffset ASCII hex binary
	000092 0x005C . 0xF9 0b_111._....
	000093 0x005D Y 0x59 0b_0101_1001

	We next decode via `H-D` instead of `H-L`.

	bits dcode
	0b11110 11

	Again, from section 3.2.5., a `dcode` of 11 means a distance in the range [49,
	64], 49 plus 4 extra bits. The next 4 bits are 0b0110=6, so the distance is 55.
	We therefore copy length=4 bytes from distance=55 bytes ago: "omeo".

	The bit stream is now at:

	offset xoffset ASCII hex binary
	000093 0x005D Y 0x59 0b_01.._....
	000094 0x005E U 0x55 0b_0101_0101
	000095 0x005F > 0x3E 0b_0011_1110

	We go back to decoding via `H-L`, which gives

	bits lcode
	0b101010 44 (ASCII ',')
	0b10100 260

	This is another non-literal. Section 3.2.5. says that an `lcode` of 260 means
	length=6 with no extra bits.

	The bit stream is now at:

	offset xoffset ASCII hex binary
	000095 0x005F > 0x3E 0b_0011_111.

	Decode with `H-D`.

	bits dcode
	0b111110 5

	Again, from section 3.2.5., a `dcode` of 5 means a distance in the range [7,
	8], 7 plus 1 extra bit. The next 1 bits are 0b0=0, so the distance is 7. We
	therefore copy length=6 bytes from distance=7 bytes ago: " Romeo".

	The bit stream is now at:

	offset xoffset ASCII hex binary
	000096 0x0060 . 0x0F 0b_0000_1111

	We go back to decoding via `H-L`, which gives

	bits lcode
	0b11110000 33 (ASCII '!')

	And on we go.


	## End of Block

	The block finishes with the bit stream at:

	offset xoffset ASCII hex binary
	000522 0x020A . 0xEB 0b_1110_101.
	000523 0x020B 0 0x30 0b_0011_0000
	000524 0x020C z 0x7A 0b_0111_1010
	000525 0x020D Y 0x59 0b_0101_1001
	000526 0x020E a 0x61 0b_0110_0001
	000527 0x020F . 0x0D 0b_0000_1101
	000528 0x0210 . 0x7F 0b_0111_1111
	000529 0x0211 . 0x01 0b_0000_0001

	The decoding is:

	bits lcode
	0b101011 99 (ASCII 'c')
	0b10000 111 (ASCII 'o')
	0b110001 117 (ASCII 'u')
	0b01111 110 (ASCII 'n')
	0b0100 257 (length=3 + 0_extra_bits...)

	bits dcode
	0b1101 15 (distance=193 + 6_extra_bits, 0b000010 in MSB to LSB order;
	length=3, distance=195: copy "sel")

	bits lcode
	0b1101011 63 (ASCII '?')
	0b0000 10 (ASCII '\n')
	0b111111101 256 (end of block)

	In other words, the block ends with "counsel?\n".


	# More Wire Format Examples

	See `test/data/artificial/deflate-*.commentary.txt`