doc/spec/rac-spec.md - external/github.com/google/wuffs - Git at Google

 # Random Access Compression: RAC

 Status: Draft (as of September 2019). There is no compatibility guarantee yet.


 ## Overview

 RAC is a *compressed* file format that allows *random access* (not just
 sequential access) to the decompressed contents. In comparison to some other
 popular compression formats, all four of the
 [Zlib](https://www.ietf.org/rfc/rfc1950.txt),
 [Brotli](https://www.ietf.org/rfc/rfc7932.txt),
 [LZ4](https://github.com/lz4/lz4/blob/master/doc/lz4_Frame_format.md) and
 [Zstandard](https://www.ietf.org/rfc/rfc8478.txt) specifications explicitly
 contain the identical phrase: "the data format defined by this specification
 does not attempt to allow random access to compressed data".

 Compression means that the derived file is typically smaller than the original
 file. Random access means that, starting from the compressed file, it is
 possible to reconstruct the half-open byte range `[di .. dj)` of the
 decompressed file without always having to first decompress all of `[0 .. di)`.

 Conceptually, the decompressed file is partitioned into non-overlapping chunks.
 Each compressed chunk can be decompressed independently (although possibly
 sharing additional context, such as a LZ77 prefix dictionary). A RAC file also
 contains a hierarchical index of those chunks.

 RAC is a container format, and while it supports common compression codecs like
 Zlib, Brotli, LZ4 and Zstandard, it is not tied to any particular compression
 codec.

 Non-goals for version 1 include:

   - Filesystem metadata like file names, file sizes and modification times.
   - Multiple source files.

 There is the capability (see attributes and reserved `TTag`s, below) but no
 promise to address these in a future RAC version. There might not be a need to,
 as other designs such as [SquashFS](http://squashfs.sourceforge.net/) and EROFS
 ([Extendable Read-Only File System](https://lkml.org/lkml/2018/5/31/306))
 already exist.

 Non-goals in general include:

   - Encryption.
   - Streaming decompression.

 A companion document has further discussion of [RAC related
 work](/doc/spec/rac-related-work.md).


 ### Glossary

   - `CBias` is the delta added to a `CPointer` to produce a `COffset`.
   - `DBias` is the delta added to a `DPointer` to produce a `DOffset`.
   - `CFile` is the compressed file.
   - `DFile` is the decompressed file.
   - `CFileSize` is the size of the `CFile`.
   - `DFileSize` is the size of the `DFile`.
   - `COffset` is a byte offset in `CSpace`.
   - `DOffset` is a byte offset in `DSpace`.
   - `CPointer` is a relative `COffset`, prior to bias-correction.
   - `DPointer` is a relative `DOffset`, prior to bias-correction.
   - `CRange` is a `Range` in `CSpace`.
   - `DRange` is a `Range` in `DSpace`.
   - `CSpace` means that byte offsets refer to the `CFile`.
   - `DSpace` means that byte offsets refer to the `DFile`.

 `Range` is a pair of byte offsets `[i .. j)`, in either `CSpace` or `DSpace`.
 It is half-open, containing every byte offset `x` such that `(i <= x)` and `(x
 < j)`. It is invalid to have `(i > j)`. The size of a `Range` equals `(j - i)`.

 All bytes are 8 bits and unless explicitly specified otherwise, all fixed-size
 unsigned integers (e.g. `uint32_t`, `uint64_t`) are encoded little-endian.
 Within those unsigned integers, bit 0 is the least significant bit and e.g. bit
 31 is the most significant bit of a `uint32_t`.

 The maximum supported `CFileSize` and the maximum supported `DFileSize` are the
 same number: `0x0000_FFFF_FFFF_FFFF`, which is `((1 << 48) - 1)`.


 ## File Structure

 A RAC file (the `CFile`) must be at least 32 bytes long, and start with the 3
 byte `Magic` (see below), so that no valid RAC file can also be e.g. a valid
 JPEG file.

 The `CFile` contains a tree of `Node`s. Each `Node` is either a `Branch Node`
 (pointing to between 1 and 255 child `Node`s) or a `Leaf Node`. There must be
 at least one `Branch Node`, called the `Root Node`. Parsing a `CFile` requires
 knowing the `CFileSize` in order to identify the `Root Node`, which is either
 at the start or the end of the `CFile`.

 Each `Node` has a `DRange`, which can be empty. `Branch Node`s can also have
 attributes: elements that aren't child `Node`s, which must have an empty
 `DRange`. Empty elements contain metadata or other decompression context such
 as a shared dictionary.

 Each `Leaf Node` also has 3 `CRange`s (`Primary`, `Secondary` and `Tertiary`),
 any or all of which may be empty. The contents of the `CFile`, within those
 `CRange`s, are decompressed according to the `Codec` (see below) to reconstruct
 that part of the `DFile` within the `Leaf Node`'s `DRange`.


 ## Branch Nodes

 A `Branch Node`'s encoding in the `CFile` has a variable byte size, between 32
 and 4096 inclusive, depending on its number of children. Specifically, it
 occupies `((Arity * 16) + 16)` bytes, grouped into 8 byte segments (but not
 necessarily 8 byte aligned), starting at a `COffset` called its `Branch
 COffset`:

     +-+-+-+-+-+-+-+-+
     |0|1|2|3|4|5|6|7|
     +-+-+-+-+-+-+-+-+
     |Magic|A|Che|0|T|  Magic, Arity, Checksum,     Reserved (0),  TTag[0]
     | DPtr[1]   |0|T|  DPtr[1],                    Reserved (0),  TTag[1]
     | DPtr[2]   |0|T|  DPtr[2],                    Reserved (0),  TTag[2]
     | ...       |0|.|  ...,                        Reserved (0),  ...
     | DPtr[A-2] |0|T|  DPtr[Arity-2],              Reserved (0),  TTag[Arity-2]
     | DPtr[A-1] |0|T|  DPtr[Arity-1],              Reserved (0),  TTag[Arity-1]
     | DPtr[A]   |0|C|  DPtr[Arity] a.k.a. DPtrMax, Reserved (0),  CodecByte
     | CPtr[0]   |L|S|  CPtr[0],                    CLen[0],       STag[0]
     | CPtr[1]   |L|S|  CPtr[1],                    CLen[1],       STag[1]
     | CPtr[2]   |L|S|  CPtr[2],                    CLen[2],       STag[2]
     | ...       |L|.|  ...,                        ...,           ...
     | CPtr[A-2] |L|S|  CPtr[Arity-2],              CLen[Arity-2], STag[Arity-2]
     | CPtr[A-1] |L|S|  CPtr[Arity-1],              CLen[Arity-1], STag[Arity-1]
     | CPtr[A]   |V|A|  CPtr[Arity] a.k.a. CPtrMax, Version,       Arity
     +-+-+-+-+-+-+-+-+

 See the "Examples" section below for example RAC files and, in particular,
 example `Branch Node`s.

 For the `(XPtr | Other6 | Other7)` 8 byte fields, the `XPtr` occupies the low
 48 bits (as a little-endian `uint64_t`) and the `Other` fields occupy the high
 16 bits.

 The `CPtr` and `DPtr` values are what is explicitly written in the `CFile`'s
 bytes. These are added to a `Branch Node`'s implicit `Branch CBias` and `Branch
 DBias` values to give the implicit `COff` and `DOff` values: `COff[i]` and
 `DOff[i]` are defined to be `(Branch_CBias + CPtr[i])` and `(Branch_DBias +
 DPtr[i])`.

 `CPtrMax` is another name for `CPtr[Arity]`, and `COffMax` is defined to be
 `(Branch_CBias + CPtrMax)`. Likewise for `DPtrMax` and `DOffMax`.

 The `DPtr[0]` value is implicit, and always equals zero, so that `DOff[0]`
 always equals the `Branch DBias`.

   - For the `Root Node`, the `DPtrMax` also sets the `DFileSize`. The `Branch
     CBias` and `Branch DBias` are both zero. The `Branch COffset` is determined
     by the "Root Node" section below.
   - For a child `Branch Node`, the `Branch COffset`, `Branch CBias` and `Branch
     DBias` are given by the parent `Branch Node`. See the "Search Within a
     Branch Node" section below.


 ### Magic

 `Magic` is the three bytes `"\x72\xC3\x63"`, which is invalid UTF-8 but is
 `"rÃc"` in ISO 8859-1. The tilde isn't particularly meaningful, other than
 `"rÃc"` being a nonsensical word (with nonsensical capitalization) that is
 unlikely to appear in other files.

 Every `Branch Node` must start with these `Magic` bytes, not just a `Branch
 Node` positioned at the start of the `CFile`.


 ### Arity

 `Arity` is the `Branch Node`'s number of elements: the number of child `Node`s
 plus the number of attributes. Having zero children is invalid.

 The `Arity` byte is given twice: the fourth byte and the final byte of the
 `Branch Node`. The two values must match.

 The repetition lets a RAC reader determine the size of the `Branch Node` data
 (as the size depends on the `Arity`), given either its start or its end offset
 in `CSpace`. For almost all `Branch Node`s, we will know its start offset (its
 `Branch COffset`), but for a `Root Node` at the end of a `CFile`, we will only
 know its end offset.


 ### Checksum

 `Checksum` is a checksum of the `Branch Node`'s bytes. It is not a checksum of
 the `CFile` or `DFile` contents pointed to by a `Branch Node`. Content
 checksums are a `Codec`-specific consideration.

 The little-endian `uint16_t` `Checksum` value is the low 16 bits XOR'ed with
 the high 16 bits of the `uint32_t` CRC-32 IEEE checksum of the `((Arity * 16) +
 10)` bytes immediately after the `Checksum`. The 4 bytes immediately before the
 `Checksum` are not considered: the `Magic` bytes have only one valid value and
 the `Arity` byte near the start is replicated by the `Arity` byte at the end.


 ### Reserved (0)

 The `Reserved (0)` bytes must have the value `0x00`.


 ### COffs and DOffs, STags and TTags

 For every `a` in the half-open range `[0 .. Arity)`, the `a`'th element has two
 tags, `STag[a]` and `TTag[a]`, and a `DRange` of `[DOff[a] .. DOff[a+1])`. The
 `DOff` values must be non-decreasing: see the "Branch Node Validation" section
 below.

 A `TTag[a]` of `0xFE` means that that element is a `Branch Node` child.

 A `TTag[a]` of `0xFD` means that there is no child, but is instead a `Codec
 Element` attribute, whose `DRange` must be empty, and the rest of this section
 does not apply: the `STag` is ignored.

 A `TTag[a]` in the half-open range `[0xC0 .. 0xFD)` is reserved. Otherwise, the
 element is a `Leaf Node` child.

 A child `Branch Node`'s `SubBranch COffset` is defined to be `COff[a]`. Its
 `SubBranch DBias` and `SubBranch DOffMax` are defined to be `DOff[a]` and
 `DOff[a+1]`.

   - When `(STag[a] < Arity)`, it is a `CBiasing Branch Node`. The `SubBranch
     CBias` is defined to be `(Branch_CBias + CPtr[STag[a]])`. This expression
     is equivalent to `COff[STag[a]]`.
   - When `(STag[a] >= Arity)`, it is a `CNeutral Branch Node`. The `SubBranch
     CBias` is defined to be `(Branch_CBias)`.

 A child `Leaf Node`'s `STag[a]` and `TTag[a]` values are also called its `Leaf
 STag` and `Leaf TTag`. It also has:

   - A `Primary CRange`, equal to `MakeCRange(a)`.
   - A `Secondary CRange`, equal to `MakeCRange(STag[a])`.
   - A `Tertiary CRange`, equal to `MakeCRange(TTag[a])`.

 The `MakeCRange(i)` function defines a `CRange`. If `(i >= Arity)` then that
 `CRange` is the empty range `[COffMax .. COffMax)`. Otherwise, the lower bound
 is `COff[i]` and the upper bound is:

   - `COffMax` when `CLen[i]` is zero.
   - The minimum of `COffMax` and `(COff[i] + (CLen[i] * 1024))` when `CLen[i]`
     is non-zero.

 In other words, the `COffMax` value clamps the `CRange` upper bound. The `CLen`
 value, if non-zero, combines with the `COff` value to apply another clamp. The
 `CLen` is given in units of 1024 bytes, but the `(COff[i] + (CLen[i] * 1024))`
 value is not necessarily quantized to 1024 byte boundaries.

 Note that, since `Arity` is at most 255, an `STag[a]` of `0xFF` always results
 in a `CNeutral Branch Node` or an empty `Secondary CRange`. Likewise, a
 `TTag[a]` of `0xFF` always results in an empty `Tertiary CRange`.


 ### COffMax

 `COffMax` is an inclusive upper bound on every `COff` in a `Branch Node` and in
 its descendent `Branch Node`s. A child `Branch Node` must not have a larger
 `COffMax` than the parent `Branch Node`'s `COffMax`, and the `Root Node`'s
 `COffMax` must equal the `CFileSize`. See the "Branch Node Validation" section
 below.

 A RAC file can therefore be incrementally modified, if the RAC writer only
 appends new `CFile` bytes and does not re-write existing `CFile` bytes, so that
 the `CFileSize` increases. Even if the old (smaller) RAC file's `Root Node` was
 at the `CFile` start, the new (larger) `CFileSize` means that those starting
 bytes are an obsolete `Root Node` (but still a valid `Branch Node`). The new
 `Root Node` is therefore located at the end of the new RAC file.

 Concatenating RAC files (concatenating in `DSpace`) involves concatenating the
 RAC files in `CSpace` and then appending a new `Root Node` with `CBiasing
 Branch Node`s pointing to each source RAC file's `Root Node`.


 ### Version

 `Version` must have the value `0x01`, indicating version 1 of the RAC format.
 The `0x00` value is reserved, although future editions may use other positive
 values.


 ### Codec

 `Codec`s define specializations of RAC, such as "RAC + Zlib" or "RAC + Brotli".
 It is valid for a "RAC + Zstandard" only decoder to reject a "RAC + Brotli"
 file, even if it is a valid RAC file. Recall that RAC is just a container, and
 not tied to any particular compression codec.

 There are two categories of `Codec`s: the high `0x80` bit of the `Codec Byte`
 being `0` or `1` denotes a `Short` or `Long` codec respectively. `Short Codec`s
 are represented by a single byte (the `Codec Byte`). `Long Codec`s use that
 `Codec Byte` to locate 7 additional bytes: a `Codec Element`.

 For both `Short Codec`s and `Long Codec`s, the second-highest `0x40` bit of the
 `Codec Byte` is the `Mix Bit`. A `Mix Bit` of `0` means that all of the
 `Node`'s descendents have exactly the same `Codec` (not just the same `Codec
 Byte`; `Short Codec`s and `Long Codec`s are not considered exactly the same
 even if they represent the same compression algorithm). A `Mix Bit` of `1`
 means that descendents may have a different `Codec`.

 For `Short Codec`s, the remaining low 6 bits correspond to a specific
 compression algorithm:

   - `0x00` means "RAC + Zeroes".
   - `0x01` means "RAC + Zlib".
   - `0x02` means "RAC + LZ4".
   - `0x03` means "RAC + Zstandard".
   - All other values are reserved.

 For `Long Codec`s, the remaining low 6 bits of the `Codec Byte` define a number
 `c64`. The lowest index `i` out of `(c64 + (64 * 0))`, `(c64 + (64 * 1))`,
 `(c64 + (64 * 2))` and `(c64 + (64 * 3))` such that `TTag[i]` is `0xFD` locates
 the 7 bytes that identifies the `Codec`. The location is what would otherwise
 occupy the `i`th element's `CPtr | CLen` space. It is invalid for no such `i`
 to exist.

 `Long Codec` values are not reserved by this specification, other than 7 NUL
 bytes also means "RAC + Zeroes". Users may define their own values for
 arbitrary compression algorithms. Maintaining a centralized registry mapping
 values to algorithms is out of scope of this specification, although we suggest
 a convention that the 7 bytes form a human-readable (ASCII) string, padded with
 trailing NUL bytes. For example, a hypothetical "Middle Out version 2"
 compression algorithm that typically used the ".mdo2" file extension might be
 represented by the 7 bytes `"mdo2\x00\x00\x00"`.


 ### Branch Node Validation

 The first time that a RAC reader visits any particular `Branch Node`, it must
 check that the `Magic` matches, the two `Arity` values match and are non-zero,
 there is at least one child `Node` (not just non-`Node` attributes), the
 computed checksum matches the listed `Checksum` and that the RAC reader accepts
 the `Version`. For `Long Codec`s, there must exist an `0xFD` `TTag` as per the
 previous section.

 It must also check that all of its `DOff` values are sorted: `(DOff[a] <=
 DOff[a+1])` for every `a` in the half-open range `[0 .. Arity)`. By induction,
 this means that all of its `DOff` values do not exceed `DOffMax`, and again by
 induction, therefore do not exceed `DFileSize`.

 It must also check that, other than `Codec Element` attributes, all of its
 `COff` values do not exceed `COffMax` (and again by induction, therefore do not
 exceed `CFileSize`). Other than that, `COff` values do not have to be sorted:
 successive `Node`s (in `DSpace`) can be out of order (in `CSpace`), allowing
 for incrementally modified RAC files.

 For the `Root Node`, its `COffMax` must equal the `CFileSize`. Recall that
 parsing a `CFile` requires knowing the `CFileSize`, and also that a `Root
 Node`'s `Branch CBias` is zero, so its `COffMax` equals its `CPtrMax`.

 For a child `Branch Node`, if the parent's `Codec` does not have the `Mix Bit`
 set then the child's `Codec` must equal its parent's. Furthermore, its
 `Version` must be less than or equal to its parent's `Version`, its `COffMax`
 must be less than or equal to its parent's `COffMax`, and its `DOffMax` must
 equal its parent's `SubBranch DOffMax`. The `DOffMax` condition is equivalent
 to checking that the parent and child agree on the child's size in `DSpace`.
 The parent states that it is its `(DPtr[a+1] - DPtr[a])` and the child states
 that it is its `DPtrMax`.

 One conservative way to check `Branch Node`s' validity on first visit is to
 check them on every visit, as validating any particular `Branch Node` is
 idempotent, but other ways are acceptable.


 ## Root Node

 The `Root Node` might be at the start of the `CFile`, as this might optimize
 alignment of `Branch Node`s and of `CRange`s. All `Branch Node`s' sizes are
 multiples of 16 bytes, and a maximal `Branch Node` is exactly 4096 bytes.

 The `Root Node` might be at the end of the `CFile`, as this allows one-pass
 (streaming) encoding of a RAC file. It also allows appending to, concatenating
 or incrementally modifying existing RAC files relatively cheaply.

 To find the `Root Node`, first look for a valid `Root Node` at the `CFile`
 start. If and only if that fails, look at the `CFile` end. If that also fails,
 it is not a valid RAC file.


 ### Root Node at the CFile Start

 The fourth byte of the `CFile` gives the `Arity`, assuming the `Root Node` is
 at the `CFile` start. If it is zero, fail over to the `CFile` end. A RAC writer
 that does not want to place the `Root Node` at the `CFile` start should
 intentionally write a zero to the fourth byte.

 Otherwise, that `Arity` defines the size in bytes of any starting `Root Node`:
 `((Arity * 16) + 16)`. If the `CFileSize` is less than this size, fail over to
 the `CFile` end.

 If those starting bytes form a valid `Root Node` (as per the "Branch Node
 Validation" section), including having a `CPtrMax` equal to the `CFileSize`,
 then we have indeed found our `Root Node`: its `Branch COffset` is zero.
 Otherwise, fail over to the `CFile` end.


 ### Root Node at the CFile End

 If there is no valid `Root Node` at the `CFile` start then the last byte of the
 `CFile` gives the `Root Node`'s `Arity`. This does not necessarily need to
 match the fourth byte of the `CFile`.

 As before, that `Arity` defines the size in bytes of any ending `Root Node`:
 `((Arity * 16) + 16)`. If the `CFileSize` is less than this size, it is not a
 valid RAC file.

 If those ending bytes form a valid `Root Node` (as per the "Branch Node
 Validation" section), including having a `CPtrMax` equal to the `CFileSize`,
 then we have indeed found our `Root Node`: its `Branch COffset` is the
 `CFileSize` minus the size implied by the `Arity`. Otherwise, it is not a valid
 RAC file.


 ## DRange Reconstruction

 To reconstruct the `DRange` `[di .. dj)`, if that `DRange` is empty then the
 request is trivially satisfied.

 Otherwise, if `(dj > DFileSize)` then reject the request.

 Otherwise, start at the `Root Node` and continue to the next section to find
 the `Leaf Node` containing the `DOffset` `di`.


 ### Search Within a Branch Node

 Load (and validate) the `Branch Node` given its `Branch COffset`, `Branch
 CBias` and `Branch DBias`.

 Find the largest child index `a` such that `(a < Arity)` and `(DOff[a] <= di)`
 and `(DOff[a+1] > di)`, then examine `TTag[a]` to see if the child is a `Leaf
 Node`. If so, skip to the next section.

 For a `Branch Node` child, let `CRemaining` equal this `Branch Node`'s (the
 parents') `COffMax` minus the `SubBranch COffset`. It invalid for `CRemaining`
 to be less than 4, or to be less than the child's size implied by the child's
 `Arity` byte at a `COffset` equal to `(SubBranch_COffset + 3)`.

 The `SubBranch COffset`, `SubBranch CBias` and `SubBranch DBias` from the
 parent `Branch Node` become the `Branch COffset`, `Branch CBias` and `Branch
 DBias` of the child `Branch Node`. In order to rule out infinite loops, at
 least one of these two conditions must hold:

   - The child's `Branch COffset` is less than the parent's `Branch COffset`.
   - The child's `DPtrMax` is less than the parent's `DPtrMax`.

 It is valid for one of those conditions to hold between a parent-child pair and
 the other condition to hold between the next parent-child pair.

 Repeat this "Search Within a Branch Node" section with the child `Branch Node`.


 ### Decompressing a Leaf Node

 If a `Leaf Node`'s `DRange` is empty, decompression is a no-op and skip the
 rest of this section. Specifically, a low-level library that iterates over a
 RAC file's chunks, without actually performing decompression, should skip over
 empty chunks instead of yielding them to its caller.

 Otherwise, decompression combines the `Primary CRange`, `Secondary CRange` and
 `Tertiary CRange` slices of the `CFile`, and the `Leaf STag` and `Leaf TTag`
 values, in a `Codec`-specific way to produce decompressed data.

 There are two general principles, although specific `Codec`s can overrule:

   - The `Codec` may produce fewer bytes than the `DRange` size. In that case,
     the remaining bytes (in `DSpace`) are set to NUL (`memset` to zero).
   - The `Codec` may consume fewer bytes than each `CRange` size, and the
     compressed data typically contains an explicit "end of data" marker. In
     that case, the remaining bytes (in `CSpace`) are ignored. Padding allows
     `COff` values to optionally be aligned.

 It is invalid to produce more bytes than the `DRange` size or to consume more
 bytes than each `CRange` size.


 ### Continuing the Reconstruction

 If decompressing that `Leaf Node` did not fully reconstruct `[di .. dj)`,
 advance through the `Node` tree in depth first search order, decompressing
 every `Leaf Node` along the way, until we have gone up to or beyond `dj`.

 During that traversal, `Node`s with an empty `DRange` should be skipped, even
 if they are `Branch Node`s.


 # Specific Codecs


 ## Common Dictionary Format

 Unless otherwise noted, codecs use this common dictionary format.

 If a `Leaf Node`'s `Secondary CRange` is empty then there is no dictionary.
 Otherwise, the `Secondary CRange` must be at least 8 bytes long:

   - 4 byte little-endian `uint32_t` `Dictionary Length`. The high 2 bits are
     reserved and must be zero. The maximum (inclusive) `Dictionary Length` is
     therefore `((1 << 30) - 1)`, or `1_073_741_823` bytes.
   - `Dictionary Length` bytes `Dictionary`.
   - 4 byte little-endian `uint32_t` `Dictionary Checksum`, the Checksum` is
     CRC-32 IEEE checksum over the `Dictionary`'s bytes (excluding both the 4
     byte prefix and the 4 byte suffix).
   - Padding (ignored).

 The `Leaf TTag` must be `0xFF`. All other `Leaf TTag` values (below `0xC0`) are
 reserved. The empty `Tertiary CRange` is ignored. The `Leaf STag` value is also
 ignored, other than deriving the `Secondary CRange`.


 ## RAC + Zeroes

 The `CRange`s are ignored. The `DRange` is filled with NUL bytes (`memset` to
 zero).


 ## RAC + Zlib

 The `CFile` data in the `Leaf Node`'s `Primary CRange` is decompressed as Zlib
 (RFC 1950), possibly referencing a LZ77 prefix dictionary (what the RFC calls a
 "preset dictionary") wrapped in RAC's common dictionary format, described
 above.


 ## RAC + LZ4

 TODO.


 ## RAC + Zstandard

 The `CFile` data in the `Leaf Node`'s `Primary CRange` is decompressed as
 Zstandard (RFC 8478), possibly referencing a dictionary wrapped in RAC's common
 dictionary format, described above. After unwrapping, the dictionary's bytes
 can be either a "raw" or "trained" dictionary, as per RFC 8478 section 5.


 # Examples

 These examples display RAC files in the format of the `hexdump -C` command line
 tool. They are deliberately very short, for ease of understanding. Realistic
 RAC files, with larger chunk sizes, would typically exhibit much better
 compression ratios.


 ## Zlib Example (Root Node at End)

 The first example is relatively simple. The root node (located at the CFile
 end) only has one child: a leaf node whose compressed contents starts at
 position `0x04`. Decompressing that chunk produces the 6 bytes
 ["More!\n"](https://play.golang.org/p/ZSYmQLgv1RR).

     00000000  72 c3 63 00 78 9c 01 06  00 f9 ff 4d 6f 72 65 21  |r.c.x......More!|
     00000010  0a 07 42 01 bf 72 c3 63  01 65 a9 00 ff 06 00 00  |..B..r.c.e......|
     00000020  00 00 00 00 01 04 00 00  00 00 00 01 ff 35 00 00  |.............5..|
     00000030  00 00 00 01 01                                    |.....|


 ## Zlib Example (Root Node at Start)

 The second example consists of a root node with four children: one metadata
 node (a shared dictionary) and three data nodes. The shared dictionary,
 "\x20sheep.\n" is `0x00000008` bytes long and its CRC-32 IEEE checksum is
 `0x477A8DD0`. The third child's (the second data node)'s compressed contents
 starts at position `0x75`. Decompressing that chunk, together with that shared
 dictionary, produces the 11 bytes ["Two
 sheep.\n"](https://play.golang.org/p/Jh9Wyp6PLID). The complete decoding of all
 three data chunks is "One sheep.\nTwo sheep.\nThree sheep.\n".

     00000000  72 c3 63 04 37 39 00 ff  00 00 00 00 00 00 00 ff  |r.c.79..........|
     00000010  0b 00 00 00 00 00 00 ff  16 00 00 00 00 00 00 ff  |................|
     00000020  23 00 00 00 00 00 00 01  50 00 00 00 00 00 01 ff  |#.......P.......|
     00000030  60 00 00 00 00 00 01 00  75 00 00 00 00 00 01 00  |`.......u.......|
     00000040  8a 00 00 00 00 00 01 00  a1 00 00 00 00 00 01 04  |................|
     00000050  08 00 00 00 20 73 68 65  65 70 2e 0a d0 8d 7a 47  |.... sheep....zG|
     00000060  78 f9 0b e0 02 6e f2 cf  4b 85 31 01 01 00 00 ff  |x....n..K.1.....|
     00000070  ff 17 21 03 90 78 f9 0b  e0 02 6e 0a 29 cf 87 31  |..!..x....n.)..1|
     00000080  01 01 00 00 ff ff 18 0c  03 a8 78 f9 0b e0 02 6e  |..........x....n|
     00000090  0a c9 28 4a 4d 85 71 00  01 00 00 ff ff 21 6e 04  |..(JM.q......!n.|
     000000a0  66                                                |f|


 ## Zlib Example (Concatenation)

 The third example demonstrates concatenating two RAC files: the two examples
 above. The decompressed form of the resultant RAC file is the concatenation of
 the two decompressed forms: "One sheep.\nTwo sheep.\nThree sheep.\nMore!\n".
 Its 41 decompressed bytes consists of the "sheep" RAC file's 35 bytes and then
 the "more" RAC file's 6 bytes. In hexadecimal, `0x29 = 0x23 + 0x06`, and the
 `0x29` and `0x23` numbers can be seen in the compressed form's bytes.

 The compressed form (what's shown in `hexdump -C` format below) is the
 concatenation of the two compressed forms, plus a new root node (64 bytes
 starting at position `0xD6`). Even though the overall RAC file starts with the
 "sheep" RAC file, whose root node was at its start, those opening bytes are no
 longer a valid root node for the larger file.

 That new root node has 3 children: 1 metadata node and 2 branch nodes. The
 metadata node (one whose `DRange` is empty) is required because one of the
 original RAC files' root node is not located at its start. Walking to that
 child branch node needs two `COffset` values: one for the embedded RAC file's
 start and one for the embedded RAC file's root node.

     00000000  72 c3 63 04 37 39 00 ff  00 00 00 00 00 00 00 ff  |r.c.79..........|
     00000010  0b 00 00 00 00 00 00 ff  16 00 00 00 00 00 00 ff  |................|
     00000020  23 00 00 00 00 00 00 01  50 00 00 00 00 00 01 ff  |#.......P.......|
     00000030  60 00 00 00 00 00 01 00  75 00 00 00 00 00 01 00  |`.......u.......|
     00000040  8a 00 00 00 00 00 01 00  a1 00 00 00 00 00 01 04  |................|
     00000050  08 00 00 00 20 73 68 65  65 70 2e 0a d0 8d 7a 47  |.... sheep....zG|
     00000060  78 f9 0b e0 02 6e f2 cf  4b 85 31 01 01 00 00 ff  |x....n..K.1.....|
     00000070  ff 17 21 03 90 78 f9 0b  e0 02 6e 0a 29 cf 87 31  |..!..x....n.)..1|
     00000080  01 01 00 00 ff ff 18 0c  03 a8 78 f9 0b e0 02 6e  |..........x....n|
     00000090  0a c9 28 4a 4d 85 71 00  01 00 00 ff ff 21 6e 04  |..(JM.q......!n.|
     000000a0  66 72 c3 63 00 78 9c 01  06 00 f9 ff 4d 6f 72 65  |fr.c.x......More|
     000000b0  21 0a 07 42 01 bf 72 c3  63 01 65 a9 00 ff 06 00  |!..B..r.c.e.....|
     000000c0  00 00 00 00 00 01 04 00  00 00 00 00 01 ff 35 00  |..............5.|
     000000d0  00 00 00 00 01 01 72 c3  63 03 83 16 00 ff 00 00  |......r.c.......|
     000000e0  00 00 00 00 00 fe 23 00  00 00 00 00 00 fe 29 00  |......#.......).|
     000000f0  00 00 00 00 00 01 a1 00  00 00 00 00 00 ff 00 00  |................|
     00000100  00 00 00 00 04 01 b6 00  00 00 00 00 04 00 16 01  |................|
     00000110  00 00 00 00 01 03                                 |......|


 Focusing on that new root node:

     000000d0                    72 c3  63 03 83 16 00 ff 00 00
     000000e0  00 00 00 00 00 fe 23 00  00 00 00 00 00 fe 29 00
     000000f0  00 00 00 00 00 01 a1 00  00 00 00 00 00 ff 00 00
     00000100  00 00 00 00 04 01 b6 00  00 00 00 00 04 00 16 01
     00000110  00 00 00 00 01 03

 Re-formatting to highlight the groups-of-8-bytes structure and reprise the
 "Branch Nodes" section's diagram:

     000000d6    72 c3 63 03 83 16 00 ff    |Magic|A|Che|0|T|
     000000de    00 00 00 00 00 00 00 fe    | DPtr[1]   |0|T|
     000000e6    23 00 00 00 00 00 00 fe    | DPtr[2]   |0|T|
     000000ee    29 00 00 00 00 00 00 01    | DPtr[A]   |0|C|
     000000f6    a1 00 00 00 00 00 00 ff    | CPtr[0]   |L|S|
     000000fe    00 00 00 00 00 00 04 01    | CPtr[1]   |L|S|
     00000106    b6 00 00 00 00 00 04 00    | CPtr[2]   |L|S|
     0000010e    16 01 00 00 00 00 01 03    | CPtr[A]   |V|A|

 Its `CodecByte` (and therefore its `Short Codec`) is `0x01`, "RAC + Zlib", its
 Version is `0x01` and its `Arity` is `0x03`. The `DPtr` values are `0x0000`
 (implicit), `0x0000`, `0x0023` and `0x0029`. The `CPtr` values are `0x00A1`,
 `0x0000`, `0x00B6` and `0x0116` (the size of the whole RAC file). Note that the
 `CPtr` values are not sorted. The last two children's `TTag`s are `0xFE` and
 `0xFE` and their `STag`s are `0x01` and `0x00`, which means that they are both
 `CBiasing Branch Node`s.


 # Reference Implementation

 C programming language libraries:

   - TODO. Follow [this GitHub issue](https://github.com/google/wuffs/issues/22)
     for updates.

 Go programming language libraries:

   - [RAC](https://godoc.org/github.com/google/wuffs/lib/rac)
   - [RAC + Zlib](https://godoc.org/github.com/google/wuffs/lib/raczlib)

 Command line tool, installable via `go get
 github.com/google/wuffs/cmd/ractool`:

   - [ractool](https://godoc.org/github.com/google/wuffs/cmd/ractool)


 # Acknowledgements

 I (Nigel Tao) thank Robert Obryk, Sanjay Ghemawat and Sean Klein for their
 review.


 ---

 Updated on September 2019.