Re-do RAC codec values, from 8 bits to 56 bits
diff --git a/doc/spec/rac-spec.md b/doc/spec/rac-spec.md
index 3aff1b9..a14fef2 100644
--- a/doc/spec/rac-spec.md
+++ b/doc/spec/rac-spec.md
@@ -1,6 +1,6 @@
# Random Access Compression: RAC
-Status: Draft (as of August 2019). There is no compatibility guarantee yet.
+Status: Draft (as of September 2019). There is no compatibility guarantee yet.
## Overview
@@ -32,10 +32,11 @@
- Filesystem metadata like file names, file sizes and modification times.
- Multiple source files.
-There is the capability (see reserved `TTag`s, below) but no promise to address
-these in a future RAC version. There might not be a need to, as other designs
-such as [SquashFS](http://squashfs.sourceforge.net/) and EROFS ([Extendable
-Read-Only File System](https://lkml.org/lkml/2018/5/31/306)) already exist.
+There is the capability (see attributes and reserved `TTag`s, below) but no
+promise to address these in a future RAC version. There might not be a need to,
+as other designs such as [SquashFS](http://squashfs.sourceforge.net/) and EROFS
+([Extendable Read-Only File System](https://lkml.org/lkml/2018/5/31/306))
+already exist.
Non-goals in general include:
@@ -88,8 +89,10 @@
knowing the `CFileSize` in order to identify the `Root Node`, which is either
at the start or the end of the `CFile`.
-Each `Node` has a `DRange`. An empty `DRange` means that the `Node` contains
-metadata or other decompression context such as a shared dictionary.
+Each `Node` has a `DRange`, which can be empty. `Branch Node`s can also have
+attributes: elements that aren't child `Node`s, which must have an empty
+`DRange`. Empty elements contain metadata or other decompression context such
+as a shared dictionary.
Each `Leaf Node` also has 3 `CRange`s (`Primary`, `Secondary` and `Tertiary`),
any or all of which may be empty. The contents of the `CFile`, within those
@@ -114,7 +117,7 @@
| ... |0|.| ..., Reserved (0), ...
| DPtr[A-2] |0|T| DPtr[Arity-2], Reserved (0), TTag[Arity-2]
| DPtr[A-1] |0|T| DPtr[Arity-1], Reserved (0), TTag[Arity-1]
- | DPtr[A] |0|C| DPtr[Arity] a.k.a. DPtrMax, Reserved (0), Codec
+ | DPtr[A] |0|C| DPtr[Arity] a.k.a. DPtrMax, Reserved (0), CodecByte
| CPtr[0] |L|S| CPtr[0], CLen[0], STag[0]
| CPtr[1] |L|S| CPtr[1], CLen[1], STag[1]
| CPtr[2] |L|S| CPtr[2], CLen[2], STag[2]
@@ -164,7 +167,8 @@
### Arity
-`Arity` is the `Branch Node`'s number of children. Zero is invalid.
+`Arity` is the `Branch Node`'s number of elements: the number of child `Node`s
+plus the number of attributes. Having zero children is invalid.
The `Arity` byte is given twice: the fourth byte and the final byte of the
`Branch Node`. The two values must match.
@@ -196,14 +200,19 @@
### COffs and DOffs, STags and TTags
-For every `a` in the half-open range `[0 .. Arity)`, the `a`'th child `Node`
-has two tags, `STag[a]` and `TTag[a]`, and a `DRange` of `[DOff[a] ..
-DOff[a+1])`. The `DOff` values must be non-decreasing: see the "Branch Node
-Validation" section below.
+For every `a` in the half-open range `[0 .. Arity)`, the `a`'th element has two
+tags, `STag[a]` and `TTag[a]`, and a `DRange` of `[DOff[a] .. DOff[a+1])`. The
+`DOff` values must be non-decreasing: see the "Branch Node Validation" section
+below.
-A `TTag[a]` of `0xFE` means that child is a `Branch Node`. A `TTag[a]` in the
-half-open range `[0xC0 .. 0xFE)` is reserved. Otherwise, the child is a `Leaf
-Node`.
+A `TTag[a]` of `0xFE` means that that element is a `Branch Node` child.
+
+A `TTag[a]` of `0xFD` means that there is no child, but is instead a `Codec
+Element` attribute, whose `DRange` must be empty, and the rest of this section
+does not apply: the `STag` is ignored.
+
+A `TTag[a]` in the half-open range `[0xC0 .. 0xFD)` is reserved. Otherwise, the
+element is a `Leaf Node` child.
A child `Branch Node`'s `SubBranch COffset` is defined to be `COff[a]`. Its
`SubBranch DBias` and `SubBranch DOffMax` are defined to be `DOff[a]` and
@@ -272,47 +281,78 @@
`Codec`s define specializations of RAC, such as "RAC + Zlib" or "RAC + Brotli".
It is valid for a "RAC + Zstandard" only decoder to reject a "RAC + Brotli"
file, even if it is a valid RAC file. Recall that RAC is just a container, and
-not tied to any particular compression codec. For the `Codec` byte in a `Branch
-Node`:
+not tied to any particular compression codec.
+
+There are two categories of `Codec`s: the high `0x80` bit of the `Codec Byte`
+being `0` or `1` denotes a `Short` or `Long` codec respectively. `Short Codec`s
+are represented by a single byte (the `Codec Byte`). `Long Codec`s use that
+`Codec Byte` to locate 7 additional bytes: a `Codec Element`.
+
+For both `Short Codec`s and `Long Codec`s, the second-highest `0x40` bit of the
+`Codec Byte` is the `Mix Bit`. A `Mix Bit` of `0` means that all of the
+`Node`'s descendents have exactly the same `Codec` (not just the same `Codec
+Byte`; `Short Codec`s and `Long Codec`s are not considered exactly the same
+even if they represent the same compression algorithm). A `Mix Bit` of `1`
+means that descendents may have a different `Codec`.
+
+For `Short Codec`s, the remaining low 6 bits correspond to a specific
+compression algorithm:
- `0x00` means "RAC + Zeroes".
- `0x01` means "RAC + Zlib".
- `0x02` means "RAC + Brotli".
- `0x04` means "RAC + ZStandard".
- - Any other value less than 0x08 means that all of this `Branch Node`'s
- children must be `Branch Node`s and not `Leaf Node`s and that no child's
- `Codec` byte can have a bit set that is not set in this `Codec` byte.
- - All other values, 0x08 or greater, are reserved.
+ - All other values are reserved.
+
+For `Long Codec`s, the remaining low 6 bits of the `Codec Byte` define a number
+`c64`. The lowest index `i` out of `(c64 + (64 * 0))`, `(c64 + (64 * 1))`,
+`(c64 + (64 * 2))` and `(c64 + (64 * 3))` such that `TTag[i]` is `0xFD` locates
+the 7 bytes that identifies the `Codec`. The location is what would otherwise
+occupy the `i`th element's `CPtr | CLen` space. It is invalid for no such `i`
+to exist.
+
+`Long Codec` values are not reserved by this specification, other than 7 NUL
+bytes also means "RAC + Zeroes". Users may define their own values for
+arbitrary compression algorithms. Maintaining a centralized registry mapping
+values to algorithms is out of scope of this specification, although we suggest
+a convention that the 7 bytes form a human-readable (ASCII) string, padded with
+trailing NUL bytes. For example, a hypothetical "Middle Out version 2"
+compression algorithm that typically used the ".mdo2" file extension might be
+represented by the 7 bytes `"mdo2\x00\x00\x00"`.
### Branch Node Validation
The first time that a RAC reader visits any particular `Branch Node`, it must
check that the `Magic` matches, the two `Arity` values match and are non-zero,
-the computed checksum matches the listed `Checksum` and that the RAC reader
-accepts the `Version` and the `Codec`.
+there is at least one child `Node` (not just non-`Node` attributes), the
+computed checksum matches the listed `Checksum` and that the RAC reader accepts
+the `Version`. For `Long Codec`s, there must exist an `0xFD` `TTag` as per the
+previous section.
It must also check that all of its `DOff` values are sorted: `(DOff[a] <=
DOff[a+1])` for every `a` in the half-open range `[0 .. Arity)`. By induction,
this means that all of its `DOff` values do not exceed `DOffMax`, and again by
induction, therefore do not exceed `DFileSize`.
-It must also check that all of its `COff` values do not exceed `COffMax` (and
-again by induction, therefore do not exceed `CFileSize`). Other than that,
-`COff` values do not have to be sorted: successive `Node`s (in `DSpace`) can be
-out of order (in `CSpace`), allowing for incrementally modified RAC files.
+It must also check that, other than `Codec Element` attributes, all of its
+`COff` values do not exceed `COffMax` (and again by induction, therefore do not
+exceed `CFileSize`). Other than that, `COff` values do not have to be sorted:
+successive `Node`s (in `DSpace`) can be out of order (in `CSpace`), allowing
+for incrementally modified RAC files.
For the `Root Node`, its `COffMax` must equal the `CFileSize`. Recall that
parsing a `CFile` requires knowing the `CFileSize`, and also that a `Root
Node`'s `Branch CBias` is zero, so its `COffMax` equals its `CPtrMax`.
-For a child `Branch Node`, its `Codec` bits must be a subset of its parent's
-`Codec` bits, its `Version` must be less than or equal to its parent's
-`Version`, its `COffMax` must be less than or equal to its parent's `COffMax`,
-and its `DOffMax` must equal its parent's `SubBranch DOffMax`. The `DOffMax`
-condition is equivalent to checking that the parent and child agree on the
-child's size in `DSpace`. The parent states that it is its `(DPtr[a+1] -
-DPtr[a])` and the child states that it is its `DPtrMax`.
+For a child `Branch Node`, if the parent's `Codec` does not have the `Mix Bit`
+set then the child's `Codec` must equal its parent's. Furthermore, its
+`Version` must be less than or equal to its parent's `Version`, its `COffMax`
+must be less than or equal to its parent's `COffMax`, and its `DOffMax` must
+equal its parent's `SubBranch DOffMax`. The `DOffMax` condition is equivalent
+to checking that the parent and child agree on the child's size in `DSpace`.
+The parent states that it is its `(DPtr[a+1] - DPtr[a])` and the child states
+that it is its `DPtrMax`.
One conservative way to check `Branch Node`s' validity on first visit is to
check them on every visit, as validating any particular `Branch Node` is
@@ -618,12 +658,13 @@
00000104 b4 00 00 00 00 00 04 00 | CPtr[2] |L|S|
0000010c 14 01 00 00 00 00 01 03 | CPtr[A] |V|A|
-Its `Codec` is `0x01`, "RAC + Zlib", its Version is `0x01` and its `Arity` is
-`0x03`. The `DPtr` values are `0x0000` (implicit), `0x0000`, `0x0023` and
-`0x0029`. The `CPtr` values are `0x009F`, `0x0000`, `0x00B4` and `0x0114` (the
-size of the whole RAC file). Note that the `CPtr` values are not sorted. The
-last two children's `TTag`s are `0xFE` and `0xFE` and their `STag`s are `0x01`
-and `0x00`, which means that they are both `CBiasing Branch Node`s.
+Its `CodecByte` (and therefore its `Codec`) is `0x01`, "RAC + Zlib", its
+Version is `0x01` and its `Arity` is `0x03`. The `DPtr` values are `0x0000`
+(implicit), `0x0000`, `0x0023` and `0x0029`. The `CPtr` values are `0x009F`,
+`0x0000`, `0x00B4` and `0x0114` (the size of the whole RAC file). Note that the
+`CPtr` values are not sorted. The last two children's `TTag`s are `0xFE` and
+`0xFE` and their `STag`s are `0x01` and `0x00`, which means that they are both
+`CBiasing Branch Node`s.
# Reference Implementation
@@ -652,4 +693,4 @@
---
-Updated on August 2019.
+Updated on September 2019.
diff --git a/lib/rac/chunk_reader.go b/lib/rac/chunk_reader.go
index 28dc0ed..f0b1fd1 100644
--- a/lib/rac/chunk_reader.go
+++ b/lib/rac/chunk_reader.go
@@ -25,12 +25,14 @@
_ = b[7] // Early bounds check to guarantee safety of reads below.
u := uint64(b[0]) | uint64(b[1])<<8 | uint64(b[2])<<16 | uint64(b[3])<<24 |
uint64(b[4])<<32 | uint64(b[5])<<40 | uint64(b[6])<<48 | uint64(b[7])<<56
- return int64(u & 0xFFFFFFFFFFFF)
+ const mask = (1 << 48) - 1
+ return int64(u & mask)
}
-// bitwiseSubset returns whether every 1-bit in inner is also set in outer.
-func bitwiseSubset(outer Codec, inner Codec) bool {
- return outer == (outer | inner)
+func u64LE(b []byte) uint64 {
+ _ = b[7] // Early bounds check to guarantee safety of reads below.
+ return uint64(b[0]) | uint64(b[1])<<8 | uint64(b[2])<<16 | uint64(b[3])<<24 |
+ uint64(b[4])<<32 | uint64(b[5])<<40 | uint64(b[6])<<48 | uint64(b[7])<<56
}
// Range is the half-open range [low, high). It is invalid for low to be
@@ -78,11 +80,40 @@
// true.
type rNode [4096]byte
-func (b *rNode) arity() int { return int(b[3]) }
-func (b *rNode) codec() Codec { return Codec(b[(8*int(b[3]))+7]) }
-func (b *rNode) cPtrMax() int64 { return u48LE(b[(16*int(b[3]))+8:]) }
-func (b *rNode) dPtrMax() int64 { return u48LE(b[8*int(b[3]):]) }
-func (b *rNode) version() uint8 { return b[(16*int(b[3]))+14] }
+func (b *rNode) arity() int { return int(b[3]) }
+func (b *rNode) codecHasMixBit() bool { return b[(8*int(b[3]))+7]&0x40 != 0 }
+func (b *rNode) cPtrMax() int64 { return u48LE(b[(16*int(b[3]))+8:]) }
+func (b *rNode) dPtrMax() int64 { return u48LE(b[8*int(b[3]):]) }
+func (b *rNode) version() uint8 { return b[(16*int(b[3]))+14] }
+
+// codec returns the 1-byte (short) or 7-byte (long) codec as a valid Codec, or
+// CodecInvalid.
+//
+// The (uint64) Codec value returned does not have the Mix Bit (the 62nd bit)
+// set, regardless of whether it's set in the rNode's bytes.
+func (b *rNode) codec() Codec {
+ arity := int(b[3])
+ cByte := b[(8*arity)+7]
+
+ // Look for a short codec.
+ if (cByte & 0x80) == 0 {
+ return Codec(cByte&0x3F) << 56
+ }
+ cByte &= 0x3F
+
+ // Look for a long codec.
+ const highBit, low56Bits = 0x8000000000000000, 0x00FFFFFFFFFFFFFF
+ for j := uint8(0); j < 4; j++ {
+ i := int(cByte | (j << 6))
+ if (i < arity) && (b.tTag(i) == 0xFD) {
+ base := (8 * arity) + 8
+ codec := Codec(u64LE(b[(8*i)+base:]))
+ return (codec & low56Bits) | highBit
+ }
+ }
+
+ return CodecInvalid
+}
func (b *rNode) cLen(i int) uint8 {
base := (8 * int(b[3])) + 14
@@ -178,37 +209,49 @@
}
// Check that the "Reserved (0)" bytes are zero and that the TTag values
- // aren't in the reserved range [0xC0, 0xFE).
+ // aren't in the reserved range [0xC0, 0xFD).
+ hasChildren := false
for i := 0; i < arity; i++ {
if b[(8*i)+6] != 0 {
return false
}
- if tTag := b[(8*i)+7]; (0xC0 <= tTag) && (tTag < 0xFE) {
+ if tTag := b[(8*i)+7]; (0xC0 <= tTag) && (tTag < 0xFD) {
return false
+ } else if tTag != 0xFD {
+ hasChildren = true
}
}
- if b[(8*arity)+6] != 0 {
+ if !hasChildren || (b[(8*arity)+6] != 0) {
return false
}
// Check that the DPtr values are non-decreasing. The first DPtr value is
// implicitly zero.
- prev := u48LE(b[8*1:])
- for i := 2; i <= arity; i++ {
+ prev := int64(0)
+ for i := 1; i <= arity; i++ {
curr := u48LE(b[8*i:])
if curr < prev {
return false
+ } else if curr != prev {
+ if tTag := b[(8*i)+7]; tTag == 0xFD {
+ return false
+ }
}
prev = curr
}
- // Check that no CPtr value exceeds CPtrMax (the final CPtr value).
+ // Check that no CPtr value exceeds CPtrMax (the final CPtr value), other
+ // than 0xFD Codec Entries.
base := (8 * arity) + 8
cPtrMax := u48LE(b[size-8:])
for i := 0; i < arity; i++ {
- if cPtr := u48LE(b[(8*i)+base:]); cPtr > cPtrMax {
- return false
+ if cPtr := u48LE(b[(8*i)+base:]); cPtr <= cPtrMax {
+ continue
}
+ if tTag := b[(8*i)+7]; tTag == 0xFD {
+ continue
+ }
+ return false
}
// Check the the version is non-zero.
@@ -225,7 +268,8 @@
// Further checking of the codec, version, COffMax and DOffMax requires
// more context, and is done in loadAndValidate.
- return true
+
+ return b.codec().Valid()
}
// ChunkReader parses a RAC file.
@@ -431,7 +475,7 @@
}
func (r *ChunkReader) loadAndValidate(cOffset int64,
- parentCodec Codec, parentVersion uint8, parentCOffMax int64,
+ parentCodec Codec, parentCodecHasMixBit bool, parentVersion uint8, parentCOffMax int64,
childCBias int64, childDSize int64) error {
if (cOffset < 0) || ((r.CompressedSize - 4) < cOffset) {
@@ -467,7 +511,7 @@
// Validate the parent and child codec, version, COffMax and DOffMax.
childVersion := r.currNode.version()
- if !bitwiseSubset(parentCodec, r.currNode.codec()) ||
+ if !parentChildCodecsValid(parentCodec, r.currNode.codec(), parentCodecHasMixBit) ||
(parentVersion < childVersion) ||
(parentCOffMax < (childCBias + r.currNode.cPtrMax())) ||
(childDSize != r.currNode.dPtrMax()) {
@@ -553,6 +597,7 @@
}
parentCodec := r.currNode.codec()
+ parentCodecHasMixBit := r.currNode.codecHasMixBit()
parentVersion := r.currNode.version()
parentCOffMax := cBias + r.currNode.cPtrMax()
childCOffset := r.currNode.cOff(i, cBias)
@@ -564,7 +609,7 @@
childDSize := r.currNode.dSize(i)
if err := r.loadAndValidate(childCOffset,
- parentCodec, parentVersion, parentCOffMax,
+ parentCodec, parentCodecHasMixBit, parentVersion, parentCOffMax,
childCBias, childDSize); err != nil {
return err
}
diff --git a/lib/rac/chunk_writer.go b/lib/rac/chunk_writer.go
index f665b23..f07442e 100644
--- a/lib/rac/chunk_writer.go
+++ b/lib/rac/chunk_writer.go
@@ -15,6 +15,7 @@
package rac
import (
+ "errors"
"hash/crc32"
"io"
"sort"
@@ -346,6 +347,9 @@
dRangeSize uint64, codec Codec, primary []byte,
secondary OptResource, tertiary OptResource) error {
+ if w.err != nil {
+ return w.err
+ }
if dRangeSize == 0 {
return nil
}
@@ -361,6 +365,15 @@
if err := w.initialize(); err != nil {
return err
}
+ if len(w.leafNodes) == 0 {
+ if !codec.Valid() {
+ return errInvalidCodec
+ }
+ w.codec = codec
+ } else if w.codec != codec {
+ w.err = errors.New("rac: TODO: support writing multiple Codecs")
+ return w.err
+ }
if err := w.write(primary); err != nil {
return err
}
@@ -413,7 +426,7 @@
_, err := w.Writer.Write(emptyRACFile[:])
return err
}
- rootNode := gather(w.leafNodes)
+ rootNode := gather(w.leafNodes, w.codec.isLong())
indexSize := rootNode.calcEncodedSize(0, w.IndexLocation == IndexLocationAtEnd)
nw := &nodeWriter{
@@ -505,6 +518,9 @@
if arity == 0 {
return accumulator
}
+ if n.codec.isLong() {
+ arity++
+ }
size := (arity * 16) + 16
cLength := calcCLength(size)
@@ -552,12 +568,25 @@
}
}
+ if !n.codec.Valid() {
+ return errors.New("rac: TODO: support writing multiple Codecs")
+ }
+
buf, dPtr := w.buffer[:], uint64(0)
arity := uint64(len(n.children) + len(n.resources))
+ if n.codec.isLong() {
+ arity++
+ }
if arity > 0xFF {
return errInternalArityIsTooLarge
}
+ // Codec Element 'DPtr|Reserved0|TTag', if present.
+ if n.codec.isLong() {
+ putU64LE(buf, 0xFD00000000000000)
+ buf = buf[8:]
+ }
+
// DPtr's. We write resources before regular children (non-resources), so
// that any TTag that refers to a resource always avoids the [0xC0, 0xFD]
// reserved zone. The ratio of resources to regulars is at most 2:1, as
@@ -577,10 +606,18 @@
}
buf = buf[8*len(n.children):]
- // DPtrMax.
- putU64LE(buf, dPtr|(uint64(n.codec)<<56))
+ // DPtrMax and the CodecByte. By construction, a Long Codec's 'c64' value
+ // is always 0, as the Codec Element is always in position 0.
+ codecHighByte := uint64(n.codec) & 0xFF00000000000000
+ putU64LE(buf, dPtr|codecHighByte)
buf = buf[8:]
+ // Codec Element 'CPtr|CLen|STag', if present.
+ if n.codec.isLong() {
+ putU64LE(buf, uint64(n.codec)&0x00FFFFFFFFFFFFFF)
+ buf = buf[8:]
+ }
+
// CPtr's. While the RAC format allows otherwise, this nodeWriter always
// keeps the CBias at zero, so that a CPtr equals a COffset.
for i, res := range n.resources {
@@ -670,13 +707,22 @@
// that the root node always directly contains the first and last leaf nodes as
// children, as those leaves presumably contain the most commonly accessed
// parts of the decompressed file.
-func gather(nodes []wNode) wNode {
+//
+// If doing this TODO, we'd also have to change the "codec = codecMixBit |
+// CodecZeroes" line below, as it assumes that no branch nodes have both branch
+// node children and leaf node children.
+func gather(nodes []wNode, codecIsLong bool) wNode {
if len(nodes) == 0 {
panic("gather: no nodes")
}
resources := map[OptResource]bool{}
+ arityBudget := 0xFF
+ if codecIsLong {
+ arityBudget = 0xFE
+ }
+
for {
i, j, arity, newNodes := 0, 0, 0, []wNode(nil)
for ; j < len(nodes); j++ {
@@ -685,7 +731,7 @@
new2 := (o.secondary != 0) && !resources[o.secondary]
new3 := (o.tertiary != 0) && !resources[o.tertiary]
arity += 1 + btoi(new2) + btoi(new3)
- if arity <= 0xFF {
+ if arity <= arityBudget {
if new2 {
resources[o.secondary] = true
}
@@ -727,9 +773,19 @@
func makeBranch(children []wNode, resMap map[OptResource]bool) wNode {
dRangeSize, codec := uint64(0), Codec(0)
- for _, c := range children {
+ for i, c := range children {
dRangeSize += c.dRangeSize
- codec |= c.codec
+ if i == 0 {
+ codec = c.codec
+ } else if codec != c.codec {
+ // We construct the node tree so that a branch node's children are
+ // either all branch nodes or all leaf nodes. If they are all leaf
+ // nodes (with the same Codec), or all branch nodes with the same
+ // Codec, the parent uses the same Codec. If they are all branch
+ // nodes, with different Codecs, we set the Mix bit and since it
+ // has no leaf nodes, we might as well use CodecZeroes.
+ codec = codecMixBit | CodecZeroes
+ }
}
resList := []int(nil)
diff --git a/lib/rac/rac.go b/lib/rac/rac.go
index 8933ced..093fa9e 100644
--- a/lib/rac/rac.go
+++ b/lib/rac/rac.go
@@ -39,14 +39,71 @@
// Codec is the compression codec for the RAC file.
//
+// For leaf nodes, there are two categories of valid Codecs: Short and Long. A
+// Short Codec's uint64 value's high 2 bits and low 56 bits must be zero. A
+// Long Codec's uint64 value's high 8 bits must be one and then 7 zeroes.
+// Symbolically, Short and Long match:
+// - 0b00??????_00000000_00000000_00000000_00000000_00000000_00000000_00000000
+// - 0b10000000_????????_????????_????????_????????_????????_????????_????????
+//
+// In terms of the RAC file format, a Short Codec fits in a single byte: the
+// Codec Byte in the middle of a branch node. A Long Codec uses that Codec Byte
+// to locate 7 other bytes, a location which would otherwise form a "CPtr|CLen"
+// value. When converting from the 7 bytes in the wire format to this Go type
+// (a uint64 value), they are read little-endian: the "CPtr" bytes are the low
+// 6 bytes, the "CLen" is the second-highest byte and the highest byte is
+// hard-coded to 0x80.
+//
+// For example, the 7 bytes "mdo2\x00\x00\x00" would correspond to a Codec
+// value of 0x8000_0000_326F_646D.
+//
+// The Mix Bit is not part of the uint64 representation. Neither is a Long
+// Codec's 'c64' index. This package's exported API deals with leaf nodes.
+// Branch nodes' wire formats are internal implementation details.
+//
// See the RAC specification for further discussion.
-type Codec uint8
+type Codec uint64
+
+func (c Codec) isLong() bool { return int64(c) < 0 }
+func (c Codec) isShort() bool { return int64(c) >= 0 }
+
+// Valid returns whether c matches the Short or Long pattern.
+func (c Codec) Valid() bool {
+ if (c >> 63) == 0 {
+ return ((c << 8) == 0) && ((c >> 62) == 0)
+ }
+ return (c >> 56) == 0x80
+}
+
+func (c Codec) name() string {
+ if c.isShort() && c.Valid() {
+ switch c >> 56 {
+ case 0:
+ return "Zeroes"
+ case 1:
+ return "Zlib"
+ case 2:
+ return "Brotli"
+ case 4:
+ return "Zstandard"
+ }
+ }
+ return ""
+}
+
+func parentChildCodecsValid(parent Codec, child Codec, parentHasMixBit bool) bool {
+ return (parent == child) || parentHasMixBit
+}
const (
- CodecZeroes = Codec(0x00)
- CodecZlib = Codec(0x01)
- CodecBrotli = Codec(0x02)
- CodecZstandard = Codec(0x04)
+ CodecZeroes = Codec(0x00 << 56)
+ CodecZlib = Codec(0x01 << 56)
+ CodecBrotli = Codec(0x02 << 56)
+ CodecZstandard = Codec(0x04 << 56)
+
+ codecMixBit = Codec(1 << 62)
+ codecLongZeroes = Codec(1 << 63)
+ CodecInvalid = Codec((1 << 64) - 1)
)
// IndexLocation is whether the index is at the start or end of the RAC file.
diff --git a/lib/rac/rac_test.go b/lib/rac/rac_test.go
index a351d81..7daa197 100644
--- a/lib/rac/rac_test.go
+++ b/lib/rac/rac_test.go
@@ -594,3 +594,53 @@
t.Fatalf("got %q, want %q", got, want)
}
}
+
+func TestLongCodec(t *testing.T) {
+ const codec = Codec(0x80000000326F646D) // "mdo2" backwards, with a high bit.
+ buf := &bytes.Buffer{}
+ w := &ChunkWriter{
+ Writer: buf,
+ IndexLocation: IndexLocationAtStart,
+ TempFile: &bytes.Buffer{},
+ }
+ if err := w.AddChunk(0x66, codec, []byte{0xAA, 0xBB}, 0, 0); err != nil {
+ t.Fatalf("AddChunk: %v", err)
+ }
+ if err := w.AddChunk(0x77, codec, []byte{0xCC}, 0, 0); err != nil {
+ t.Fatalf("AddChunk: %v", err)
+ }
+ if err := w.Close(); err != nil {
+ t.Fatalf("Close: %v", err)
+ }
+
+ encoded := buf.Bytes()
+ gotHexDump := hex.Dump(encoded)
+
+ const wantHexDump = "" +
+ "00000000 72 c3 63 03 3d c9 00 fd 00 00 00 00 00 00 00 ff |r.c.=...........|\n" +
+ "00000010 66 00 00 00 00 00 00 ff dd 00 00 00 00 00 00 80 |f...............|\n" +
+ "00000020 6d 64 6f 32 00 00 00 00 40 00 00 00 00 00 01 ff |mdo2....@.......|\n" +
+ "00000030 42 00 00 00 00 00 01 ff 43 00 00 00 00 00 01 03 |B.......C.......|\n" +
+ "00000040 aa bb cc |...|\n"
+
+ if gotHexDump != wantHexDump {
+ t.Fatalf("\ngot:\n%s\nwant:\n%s", gotHexDump, wantHexDump)
+ }
+
+ r := &ChunkReader{
+ ReadSeeker: bytes.NewReader(encoded),
+ CompressedSize: int64(len(encoded)),
+ }
+ for i := 0; i < 2; i++ {
+ c, err := r.NextChunk()
+ if err != nil {
+ t.Fatalf("i=%d: %v", i, err)
+ }
+ if got, want := c.Codec, codec; got != want {
+ t.Fatalf("i=%d: Codec: got 0x%X, want 0x%X", i, got, want)
+ }
+ }
+ if _, err := r.NextChunk(); err != io.EOF {
+ t.Fatalf("got %v, want %v", err, io.EOF)
+ }
+}
diff --git a/lib/rac/reader.go b/lib/rac/reader.go
index d64bb92..dc50e17 100644
--- a/lib/rac/reader.go
+++ b/lib/rac/reader.go
@@ -368,7 +368,7 @@
return r.err
}
- if chunk.Codec == 0 {
+ if (chunk.Codec == CodecZeroes) || (chunk.Codec == codecLongZeroes) {
r.currChunk.N = 0
r.dRange = chunk.DRange
r.zeroes = zeroesReader(r.dRange.Size())
@@ -384,16 +384,12 @@
}
}
if codecReader == nil {
- name := ""
- switch chunk.Codec {
- case CodecZlib:
- name = " (Zlib)"
- case CodecBrotli:
- name = " (Brotli)"
- case CodecZstandard:
- name = " (Zstandard)"
+ name0, name1, name2 := "", chunk.Codec.name(), ""
+ if name1 != "" {
+ name0, name2 = " (", ")"
}
- r.err = fmt.Errorf("rac: no matching CodecReader for Codec 0x%02X%s", chunk.Codec, name)
+ r.err = fmt.Errorf("rac: no matching CodecReader for Codec 0x%X%s%s%s",
+ chunk.Codec, name0, name1, name2)
return r.err
}
diff --git a/lib/raczlib/raczlib_test.go b/lib/raczlib/raczlib_test.go
index aba447d..66bfd54 100644
--- a/lib/raczlib/raczlib_test.go
+++ b/lib/raczlib/raczlib_test.go
@@ -161,7 +161,7 @@
buf[0x37] = 0x00 // CBiasing with COff[0], which is encLen0.
// Codec and Version.
- buf[0x1F] = byte(rac.CodecZlib)
+ buf[0x1F] = byte(rac.CodecZlib >> 56)
buf[0x3E] = 0x01
// Checksum.