blob: b1c1302aacebf5909b9f1cae7755727ee0fe5e9f [file] [log] [blame] [view]
# Quirks
Quirks are configuration options that let Wuffs more closely match other codec
implementations.
Some codecs and file formats are more tightly specified than others. When a
spec doesn't fully dictate how to treat malformed files, different codec
implementations have disagreed on whether to accept it (e.g. whether they
follow Postel's Principle of "be liberal in what you accept") and if so, how to
interpret it. Some implementations also raise those decisions to the
application level, not the library level: "provide mechanism, not policy".
For example, in an animated GIF image, the first frame might not fill out the
entire image, and different implementations choose a different default color
for the outside pixels: opaque black, transparent black, or something else.
Other file formats have unofficial extensions. For example, a strict reading of
the [JSON specification](https://json.org/) rejects an initial `"\x1E"` ASCII
Record Separator, `"// comments"` or commas after the final array element or
object key-value pair (e.g. `"[1,2,3,]"`), but
[some](https://www.ietf.org/rfc/rfc7464.txt) [extensions](https://json5.org) to
the format allow these and other changes.
Wuffs, out of the box, makes particular choices (typically mimicking the de
facto canonical implementation), but enabling various quirks results in
different choices. In particular, quirks are useful in regression testing that
Wuffs and another implementation produce the same output for the same input,
even for malformed input.
## Wuffs API
Each quirk is assigned a `uint32_t` value, packed using the [base38 namespace
convention](/doc/note/base38-and-fourcc.md). Decoders and encoders can have a
`set_quirk_enabled!(quirk base.u32, enabled base.bool)` method whose first
argument is this `uint32_t` value.
For example, the base38 encoding of `"gif "` and `"json"` is `0x0F8586` and
`0x124265` respectively, so that the GIF-specific quirks have a `uint32_t`
value of `((0x0F8586 << 10) | g)`, and the JSON-specific quirks have a
`uint32_t` value of `((0x124265 << 10) | j)`, for some small integers `g` and
`j`. The high bits are a namespace. The overall quirk values are different even
if `g` and `j` re-use the same 10-bit integer.
The generated `C` language file defines human-readable names for those constant
values, such as `WUFFS_JSON__QUIRK_ALLOW_LEADING_UNICODE_BYTE_ORDER_MARK`.
## Listing
Common quirks:
- `WUFFS_BASE__QUIRK_IGNORE_CHECKSUM` configures decoders (but not encoders) to
skip checksum verification. This can result in [noticably faster
decodes](https://github.com/google/wuffs/commit/170a8104867fa818d329d85921012c922577c955),
at a cost of being less able to detect data corruption and to deviate from a
strict reading of the relevant file format specifications, accepting some
inputs that are technically invalid (but otherwise decode fine).
Package-specific quirks:
- [GIF image decoder quirks](/std/gif/decode_quirks.wuffs)
- [JSON decoder quirks](/std/json/decode_quirks.wuffs)