blob: ffce34bcd13190f19cd2a58abe203708cbcc986a [file] [log] [blame] [view]
# Naïve Image Formats: NIE, NII, NIA
Status: Draft (as of November 2021). There is no compatibility guarantee yet.
A companion document has further discussion of [NIE related
work](/doc/spec/nie-related-work.md).
## NIE: Still Images
NIE is an easily parsed, uncompressed, lossless format for still (single frame)
images. The 16 byte header:
- 4 bytes of 'magic': \[0x6E, 0xC3, 0xAF, 0x45\], the UTF-8 encoding of "nïE".
- 4 bytes of version-and-configuration: \[0xFF, 0x62, 0x6E or 0x70, 0x34 or
0x38\].
- The first byte denotes the overall NIE/NII/NIA format version. 0xFF (which
is not valid UTF-8) denotes version 1. There are no other valid versions at
this time.
- The second byte must be an ASCII 'b'. This denotes that the payload is in
BGRA order (not RGBA), in terms of the wire format, independent of CPU
endianness.
- The third byte, either an ASCII 'n' or an ASCII 'p', denotes whether the
payload contains non-premultiplied or premultiplied alpha.
- The fourth byte, either an ASCII '4' or an ASCII '8', denotes whether there
are 4 or 8 bytes per pixel.
- Future format versions may allow other byte values, but in version 1, it
must be '\xFF', then 'b', then 'n' or 'p', then '4' or '8'.
- 4 bytes little-endian `uint32` width. The high bit must not be set.
- 4 bytes little-endian `uint32` height. The high bit must not be set.
The payload:
- 4 or 8 bytes per pixel. W×H pixels in row-major order (horizontally adjacent
pixels are adjacent in memory). For example, with 8 bytes per pixel, those
bytes are \[B₀, B₁, G₀, G₁, R₀, R₁, A₀, A₁\]. The ₀ and ₁ subscripts denote
the low and high bytes of each little-endian `uint16`.
That's it.
### Example NIE File
This still image is 3 pixels wide and 2 pixels high. It is a crude
approximation to the French flag, being three columns: blue, white and red.
00000000 6e c3 af 45 ff 62 6e 34 03 00 00 00 02 00 00 00 |n..E.bn4........|
00000010 ff 00 00 ff ff ff ff ff 00 00 ff ff ff 00 00 ff |................|
00000020 ff ff ff ff 00 00 ff ff |........|
## NII: Animated Images, Timing Index Only, Out-of-Band Frames
NII is an index for animated (multiple frame) images. In video compression
terminology, every NII frame is an I-frame, also known as a keyframe. A NII
file doesn't contain the images per se, only the duration that each frame
should be shown.
The per-frame images, not part of a NII file, may be NIE images, but they may
also be in other formats, such as PNG or WebP or a heterogenous mixture. They
may be static files (possibly with systematic filenames such as
`frame000000.png`, `frame000001.png`, etc.) or dynamically generated. That is
for each application to decide, and out of scope of this specification.
A NII file consists of a 16 byte header, a variable sized payload and an 8 byte
footer.
### NII Header
The 16 byte NII header:
- 4 bytes of 'magic': \[0x6E, 0xC3, 0xAF, 0x49\]. The final byte differs from
NIE: an ASCII 'I' instead of an ASCII 'E'.
- 4 bytes of version-and-padding, all 0xFF.
- 4 bytes little-endian `uint32` width. The high bit must not be set.
- 4 bytes little-endian `uint32` height. The high bit must not be set.
### NII Payload
The payload is a sequence of 0 or more frames, exactly 8 bytes (a little-endian
`uint64`) per frame:
- The most significant bit is 0.
- The low 63 bits are the cumulative display duration (CDD). This is the amount
of time, relative to the start of the animation, at which display should
proceed to the next frame.
Every frame's CDD must be greater than or equal to the previous frame's CDD (or
for the first frame, greater than or equal to zero, which will always be true).
For example, if an animation has four frames, to be displayed for 1 second, 2
seconds, 0 seconds and finally 4.5 seconds, then the CDD's are 1s, 3s, 3s and
7.5s. NII's unit of time is [flicks](https://github.com/OculusVR/Flicks): one
flick (frame-tick) is 1 / 705\_600\_000 of a second. Continuing our example,
the CDDs (in decimal and then hexadecimal) are:
- 705\_600\_000 × 1.0 = 0\_705\_600\_000 = 0x0000\_0000\_2A0E\_9A00.
- 705\_600\_000 × 3.0 = 2\_116\_800\_000 = 0x0000\_0000\_7E2B\_CE00.
- 705\_600\_000 × 3.0 = 2\_116\_800\_000 = 0x0000\_0000\_7E2B\_CE00.
- 705\_600\_000 × 7.5 = 5\_292\_000\_000 = 0x0000\_0001\_3B6D\_8300.
Animations lasting `(1<<63)` or more flicks, more than 400 years, are not
representable in the NII format.
### NII Footer
The 8 byte NII footer:
- 4 bytes little-endian `uint32` LoopCount.
- 4 bytes: \[0x00, 0x00, 0x00, 0x80\].
A zero LoopCount means that the animation loops forever. Non-zero means that
the animation is played LoopCount times and then stops. This is the
[APNG](https://wiki.mozilla.org/APNG_Specification) meaning, not the
[GIF](https://www.w3.org/Graphics/GIF/spec-gif89a.txt) meaning (the number of
times to repeat the loop _after_ the first play). The two meanings differ by 1.
### Example NII File
This animated image is 3 pixels wide and 2 pixels high. It consists of 20
frames, being 10 loops of 2 frames. The total animation time of a single loop
is 3 seconds, so the 10 loops will take 30 seconds. The first frame is shown
for 1 second. The next frame is shown for (3 - 1) seconds (i.e., 2 seconds).
The actual pixel data per frame is stored elsewhere.
00000000 6e c3 af 49 ff ff ff ff 03 00 00 00 02 00 00 00 |n..I............|
00000010 00 9a 0e 2a 00 00 00 00 00 ce 2b 7e 00 00 00 00 |...*......+~....|
00000020 0a 00 00 00 00 00 00 80 |........|
## NIA: Animated Images, In-band Frames
NIA is like a NII file where the per-frame still images are NIE files
interleaved between the NII payload values.
The NIA header is the same as the 16 byte NII header, except that the 4 byte
'magic' ends in an ASCII 'A' instead of an ASCII 'I', and the 5th to 8th bytes
are version-and-configuration (the same as for NIE), instead of NII's
version-and-padding. The range of valid version-and-configuration bytes is the
same for NIA as it is for NIE.
The NIA footer is the same as the 8 byte NII footer.
The payload is a sequence of 0 or more frames. Each frame is:
- 8 bytes little-endian `uint64` value, the same meaning and constraints as a
NII payload value.
- A complete NIE image: header and payload. The outer NIA and inner NIE
must have the same 12 bytes of version-and-configuration, width and height.
- Either 0 or 4 bytes of padding. If present, it must be all zeroes. The
padding ensures that the size of the padded NIE image a multiple of 8 bytes,
so that every CDD field is 8 byte aligned. The padding size is 4 if and only
if there are 4 (not 8) bytes per pixel and both the width and height are odd.
A C programming language expression for its presence is `((bytes_per_pixel ==
4) && (width & height & 1))`.
### Example NIA File
This animated image is 3 pixels wide and 2 pixels high. It consists of 20
frames, being 10 loops of 2 frames. The total animation time of a single loop
is 3 seconds, so the 10 loops will take 30 seconds. The first frame is a crude
approximation to the French flag (blue, white and red) and is shown for 1
second. The next frame is a crude approximation to the Italian flag (green,
white and red) and is shown for (3 - 1) seconds (i.e., 2 seconds).
00000000 6e c3 af 41 ff 62 6e 34 03 00 00 00 02 00 00 00 |n..A.bn4........|
00000010 00 9a 0e 2a 00 00 00 00 6e c3 af 45 ff 62 6e 34 |...*....n..E.bn4|
00000020 03 00 00 00 02 00 00 00 ff 00 00 ff ff ff ff ff |................|
00000030 00 00 ff ff ff 00 00 ff ff ff ff ff 00 00 ff ff |................|
00000040 00 ce 2b 7e 00 00 00 00 6e c3 af 45 ff 62 6e 34 |..+~....n..E.bn4|
00000050 03 00 00 00 02 00 00 00 00 ff 00 ff ff ff ff ff |................|
00000060 00 00 ff ff 00 ff 00 ff ff ff ff ff 00 00 ff ff |................|
00000070 0a 00 00 00 00 00 00 80 |........|
# Commentary
## Motivation
One motivating example is securely decoding untrusted images, perhaps uploaded
from potentially malicious actors. Codec libraries have been a rich source of
software security vulnerabilities in the past. One response is to split off
such code into a separate, sandboxed process that reads the compressed image
and writes the equivalent NIE/NIA image, perhaps through pipes or shared
memory. The untrusted codec library processes the untrusted data within the
sandboxed worker process. The unsandboxed manager process only needs to handle
the much simpler NIE/NIA format. That format can be further simplified by the
manager mandating a fixed version-and-configuration, such as v1-"bn4".
Another example is connecting a series of independent image manipulation
programs, each component reading, transforming and then writing a NIE/NIA
image. Such filters can be written in simple programming languages and
connected with Unix-style pipes.
Another example is storing 'golden images' (or their hashes) for codec
development. Given a corpus of test images in a compressed format (e.g. a
corpus of PNG files), it is useful to store their expected decodings for
comparison, but those golden test files should be encoded in an alternative
format, such as NIE/NIA.
## Magic
The 4 bytes of 'magic' are the UTF-8 encoding of the non-ASCII strings "nïE",
"nïI" and "nïA". The unusual capitalization lessens the chance of plain text
data accidentally matching these magic bytes.
## Alpha Premultiplication
For premultiplied alpha, it is valid for a pixel's blue, green or red values to
be greater than its alpha value. Interpretation of such super-saturated colors
is out of scope of this specification.
A program that simply extracts a subset of a NIA's frames as a new NIA
animation is not required to examine or re-encode every payload byte in order
to always output valid NIA data.
## Random Access
Given a NIA animation's bytes per pixel, width and height _B_, _W_ and _H_, the
offset and length of the _i_'th frame's NIE data within that NIA is a simple
computation (but remember to check for overflow):
- length = roundup8(_B_ × _W_ × _H_) + 16
- offset = ((length + 8) × _i_) + 24
The roundup8 function rounds its argument up to the nearest multiple of 8.
This is random access by frame index (the "_i_" in "the _i_'th frame"), not by
time, as different frames can have different display durations.
## Degenerate Animations
A still image is, in some sense, an animated image with a single frame, albeit
without an explicit display duration or looping behavior. Some animated image
formats also support zero frames, just like the empty string being a valid
string. For these degenerate (0 or 1 frame) cases, when converting to NII or
NIA, the convention (but not requirement) is a zero CDD and a zero LoopCount.
## Cumulative Display Duration
Other animation formats, like
[APNG](https://wiki.mozilla.org/APNG_Specification) and
[GIF](https://www.w3.org/Graphics/GIF/spec-gif89a.txt), provide display
durations relative to the previous frame, not relative to the initial frame.
The two schemes are equivalent, in that from a complete stream, either one can
be derived from the other. NII / NIA frames report the cumulative number so
that random access by time can be implemented as a binary search, given random
access by frame index.
Suppose we are given a time _t_ ≥ 0 and want to find the frame to show at that
time. First, there may be no such frame, if the animation contains no frames.
Otherwise, let _o_ be the final frame's CDD, so that _o_ ≥ 0. If _o_ is zero,
the frame to show is the final frame of the animation, and no further
computation is necessary.
Otherwise, calculate the number of loops that would complete by time t: _n_ =
_t_ / _o_, rounding down to the nearest integer. If the LoopCount is non-zero
(as zero means loop forever) and _n_ ≥ LoopCount then the frame to show is the
final frame.
Otherwise, calculate _t′_ = _t_ - (_n_ × _o_), the time 'modulo' _o_. Binary
search to find the smallest _i_ ≥ 0 such that both CDD(_i_) > _t′_ and the
_i_th frame is non-instantaneous. CDD(_i_) is the cumulative display duration
for frame _i_. The first frame is instantaneous if its CDD is zero. Any other
frame is instantaneous if its CDD equals its previous frame's CDD.
## Integer Arithmetic Overflow
Parsing NIE data is almost trivial, but care should be taken to avoid integer
arithmetic overflow when calculating the pixel buffer size from fields in the
NIE header. For example, a C programming language statement like `size_t
row_size = bytes_per_pixel * width;` is incorrect without additional prior
checks. A careful C implementation is:
```c
#include <stdbool.h>
#include <stdint.h>
// nie_payload_size calculates the size in bytes of a NIE payload, given the
// metadata from the NIE header: bytes_per_pixel, width and height. The max
// argument, not defined in the metadata, is the caller's maximum acceptable
// payload size. For example, pass SIZE_MAX for max, or pass a smaller value if
// you wish to limit the memory required to decode an arbitrary NIE file (and
// reject otherwise valid NIE files that would require more memory).
//
// That size is essentially (bytes_per_pixel * width * height), but this
// function checks for integer arithmetic overflow. It also checks that the
// result pointer is non-NULL, the result (the calculated payload size) is less
// than or equal to max, and that the bytes_per_pixel is either 4 or 8.
//
// The bool return value is whether all checks pass. On success, it sets
// *result to the payload size.
bool nie_payload_size(size_t* result,
size_t max,
uint32_t bytes_per_pixel,
uint32_t width,
uint32_t height) {
if ((result == NULL) || ((bytes_per_pixel != 4) && (bytes_per_pixel != 8))) {
return false;
}
uint64_t n = ((uint64_t)width) * ((uint64_t)height);
// bpp_shift is 2 or 3, depending on bytes_per_pixel being 4 or 8.
uint32_t bpp_shift = 2 + (bytes_per_pixel >> 3);
if (n > (max >> bpp_shift)) {
return false;
}
n <<= bpp_shift;
*result = (size_t)n;
return true;
}
```
## Color Management and Other Metadata
There is no facility for describing color spaces, gamma, palettes or other
metadata such as EXIF information. For example, when using a sandboxed worker
process to convert from a PNG image (with an embedded color profile) to a NIE
image, the target color space should be provided to the worker out-of-band.
## YUV Color
There is no facility for explicitly describing YUV or Y'CbCr color. Converting
between NIE/NIA and formats such as
[JPEG](http://www.w3.org/Graphics/JPEG/itu-t81.pdf) or [WebP
Lossy](https://developers.google.com/speed/webp/docs/riff_container) is a lossy
process, although JPEG and WebP Lossy are lossy formats to begin with.
## Bytes versus Octets
It was not always the case, historically, but in this specification, `byte` is
synonymous with `octet` and `uint8`.
# Filename Extensions and MIME Types
The recommended filename extensions are `.nie`, `.nii` and `.nia`.
The recommended MIME types are `image/nie`, `image/nii` and `image/nia`.
## Why NIE and not NIF (for Naïve Image Format)?
The `.nif` filename extension is already used by the NetImmerse / Gamebryo game
engine. Instead, you can think of `.nie` as derived from the word "naïve".
## Pronunciation
I pronounce "NIE", "NII" and "NIA" as /naɪˈiː/, /naɪˈaɪ/ and /naɪˈeɪ/, ending
in a long "E", "I" or "A" sound. It's definitely a hard "N", not a soft one.
---
Updated on November 2021.