spec/basis_spec.txt - external/github.com/BinomialLLC/basis_universal - Git at Google

 Note: The latest specification is in the Basis Universal wiki, here:
 https://github.com/BinomialLLC/basis_universal/wiki/.basis-File-Format-and-ETC1S-Texture-Video-Specification

 File: basis_spec.txt
 Version 1.01

 1.0 Introduction
 ----------------

 The Basis Universal GPU texture codec supports reading and writing ".basis" files.
 The .basis file format supports ETC1S or UASTC 4x4 texture data.

 * ETC1S is a simplified subset of ETC1.

 The mode is always differential (diff bit=1), the Rd, Gd, and Bd color deltas
 are always (0,0,0), and the flip bit is always set. ETC1S texture data is fully
 compliant with all existing software and hardware ETC1 decoders. Existing encoders
 can be easily modified to limit their output to ETC1S.

 * UASTC 4x4 is a 19 mode subset of the ASTC texture format. Its specification is
 [here](https://github.com/BinomialLLC/basis_universal/wiki/UASTC-Texture-Specification). UASTC texture data can always be losslessly transcoded to ASTC.

 2.0 High-Level File Structure
 -----------------------------

 A .basis file consists of multiple sections. Apart from the header, which must always
 be at the start of the file, the other sections may appear in any order.

 Here's the high level organization of a typical .basis file:

 * The file header
 * Optional ETC1S compressed endpoint/selector codebooks
 * Optional ETC1S Huffman table information
 * A required "slice" description array describing the resolutions and file offset/compressed sizes of each texture slice present in the file
 * 1 or more slices containing ETC1S or UASTC compressed texture data.
 * For future expansion, the format supports an "extended" header which may be located anywhere in the file. This section contains .PNG-like chunked data.

 3.0 File Enums
 --------------

 // basis_file_header::m_tex_type
 enum basis_texture_type
 {
   cBASISTexType2D = 0,
   cBASISTexType2DArray = 1,
   cBASISTexTypeCubemapArray = 2,
   cBASISTexTypeVideoFrames = 3,
   cBASISTexTypeVolume = 4,
   cBASISTexTypeTotal
 };

 // basis_slice_desc::flags
 enum basis_slice_desc_flags
 {
   cSliceDescFlagsHasAlpha = 1,
   cSliceDescFlagsFrameIsIFrame = 2
 };

 // basis_file_header::m_tex_format
 enum basis_tex_format
 {
   cETC1S = 0,
   cUASTC4x4 = 1
 };

 // basis_file_header::m_flags
 enum basis_header_flags
 {
   cBASISHeaderFlagETC1S = 1,
   cBASISHeaderFlagYFlipped = 2,
   cBASISHeaderFlagHasAlphaSlices = 4
 };

 4.0 File Structures
 -------------------

 All individual members in all file structures are byte aligned and little endian. The structs
 have no padding (i.e. they are declared with #pragma pack(1)).

 4.1 "basis_file_header" structure
 ---------------------------------

 The file header must always be at the beginning of the file.

 struct basis_file_header
 {
   uint16      m_sig;              // 2 byte file signature
   uint16      m_ver;              // File version
   uint16      m_header_size;      // Header size in bytes, sizeof(basis_file_header) or 0x4D
   uint16      m_header_crc16;     // CRC16/genibus of the remaining header data

   uint32      m_data_size;        // The total size of all data after the header
   uint16      m_data_crc16;       // The CRC16 of all data after the header

   uint24      m_total_slices;     // The number of compressed slices
   uint24      m_total_images;     // The total # of images

   byte        m_tex_format;       // enum basis_tex_format
   uint16      m_flags;            // enum basis_header_flags
   byte        m_tex_type;         // enum basis_texture_type
   uint24      m_us_per_frame;     // Video: microseconds per frame

   uint32      m_reserved;         // For future use
   uint32      m_userdata0;        // For client use
   uint32      m_userdata1;        // For client use

   uint16      m_total_endpoints;          // ETC1S: The number of endpoints in the endpoint codebook
   uint32      m_endpoint_cb_file_ofs;     // ETC1S: The compressed endpoint codebook's file offset relative to the start of the file
   uint24      m_endpoint_cb_file_size;    // ETC1S: The compressed endpoint codebook's size in bytes

   uint16      m_total_selectors;          // ETC1S: The number of selectors in the selector codebook
   uint32      m_selector_cb_file_ofs;     // ETC1S: The compressed selector codebook's file offset relative to the start of the file
   uint24      m_selector_cb_file_size;    // ETC1S: The compressed selector codebook's size in bytes

   uint32      m_tables_file_ofs;          // ETC1S: The file offset of the compressed Huffman codelength tables.
   uint32      m_tables_file_size;         // ETC1S: The file size in bytes of the compressed Huffman codelength tables.

   uint32      m_slice_desc_file_ofs;      // The file offset to the slice description array, usually follows the header
   uint32      m_extended_file_ofs;        // The file offset of the "extended" header and compressed data, for future use
   uint32      m_extended_file_size;       // The file size in bytes of the "extended" header and compressed data, for future use
 };

 4.1.1 Details:

 * m_sig is always 'B' * 256 + 's', or 0x4273.
 * m_ver is currently always 0x10.
 * m_header_size is sizeof(basis_file_header). It's always 0x4D.
 * m_header_crc16 is the CRC-16 of the remaining header data. See the "CRC-16" section 5.0 below for more information.
 * m_data_size, m_data_crc16: The size of all data following the header, and its CRC-16.
 * m_total_slices: The total number of slices, from [1,2^24-1]
 * m_total_images: The total number of images (where one image can contain multiple mipmap levels, and each mipmap level is a different slice).
 * m_tex_format: basis_tex_format. Either cETC1S (0), or cUASTC4x4 (1).
 * m_flags: A combination of flags from the basis_header_flags enum.
 * m_tex_type: The texture type, from enum basis_texture_type
 * m_us_per_frame: Microseconds per frame, only valid for cBASISTexTypeVideoFrames texture types.
 * m_total_endpoints, m_endpoint_cb_file_ofs, m_endpoint_cb_file_size: Information about the compressed ETC1S endpoint codebook: The total # of entries, the offset to the compressed data, and the compressed data's size.
 * m_total_selectors, m_selector_cb_file_ofs, m_selector_cb_file_size: Information about the compressed ETC1S selector codebook: The total # of entries, the offset to the compressed data, and the compressed data's size.
 * m_tables_file_ofs, m_tables_file_size: The file offset and size of the compressed Huffman tables for ETC1S format files.
 * m_slice_desc_file_ofs: The file offset to the array of slice description structures. There will be m_total_slices structures at this file offset.
 * m_extended_file_ofs, m_extended_file_size: The "extended" header, for future expansion. Currently unused.

 4.2 "basis_slice_desc" structure
 --------------------------------

 struct basis_slice_desc
 {
     uint24 m_image_index;
     uint8 m_level_index;
     uint8 m_flags;

     uint16 m_orig_width;
     uint16 m_orig_height;

     uint16 m_num_blocks_x;
     uint16 m_num_blocks_y;

     uint32 m_file_ofs;
     uint32 m_file_size;

     uint16 m_slice_data_crc16;
 };

 4.2.1 Details:

 * m_image_index: The index of the source image provided to the encoder (will always appear in order from first to last, first image index is 0, no skipping allowed)
 * m_level_index: The mipmap level index (mipmaps will always appear from largest to smallest)
 * m_flags: enum basis_slice_desc_flags
 * m_orig_width: The original image width (may not be a multiple of 4 pixels)
 * m_orig_height: The original image height (may not be a multiple of 4 pixels)
 * m_num_blocks_x: The slice's block X dimensions. Each block is 4x4 pixels. The slice's pixel resolution may or may not be a power of 2.
 * m_num_blocks_y: The slice's block Y dimensions.
 * m_file_ofs: Offset from the start of the file to the start of the slice's data
 * m_file_size: The size of the compressed slice data in bytes
 * m_slice_data_crc16: The CRC16 of the compressed slice data, for extra-paranoid use cases

 5.0 CRC-16 Function
 -------------------

 .basis files use CRC-16/genibus(aka CRC-16 EPC, CRC-16 I-CODE, CRC-16 DARC) format CRC-16's.

 Here's an example function in C++:

 uint16_t crc16(const void* r, size_t size, uint16_t crc)
 {
   crc = ~crc;
   const uint8_t* p = static_cast<const uint8_t*>(r);
   for ( ; size; --size)
   {
     const uint16_t q = *p++ ^ (crc >> 8);
     uint16_t k = (q >> 4) ^ q;
     crc = (((crc << 8) ^ k) ^ (k << 5)) ^ (k << 12);
   }

   return static_cast<uint16_t>(~crc);
 }

 This function is called with 0 in the final "crc" parameter when computing CRC-16's of file data.

 6.0 Compressed Huffman Tables
 -----------------------------

 ETC1S format .basis files rely heavily on static [canonical Huffman
 prefix coding](https://en.wikipedia.org/wiki/Canonical_Huffman_code).  Multiple
 Huffman tables are used by each compressed section. Huffman codes are stored in
 each output byte in LSB to MSB order. (This is opposite of the JPEG format,
 which stores the codes in MSB to LSB order.)

 Huffman coding in .basis is compatible with the canonical Huffman methods used
 by Deflate encoders/decoders. Section 3.2.2 of [Deflate - RFC
 1951](https://tools.ietf.org/html/rfc1951), which describes how to compute the
 value of each Huffman code given an array of symbol codelengths. This document
 assumes familiarity with how Huffman coding works in Deflate.

 First, some enums:

 enum
 {
     // Max supported Huffman code size is 16-bits
     cHuffmanMaxSupportedCodeSize = 16,

     // The maximum number of symbols  is 2^14
     cHuffmanMaxSymsLog2 = 14,
     cHuffmanMaxSyms = 1 << cHuffmanMaxSymsLog2,

     // Small zero runs may range from 3-10 entries
     cHuffmanSmallZeroRunSizeMin = 3,
     cHuffmanSmallZeroRunSizeMax = 10,
     cHuffmanSmallZeroRunExtraBits = 3,

     // Big zero runs may range from 11-138 entries
     cHuffmanBigZeroRunSizeMin = 11,
     cHuffmanBigZeroRunSizeMax = 138,
     cHuffmanBigZeroRunExtraBits = 7,

     // Small non-zero runs may range from 3-6 entries
     cHuffmanSmallRepeatSizeMin = 3,
     cHuffmanSmallRepeatSizeMax = 6,
     cHuffmanSmallRepeatExtraBits = 2,

     // Big non-zero run may range from 7-134 entries
     cHuffmanBigRepeatSizeMin = 7,
     cHuffmanBigRepeatSizeMax = 134,
     cHuffmanBigRepeatExtraBits = 7,

     // There are a maximum of 21 symbols in a compressed Huffman code length table.
     cHuffmanTotalCodelengthCodes = 21,

     // Symbols [0,16] indicate code sizes. Other symbols indicate zero runs or repeats:
     cHuffmanSmallZeroRunCode = 17,
     cHuffmanBigZeroRunCode = 18,
     cHuffmanSmallRepeatCode = 19,
     cHuffmanBigRepeatCode = 20
 };

 A .basis Huffman table consists of 1 to cHuffmanMaxSyms symbols. Each compressed
 Huffman table is described by an array of symbol code lengths in bits.

 The table's symbol code lengths are themselves RLE+Huffman coded, just like
 Deflate. (Note this can be confusing to developers unfamiliar with Deflate.)
 Each table begins with a small fixed header:

     14 bits: total_used_syms [1, cHuffmanMaxSyms]
     5 bits: num_codelength_codes [1, cHuffmanTotalCodelengthCodes]

 Next, the code lengths for the small Huffman table which is used to send the compressed codelengths (and RLE/repeat codes) are sent uncompressed but in a reordered manner:

     3*num_codelength_codes bits: Code size of each Huffman symbol for the compressed Huffman codelength table.

     These code lengths are sent in this order (to help reduce the number that must be sent):

     {
         cHuffmanSmallZeroRunCode, cHuffmanBigZeroRunCode, cHuffmanSmallRepeatCode, cHuffmanBigRepeatCode,
         0, 8, 7, 9, 6, 0xA, 5, 0xB, 4, 0xC, 3, 0xD, 2, 0xE, 1, 0xF, 0x10
     };

 A canonical Huffman decoding table (of up to 21 symbols) should be built from
 these code lengths. Immediately following this data are the Huffman symbols
 (sometimes intermixed with raw bits) which describe how to unpack the
 codelengths of each symbol in the Huffman table:

     - Symbols [0,16] indicate a specific symbol code length in bits.

     - Symbol cHuffmanSmallZeroRunCode (17) indicates a short run of symbols with 0 bit code lengths.
       cHuffmanSmallZeroRunExtraBits (3) bits are sent after this symbol, which indicates the run's size after adding the minimum size (cHuffmanSmallZeroRunSizeMin).

     - Symbol cHuffmanBigZeroRunCode (18) indicates a long run of symbols with 0 bit code lengths.
       cHuffmanBigZeroRunExtraBits (7) bits are sent after this symbol, which indicates the run's size after adding the minimum size (cHuffmanBigZeroRunSizeMin)

     - Symbol cHuffmanSmallRepeatCode (19) indicates a short run of symbols that repeat the previous symbol's code length.
       cHuffmanSmallRepeatExtraBits (2) bits are sent after this symbol, which indicates the number of times to repeat the previous symbol's code length,
       after adding the minimum size (cHuffmanSmallRepeatSizeMin).
       Cannot be the first symbol, and the previous symbol cannot have a code length of 0.

     - Symbol cHuffmanBigRepeatCode (20) indicates a short run of symbols that repeat the previous symbol's code length.
       cHuffmanBigRepeatExtraBits (7) bits are sent after this symbol, which indicates the number of times to repeat the previous symbol's code length,
       after adding the minimum size (cHuffmanBigRepeatSizeMin).
       Cannot be the first symbol, and the previous symbol cannot have a code length of 0.

 There should be exactly total_used_syms code lengths stored in the compressed Huffman table. If not the stream is either corrupted or invalid.

 After all the symbol codelengths are uncompressed, the symbol codes can be computed and the canonical Huffman decoding tables can be built.

 7.0 ETC1S Endpoint Codebooks
 ----------------------------

 The endpoint codebook section starts at file offset
 basis_file_header::m_endpoint_cb_file_ofs and is m_endpoint_cb_file_size bytes
 long. The endpoint codebook will have basis_file_header::m_total_endpoints total
 entries.

 At the beginning of the compressed endpoint codebook section are four compressed
 Huffman tables, stored using the procedure outlined in section 6.0. The Huffman tables
 appear in this order:

     1. color5_delta_model0
     2. color5_delta_model1
     3. color5_delta_model2
     4. inten_delta_model

 Following the data for these Huffman tables is a single 1-bit code which
 indicates if the color endpoint codebook is grayscale or not.

 Immediately following this code is the compressed color endpoint codebook data.
 A simple form of DPCM (Delta Pulse Code Modulation) coding is used to send the
 ETC1S intensity table indices and color values. Here is the procedure to decode
 the endpoint codebook:

     const int COLOR5_PAL0_PREV_HI = 9, COLOR5_PAL0_DELTA_LO = -9, COLOR5_PAL0_DELTA_HI = 31;
     const int COLOR5_PAL1_PREV_HI = 21, COLOR5_PAL1_DELTA_LO = -21, COLOR5_PAL1_DELTA_HI = 21;
     const int COLOR5_PAL2_PREV_HI = 31, COLOR5_PAL2_DELTA_LO = -31, COLOR5_PAL2_DELTA_HI = 9;

     // Assume previous endpoint color is (16, 16, 16), and the previous intensity is 0.
     color32 prev_color5(16, 16, 16, 0);
     uint32_t prev_inten = 0;

     // For each endpoint codebook entry
     for (uint32_t i = 0; i < num_endpoints; i++)
     {
         // Decode the intensity delta Huffman code
         uint32_t inten_delta = decode_huffman(inten_delta_model);
         endpoints[i].m_inten5 = static_cast<uint8_t>((inten_delta + prev_inten) & 7);
         prev_inten = endpoints[i].m_inten5;

         // Now decode the endpoint entry's color or intensity value
         for (uint32_t c = 0; c < (endpoints_are_grayscale ? 1U : 3U); c++)
         {
             // The Huffman table used to decode the delta depends on the previous color's value
             int delta;
             if (prev_color5[c] <= basist::COLOR5_PAL0_PREV_HI)
                 delta = decode_huffman(color5_delta_model0);
             else if (prev_color5[c] <= basist::COLOR5_PAL1_PREV_HI)
                 delta = decode_huffman(color5_delta_model1);
             else
                 delta = decode_huffman(color5_delta_model2);

             // Apply the delta
             int v = (prev_color5[c] + delta) & 31;

             endpoints[i].m_color5[c] = static_cast<uint8_t>(v);

             prev_color5[c] = static_cast<uint8_t>(v);
         }

         // If the endpoints are grayscale, set G and B to match R.
         if (endpoints_are_grayscale)
         {
             endpoints[i].m_color5[1] = endpoints[i].m_color5[0];
             endpoints[i].m_color5[2] = endpoints[i].m_color5[0];
         }
     }

 The rest of the section's data (if any) can be ignored.

 8.0 ETC1S Selector Codebooks
 ----------------------------

 The selector codebook section starts at file offset
 basis_file_header::m_selector_cb_file_ofs and is m_selector_cb_file_size bytes
 long. The selector codebook will have basis_file_header::m_total_selectors total
 entries.

 The first bit of this section indicates if "global" selector codebooks are used.
 Basis Universal doesn't currently utilize global selector codebooks, so this bit
 should always be 0.

 The second bit of this section indicates if "hybrid" global/local selector
 codebooks are used. Hybrid codebooks are not supported either, so this bit
 should always be 0.

 The third bit indicates if the selector codebook has been sent in raw form
 (uncompressed). If it's set, each selector is sent as four 8-bit bytes. Each
 byte corresponds to four 2-bit ETC1S selectors. The first selector of each group
 of 4 selectors starts at the LSB (least significant bit) of each byte, and is
 2-bits wide.

 If the third bit is 0, the selectors have been DPCM coded with Huffman coding.
 The "delta_selector_pal_model" Huffman table will immediately follow the third
 bit, and is stored using the procedure outlined in section 6.0.

 Immediately following the Huffman table is the compressed selector codebook.
 Here is the DPCM decoding procedure:

         uint8_t prev_bytes[4] = { 0, 0, 0, 0 };

         for (uint32_t i = 0; i < num_selectors; i++)
         {
             if (!i)
             {
                 // First selector is sent raw
                 for (uint32_t j = 0; j < 4; j++)
                 {
                     uint32_t cur_byte = get_bits(8);
                     prev_bytes[j] = static_cast<uint8_t>(cur_byte);

                     for (uint32_t k = 0; k < 4; k++)
                         selectors[i].set_selector(k, j, (cur_byte >> (k * 2)) & 3);
                 }
                 selectors[i].init_flags();
                 continue;
             }

             // Subsequent selectors are sent with a simple form of byte-wise DPCM coding.
             for (uint32_t j = 0; j < 4; j++)
             {
                 int delta_byte = decode_huffman(delta_selector_pal_model);

                 uint32_t cur_byte = delta_byte ^ prev_bytes[j];
                 prev_bytes[j] = static_cast<uint8_t>(cur_byte);

                 for (uint32_t k = 0; k < 4; k++)
                     selectors[i].set_selector(k, j, (cur_byte >> (k * 2)) & 3);
             }
         }

 Any bytes in this section following the selector codebook bits can be safely ignored.

 9.0 ETC1S Compressed Slice Decoding Huffman Tables
 --------------------------------------------------

 Each ETC1S slice is compressed with four Huffman tables stored using the
 procedure outlined in section 6.0. These Huffman tables are stored at file
 offset basis_file_header::m_tables_file_ofs. This section will be
 basis_file_header::m_tables_file_size bytes long.

 The following four Huffman tables are sent, in this order:

     1. endpoint_pred_model
     2. delta_endpoint_model
     3. selector_model
     4. selector_history_buf_rle_model

 Following the last Huffman table are 13-bits indicating the size of the selector
 history buffer. Any remaining bits may be safely ignored.

 10. ETC1S Slice Decoding
 ------------------------

 ETC1S slices consist of a compressed 2D array of ETC1S blocks, always compressed
 in top-down/left-right raster order. For texture video, the previous slice's
 already decoded contents may be referred to when blocks are encoded using
 Conditional Replenishment (also known as "skip blocks").

 Each ETC1S block is encoded by using references to the color endpoint codebook
 and the selector codebook. Sections 10.1 and 10.2 describe the helper procedures
 using by the decoder, and section 10.3 describes how the array of ETC1S blocks
 is actually decoded.

 10.1 ETC1S Approximate Move to Front Routines
 ---------------------------------------------

 An approximate Move to Front (MTF) approach is used to efficiently encode the
 selector codebook references. Here is the C++ example class for approximate MTF
 decoding:

     class approx_move_to_front
     {
     public:
         approx_move_to_front(uint32_t n)
         {
             init(n);
         }

         void init(uint32_t n)
         {
             m_values.resize(n);
             m_rover = n / 2;
         }

         size_t size() const { return m_values.size(); }

         const int& operator[] (uint32_t index) const { return m_values[index]; }
               int operator[] (uint32_t index)        { return m_values[index]; }

         void add(int new_value)
         {
             m_values[m_rover++] = new_value;
             if (m_rover == m_values.size())
                 m_rover = (uint32_t)m_values.size() / 2;
         }

         void use(uint32_t index)
         {
             if (index)
             {
                 int x = m_values[index / 2];
                 int y = m_values[index];
                 m_values[index / 2] = y;
                 m_values[index] = x;
             }
         }

     private:
         std::vector<int> m_values;
         uint32_t m_rover;
     };

 10.2 ETC1S VLC Decoding Procedure
 ---------------------------------

 ETC1S slice decoding utilizes a simple Variable Length Coding (VLC) scheme that
 sends raw bits using variable-size chunks. Here is the VLC decoding procedure:

     uint32_t decode_vlc(uint32_t chunk_bits)
     {
         assert(chunk_bits);

         const uint32_t chunk_size = 1 << chunk_bits;
         const uint32_t chunk_mask = chunk_size - 1;

         uint32_t v = 0;
         uint32_t ofs = 0;

         for ( ; ; )
         {
             uint32_t s = get_bits(chunk_bits + 1);
             v |= ((s & chunk_mask) << ofs);
             ofs += chunk_bits;

             if ((s & chunk_size) == 0)
                 break;

             if (ofs >= 32)
             {
                 assert(0);
                 break;
             }
         }

         return v;
     }

 10.3 ETC1S Slice Block Decoding
 -------------------------------

 Each slice has a corresponding "basis_slice_desc" structure, described in section
 4.2. The slice's dimensions in ETC1S blocks are stored in
 basis_slice_desc::m_num_blocks_x and basis_slice_desc::m_num_blocks_y. Each
 slice is located at file offset basis_slice_desc::m_file_ofs, and is
 basis_slice_desc::m_file_size bytes long.

 The decoder iterates through all the slice blocks in top-down, left-right raster
 order. Each block is represented by an index into the color endpoint codebook
 and another index into the selector endpoint codebook. The endpoint codebook
 contains each ETC1S block's base RGB color and intensity table information, and
 the selector codebook contains the 4x4 texel selector entry (which are 2-bits
 each) information. This is all the information needed to fully represent the
 texels within each block.

 The decoding procedure loops over all the blocks in raster order, and decodes
 the endpoint and selector indices used to represent each block. The decoding
 procedure is complex enough that commented code is best used to describe it.

 Here's the slice decoding procedure. This block of code shows the block loop,
 and how endpoint codebook indices are decoded. The next block of code shows how
 selector codebook indices are decoded.

     // Constants used by the decoder
     const uint32_t ENDPOINT_PRED_TOTAL_SYMBOLS = (4 * 4 * 4 * 4) + 1;
     const uint32_t ENDPOINT_PRED_REPEAT_LAST_SYMBOL = ENDPOINT_PRED_TOTAL_SYMBOLS - 1;
     const uint32_t ENDPOINT_PRED_MIN_REPEAT_COUNT = 3;
     const uint32_t ENDPOINT_PRED_COUNT_VLC_BITS = 4;

     const uint32_t NUM_ENDPOINT_PREDS = 3;
     const uint32_t CR_ENDPOINT_PRED_INDEX = NUM_ENDPOINT_PREDS - 1;
     const uint32_t NO_ENDPOINT_PRED_INDEX = 3;

     // Endpoint/selector codebooks - decoded previously. See sections 7.0 and 8.0.
     endpoint endpoints[endpoint_codebook_size];
     selector selectors[selector_codebook_size];

     // Array of per-block values used for endpoint index prediction (enough for 2 rows).
     struct block_preds
     {
         uint16_t m_endpoint_index;
         uint8_t m_pred_bits;
     };
     block_preds block_endpoint_preds[2][num_blocks_x];

     // Some constants and state used during block decoding
     const uint32_t SELECTOR_HISTORY_BUF_FIRST_SYMBOL_INDEX = selector_codebook_size;
     const uint32_t SELECTOR_HISTORY_BUF_RLE_SYMBOL_INDEX = selector_history_buf_size + SELECTOR_HISTORY_BUF_FIRST_SYMBOL_INDEX;
     uint32_t cur_selector_rle_count = 0;

     uint32_t cur_pred_bits = 0;
     int prev_endpoint_pred_sym = 0;
     int endpoint_pred_repeat_count = 0;
     uint32_t prev_endpoint_index = 0;

     // This array is only used for texture video. It holds the previous frame's endpoint and selector indices (each 16-bits, for 32-bits total).
     uint32_t prev_frame_indices[num_blocks_x][num_blocks_y];

     // Selector history buffer - See section 10.1.
     // For the selector history buffer's size, see section 9.0.
     approx_move_to_front selector_history_buf(selector_history_buf_size);

     // Loop over all slice blocks in raster order
     for (uint32_t block_y = 0; block_y < num_blocks_y; block_y++)
     {
         // The index into the block_endpoint_preds array
         const uint32_t cur_block_endpoint_pred_array = block_y & 1;

         for (uint32_t block_x = 0; block_x < num_blocks_x; block_x++)
         {
             // Check if we're at the start of a 2x2 block group.
             if ((block_x & 1) == 0)
             {
                 // Are we on an even or odd row of blocks?
                 if ((block_y & 1) == 0)
                 {
                     // We're on an even row and column of blocks. Decode the combined endpoint index predictor symbols for 2x2 blocks.
                     // This symbol tells the decoder how the endpoints are decoded for each block in a 2x2 group of blocks.

                     // Are we in an RLE run?
                     if (endpoint_pred_repeat_count)
                     {
                         // Inside a run of endpoint predictor symbols.
                         endpoint_pred_repeat_count--;
                         cur_pred_bits = prev_endpoint_pred_sym;
                     }
                     else
                     {
                         // Decode the endpoint prediction symbol, using the "endpoint pred" Huffman table (see section 9.0).
                         cur_pred_bits = decode_huffman(m_endpoint_pred_model);
                         if (cur_pred_bits == ENDPOINT_PRED_REPEAT_LAST_SYMBOL)
                         {
                             // It's a run of symbols, so decode the count using VLC decoding (see section 10.2)
                             endpoint_pred_repeat_count = decode_vlc(ENDPOINT_PRED_COUNT_VLC_BITS) + ENDPOINT_PRED_MIN_REPEAT_COUNT - 1;

                             cur_pred_bits = prev_endpoint_pred_sym;
                         }
                         else
                         {
                             // It's not a run of symbols
                             prev_endpoint_pred_sym = cur_pred_bits;
                         }
                     }

                     // The symbol has enough endpoint prediction information for 4 blocks (2 bits per block), so 8 bits total.
                     // Remember the prediction information we should use for the next row of 2 blocks beneath the current block.
                     block_endpoint_preds[cur_block_endpoint_pred_array ^ 1][block_x].m_pred_bits = (uint8_t)(cur_pred_bits >> 4);
                 }
                 else
                 {
                     // We're on an odd row of blocks, so use the endpoint prediction information we previously stored on the previous even row.
                     cur_pred_bits = block_endpoint_preds[cur_block_endpoint_pred_array][block_x].m_pred_bits;
                 }
             }

             // Decode the current block's endpoint and selector indices.
             uint32_t endpoint_index, selector_index = 0;

             // Get the 2-bit endpoint prediction index for this block.
             const uint32_t pred = cur_pred_bits & 3;

             // Get the next block's endpoint prediction bits ready.
             cur_pred_bits >>= 2;

             // Now check to see if we should reuse a previously encoded block's endpoints.
             if (pred == 0)
             {
                 // Reuse the left block's endpoint index
                 assert(block_x > 0);
                 endpoint_index = prev_endpoint_index;
             }
             else if (pred == 1)
             {
                 // Reuse the upper block's endpoint index
                 assert(block_y > 0)
                 endpoint_index = block_endpoint_preds[cur_block_endpoint_pred_array ^ 1][block_x].m_endpoint_index;
             }
             else if (pred == 2)
             {
                 if (is_video)
                 {
                     // If it's texture video, reuse the previous frame's endpoint index, at this block.
                     assert(pred == CR_ENDPOINT_PRED_INDEX);
                     endpoint_index = prev_frame_indices[block_x][block_y];
                     selector_index = endpoint_index >> 16;
                     endpoint_index &= 0xFFFFU;
                 }
                 else
                 {
                     // Reuse the upper left block's endpoint index.
                     assert((block_x > 0) && (block_y > 0));
                     endpoint_index = block_endpoint_preds[cur_block_endpoint_pred_array ^ 1][block_x - 1].m_endpoint_index;
                 }
             }
             else
             {
                 // We need to decode and apply a DPCM encoded delta to the previously used endpoint index.
                 // This uses the delta endpoint Huffman table (see section 9.0).
                 const uint32_t delta_sym = decode_huffman(delta_endpoint_model);

                 endpoint_index = delta_sym + prev_endpoint_index;

                 // Wrap around if the index goes beyond the end of the endpoint codebook
                 if (endpoint_index >= endpoints.size())
                     endpoint_index -= (int)endpoints.size();
             }

             // Remember the endpoint index we used on this block, so the next row can potentially reuse the index.
             block_endpoint_preds[cur_block_endpoint_pred_array][block_x].m_endpoint_index = (uint16_t)endpoint_index;

             // Remember the endpoint index used
             prev_endpoint_index = endpoint_index;

             // Now we have fully decoded the ETC1S endpoint codebook index, in endpoint_index.

             // Now decode the selector index (see the next block of code, below).
             < selector decoding - see below >

         } // block_x
     } // block_y

 The compressed format allows the encoder to reuse the endpoint index used by
 the previous block, the block immediately above the current block, or the
 block to the upper left (if the file is not texture video). Alternately, the
 encoder can send a Huffman coded DPCM encoded index relative to the
 previously used endpoint index.

 Which type of prediction was used by the encoder is controlled by the "endpoint
 pred" (endpoint prediction) indices, which are sent with Huffman coding (using
 the "endpoint_pred_model" table described in Section 9.0) once every 2x2 blocks.

 For texture video, the endpoint prediction symbol normally used to refer to the
 upper left block (endpoint pred index 2) instead indicates that both the
 endpoint and selector indices from the previous frame's block should be reused
 on the current frame's block. The endpoint pred indices are RLE coded, so this
 allows the encoder to efficiently skip over a large number of unchanged blocks
 in a video sequence.

 The code to decode the selector codebook index immediately follows the code above for decoding the endpoint indices:

     const uint32_t MAX_SELECTOR_HISTORY_BUF_SIZE = 64;
     const uint32_t SELECTOR_HISTORY_BUF_RLE_COUNT_THRESH = 3;
     const uint32_t SELECTOR_HISTORY_BUF_RLE_COUNT_BITS = 6;
     const uint32_t SELECTOR_HISTORY_BUF_RLE_COUNT_TOTAL = (1 << SELECTOR_HISTORY_BUF_RLE_COUNT_BITS);

     // Decode selector index, unless it's texture video and the endpoint predictor indicated that the
     // block's endpoints were reused from the previous frame.
     if ((!is_video) || (pred != CR_ENDPOINT_PRED_INDEX))
     {
         int selector_sym;

         // Are we in a selector RLE run?
         if (cur_selector_rle_count > 0)
         {
             // Handle selector RLE run.
             cur_selector_rle_count--;

             selector_sym = (int)selectors.size();
         }
         else
         {
             // Decode the selector symbol, using the selector Huffman table (see section 9.0).
             selector_sym = decode_huffman(m_selector_model);

             // Is it a run?
             if (selector_sym == static_cast<int>(SELECTOR_HISTORY_BUF_RLE_SYMBOL_INDEX))
             {
                 // Decode the selector run's size, using the selector history buf RLE Huffman table (see section 9.0).
                 int run_sym = decode_huffman(selector_history_buf_rle_model);

                 // Is it a very long run?
                 if (run_sym == (SELECTOR_HISTORY_BUF_RLE_COUNT_TOTAL - 1))
                     cur_selector_rle_count = decode_vlc(7) + SELECTOR_HISTORY_BUF_RLE_COUNT_THRESH;
                 else
                     cur_selector_rle_count = run_sym + SELECTOR_HISTORY_BUF_RLE_COUNT_THRESH;

                 selector_sym = (int)selectors.size();

                 cur_selector_rle_count--;
             }
         }

         // Is it a reference into the selector history buffer?
         if (selector_sym >= (int)selectors.size())
         {
             assert(m_selector_history_buf_size > 0);

             // Compute the history buffer index
             int history_buf_index = selector_sym - (int)selectors.size();

             assert(history_buf_index < selector_history_buf.size());

             // Access the history buffer
             selector_index = selector_history_buf[history_buf_index];

             // Update the history buffer
             if (history_buf_index != 0)
                 selector_history_buf.use(history_buf_index);
         }
         else
         {
             // It's an index into the selector codebook
             selector_index = selector_sym;

             // Add it to the selector history buffer
             if (m_selector_history_buf_size)
                 selector_history_buf.add(selector_index);
         }
     }

     // For texture video, remember the endpoint and selector indices used by the block on this frame, for later reuse on the next frame.
     if (is_video)
         prev_frame_indices[block_x][block_y] = endpoint_index | (selector_index << 16);

     // The block is fully decoded here. The codebook indices are endpoint_index and selector_index.
     // Make sure they are valid
     assert((endpoint_index < endpoints.size()) && (selector_index < selectors.size()));

 At this point, the decoder has decoded each block's endpoint and selector codebook indices.
 It can now fetch the actual ETC1S endpoints/selectors from the codebooks and write out ETC1S
 texture data, or it can immedately transcode the ETC1S data to another GPU texture format.

 11.0 Alpha Channels in ETC1S Format Files
 -----------------------------------------

 ETC1S .basis files can have optional alpha channels, stored in odd slices. If any slice needs an alpha channel,
 all slices must have alpha channels. basis_file_header::m_flags will be logically OR'd with
 cBASISHeaderFlagHasAlphaSlices. Alpha channel ETC1S files will contain two slices for each mipmap level
 (or face, or video frame, etc.). The basis_slice_desc::m_flags field will be logically OR'd with
 cSliceDescFlagsHasAlpha for all odd alpha slices.

 The even slices will contain the RGB data, and the odd slices will contain the alpha data, both stored in ETC1S
 format. Alpha channel ETC1S files must always have an even total number of slices. A decoder can first decode
 the RGB data slice, then the next alpha channel slice, or it can decode them in parallel using multithreading.
 The ETC1S green channel (on the odd slices) contains the alpha values.

 12.0 Texture Video
 ------------------

 Both ETC1S and UASTC format files support texture video. Texture video files can be optionally mipmapped, and can
 contain optional alpha channels (stored as separate slices in ETC1S format files). Currently, the first frame is
 always an i-frame, and all subsequent frames are p-frames, but the file format and transcoder supports any
 frame being an i-frame (and the encoder will be enhanced to support this feature). Decoders must track the previously
 decoded frame's endpoints/selectors for all mipmap levels (if any), not just the top level's.

 Skip blocks always refer to the previous frame. i-frames cannot use skip blocks (encoded as endpoint predictor index 2).

 12.0 Example Bitstreams
 -----------------------

 This section will include several example .basis file bitstreams, along with their decoded equivalents, which should be helpful for new decoder verification.