| Name |
| |
| KHR_texture_compression_astc_hdr |
| |
| Name Strings |
| |
| GL_KHR_texture_compression_astc_hdr |
| GL_KHR_texture_compression_astc_ldr |
| |
| Contact |
| |
| Sean Ellis (sean.ellis 'at' arm.com) |
| Jon Leech (oddhack 'at' sonic.net) |
| |
| Contributors |
| |
| Sean Ellis, ARM |
| Jorn Nystad, ARM |
| Tom Olson, ARM |
| Andy Pomianowski, AMD |
| Cass Everitt, NVIDIA |
| Walter Donovan, NVIDIA |
| Robert Simpson, Qualcomm |
| Maurice Ribble, Qualcomm |
| Larry Seiler, Intel |
| Daniel Koch, NVIDIA |
| Anthony Wood, Imagination Technologies |
| Jon Leech |
| Andrew Garrard, Samsung |
| |
| IP Status |
| |
| No known issues. |
| |
| Notice |
| |
| Copyright (c) 2012-2016 The Khronos Group Inc. Copyright terms at |
| http://www.khronos.org/registry/speccopyright.html |
| |
| Specification Update Policy |
| |
| Khronos-approved extension specifications are updated in response to |
| issues and bugs prioritized by the Khronos OpenGL and OpenGL ES Working Groups. For |
| extensions which have been promoted to a core Specification, fixes will |
| first appear in the latest version of that core Specification, and will |
| eventually be backported to the extension document. This policy is |
| described in more detail at |
| https://www.khronos.org/registry/OpenGL/docs/update_policy.php |
| |
| Status |
| |
| Complete. |
| Approved by the ARB on 2012/06/18. |
| Approved by the OpenGL ES WG on 2012/06/15. |
| Ratified by the Khronos Board of Promoters on 2012/07/27 (LDR profile). |
| Ratified by the Khronos Board of Promoters on 2013/09/27 (HDR profile). |
| |
| Version |
| |
| Version 8, June 8, 2017 |
| |
| Number |
| |
| ARB Extension #118 |
| OpenGL ES Extension #117 |
| |
| Dependencies |
| |
| Written based on the wording of the OpenGL ES 3.1 (April 29, 2015) |
| Specification |
| |
| May be implemented against any version of OpenGL or OpenGL ES supporting |
| compressed textures. |
| |
| Some of the functionality of these extensions is not supported if the |
| underlying implementation does not support cube map array textures. |
| |
| |
| Overview |
| |
| Adaptive Scalable Texture Compression (ASTC) is a new texture |
| compression technology that offers unprecendented flexibility, while |
| producing better or comparable results than existing texture |
| compressions at all bit rates. It includes support for 2D and |
| slice-based 3D textures, with low and high dynamic range, at bitrates |
| from below 1 bit/pixel up to 8 bits/pixel in fine steps. |
| |
| The goal of these extensions is to support the full 2D profile of the |
| ASTC texture compression specification, and allow construction of 3D |
| textures from multiple compressed 2D slices. |
| |
| ASTC-compressed textures are handled in OpenGL ES and OpenGL by adding |
| new supported formats to the existing commands for defining and updating |
| compressed textures, and defining the interaction of the ASTC formats |
| with each texture target. |
| |
| New Procedures and Functions |
| |
| None |
| |
| New Tokens |
| |
| Accepted by the <format> parameter of CompressedTexSubImage2D and |
| CompressedTexSubImage3D, and by the <internalformat> parameter of |
| CompressedTexImage2D, CompressedTexImage3D, TexStorage2D, |
| TextureStorage2D, TexStorage3D, and TextureStorage3D: |
| |
| COMPRESSED_RGBA_ASTC_4x4_KHR 0x93B0 |
| COMPRESSED_RGBA_ASTC_5x4_KHR 0x93B1 |
| COMPRESSED_RGBA_ASTC_5x5_KHR 0x93B2 |
| COMPRESSED_RGBA_ASTC_6x5_KHR 0x93B3 |
| COMPRESSED_RGBA_ASTC_6x6_KHR 0x93B4 |
| COMPRESSED_RGBA_ASTC_8x5_KHR 0x93B5 |
| COMPRESSED_RGBA_ASTC_8x6_KHR 0x93B6 |
| COMPRESSED_RGBA_ASTC_8x8_KHR 0x93B7 |
| COMPRESSED_RGBA_ASTC_10x5_KHR 0x93B8 |
| COMPRESSED_RGBA_ASTC_10x6_KHR 0x93B9 |
| COMPRESSED_RGBA_ASTC_10x8_KHR 0x93BA |
| COMPRESSED_RGBA_ASTC_10x10_KHR 0x93BB |
| COMPRESSED_RGBA_ASTC_12x10_KHR 0x93BC |
| COMPRESSED_RGBA_ASTC_12x12_KHR 0x93BD |
| |
| COMPRESSED_SRGB8_ALPHA8_ASTC_4x4_KHR 0x93D0 |
| COMPRESSED_SRGB8_ALPHA8_ASTC_5x4_KHR 0x93D1 |
| COMPRESSED_SRGB8_ALPHA8_ASTC_5x5_KHR 0x93D2 |
| COMPRESSED_SRGB8_ALPHA8_ASTC_6x5_KHR 0x93D3 |
| COMPRESSED_SRGB8_ALPHA8_ASTC_6x6_KHR 0x93D4 |
| COMPRESSED_SRGB8_ALPHA8_ASTC_8x5_KHR 0x93D5 |
| COMPRESSED_SRGB8_ALPHA8_ASTC_8x6_KHR 0x93D6 |
| COMPRESSED_SRGB8_ALPHA8_ASTC_8x8_KHR 0x93D7 |
| COMPRESSED_SRGB8_ALPHA8_ASTC_10x5_KHR 0x93D8 |
| COMPRESSED_SRGB8_ALPHA8_ASTC_10x6_KHR 0x93D9 |
| COMPRESSED_SRGB8_ALPHA8_ASTC_10x8_KHR 0x93DA |
| COMPRESSED_SRGB8_ALPHA8_ASTC_10x10_KHR 0x93DB |
| COMPRESSED_SRGB8_ALPHA8_ASTC_12x10_KHR 0x93DC |
| COMPRESSED_SRGB8_ALPHA8_ASTC_12x12_KHR 0x93DD |
| |
| If extension "EXT_texture_storage" is supported, these tokens are also |
| accepted by TexStorage2DEXT, TextureStorage2DEXT, TexStorage3DEXT and |
| TextureStorage3DEXT. |
| |
| Additions to Chapter 8 of the OpenGL ES 3.1 Specification (Textures and Samplers) |
| |
| Add to Section 8.7 Compressed Texture Images: |
| |
| Modify table 8.19 (Compressed internal formats) to add all the ASTC |
| format tokens in the New Tokens section. The "Base Internal Format" |
| column is RGBA for all ASTC formats. |
| |
| Add a new column "Block Width x Height", which is 4x4 for all non-ASTC |
| formats in the table, and matches the size in the token name for ASTC |
| formats (e.g. COMPRESSED_SRGB8_ALPHA8_ASTC_10x8_KHR has a block size of |
| 10 x 8). |
| |
| Add a second new column "3D Tex." which is empty for all non-ASTC |
| formats. If only the LDR profile is supported by the implementation, |
| this column is also empty for all ASTC formats. If both the LDR and HDR |
| profiles are supported, this column is checked for all ASTC formats. |
| |
| Add a third new column "Cube Map Array Tex." which is empty for all |
| non-ASTC formats, and checked for all ASTC formats. |
| |
| Append to the table caption: |
| |
| "The "Block Size" column specifies the compressed block size of the |
| format. Modifying compressed images along aligned block boundaries is |
| possible, as described in this section. The "3D Tex." and "Cube Map |
| Array Tex." columns determine if 3D images composed of compressed 2D |
| slices, and cube map array textures respectively can be specified using |
| CompressedTexImage3D." |
| |
| Append to the paragraph at the bottom of p. 168: |
| |
| "If <internalformat> is one of the specific ... supports only |
| two-dimensional images. However, if the "3D Tex." column of table 8.19 |
| is checked, CompressedTexImage3D will accept a three-dimensional image |
| specified as an array of compressed data consisting of multiple rows of |
| compressed blocks laid out as described in section 8.5." |
| |
| Modify the second and third errors in the Errors section for |
| CompressedTexImage[2d]D on p. 169, and add a new error: |
| |
| "An INVALID_VALUE error is generated by |
| |
| * CompressedTexImage2D if <target> is |
| one of the cube map face targets from table 8.21, and |
| * CompressedTexImage3D if <target> is TEXTURE_CUBE_MAP_ARRAY, |
| |
| and <width> and <height> are not equal. |
| |
| An INVALID_OPERATION error is generated by CompressedTexImage3D if |
| <internalformat> is one of the the formats in table 8.19 and <target> is |
| not TEXTURE_2D_ARRAY, TEXTURE_CUBE_MAP_ARRAY, or TEXTURE_3D. |
| |
| An INVALID_OPERATION error is generated by CompressedTexImage3D if |
| <target> is TEXTURE_CUBE_MAP_ARRAY and the "Cube Map Array" |
| column of table 8.19 is *not* checked, or if <target> is |
| TEXTURE_3D and the "3D Tex." column of table 8.19 is *not* checked" |
| |
| Modify the fifth and sixth paragraphs on p. 170: |
| |
| "Since these specific compressed formats are easily edited along texel |
| block boundaries, the limitations on subimage location and size are |
| relaxed for CompressedTexSubImage2D and CompressedTexSubImage3D. |
| |
| The block width and height varies for different formats, as described in |
| table 8.19. The contents of any block of texels of a compressed texture |
| image in these specific compressed formats that does not intersect the |
| area being modified are preserved during CompressedTexSubImage* calls." |
| |
| Modify the second error in the Errors section for |
| CompressedTexSubImage[23]D on p. 170, and add a new error: |
| |
| "An INVALID_OPERATION error is generated by CompressedTexSubImage3D if |
| <format> is one of the formats in table 8.19 and <target> is not |
| TEXTURE_2D_ARRAY, TEXTURE_CUBE_MAP_ARRAY, or TEXTURE_3D. |
| |
| An INVALID_OPERATION error is generated by CompressedTexSubImage3D if |
| <target> is TEXTURE_CUBE_MAP_ARRAY and the "Cube Map Array" column of |
| table 8.19 is *not* checked, or if <target> is TEXTURE_3D and the "3D |
| Tex." column of table 8.19 is *not* checked" |
| |
| Modify the final error in the same section, on p. 171: |
| |
| "An INVALID_OPERATION error is generated if format is one of the formats |
| in table 8.19 and any of the following conditions occurs. The block |
| width and height refer to the values in the corresponding column of the |
| table. |
| |
| * <width> is not a multiple of the format's block width, and <width> + |
| <xoffset> is not equal to the value of TEXTURE_WIDTH. |
| * height is not a multiple of the format's block height, and <height> |
| + <yoffset> is not equal to the value of TEXTURE_HEIGHT. |
| * <xoffset> or <yoffset> is not a multiple of the block width or |
| height, respectively." |
| |
| Modify table 8.24 (sRGB texture internal formats) to add all of the |
| COMPRESSED_SRGB8_ALPHA8_ASTC_*_KHR formats defined above. |
| |
| Additions to Appendix C of the OpenGL ES 3.1 Specification (Compressed |
| Texture Image Formats) |
| |
| Add a new sub-section on ASTC image formats, as follows: |
| |
| "C.2 ASTC Compressed Texture Image Formats |
| ========================================= |
| |
| C.2.1 What is ASTC? |
| --------------------- |
| |
| ASTC stands for Adaptive Scalable Texture Compression. |
| The ASTC formats form a family of related compressed texture image |
| formats. They are all derived from a common set of definitions. |
| |
| ASTC textures may be encoded using either high or low dynamic range, |
| corresponding to the "HDR profile" and "LDR profile". Support for the |
| HDR profile is indicated by the "GL_KHR_texture_compression_astc_hdr" |
| extension string, and support for the LDR profile is indicated by the |
| "GL_KHR_texture_compression_astc_ldr" extension string. |
| |
| The LDR profile supports two-dimensional images for texture targets |
| TEXTURE_2D. TEXTURE_2D_ARRAY, the six texture cube map face targets, and |
| TEXTURE_CUBE_MAP_ARRAY. These images may optionally be specified using |
| the sRGB color space for the RGB channels. |
| |
| The HDR profile is a superset of the LDR profile, and also supports |
| texture target TEXTURE_3D for images made up of multiple two-dimensional |
| slices of compressed data. HDR images may be a mix of low and high |
| dynamic range data. If the HDR profile is supported, the LDR profile and |
| its extension string must also be supported. |
| |
| ASTC textures may be encoded as 1, 2, 3 or 4 components, but they are |
| all decoded into RGBA. |
| |
| Different ASTC formats have different block sizes, specified as part of |
| the name of the format token passed to CompressedImage2D and its related |
| functions, and in table 8.19. |
| |
| Additional ASTC formats (the "Full profile") exist which support 3D data |
| specified as compressed 3D blocks. However, such formats are not defined |
| by either the LDR or HDR profiles, and are not described in this |
| specification. |
| |
| C.2.2 Design Goals |
| -------------------- |
| |
| The design goals for the format are as follows: |
| |
| * Random access. This is a must for any texture compression format. |
| * Bit exact decode. This is a must for conformance testing and |
| reproducibility. |
| * Suitable for mobile use. The format should be suitable for both |
| desktop and mobile GPU environments. It should be low bandwidth |
| and low in area. |
| * Flexible choice of bit rate. Current formats only offer a few bit |
| rates, leaving content developers with only coarse control over |
| the size/quality tradeoff. |
| * Scalable and long-lived. The format should support existing R, RG, |
| RGB and RGBA image types, and also have high "headroom", allowing |
| continuing use for several years and the ability to innovate in |
| encoders. Part of this is the choice to include HDR and 3D. |
| * Feature orthogonality. The choices for the various features of the |
| format are all orthogonal to each other. This has three effects: |
| first, it allows a large, flexible configuration space; second, |
| it makes that space easier to understand; and third, it makes |
| verification easier. |
| * Best in class at given bit rate. It should beat or match the current |
| best in class for peak signal-to-noise ratio (PSNR) at all bit rates. |
| * Fast decode. Texel throughput for a cached texture should be one |
| texel decode per clock cycle per decoder. Parallel decoding of several |
| texels from the same block should be possible at incremental cost. |
| * Low bandwidth. The encoding scheme should ensure that memory access |
| is kept to a minimum, cache reuse is high and memory bandwidth for |
| the format is low. |
| * Low area. It must occupy comparable die size to competing formats. |
| |
| C.2.3 Basic Concepts |
| ---------------------- |
| |
| ASTC is a block-based lossy compression format. The compressed image |
| is divided into a number of blocks of uniform size, which makes it |
| possible to quickly determine which block a given texel resides in. |
| |
| Each block has a fixed memory footprint of 128 bits, but these bits |
| can represent varying numbers of texels (the block "footprint"). |
| |
| Block footprint sizes are not confined to powers-of-two, and are |
| also not confined to be square. They may be 2D, in which case the |
| block dimensions range from 4 to 12 texels, or 3D, in which case |
| the block dimensions range from 3 to 6 texels. |
| |
| Decoding one texel requires only the data from a single block. This |
| simplifies cache design, reduces bandwidth and improves encoder throughput. |
| |
| C.2.4 Block Encoding |
| ---------------------- |
| |
| To understand how the blocks are stored and decoded, it is useful to start |
| with a simple example, and then introduce additional features. |
| |
| The simplest block encoding starts by defining two color "endpoints". The |
| endpoints define two colors, and a number of additional colors are generated |
| by interpolating between them. We can define these colors using 1, 2, 3, |
| or 4 components (usually corresponding to R, RG, RGB and RGBA textures), |
| and using low or high dynamic range. |
| |
| We then store a color interpolant weight for each texel in the image, which |
| specifies how to calculate the color to use. From this, a weighted average |
| of the two endpoint colors is used to generate the intermediate color, |
| which is the returned color for this texel. |
| |
| There are several different ways of specifying the endpoint colors, and the |
| weights, but once they have been defined, calculation of the texel colors |
| proceeds identically for all of them. Each block is free to choose whichever |
| encoding scheme best represents its color endpoints, within the constraint |
| that all the data fits within the 128 bit block. |
| |
| For blocks which have a large number of texels (e.g. a 12x12 block), there is |
| not enough space to explicitly store a weight for every texel. In this case, |
| a sparser grid with fewer weights is stored, and interpolation is used to |
| determine the effective weight to be used for each texel position. This allows |
| very low bit rates to be used with acceptable quality. This can also be used |
| to more efficiently encode blocks with low detail, or with strong vertical |
| or horizontal features. |
| |
| For blocks which have a mixture of disparate colors, a single line in the |
| color space is not a good fit to the colors of the pixels in the original |
| image. It is therefore possible to partition the texels into multiple sets, |
| the pixels within each set having similar colors. For each of these |
| "partitions", we specify separate endpoint pairs, and choose which pair of |
| endpoints to use for a particular texel by looking up the partition index |
| from a partitioning pattern table. In ASTC, this partition table is actually |
| implemented as a function. |
| |
| The endpoint encoding for each partition is independent. |
| |
| For blocks which have uncorrelated channels - for example an image with a |
| transparency mask, or an image used as a normal map - it may be necessary |
| to specify two weights for each texel. Interpolation between the components |
| of the endpoint colors can then proceed independently for each "plane" of |
| the image. The assignment of channels to planes is selectable. |
| |
| Since each of the above options is independent, it is possible to specify any |
| combination of channels, endpoint color encoding, weight encoding, |
| interpolation, multiple partitions and single or dual planes. |
| |
| Since these values are specified per block, it is important that they are |
| represented with the minimum possible number of bits. As a result, these |
| values are packed together in ways which can be difficult to read, but |
| which are nevertheless highly amenable to hardware decode. |
| |
| All of the values used as weights and color endpoint values can be specified |
| with a variable number of bits. The encoding scheme used allows a fine- |
| grained tradeoff between weight bits and color endpoint bits using "integer |
| sequence encoding". This can pack adjacent values together, allowing us to |
| use fractional numbers of bits per value. |
| |
| Finally, a block may be just a single color. This is a so-called "void |
| extent block" and has a special coding which also allows it to identify |
| nearby regions of single color. This may be used to short-circuit fetching of |
| what would be identical blocks, and further reduce memory bandwidth. |
| |
| C.2.5 LDR and HDR Modes |
| ------------------------- |
| |
| The decoding process for LDR content can be simplified if it is known in |
| advance that sRGB output is required. This selection is therefore included |
| as part of the global configuration. |
| |
| The two modes differ in various ways. |
| |
| ----------------------------------------------------------------------------- |
| Operation LDR Mode HDR Mode |
| ----------------------------------------------------------------------------- |
| Returned value Vector of FP16 values, Vector of FP16 values |
| or Vector of UNORM8 values. |
| |
| sRGB compatible Yes No |
| |
| LDR endpoint 16 bits, or 16 bits |
| decoding precision 8 bits for sRGB |
| |
| HDR endpoint mode Error color As decoded |
| results |
| |
| Error results Error color Vector of NaNs (0xFFFF) |
| ----------------------------------------------------------------------------- |
| Table C.2.1 - Differences Between LDR and HDR Modes |
| |
| The error color is opaque fully-saturated magenta |
| (R,G,B,A = 0xFF, 0x00, 0xFF, 0xFF). This has been chosen as it is much more |
| noticeable than black or white, and occurs far less often in valid images. |
| |
| For linear RGB decode, the error color may be either opaque fully-saturated |
| magenta (R,G,B,A = 1.0, 0.0, 1.0, 1.0) or a vector of four NaNs |
| (R,G,B,A = NaN, NaN, NaN, NaN). In the latter case, the recommended NaN |
| value returned is 0xFFFF. |
| |
| The error color is returned as an informative response to invalid |
| conditions, including invalid block encodings or use of reserved endpoint |
| modes. |
| |
| Future, forward-compatible extensions to KHR_texture_compression_astc |
| may define valid interpretations of these conditions, which will decode to |
| some other color. Therefore, encoders and applications must not rely on |
| invalid encodings as a way of generating the error color. |
| |
| C.2.6 Configuration Summary |
| ----------------------------- |
| |
| The global configuration data for the format is as follows: |
| |
| * Block dimension (always 2D for both LDR and HDR profiles) |
| * Block footprint size |
| * sRGB output enabled or not |
| |
| The data specified per block is as follows: |
| |
| * Texel weight grid size |
| * Texel weight range |
| * Texel weight values |
| * Number of partitions |
| * Partition pattern index |
| * Color endpoint modes (includes LDR or HDR selection) |
| * Color endpoint data |
| * Number of planes |
| * Plane-to-channel assignment |
| |
| C.2.7 Decode Procedure |
| ------------------------ |
| |
| To decode one texel: |
| |
| Find block containing texel |
| Read block mode |
| If void-extent block, store void extent and immediately return single |
| color (optimization) |
| |
| For each plane in image |
| If block mode requires infill |
| Find and decode stored weights adjacent to texel, unquantize and |
| interpolate |
| Else |
| Find and decode weight for texel, and unquantize |
| |
| Read number of partitions |
| If number of partitions > 1 |
| Read partition table pattern index |
| Look up partition number from pattern |
| |
| Read color endpoint mode and endpoint data for selected partition |
| Unquantize color endpoints |
| Interpolate color endpoints using weight (or weights in dual-plane mode) |
| Return interpolated color |
| |
| C.2.8 Block Determination and Bit Rates |
| The block footprint is a global setting for any given texture, and is |
| therefore not encoded in the individual blocks. |
| |
| For 2D textures, the block footprint's width and height are selectable |
| from a number of predefined sizes, namely 4, 5, 6, 8, 10 and 12 pixels. |
| |
| For square and nearly-square blocks, this gives the following bit rates: |
| |
| ------------------------------------- |
| Footprint |
| Width Height Bit Rate Increment |
| ------------------------------------- |
| 4 4 8.00 125% |
| 5 4 6.40 125% |
| 5 5 5.12 120% |
| 6 5 4.27 120% |
| 6 6 3.56 114% |
| 8 5 3.20 120% |
| 8 6 2.67 105% |
| 10 5 2.56 120% |
| 10 6 2.13 107% |
| 8 8 2.00 125% |
| 10 8 1.60 125% |
| 10 10 1.28 120% |
| 12 10 1.07 120% |
| 12 12 0.89 |
| ------------------------------------- |
| Table C.2.2 - 2D Footprint and Bit Rates |
| |
| The block footprint is shown as <width>x<height> in the format name. For |
| example, the format COMPRESSED_RGBA_ASTC_8x6_KHR specifies an image with |
| a block width of 8 texels, and a block height of 6 texels. |
| |
| The "Increment" column indicates the ratio of bit rate against the next |
| lower available rate. A consistent value in this column indicates an even |
| spread of bit rates. |
| |
| The HDR profile supports only those block footprints listed in Table |
| C.2.2. Other block sizes are not supported. |
| |
| For images which are not an integer multiple of the block size, additional |
| texels are added to the edges with maximum X and Y. These texels may be |
| any color, as they will not be accessed. |
| |
| Although these are not all powers of two, it is possible to calculate block |
| addresses and pixel addresses within the block, for legal image sizes, |
| without undue complexity. |
| |
| Given a 2D image which is W x H pixels in size, with block size |
| w x h, the size of the image in blocks is: |
| |
| Bw = ceiling(W/w) |
| Bh = ceiling(H/h) |
| |
| For a 3D image, each 2D slice is a single texel thick, so that for an |
| image which is W x H x D pixels in size, with block size w x h, the size |
| of the image in blocks is: |
| |
| Bw = ceiling(W/w) |
| Bh = ceiling(H/h) |
| Bd = D |
| |
| C.2.9 Block Layout |
| -------------------- |
| |
| Each block in the image is stored as a single 128-bit block in memory. These |
| blocks are laid out in raster order, starting with the block at (0,0,0), then |
| ordered sequentially by X, Y and finally Z (if present). They are aligned to |
| 128-bit boundaries in memory. |
| |
| The bits in the block are labeled in little-endian order - the byte at the |
| lowest address contains bits 0..7. Bit 0 is the least significant bit in the |
| byte. |
| |
| Each block has the same basic layout, as shown in figure C.1. |
| |
| 127 126 125 124 123 122 121 120 119 118 117 116 115 114 113 112 |
| -------------------------------------------------------------- |
| | Texel Weight Data (variable width) Fill direction -> |
| -------------------------------------------------------------- |
| |
| 111 110 109 108 107 106 105 104 103 102 101 100 99 98 97 96 |
| -------------------------------------------------------------- |
| Texel Weight Data |
| -------------------------------------------------------------- |
| |
| 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 |
| -------------------------------------------------------------- |
| Texel Weight Data |
| -------------------------------------------------------------- |
| |
| 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 |
| -------------------------------------------------------------- |
| Texel Weight Data |
| -------------------------------------------------------------- |
| |
| 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 |
| -------------------------------------------------------------- |
| : More config data : |
| -------------------------------------------------------------- |
| |
| 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 |
| -------------------------------------------------------------- |
| <-Fill direction Color Endpoint Data |
| -------------------------------------------------------------- |
| |
| 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 |
| -------------------------------------------------------------- |
| : Extra configuration data |
| -------------------------------------------------------------- |
| |
| 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 |
| -------------------------------------------------------------- |
| Extra | Part | Block mode | |
| -------------------------------------------------------------- |
| |
| Figure C.1 - Block Layout Overview |
| |
| Dotted partition lines indicate that the split position is not fixed. |
| |
| The "Block mode" field specifies how the Texel Weight Data is encoded. |
| |
| The "Part" field specifies the number of partitions, minus one. If dual |
| plane mode is enabled, the number of partitions must be 3 or fewer. |
| If 4 partitions are specified, the error value is returned for all |
| texels in the block. |
| |
| The size and layout of the extra configuration data depends on the |
| number of partitions, and the number of planes in the image, as shown in |
| figures C.2 and C.3 (only the bottom 32 bits are shown): |
| |
| 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 |
| -------------------------------------------------------------- |
| <- Color endpoint data |CEM |
| -------------------------------------------------------------- |
| |
| 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 |
| -------------------------------------------------------------- |
| CEM | 0 0 | Block Mode | |
| -------------------------------------------------------------- |
| |
| Figure C.2 - Single-partition Block Layout |
| |
| CEM is the color endpoint mode field, which determines how the Color |
| Endpoint Data is encoded. |
| |
| If dual-plane mode is active, the color component selector bits appear |
| directly below the weight bits. |
| |
| 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 |
| -------------------------------------------------------------- |
| | CEM | Partition Index |
| -------------------------------------------------------------- |
| |
| 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 |
| -------------------------------------------------------------- |
| Partition Index | Block Mode | |
| -------------------------------------------------------------- |
| |
| Figure C.3 - Multi-partition Block Layout |
| |
| The Partition Index field specifies which partition layout to use. CEM is |
| the first 6 bits of color endpoint mode information for the various |
| partitions. For modes which require more than 6 bits of CEM data, the |
| additional bits appear at a variable position directly beneath the texel |
| weight data. |
| |
| If dual-plane mode is active, the color component selector bits then appear |
| directly below the additional CEM bits. |
| |
| The final special case is that if bits [8:0] of the block are "111111100", |
| then the block is a void-extent block, which has a separate encoding |
| described in section C.2.22. |
| |
| C.2.10 Block Mode |
| ------------------ |
| |
| The Block Mode field specifies the width, height and depth of the grid of |
| weights, what range of values they use, and whether dual weight planes are |
| present. Since some these are not represented using powers of two (there |
| are 12 possible weight widths, for example), and not all combinations are |
| allowed, this is not a simple bit packing. However, it can be unpacked |
| quickly in hardware. |
| |
| The weight ranges are encoded using a 3 bit value R, which is interpreted |
| together with a precision bit H, as follows: |
| |
| Low Precision Range (H=0) High Precision Range (H=1) |
| R Weight Range Trits Quints Bits Weight Range Trits Quints Bits |
| ------------------------------------------------------------------------- |
| 000 Invalid Invalid |
| 001 Invalid Invalid |
| 010 0..1 1 0..9 1 1 |
| 011 0..2 1 0..11 1 2 |
| 100 0..3 2 0..15 4 |
| 101 0..4 1 0..19 1 2 |
| 110 0..5 1 1 0..23 1 3 |
| 111 0..7 3 0..31 5 |
| ------------------------------------------------------------------------- |
| Table C.2.7 - Weight Range Encodings |
| |
| Each weight value is encoded using the specified number of Trits, Quints |
| and Bits. The details of this encoding can be found in Section C.3.12 - |
| Integer Sequence Encoding. |
| |
| For 2D blocks, the Block Mode field is laid out as follows: |
| |
| ------------------------------------------------------------------------- |
| 10 9 8 7 6 5 4 3 2 1 0 Width Height Notes |
| ------------------------------------------------------------------------- |
| D H B A R0 0 0 R2 R1 B+4 A+2 |
| D H B A R0 0 1 R2 R1 B+8 A+2 |
| D H B A R0 1 0 R2 R1 A+2 B+8 |
| D H 0 B A R0 1 1 R2 R1 A+2 B+6 |
| D H 1 B A R0 1 1 R2 R1 B+2 A+2 |
| D H 0 0 A R0 R2 R1 0 0 12 A+2 |
| D H 0 1 A R0 R2 R1 0 0 A+2 12 |
| D H 1 1 0 0 R0 R2 R1 0 0 6 10 |
| D H 1 1 0 1 R0 R2 R1 0 0 10 6 |
| B 1 0 A R0 R2 R1 0 0 A+6 B+6 D=0, H=0 |
| x x 1 1 1 1 1 1 1 0 0 - - Void-extent |
| x x 1 1 1 x x x x 0 0 - - Reserved* |
| x x x x x x x 0 0 0 0 - - Reserved |
| ------------------------------------------------------------------------- |
| Table C.2.8 - 2D Block Mode Layout |
| |
| Note that, due to the encoding of the R field, as described in the |
| previous page, bits R2 and R1 cannot both be zero, which disambiguates |
| the first five rows from the rest of the table. |
| |
| Bit positions with a value of x are ignored for purposes of determining |
| if a block is a void-extent block or reserved, but may have defined |
| encodings for specific void-extent blocks. |
| |
| The penultimate row of the table is reserved only if bits [5:2] are not |
| all 1, in which case it encodes a void-extent block (as shown in the |
| previous row). |
| |
| The D bit is set to indicate dual-plane mode. In this mode, the maximum |
| allowed number of partitions is 3. |
| |
| The penultimate row of the table is reserved only if bits [4:2] are not |
| all 1, in which case it encodes a void-extent block (as shown in the |
| previous row). |
| |
| The size of the grid in each dimension must be less than or equal to |
| the corresponding dimension of the block footprint. If the grid size |
| is greater than the footprint dimension in any axis, then this is an |
| illegal block encoding and all texels will decode to the error color. |
| |
| C.2.11 Color Endpoint Mode |
| --------------------------- |
| |
| In single-partition mode, the Color Endpoint Mode (CEM) field stores one |
| of 16 possible values. Each of these specifies how many raw data values |
| are encoded, and how to convert these raw values into two RGBA color |
| endpoints. They can be summarized as follows: |
| |
| --------------------------------------------- |
| CEM Description Class |
| --------------------------------------------- |
| 0 LDR Luminance, direct 0 |
| 1 LDR Luminance, base+offset 0 |
| 2 HDR Luminance, large range 0 |
| 3 HDR Luminance, small range 0 |
| 4 LDR Luminance+Alpha, direct 1 |
| 5 LDR Luminance+Alpha, base+offset 1 |
| 6 LDR RGB, base+scale 1 |
| 7 HDR RGB, base+scale 1 |
| 8 LDR RGB, direct 2 |
| 9 LDR RGB, base+offset 2 |
| 10 LDR RGB, base+scale plus two A 2 |
| 11 HDR RGB, direct 2 |
| 12 LDR RGBA, direct 3 |
| 13 LDR RGBA, base+offset 3 |
| 14 HDR RGB, direct + LDR Alpha 3 |
| 15 HDR RGB, direct + HDR Alpha 3 |
| --------------------------------------------- |
| Table C.2.10 - Color Endpoint Modes. |
| [[ If the HDR profile is not implemented, remove from table C.2.10 |
| all rows whose description starts with "HDR", and add to the |
| caption: ]] |
| Modes not described in the CEM column are reserved for HDR modes, and |
| will generate errors in an unextended OpenGL ES implementation. |
| |
| In multi-partition mode, the CEM field is of variable width, from 6 to 14 |
| bits. The lowest 2 bits of the CEM field specify how the endpoint mode |
| for each partition is calculated: |
| |
| ---------------------------------------------------- |
| Value Meaning |
| ---------------------------------------------------- |
| 00 All color endpoint pairs are of the same type. |
| A full 4-bit CEM is stored in block bits [28:25] |
| and is used for all partitions. |
| 01 All endpoint pairs are of class 0 or 1. |
| 10 All endpoint pairs are of class 1 or 2. |
| 11 All endpoint pairs are of class 2 or 3. |
| ---------------------------------------------------- |
| Table C.2.11 - Multi-Partition Color Endpoint Modes |
| |
| If the CEM selector value in bits [24:23] is not 00, |
| then data layout is as follows: |
| |
| --------------------------------------------------- |
| Part n m l k j i h g |
| ------------------------------------------ |
| 2 ... Weight : M1 : ... |
| ------------------------------------------ |
| 3 ... Weight : M2 : M1 :M0 : ... |
| ------------------------------------------ |
| 4 ... Weight : M3 : M2 : M1 : M0 : ... |
| ------------------------------------------ |
| |
| Part 28 27 26 25 24 23 |
| ---------------------- |
| 2 | M0 |C1 |C0 | CEM | |
| ---------------------- |
| 3 |M0 |C2 |C1 |C0 | CEM | |
| ---------------------- |
| 4 |C3 |C2 |C1 |C0 | CEM | |
| ---------------------- |
| --------------------------------------------------- |
| Figure C.4 - Multi-Partition Color Endpoint Modes |
| |
| In this view, each partition i has two fields. C<i> is the class |
| selector bit, choosing between the two possible CEM classes (0 indicates |
| the lower of the two classes), and M<i> is a two-bit field specifying |
| the low bits of the color endpoint mode within that class. The |
| additional bits appear at a variable bit position, immediately below the |
| texel weight data. |
| |
| The ranges used for the data values are not explicitly specified. |
| Instead, they are derived from the number of available bits remaining |
| after the configuration data and weight data have been specified. |
| |
| Details of the decoding procedure for Color Endpoints can be found in |
| section C.2.13. |
| |
| C.2.12 Integer Sequence Encoding |
| --------------------------------- |
| |
| Both the weight data and the endpoint color data are variable width, and |
| are specified using a sequence of integer values. The range of each |
| value in a sequence (e.g. a color weight) is constrained. |
| |
| Since it is often the case that the most efficient range for these |
| values is not a power of two, each value sequence is encoded using a |
| technique known as "integer sequence encoding". This allows efficient, |
| hardware-friendly packing and unpacking of values with non-power-of-two |
| ranges. |
| |
| In a sequence, each value has an identical range. The range is specified |
| in one of the following forms: |
| |
| Value range MSB encoding LSB encoding Value Block Packed |
| block size |
| ----------- ------------ ------------ ----------- ----- ---------- |
| 0 .. 2^n-1 - n bit value m 1 n |
| m (n <= 8) |
| 0 .. (3 * 2^n)-1 Base-3 "trit" n bit value t * 2^n + m 5 8 + 5*n |
| value t m (n <= 6) |
| 0 .. (5 * 2^n)-1 Base-5 "quint" n bit value q * 2^n + m 3 7 + 3*n |
| value q m (n <= 5) |
| ------------------------------------------- |
| Table C.2.13 -Encoding for Different Ranges |
| |
| Since 3^5 is 243, it is possible to pack five trits into 8 bits(which has |
| 256 possible values), so a trit can effectively be encoded as 1.6 bits. |
| Similarly, since 5^3 is 125, it is possible to pack three quints into |
| 7 bits (which has 128 possible values), so a quint can be encoded as |
| 2.33 bits. |
| |
| The encoding scheme packs the trits or quints, and then interleaves the n |
| additional bits in positions that satisfy the requirements of an |
| arbitrary length stream. This makes it possible to correctly specify |
| lists of values whose length is not an integer multiple of 3 or 5 values. |
| It also makes it possible to easily select a value at random within the stream. |
| |
| If there are insufficient bits in the stream to fill the final block, then |
| unused (higher order) bits are assumed to be 0 when decoding. |
| |
| To decode the bits for value number i in a sequence of bits b, both |
| indexed from 0, perform the following: |
| |
| If the range is encoded as n bits per value, then the value is bits |
| b[i*n+n-1:i*n] - a simple multiplexing operation. |
| |
| If the range is encoded using a trit, then each block contains 5 values |
| (v0 to v4), each of which contains a trit (t0 to t4) and a corresponding |
| LSB value (m0 to m4). The first bit of the packed block is bit |
| floor(i/5)*(8+5*n). The bits in the block are packed as follows |
| (in this example, n is 4): |
| |
| 27 26 25 24 23 22 21 20 19 18 17 16 |
| ----------------------------------------------- |
| |T7 | m4 |T6 T5 | m3 |T4 | |
| ----------------------------------------------- |
| |
| 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 |
| -------------------------------------------------------------- |
| | m2 |T3 T2 | m1 |T1 T0 | m0 | |
| -------------------------------------------------------------- |
| |
| Figure C.5 - Trit-based Packing |
| |
| The five trits t0 to t4 are obtained by bit manipulations of the 8 bits |
| T[7:0] as follows: |
| |
| if T[4:2] = 111 |
| C = { T[7:5], T[1:0] }; t4 = t3 = 2 |
| else |
| C = T[4:0] |
| if T[6:5] = 11 |
| t4 = 2; t3 = T[7] |
| else |
| t4 = T[7]; t3 = T[6:5] |
| |
| if C[1:0] = 11 |
| t2 = 2; t1 = C[4]; t0 = { C[3], C[2]&~C[3] } |
| else if C[3:2] = 11 |
| t2 = 2; t1 = 2; t0 = C[1:0] |
| else |
| t2 = C[4]; t1 = C[3:2]; t0 = { C[1], C[0]&~C[1] } |
| |
| If the range is encoded using a quint, then each block contains 3 values |
| (v0 to v2), each of which contains a quint (q0 to q2) and a corresponding |
| LSB value (m0 to m2). The first bit of the packed block is bit |
| floor(i/3)*(7+3*n). |
| |
| The bits in the block are packed as follows (in this example, n is 4): |
| |
| 18 17 16 |
| ----------- |
| |Q6 Q5 | m2 |
| ----------- |
| 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 |
| --------------------------------------------------------------- |
| m2 |Q4 Q3 | m1 |Q2 Q1 Q0 | m0 | |
| --------------------------------------------------------------- |
| |
| Figure C.6 - Quint-based Packing |
| |
| The three quints q0 to q2 are obtained by bit manipulations of the 7 bits |
| Q[6:0] as follows: |
| |
| if Q[2:1] = 11 and Q[6:5] = 00 |
| q2 = { Q[0], Q[4]&~Q[0], Q[3]&~Q[0] }; q1 = q0 = 4 |
| else |
| if Q[2:1] = 11 |
| q2 = 4; C = { Q[4:3], ~Q[6:5], Q[0] } |
| else |
| q2 = T[6:5]; C = Q[4:0] |
| |
| if C[2:0] = 101 |
| q1 = 4; q0 = C[4:3] |
| else |
| q1 = C[4:3]; q0 = C[2:0] |
| |
| Both these procedures ensure a valid decoding for all 128 possible values |
| (even though a few are duplicates). They can also be implemented |
| efficiently in software using small tables. |
| |
| Encoding methods are not specified here, although table-based mechanisms |
| work well. |
| |
| C.2.13 Endpoint Unquantization |
| ------------------------------- |
| |
| Each color endpoint is specified as a sequence of integers in a given |
| range. These values are packed using integer sequence encoding, as a |
| stream of bits stored from just above the configuration data, and |
| growing upwards. |
| |
| Once unpacked, the values must be unquantized from their storage range, |
| returning them to a standard range of 0..255. |
| |
| For bit-only representations, this is simple bit replication from the |
| most significant bit of the value. |
| |
| For trit or quint-based representations, this involves a set of bit |
| manipulations and adjustments to avoid the expense of full-width |
| multipliers. This procedure ensures correct scaling, but scrambles |
| the order of the decoded values relative to the encoded values. |
| This must be compensated for using a table in the encoder. |
| |
| The initial inputs to the procedure are denoted A (9 bits), B (9 bits), |
| C (9 bits) and D (3 bits) and are decoded using the range as follows: |
| |
| --------------------------------------------------------------- |
| Range T Q B Bits A B C D |
| --------------------------------------------------------------- |
| 0..5 1 1 a aaaaaaaaa 000000000 204 Trit value |
| 0..9 1 1 a aaaaaaaaa 000000000 113 Quint value |
| 0..11 1 2 ba aaaaaaaaa b000b0bb0 93 Trit value |
| 0..19 1 2 ba aaaaaaaaa b0000bb00 54 Quint value |
| 0..23 1 3 cba aaaaaaaaa cb000cbcb 44 Trit value |
| 0..39 1 3 cba aaaaaaaaa cb0000cbc 26 Quint value |
| 0..47 1 4 dcba aaaaaaaaa dcb000dcb 22 Trit value |
| 0..79 1 4 dcba aaaaaaaaa dcb0000dc 13 Quint value |
| 0..95 1 5 edcba aaaaaaaaa edcb000ed 11 Trit value |
| 0..159 1 5 edcba aaaaaaaaa edcb0000e 6 Quint value |
| 0..191 1 6 fedcba aaaaaaaaa fedcb000f 5 Trit value |
| --------------------------------------------------------------- |
| Table C.2.16 - Color Unquantization Parameters |
| |
| These are then processed as follows: |
| |
| T = D * C + B; |
| T = T ^ A; |
| T = (A & 0x80) | (T >> 2); |
| |
| Note that the multiply in the first line is nearly trivial as it only |
| needs to multiply by 0, 1, 2, 3 or 4. |
| |
| C.2.14 LDR Endpoint Decoding |
| ----------------------------- |
| The decoding method used depends on the Color Endpoint Mode (CEM) field, |
| which specifies how many values are used to represent the endpoint. |
| |
| The CEM field also specifies how to take the n unquantized color endpoint |
| values v0 to v[n-1] and convert them into two RGBA color endpoints e0 |
| and e1. |
| |
| The HDR Modes are more complex and do not fit neatly into this section. |
| They are documented in following section. |
| |
| The methods can be summarized as follows. |
| |
| ------------------------------------------------- |
| CEM Range Description n |
| ------------------------------------------------- |
| 0 LDR Luminance, direct 2 |
| 1 LDR Luminance, base+offset 2 |
| 2 HDR Luminance, large range 2 |
| 3 HDR Luminance, small range 2 |
| 4 LDR Luminance+Alpha, direct 4 |
| 5 LDR Luminance+Alpha, base+offset 4 |
| 6 LDR RGB, base+scale 4 |
| 7 HDR RGB, base+scale 4 |
| 8 LDR RGB, direct 6 |
| 9 LDR RGB, base+offset 6 |
| 10 LDR RGB, base+scale plus two A 6 |
| 11 HDR RGB 6 |
| 12 LDR RGBA, direct 8 |
| 13 LDR RGBA, base+offset 8 |
| 14 HDR RGB + LDR Alpha 8 |
| 15 HDR RGB + HDR Alpha 8 |
| ------------------------------------------------- |
| Table C.2.17 -Color Endpoint Modes |
| [[ If the HDR profile is not implemented, remove from table C.2.17 |
| all rows whose description starts with "HDR", and add to the |
| caption: ]] |
| Modes not described are reserved, as described in table C.2.10. |
| |
| [[ HDR profile only ]] |
| Mode 14 is special in that the alpha values are interpolated linearly, |
| but the color components are interpolated logarithmically. This is the |
| only endpoint format with mixed-mode operation, and will return the |
| error value if encountered in LDR mode. |
| |
| Decode the different LDR endpoint modes as follows: |
| |
| Mode 0 LDR Luminance, direct |
| |
| e0=(v0,v0,v0,0xFF); e1=(v1,v1,v1,0xFF); |
| |
| Mode 1 LDR Luminance, base+offset |
| |
| L0 = (v0>>2)|(v1&0xC0); L1=L0+(v1&0x3F); |
| if (L1>0xFF) { L1=0xFF; } |
| e0=(L0,L0,L0,0xFF); e1=(L1,L1,L1,0xFF); |
| |
| Mode 4 LDR Luminance+Alpha,direct |
| |
| e0=(v0,v0,v0,v2); |
| e1=(v1,v1,v1,v3); |
| |
| Mode 5 LDR Luminance+Alpha, base+offset |
| |
| bit_transfer_signed(v1,v0); bit_transfer_signed(v3,v2); |
| e0=(v0,v0,v0,v2); e1=(v0+v1,v0+v1,v0+v1,v2+v3); |
| clamp_unorm8(e0); clamp_unorm8(e1); |
| |
| Mode 6 LDR RGB, base+scale |
| |
| e0=(v0*v3>>8,v1*v3>>8,v2*v3>>8, 0xFF); |
| e1=(v0,v1,v2,0xFF); |
| |
| Mode 8 LDR RGB, Direct |
| |
| s0= v0+v2+v4; s1= v1+v3+v5; |
| if (s1>=s0){e0=(v0,v2,v4,0xFF); |
| e1=(v1,v3,v5,0xFF); } |
| else { e0=blue_contract(v1,v3,v5,0xFF); |
| e1=blue_contract(v0,v2,v4,0xFF); } |
| |
| Mode 9 LDR RGB, base+offset |
| |
| bit_transfer_signed(v1,v0); |
| bit_transfer_signed(v3,v2); |
| bit_transfer_signed(v5,v4); |
| if(v1+v3+v5 >= 0) |
| { e0=(v0,v2,v4,0xFF); e1=(v0+v1,v2+v3,v4+v5,0xFF); } |
| else |
| { e0=blue_contract(v0+v1,v2+v3,v4+v5,0xFF); |
| e1=blue_contract(v0,v2,v4,0xFF); } |
| clamp_unorm8(e0); clamp_unorm8(e1); |
| |
| Mode 10 LDR RGB, base+scale plus two A |
| |
| e0=(v0*v3>>8,v1*v3>>8,v2*v3>>8, v4); |
| e1=(v0,v1,v2, v5); |
| |
| Mode 12 LDR RGBA, direct |
| |
| s0= v0+v2+v4; s1= v1+v3+v5; |
| if (s1>=s0){e0=(v0,v2,v4,v6); |
| e1=(v1,v3,v5,v7); } |
| else { e0=blue_contract(v1,v3,v5,v7); |
| e1=blue_contract(v0,v2,v4,v6); } |
| |
| Mode 13 LDR RGBA, base+offset |
| |
| bit_transfer_signed(v1,v0); |
| bit_transfer_signed(v3,v2); |
| bit_transfer_signed(v5,v4); |
| bit_transfer_signed(v7,v6); |
| if(v1+v3+v5>=0) { e0=(v0,v2,v4,v6); |
| e1=(v0+v1,v2+v3,v4+v5,v6+v7); } |
| else { e0=blue_contract(v0+v1,v2+v3,v4+v5,v6+v7); |
| e1=blue_contract(v0,v2,v4,v6); } |
| clamp_unorm8(e0); clamp_unorm8(e1); |
| |
| The bit_transfer_signed procedure transfers a bit from one value (a) |
| to another (b). Initially, both a and b are in the range 0..255. |
| After calling this procedure, a's range becomes -32..31, and b remains |
| in the range 0..255. Note that, as is often the case, this is easier to |
| express in hardware than in C: |
| |
| bit_transfer_signed(int& a, int& b) |
| { |
| b >>= 1; |
| b |= a & 0x80; |
| a >>= 1; |
| a &= 0x3F; |
| if( (a&0x20)!=0 ) a-=0x40; |
| } |
| |
| The blue_contract procedure is used to give additional precision to |
| RGB colors near grey: |
| |
| color blue_contract( int r, int g, int b, int a ) |
| { |
| color c; |
| c.r = (r+b) >> 1; |
| c.g = (g+b) >> 1; |
| c.b = b; |
| c.a = a; |
| return c; |
| } |
| |
| The clamp_unorm8 procedure is used to clamp a color into the UNORM8 range: |
| |
| void clamp_unorm8(color c) |
| { |
| if(c.r < 0) {c.r=0;} else if(c.r > 255) {c.r=255;} |
| if(c.g < 0) {c.g=0;} else if(c.g > 255) {c.g=255;} |
| if(c.b < 0) {c.b=0;} else if(c.b > 255) {c.b=255;} |
| if(c.a < 0) {c.a=0;} else if(c.a > 255) {c.a=255;} |
| } |
| |
| [[ If the HDR profile is not implemented, do not include section |
| C.2.15 ]] |
| |
| C.2.15 HDR Endpoint Decoding |
| ------------------------- |
| |
| For HDR endpoint modes, color values are represented in a 12-bit |
| pseudo-logarithmic representation. |
| |
| HDR Endpoint Mode 2 |
| |
| Mode 2 represents luminance-only data with a large range. It encodes |
| using two values (v0, v1). The complete decoding procedure is as follows: |
| |
| if(v1 >= v0) |
| { |
| y0 = (v0 << 4); |
| y1 = (v1 << 4); |
| } |
| else |
| { |
| y0 = (v1 << 4) + 8; |
| y1 = (v0 << 4) - 8; |
| } |
| // Construct RGBA result (0x780 is 1.0f) |
| e0 = (y0, y0, y0, 0x780); |
| e1 = (y1, y1, y1, 0x780); |
| |
| HDR Endpoint Mode 3 |
| |
| Mode 3 represents luminance-only data with a small range. It packs the |
| bits for a base luminance value, together with an offset, into two values |
| (v0, v1): |
| |
| Value 7 6 5 4 3 2 1 0 |
| ----- ------------------------------ |
| v0 |M | L[6:0] | |
| ------------------------------ |
| v1 | X[3:0] | d[3:0] | |
| ------------------------------ |
| |
| Table C.2.18 - HDR Mode 3 Value Layout |
| |
| The bit field marked as X allocates different bits to L or d depending |
| on the value of the mode bit M. |
| |
| The complete decoding procedure is as follows: |
| |
| // Check mode bit and extract. |
| if((v0&0x80) !=0) |
| { |
| y0 = ((v1 & 0xE0) << 4) | ((v0 & 0x7F) << 2); |
| d = (v1 & 0x1F) << 2; |
| } |
| else |
| { |
| y0 = ((v1 & 0xF0) << 4) | ((v0 & 0x7F) << 1); |
| d = (v1 & 0x0F) << 1; |
| } |
| |
| // Add delta and clamp |
| y1 = y0 + d; |
| if(y1 > 0xFFF) { y1 = 0xFFF; } |
| |
| // Construct RGBA result (0x780 is 1.0f) |
| e0 = (y0, y0, y0, 0x780); |
| e1 = (y1, y1, y1, 0x780); |
| |
| HDR Endpoint Mode 7 |
| |
| Mode 7 packs the bits for a base RGB value, a scale factor, and some |
| mode bits into the four values (v0, v1, v2, v3): |
| |
| Value 7 6 5 4 3 2 1 0 |
| ----- ------------------------------ |
| v0 |M[3:2] | R[5:0] | |
| ----- ------------------------------ |
| v1 |M1 |X0 |X1 | G[4:0] | |
| ----- ------------------------------ |
| v2 |M0 |X2 |X3 | B[4:0] | |
| ----- ------------------------------ |
| v3 |X4 |X5 |X6 | S[4:0] | |
| ----- ------------------------------ |
| Table C.2.19 - HDR Mode 7 Value Layout |
| |
| The mode bits M0 to M3 are a packed representation of an endpoint bit |
| mode, together with the major component index. For modes 0 to 4, the |
| component (red, green, or blue) with the largest magnitude is identified, |
| and the values swizzled to ensure that it is decoded from the red channel. |
| |
| The endpoint bit mode is used to determine the number of bits assigned |
| to each component of the endpoint, and the destination of each of the |
| extra bits X0 to X6, as follows: |
| |
| ------------------------------------------------------ |
| Number of bits Destination of extra bits |
| Mode R G B S X0 X1 X2 X3 X4 X5 X6 |
| ------------------------------------------------------ |
| 0 11 5 5 7 R9 R8 R7 R10 R6 S6 S5 |
| 1 11 6 6 5 R8 G5 R7 B5 R6 R10 R9 |
| 2 10 5 5 8 R9 R8 R7 R6 S7 S6 S5 |
| 3 9 6 6 7 R8 G5 R7 B5 R6 S6 S5 |
| 4 8 7 7 6 G6 G5 B6 B5 R6 R7 S5 |
| 5 7 7 7 7 G6 G5 B6 B5 R6 S6 S5 |
| ------------------------------------------------------ |
| Table C.2.20 - Endpoint Bit Mode |
| |
| As noted before, this appears complex when expressed in C, but much |
| easier to achieve in hardware - bit masking, extraction, shifting |
| and assignment usually ends up as a single wire or multiplexer. |
| |
| The complete decoding procedure is as follows: |
| |
| // Extract mode bits and unpack to major component and mode. |
| int modeval = ((v0&0xC0)>>6) | ((v1&0x80)>>5) | ((v2&0x80)>>4); |
| |
| int majcomp; |
| int mode; |
| |
| if( (modeval & 0xC ) != 0xC ) |
| { |
| majcomp = modeval >> 2; mode = modeval & 3; |
| } |
| else if( modeval != 0xF ) |
| { |
| majcomp = modeval & 3; mode = 4; |
| } |
| else |
| { |
| majcomp = 0; mode = 5; |
| } |
| |
| // Extract low-order bits of r, g, b, and s. |
| int red = v0 & 0x3f; |
| int green = v1 & 0x1f; |
| int blue = v2 & 0x1f; |
| int scale = v3 & 0x1f; |
| |
| // Extract high-order bits, which may be assigned depending on mode |
| int x0 = (v1 >> 6) & 1; int x1 = (v1 >> 5) & 1; |
| int x2 = (v2 >> 6) & 1; int x3 = (v2 >> 5) & 1; |
| int x4 = (v3 >> 7) & 1; int x5 = (v3 >> 6) & 1; |
| int x6 = (v3 >> 5) & 1; |
| |
| // Now move the high-order xs into the right place. |
| int ohm = 1 << mode; |
| if( ohm & 0x30 ) green |= x0 << 6; |
| if( ohm & 0x3A ) green |= x1 << 5; |
| if( ohm & 0x30 ) blue |= x2 << 6; |
| if( ohm & 0x3A ) blue |= x3 << 5; |
| if( ohm & 0x3D ) scale |= x6 << 5; |
| if( ohm & 0x2D ) scale |= x5 << 6; |
| if( ohm & 0x04 ) scale |= x4 << 7; |
| if( ohm & 0x3B ) red |= x4 << 6; |
| if( ohm & 0x04 ) red |= x3 << 6; |
| if( ohm & 0x10 ) red |= x5 << 7; |
| if( ohm & 0x0F ) red |= x2 << 7; |
| if( ohm & 0x05 ) red |= x1 << 8; |
| if( ohm & 0x0A ) red |= x0 << 8; |
| if( ohm & 0x05 ) red |= x0 << 9; |
| if( ohm & 0x02 ) red |= x6 << 9; |
| if( ohm & 0x01 ) red |= x3 << 10; |
| if( ohm & 0x02 ) red |= x5 << 10; |
| |
| // Shift the bits to the top of the 12-bit result. |
| static const int shamts[6] = { 1,1,2,3,4,5 }; |
| int shamt = shamts[mode]; |
| red <<= shamt; green <<= shamt; blue <<= shamt; scale <<= shamt; |
| |
| // Minor components are stored as differences |
| if( mode != 5 ) { green = red - green; blue = red - blue; } |
| |
| // Swizzle major component into place |
| if( majcomp == 1 ) swap( red, green ); |
| if( majcomp == 2 ) swap( red, blue ); |
| |
| // Clamp output values, set alpha to 1.0 |
| e1.r = clamp( red, 0, 0xFFF ); |
| e1.g = clamp( green, 0, 0xFFF ); |
| e1.b = clamp( blue, 0, 0xFFF ); |
| e1.alpha = 0x780; |
| |
| e0.r = clamp( red - scale, 0, 0xFFF ); |
| e0.g = clamp( green - scale, 0, 0xFFF ); |
| e0.b = clamp( blue - scale, 0, 0xFFF ); |
| e0.alpha = 0x780; |
| |
| HDR Endpoint Mode 11 |
| |
| Mode 11 specifies two RGB values, which it calculates from a number of |
| bitfields (a, b0, b1, c, d0 and d1) which are packed together with some |
| mode bits into the six values (v0, v1, v2, v3, v4, v5): |
| |
| Value 7 6 5 4 3 2 1 0 |
| ----- ------------------------------ |
| v0 | a[7:0] | |
| ----- ------------------------------ |
| v1 |m0 |a8 | c[5:0] | |
| ----- ------------------------------ |
| v2 |m1 |X0 | b0[5:0] | |
| ----- ------------------------------ |
| v3 |m2 |X1 | b1[5:0] | |
| ----- ------------------------------ |
| v4 |mj0|X2 |X4 | d0[4:0] | |
| ----- ------------------------------ |
| v5 |mj1|X3 |X5 | d1[4:0] | |
| ----- ------------------------------ |
| Table C.2.21 - HDR Mode 11 Value Layout |
| |
| If the major component bits mj[1:0 ] are both 1, then the RGB values |
| are specified directly |
| |
| Value 7 6 5 4 3 2 1 0 |
| ----- ------------------------------ |
| v0 | R0[11:4] | |
| ----- ------------------------------ |
| v1 | R1[11:4] | |
| ----- ------------------------------ |
| v2 | G0[11:4] | |
| ----- ------------------------------ |
| v3 | G1[11:4] | |
| ----- ------------------------------ |
| v4 | 1 | B0[11:5] | |
| ----- ------------------------------ |
| v5 | 1 | B1[11:5] | |
| ----- ------------------------------ |
| Table C.2.22 - HDR Mode 11 Value Layout |
| |
| The mode bits m[2:0] specify the bit allocation for the different |
| values, and the destinations of the extra bits X0 to X5: |
| |
| ------------------------------------------------------------------------- |
| Number of bits Destination of extra bits |
| Mode a b c d X0 X1 X2 X3 X4 X5 |
| ------------------------------------------------------------------------- |
| 0 9 7 6 7 b0[6] b1[6] d0[6] d1[6] d0[5] d1[5] |
| 1 9 8 6 6 b0[6] b1[6] b0[7] b1[7] d0[5] d1[5] |
| 2 10 6 7 7 a[9] c[6] d0[6] d1[6] d0[5] d1[5] |
| 3 10 7 7 6 b0[6] b1[6] a[9] c[6] d0[5] d1[5] |
| 4 11 8 6 5 b0[6] b1[6] b0[7] b1[7] a[9] a[10] |
| 5 11 6 7 6 a[9] a[10] c[7] c[6] d0[5] d1[5] |
| 6 12 7 7 5 b0[6] b1[6] a[11] c[6] a[9] a[10] |
| 7 12 6 7 6 a[9] a[10] a[11] c[6] d0[5] d1[5] |
| ------------------------------------------------------------------------- |
| Table C.2.23 - Endpoint Bit Mode |
| |
| The complete decoding procedure is as follows: |
| |
| // Find major component |
| int majcomp = ((v4 & 0x80) >> 7) | ((v5 & 0x80) >> 6); |
| |
| // Deal with simple case first |
| if( majcomp == 3 ) |
| { |
| e0 = (v0 << 4, v2 << 4, (v4 & 0x7f) << 5, 0x780); |
| e1 = (v1 << 4, v3 << 4, (v5 & 0x7f) << 5, 0x780); |
| return; |
| } |
| |
| // Decode mode, parameters. |
| int mode = ((v1&0x80)>>7) | ((v2&0x80)>>6) | ((v3&0x80)>>5); |
| int va = v0 | ((v1 & 0x40) << 2); |
| int vb0 = v2 & 0x3f; |
| int vb1 = v3 & 0x3f; |
| int vc = v1 & 0x3f; |
| int vd0 = v4 & 0x7f; |
| int vd1 = v5 & 0x7f; |
| |
| // Assign top bits of vd0, vd1. |
| static const int dbitstab[8] = {7,6,7,6,5,6,5,6}; |
| vd0 = signextend( vd0, dbitstab[mode] ); |
| vd1 = signextend( vd1, dbitstab[mode] ); |
| |
| // Extract and place extra bits |
| int x0 = (v2 >> 6) & 1; |
| int x1 = (v3 >> 6) & 1; |
| int x2 = (v4 >> 6) & 1; |
| int x3 = (v5 >> 6) & 1; |
| int x4 = (v4 >> 5) & 1; |
| int x5 = (v5 >> 5) & 1; |
| |
| int ohm = 1 << mode; |
| if( ohm & 0xA4 ) va |= x0 << 9; |
| if( ohm & 0x08 ) va |= x2 << 9; |
| if( ohm & 0x50 ) va |= x4 << 9; |
| if( ohm & 0x50 ) va |= x5 << 10; |
| if( ohm & 0xA0 ) va |= x1 << 10; |
| if( ohm & 0xC0 ) va |= x2 << 11; |
| if( ohm & 0x04 ) vc |= x1 << 6; |
| if( ohm & 0xE8 ) vc |= x3 << 6; |
| if( ohm & 0x20 ) vc |= x2 << 7; |
| if( ohm & 0x5B ) vb0 |= x0 << 6; |
| if( ohm & 0x5B ) vb1 |= x1 << 6; |
| if( ohm & 0x12 ) vb0 |= x2 << 7; |
| if( ohm & 0x12 ) vb1 |= x3 << 7; |
| |
| // Now shift up so that major component is at top of 12-bit value |
| int shamt = (modeval >> 1) ^ 3; |
| va <<= shamt; vb0 <<= shamt; vb1 <<= shamt; |
| vc <<= shamt; vd0 <<= shamt; vd1 <<= shamt; |
| |
| e1.r = clamp( va, 0, 0xFFF ); |
| e1.g = clamp( va - vb0, 0, 0xFFF ); |
| e1.b = clamp( va - vb1, 0, 0xFFF ); |
| e1.alpha = 0x780; |
| |
| e0.r = clamp( va - vc, 0, 0xFFF ); |
| e0.g = clamp( va - vb0 - vc - vd0, 0, 0xFFF ); |
| e0.b = clamp( va - vb1 - vc - vd1, 0, 0xFFF ); |
| e0.alpha = 0x780; |
| |
| if( majcomp == 1 ) { swap( e0.r, e0.g ); swap( e1.r, e1.g ); } |
| else if( majcomp == 2 ) { swap( e0.r, e0.b ); swap( e1.r, e1.b ); } |
| |
| HDR Endpoint Mode 14 |
| |
| Mode 14 specifies two RGBA values, using the eight values (v0, v1, v2, |
| v3, v4, v5, v6, v7). First, the RGB values are decoded from (v0..v5) |
| using the method from Mode 11, then the alpha values are filled in |
| from v6 and v7: |
| |
| // Decode RGB as for mode 11 |
| (e0,e1) = decode_mode_11(v0,v1,v2,v3,v4,v5) |
| |
| // Now fill in the alphas |
| e0.alpha = v6; |
| e1.alpha = v7; |
| |
| Note that in this mode, the alpha values are interpreted (and |
| interpolated) as 8-bit unsigned normalized values, as in the LDR modes. |
| This is the only mode that exhibits this behaviour. |
| |
| HDR Endpoint Mode 15 |
| |
| Mode 15 specifies two RGBA values, using the eight values (v0, v1, v2, |
| v3, v4, v5, v6, v7). First, the RGB values are decoded from (v0..v5) |
| using the method from Mode 11. The alpha values are stored in values |
| v6 and v7 as a mode and two values which are interpreted according |
| to the mode: |
| |
| Value 7 6 5 4 3 2 1 0 |
| ----- ------------------------------ |
| v6 |M0 | A[6:0] | |
| ----- ------------------------------ |
| v7 |M1 | B[6:0] | |
| ----- ------------------------------ |
| Table C.2.24 - HDR Mode 15 Alpha Value Layout |
| |
| The alpha values are decoded from v6 and v7 as follows: |
| |
| // Decode RGB as for mode 11 |
| (e0,e1) = decode_mode_11(v0,v1,v2,v3,v4,v5) |
| |
| // Extract mode bits |
| mode = ((v6 >> 7) & 1) | ((v7 >> 6) & 2); |
| v6 &= 0x7F; |
| v7 &= 0x7F; |
| |
| if(mode==3) |
| { |
| // Directly specify alphas |
| e0.alpha = v6 << 5; |
| e1.alpha = v7 << 5; |
| } |
| else |
| { |
| // Transfer bits from v7 to v6 and sign extend v7. |
| v6 |= (v7 << (mode+1))) & 0x780; |
| v7 &= (0x3F >> mode); |
| v7 ^= 0x20 >> mode; |
| v7 -= 0x20 >> mode; |
| v6 <<= (4-mode); |
| v7 <<= (4-mode); |
| |
| // Add delta and clamp |
| v7 += v6; |
| v7 = clamp(v7, 0, 0xFFF); |
| e0.alpha = v6; |
| e1.alpha = v7; |
| } |
| |
| Note that in this mode, the alpha values are interpreted (and |
| interpolated) as 12-bit HDR values, and are interpolated as |
| for any other HDR component. |
| |
| C.2.16 Weight Decoding |
| ----------------------- |
| The weight information is stored as a stream of bits, growing downwards |
| from the most significant bit in the block. Bit n in the stream is thus |
| bit 127-n in the block. |
| |
| For each location in the weight grid, a value (in the specified range) |
| is packed into the stream. These are ordered in a raster pattern |
| starting from location (0,0,0), with the X dimension increasing fastest, |
| and the Z dimension increasing slowest. If dual-plane mode is selected, |
| both weights are emitted together for each location, plane 0 first, |
| then plane 1. |
| |
| C.2.17 Weight Unquantization |
| ----------------------------- |
| |
| Each weight plane is specified as a sequence of integers in a given |
| range. These values are packed using integer sequence encoding. |
| |
| Once unpacked, the values must be unquantized from their storage |
| range, returning them to a standard range of 0..64. The procedure |
| for doing so is similar to the color endpoint unquantization. |
| |
| First, we unquantize the actual stored weight values to the range 0..63. |
| |
| For bit-only representations, this is simple bit replication from the |
| most significant bit of the value. |
| |
| For trit or quint-based representations, this involves a set of bit |
| manipulations and adjustments to avoid the expense of full-width |
| multipliers. |
| |
| For representations with no additional bits, the results are as follows: |
| |
| Range 0 1 2 3 4 |
| -------------------------- |
| 0..2 0 32 63 - - |
| 0..4 0 16 32 47 63 |
| -------------------------- |
| Table C.2.25 - Weight Unquantization Values |
| |
| For other values, we calculate the initial inputs to a bit manipulation |
| procedure. These are denoted A (7 bits), B (7 bits), C (7 bits), and |
| D (3 bits) and are decoded using the range as follows: |
| |
| Range T Q B Bits A B C D |
| ------------------------------------------------------- |
| 0..5 1 1 a aaaaaaa 0000000 50 Trit value |
| 0..9 1 1 a aaaaaaa 0000000 28 Quint value |
| 0..11 1 2 ba aaaaaaa b000b0b 23 Trit value |
| 0..19 1 2 ba aaaaaaa b0000b0 13 Quint value |
| 0..23 1 3 cba aaaaaaa cb000cb 11 Trit value |
| ------------------------------------------------------- |
| Table C.2.26 - Weight Unquantization Parameters |
| |
| These are then processed as follows: |
| |
| T = D * C + B; |
| T = T ^ A; |
| T = (A & 0x20) | (T >> 2); |
| |
| Note that the multiply in the first line is nearly trivial as it only |
| needs to multiply by 0, 1, 2, 3 or 4. |
| |
| As a final step, for all types of value, the range is expanded from |
| 0..63 up to 0..64 as follows: |
| |
| if (T > 32) { T += 1; } |
| |
| This allows the implementation to use 64 as a divisor during inter- |
| polation, which is much easier than using 63. |
| |
| C.2.18 Weight Infill |
| --------------------- |
| |
| After unquantization, the weights are subject to weight selection and |
| infill. The infill method is used to calculate the weight for a texel |
| position, based on the weights in the stored weight grid array (which |
| may be a different size). |
| |
| The procedure below must be followed exactly, to ensure bit exact |
| results. |
| |
| The block size is specified as two dimensions along the s and t |
| axes (Bs, Bt). Texel coordinates within the block (s,t) can have values |
| from 0 to one less than the block dimension in that axis. |
| |
| For each block dimension, we compute scale factors (Ds, Dt) |
| |
| Ds = floor( (1024 + floor(Bs/2)) / (Bs-1) ); |
| Dt = floor( (1024 + floor(Bt/2)) / (Bt-1) ); |
| |
| Since the block dimensions are constrained, these are easily looked up |
| in a table. These scale factors are then used to scale the (s,t) |
| coordinates to a homogeneous coordinate (cs, ct): |
| |
| cs = Ds * s; |
| ct = Dt * t; |
| |
| This homogeneous coordinate (cs, ct) is then scaled again to give |
| a coordinate (gs, gt) in the weight-grid space . The weight-grid is |
| of size (N, M), as specified in the block mode field: |
| |
| gs = (cs*(N-1)+32) >> 6; |
| gt = (ct*(M-1)+32) >> 6; |
| |
| The resulting coordinates may be in the range 0..176. These are inter- |
| preted as 4:4 unsigned fixed point numbers in the range 0.0 .. 11.0. |
| |
| If we label the integral parts of these (js, jt) and the fractional |
| parts (fs, ft), then: |
| |
| js = gs >> 4; fs = gs & 0x0F; |
| jt = gt >> 4; ft = gt & 0x0F; |
| |
| These values are then used to bilinearly interpolate between the stored |
| weights. |
| |
| v0 = js + jt*N; |
| p00 = decode_weight(v0); |
| p01 = decode_weight(v0 + 1); |
| p10 = decode_weight(v0 + N); |
| p11 = decode_weight(v0 + N + 1); |
| |
| The function decode_weight(n) decodes the nth weight in the stored weight |
| stream. The values p00 to p11 are the weights at the corner of the square |
| in which the texel position resides. These are then weighted using the |
| fractional position to produce the effective weight i as follows: |
| |
| w11 = (fs*ft+8) >> 4; |
| w10 = ft - w11; |
| w01 = fs - w11; |
| w00 = 16 - fs - ft + w11; |
| i = (p00*w00 + p01*w01 + p10*w10 + p11*w11 + 8) >> 4; |
| |
| C.2.19 Weight Application |
| -------------------------- |
| Once the effective weight i for the texel has been calculated, the color |
| endpoints are interpolated and expanded. |
| |
| For LDR endpoint modes, each color component C is calculated from the |
| corresponding 8-bit endpoint components C0 and C1 as follows: |
| |
| If sRGB conversion is not enabled, or for the alpha channel in any case, |
| C0 and C1 are first expanded to 16 bits by bit replication: |
| |
| C0 = (C0 << 8) | C0; C1 = (C1 << 8) | C1; |
| |
| If sRGB conversion is enabled, C0 and C1 for the R, G, and B channels |
| are expanded to 16 bits differently, as follows: |
| |
| C0 = (C0 << 8) | 0x80; C1 = (C1 << 8) | 0x80; |
| |
| C0 and C1 are then interpolated to produce a UNORM16 result C: |
| |
| C = floor( (C0*(64-i) + C1*i + 32)/64 ) |
| |
| If sRGB conversion is enabled, the top 8 bits of the interpolation |
| result for the R, G and B channels are passed to the external sRGB |
| conversion block. Otherwise, if C = 65535, then the final result is |
| 1.0 (0x3C00) otherwise C is divided by 65536 and the infinite-precision |
| result of the division is converted to FP16 with round-to-zero |
| semantics. |
| |
| For HDR endpoint modes, color values are represented in a 12-bit |
| pseudo-logarithmic representation, and interpolation occurs in a |
| piecewise-approximate logarithmic manner as follows: |
| |
| In LDR mode, the error result is returned. |
| |
| In HDR mode, the color components from each endpoint, C0 and C1, are |
| initially shifted left 4 bits to become 16-bit integer values and these |
| are interpolated in the same way as LDR. The 16-bit value C is then |
| decomposed into the top five bits, E, and the bottom 11 bits M, which |
| are then processed and recombined with E to form the final value Cf: |
| |
| C = floor( (C0*(64-i) + C1*i + 32)/64 ) |
| E = (C&0xF800) >> 11; M = C&0x7FF; |
| if (M < 512) { Mt = 3*M; } |
| else if (M >= 1536) { Mt = 5*M - 2048; } |
| else { Mt = 4*M - 512; } |
| Cf = (E<<10) + (Mt>>3) |
| |
| This interpolation is a considerably closer approximation to a |
| logarithmic space than simple 16-bit interpolation. |
| |
| This final value Cf is interpreted as an IEEE FP16 value. If the result |
| is +Inf or NaN, it is converted to the bit pattern 0x7BFF, which is the |
| largest representable finite value. |
| |
| C.2.20 Dual-Plane Decoding |
| --------------------------- |
| If dual-plane mode is disabled, all of the endpoint components are inter- |
| polated using the same weight value. |
| |
| If dual-plane mode is enabled, two weights are stored with each texel. |
| One component is then selected to use the second weight for interpolation, |
| instead of the first weight. The first weight is then used for all other |
| components. |
| |
| The component to treat specially is indicated using the 2-bit Color |
| Component Selector (CCS) field as follows: |
| |
| Value Weight 0 Weight 1 |
| -------------------------- |
| 0 GBA R |
| 1 RBA G |
| 2 RGA B |
| 3 RGB A |
| -------------------------- |
| Table C.2.28 - Dual Plane Color Component Selector Values |
| |
| The CCS bits are stored at a variable position directly below the weight |
| bits and any additional CEM bits. |
| |
| C.2.21 Partition Pattern Generation |
| ------------------------------------ |
| |
| When multiple partitions are active, each texel position is assigned a |
| partition index. This partition index is calculated using a seed (the |
| partition pattern index), the texel's x,y,z position within the block, |
| and the number of partitions. An additional argument, small_block, is |
| set to 1 if the number of texels in the block is less than 31, |
| otherwise it is set to 0. |
| |
| This function is specified in terms of x, y and z in order to support |
| 3D textures. For 2D textures and texture slices, z will always be 0. |
| |
| The full partition selection algorithm is as follows: |
| |
| int select_partition(int seed, int x, int y, int z, |
| int partitioncount, int small_block) |
| { |
| if( small_block ){ x <<= 1; y <<= 1; z <<= 1; } |
| seed += (partitioncount-1) * 1024; |
| uint32_t rnum = hash52(seed); |
| uint8_t seed1 = rnum & 0xF; |
| uint8_t seed2 = (rnum >> 4) & 0xF; |
| uint8_t seed3 = (rnum >> 8) & 0xF; |
| uint8_t seed4 = (rnum >> 12) & 0xF; |
| uint8_t seed5 = (rnum >> 16) & 0xF; |
| uint8_t seed6 = (rnum >> 20) & 0xF; |
| uint8_t seed7 = (rnum >> 24) & 0xF; |
| uint8_t seed8 = (rnum >> 28) & 0xF; |
| uint8_t seed9 = (rnum >> 18) & 0xF; |
| uint8_t seed10 = (rnum >> 22) & 0xF; |
| uint8_t seed11 = (rnum >> 26) & 0xF; |
| uint8_t seed12 = ((rnum >> 30) | (rnum << 2)) & 0xF; |
| |
| seed1 *= seed1; seed2 *= seed2; |
| seed3 *= seed3; seed4 *= seed4; |
| seed5 *= seed5; seed6 *= seed6; |
| seed7 *= seed7; seed8 *= seed8; |
| seed9 *= seed9; seed10 *= seed10; |
| seed11 *= seed11; seed12 *= seed12; |
| |
| int sh1, sh2, sh3; |
| if( seed & 1 ) |
| { sh1 = (seed&2 ? 4:5); sh2 = (partitioncount==3 ? 6:5); } |
| else |
| { sh1 = (partitioncount==3 ? 6:5); sh2 = (seed&2 ? 4:5); } |
| sh3 = (seed & 0x10) ? sh1 : sh2: |
| |
| seed1 >>= sh1; seed2 >>= sh2; seed3 >>= sh1; seed4 >>= sh2; |
| seed5 >>= sh1; seed6 >>= sh2; seed7 >>= sh1; seed8 >>= sh2; |
| seed9 >>= sh3; seed10 >>= sh3; seed11 >>= sh3; seed12 >>= sh3; |
| |
| int a = seed1*x + seed2*y + seed11*z + (rnum >> 14); |
| int b = seed3*x + seed4*y + seed12*z + (rnum >> 10); |
| int c = seed5*x + seed6*y + seed9 *z + (rnum >> 6); |
| int d = seed7*x + seed8*y + seed10*z + (rnum >> 2); |
| |
| a &= 0x3F; b &= 0x3F; c &= 0x3F; d &= 0x3F; |
| |
| if( partitioncount < 4 ) d = 0; |
| if( partitioncount < 3 ) c = 0; |
| |
| if( a >= b && a >= c && a >= d ) return 0; |
| else if( b >= c && b >= d ) return 1; |
| else if( c >= d ) return 2; |
| else return 3; |
| } |
| |
| As has been observed before, the bit selections are much easier to |
| express in hardware than in C. |
| |
| The seed is expanded using a hash function hash52, which is defined as |
| follows: |
| |
| uint32_t hash52( uint32_t p ) |
| { |
| p ^= p >> 15; p -= p << 17; p += p << 7; p += p << 4; |
| p ^= p >> 5; p += p << 16; p ^= p >> 7; p ^= p >> 3; |
| p ^= p << 6; p ^= p >> 17; |
| return p; |
| } |
| |
| This assumes that all operations act on 32-bit values |
| |
| C.2.22 Data Size Determination |
| ------------------------------- |
| |
| The size of the data used to represent color endpoints is not |
| explicitly specified. Instead, it is determined from the block mode and |
| number of partitions as follows: |
| |
| config_bits = 17; |
| if(num_partitions>1) |
| if(single_CEM) |
| config_bits = 29; |
| else |
| config_bits = 25 + 3*num_partitions; |
| |
| num_weights = M * N * Q; // size of weight grid |
| |
| if(dual_plane) |
| config_bits += 2; |
| num_weights *= 2; |
| |
| weight_bits = ceil(num_weights*8*trits_in_weight_range/5) + |
| ceil(num_weights*7*quints_in_weight_range/3) + |
| num_weights*bits_in_weight_range; |
| |
| remaining_bits = 128 - config_bits - weight_bits; |
| |
| num_CEM_pairs = base_CEM_class+1 + count_bits(extra_CEM_bits); |
| |
| The CEM value range is then looked up from a table indexed by remaining |
| bits and num_CEM_pairs. This table is initialized such that the range |
| is as large as possible, consistent with the constraint that the number |
| of bits required to encode num_CEM_pairs pairs of values is not more |
| than the number of remaining bits. |
| |
| An equivalent iterative algorithm would be: |
| |
| num_CEM_values = num_CEM_pairs*2; |
| |
| for(range = each possible CEM range in descending order of size) |
| { |
| CEM_bits = ceil(num_CEM_values*8*trits_in_CEM_range/5) + |
| ceil(num_CEM_values*7*quints_in_CEM_range/3) + |
| num_CEM_values*bits_in_CEM_range; |
| |
| if(CEM_bits <= remaining_bits) |
| break; |
| } |
| return range; |
| |
| In cases where this procedure results in unallocated bits, these bits |
| are not read by the decoding process and can have any value. |
| |
| C.2.23 Void-Extent Blocks |
| -------------------------- |
| |
| A void-extent block is a block encoded with a single color. It also |
| specifies some additional information about the extent of the single- |
| color area beyond this block, which can optionally be used by a |
| decoder to reduce or prevent redundant block fetches. |
| |
| The layout of a 2D Void-Extent block is as follows: |
| |
| 127 126 125 124 123 122 121 120 119 118 117 116 115 114 113 112 |
| --------------------------------------------------------------- |
| | Block color A component | |
| --------------------------------------------------------------- |
| |
| 111 110 109 108 107 106 105 104 103 102 101 100 99 98 97 96 |
| ---------------------------------------------------------------- |
| | Block color B component | |
| ---------------------------------------------------------------- |
| |
| 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 |
| ---------------------------------------------------------------- |
| | Block color G component | |
| ---------------------------------------------------------------- |
| 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 |
| ---------------------------------------------------------------- |
| | Block color R component | |
| ---------------------------------------------------------------- |
| |
| 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 |
| ---------------------------------------------------------------- |
| | Void-extent maximum T coordinate | Min T | |
| ---------------------------------------------------------------- |
| |
| 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 |
| ---------------------------------------------------------------- |
| Void-extent minimum T coordinate | Void-extent max S | |
| ---------------------------------------------------------------- |
| |
| 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 |
| ---------------------------------------------------------------- |
| Void-extent max S coord | Void-extent minimum S coordinate | |
| ---------------------------------------------------------------- |
| 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 |
| ---------------------------------------------------------------- |
| Min S coord | 1 | 1 | D | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | |
| ---------------------------------------------------------------- |
| ------------------------------------------------- |
| Figure C.7 - 2D Void-Extent Block Layout Overview |
| |
| Bit 9 is the Dynamic Range flag, which indicates the format in which |
| colors are stored. A 0 value indicates LDR, in which case the color |
| components are stored as UNORM16 values. A 1 indicates HDR, in which |
| case the color components are stored as FP16 values. |
| |
| The reason for the storage of UNORM16 values in the LDR case is due |
| to the possibility that the value will need to be passed on to sRGB |
| conversion. By storing the color value in the format which comes out |
| of the interpolator, before the conversion to FP16, we avoid having |
| to have separate versions for sRGB and linear modes. |
| |
| If a void-extent block with HDR values is decoded in LDR mode, then |
| the result will be the error color, opaque magenta, for all texels |
| within the block. |
| |
| In the HDR case, if the color component values are infinity or NaN, this |
| will result in undefined behavior. As usual, this must not lead to GL |
| interruption or termination. |
| |
| Bits 10 and 11 are reserved and must be 1. |
| |
| The minimum and maximum coordinate values are treated as unsigned |
| integers and then normalized into the range 0..1 (by dividing by 2^13-1 |
| or 2^9-1, for 2D and 3D respectively). The maximum values for each |
| dimension must be greater than the corresponding minimum values, |
| unless they are all all-1s. |
| |
| If all the coordinates are all-1s, then the void extent is ignored, |
| and the block is simply a constant-color block. |
| |
| The existence of single-color blocks with void extents must not produce |
| results different from those obtained if these single-color blocks are |
| defined without void-extents. Any situation in which the results would |
| differ is invalid. Results from invalid void extents are undefined. |
| |
| If a void-extent appears in a MIPmap level other than the most detailed |
| one, then the extent will apply to all of the more detailed levels too. |
| This allows decoders to avoid sampling more detailed MIPmaps. |
| |
| If the more detailed MIPmap level is not a constant color in this region, |
| then the block may be marked as constant color, but without a void extent, |
| as detailed above. |
| |
| If a void-extent extends to the edge of a texture, then filtered texture |
| colors may not be the same color as that specified in the block, due to |
| texture border colors, wrapping, or cube face wrapping. |
| |
| Care must be taken when updating or extracting partial image data that |
| void-extents in the image do not become invalid. |
| |
| C.2.24 Illegal Encodings |
| ------------------------- |
| |
| In ASTC, there is a variety of ways to encode an illegal block. Decoders |
| are required to recognize all illegal blocks and emit the standard error |
| color value upon encountering an illegal block. |
| |
| Here is a comprehensive list of situations that represent illegal block |
| encodings: |
| |
| * The block mode specified is one of the modes explicitly listed |
| as Reserved. |
| * A 2D void-extent block that has any of the reserved bits not |
| set to 1. |
| * A block mode has been specified that would require more than |
| 64 weights total. |
| * A block mode has been specified that would require more than |
| 96 bits for integer sequence encoding of the weight grid. |
| * A block mode has been specifed that would require fewer than |
| 24 bits for integer sequence encoding of the weight grid. |
| * The size of the weight grid exceeds the size of the block footprint |
| in any dimension. |
| * Color endpoint modes have been specified such that the color |
| integer sequence encoding would require more than 18 integers. |
| * The number of bits available for color endpoint encoding after all |
| the other fields have been counted is less than ceil(13C/5) where C |
| is the number of color endpoint integers (this would restrict color |
| integers to a range smaller than 0..5, which is not supported). |
| * Dual weight mode is enabled for a block with 4 partitions. |
| * Void-Extent blocks where the low coordinate for some texture axis |
| is greater than or equal to the high coordinate. |
| |
| Note also that, in LDR mode, a block which has both HDR and LDR endpoint |
| modes assigned to different partitions is not an error block. Only those |
| texels which belong to the HDR partition will result in the error color. |
| Texels belonging to a LDR partition will be decoded as normal. |
| |
| C.2.25 LDR PROFILE SUPPORT |
| --------------------------- |
| |
| Implementations of the LDR Profile must satisfy the following requirements: |
| |
| * All textures with valid encodings for LDR Profile must decode |
| identically using either a LDR Profile, HDR Profile, or Full Profile |
| decoder. |
| * All features included only in the HDR Profile or Full Profile must be |
| treated as reserved in the LDR Profile, and return the error color on |
| decoding. |
| * Any sequence of API calls valid for the LDR Profile must also be valid |
| for the HDR Profile or Full Profile and return identical results when |
| given a texture encoded for the LDR Profile. |
| |
| The feature subset for the LDR profile is: |
| |
| * 2D textures only, including 2D, 2D array, cube map face, |
| and cube map array texture targets. |
| * Only those block sizes listed in Table C.2.2 are supported. |
| * LDR operation mode only. |
| * Only LDR endpoint formats must be supported, namely formats |
| 0, 1, 4, 5, 6, 8, 9, 10, 12, 13. |
| * Decoding from a HDR endpoint results in the error color. |
| * Interpolation returns UNORM8 results when used in conjunction |
| with sRGB. |
| * LDR void extent blocks must be supported, but void extents |
| may not be checked." |
| |
| If only the LDR profile is supported, read this extension by striking |
| all descriptions of HDR modes and decoding algorithms. The extension |
| documents how to modify the document for some particularly tricky cases, |
| but the general rule is as described in this paragraph. |
| |
| Interactions with immutable-format texture images |
| |
| ASTC texture formats are supported by immutable-format textures only if |
| such textures are supported by the underlying implementation (e.g. |
| OpenGL 4.1 or later, OpenGL ES 3.0 or later, or earlier versions |
| supporting the GL_EXT_texture_storage extension). Otherwise, remove all |
| references to the Tex*Storage* commands from this specification. |
| |
| Interactions with texture cube map arrays |
| |
| ASTC textures are supported for the TEXTURE_CUBE_MAP_ARRAY target only |
| when cube map arrays are supported by the underlying implementation |
| (e.g. OpenGL 4.0 or later, or an OpenGL or OpenGL ES version supporting |
| an extension defining cube map arrays). Otherwise, remove all references |
| to texture cube map arrays from this specification. |
| |
| Interactions with OpenGL (all versions) |
| |
| ASTC is not supported for 1D textures and texture rectangles, and does |
| not support non-zero borders. |
| |
| Add the following error conditions to CompressedTexImage*D: |
| |
| "An INVALID_ENUM error is generated by CompressedTexImage1D if |
| <internalformat> is one of the ASTC formats. |
| |
| An INVALID_OPERATION error is generated by CompressedTexImage2D |
| and CompressedTexImage3D if <internalformat> is one of the ASTC |
| formats and <border> is non-zero." |
| |
| Add the following error conditions to CompressedTexSubImage*D: |
| |
| "An INVALID_ENUM error is generated by CompressedTex*SubImage1D |
| if the internal format of the texture is one of the ASTC formats. |
| |
| An INVALID_OPERATION error is generated by CompressedTex*SubImage2D |
| if the internal format of the texture is one of the ASTC formats |
| and <border> is non-zero." |
| |
| Add the following error conditions to TexStorage1D and TextureStorage1D: |
| |
| "An INVALID_ENUM error is generated by TexStorage1D and TextureStorage1D |
| if <format> is one of the ASTC formats." |
| |
| Add the following error conditions to TexStorage2D and TextureStorage2D |
| for versions of OpenGL that support texture rectangles: |
| |
| "An INVALID_OPERATON error is generated by TexStorage2D and |
| TextureStorage2D if <format> is one of the ASTC formats and <target> |
| is TEXTURE_RECTANGLE. |
| |
| Interactions with OpenGL 4.2 |
| |
| OpenGL 4.2 supports the feature that compressed textures can be |
| compressed online, by passing the compressed texture format enum as |
| the internal format when uploading a texture using TexImage1D, |
| TexImage2D or TexImage3D (see Section 3.9.3, Texture Image |
| Specification, subsection Encoding of Special Internal Formats). |
| |
| Due to the complexity of the ASTC compression algorithm, it is not |
| usually suitable for online use, and therefore ASTC support will be |
| limited to pre-compressed textures only. Where on-device compression |
| is required, a domain-specific limited compressor will typically |
| be used, and this is therefore not suitable for implementation in |
| the driver. |
| |
| In particular, the ASTC format specifiers will not be added to |
| Table 3.14, and thus will not be accepted by the TexImage*D |
| functions, and will not be returned by the (already deprecated) |
| COMPRESSED_TEXTURE_FORMATS query. |
| |
| Issues |
| |
| 1) Three-dimensional block ASTC formats (e.g. formats whose block depth |
| is greater than one) are not supported by these extensions. |
| |
| 2) The first release of the extension was not clear about the |
| restrictions of the LDR profile and did not document interactions |
| with cube map array textures. |
| |
| RESOLVED. This extension has been rewritten to be based on OpenGL ES |
| 3.1, to clearly document LDR restrictions, and to add cube map array |
| texture interactions. |
| |
| Revision History |
| |
| Revision 8, June 8, 2017 - Added missing interactions with OpenGL. |
| |
| Revision 7, July 14, 2016 - Clarified definition of 2D void-extent |
| blocks. |
| |
| Revision 6, March 8, 2016 - Clarified that sRGB transform is not |
| applied to Alpha channel. |
| |
| Revision 5, September 15, 2015 - fix typo in third paragraph of section |
| 8.7. |
| |
| Revision 4, June 24, 2015 - minor cleanup from feedback. Move Issues and |
| Interactions sections to the end of the document. Merge some language |
| from OpenGL ES specification edits and rename some tables to figures, |
| due to how they're generated in the core specifications. Include a |
| description of the "Cube Map Array Texture" column added to table 3.19 |
| and expand the description of how to read this document when supporting |
| only the LDR profile (Bug 13921). |
| |
| Revision 3, May 28, 2015 - rebase extension on OpenGL ES 3.1. Clarify |
| texture formats and targets supported by LDR and HDR profiles. Add cube |
| map array targets and an Interactions section defining when they are |
| supported. Add an Interactions section for immutable-format textures |
| (Bug 13921). |
| |
| Revision 2, April 28, 2015 - added CompressedTex{Sub,}Image3D to |
| commands accepting ASTC format tokens in the New Tokens section (Bug |
| 10183). |