skia / external / github.com / KhronosGroup / OpenGL-Registry / 1a3402f90f07618e381b5d6179e080f2d03241a3 / . / extensions / KHR / KHR_texture_compression_astc_hdr.txt

Name | |

KHR_texture_compression_astc_hdr | |

Name Strings | |

GL_KHR_texture_compression_astc_hdr | |

GL_KHR_texture_compression_astc_ldr | |

Contact | |

Sean Ellis (sean.ellis 'at' arm.com) | |

Jon Leech (oddhack 'at' sonic.net) | |

Contributors | |

Sean Ellis, ARM | |

Jorn Nystad, ARM | |

Tom Olson, ARM | |

Andy Pomianowski, AMD | |

Cass Everitt, NVIDIA | |

Walter Donovan, NVIDIA | |

Robert Simpson, Qualcomm | |

Maurice Ribble, Qualcomm | |

Larry Seiler, Intel | |

Daniel Koch, NVIDIA | |

Anthony Wood, Imagination Technologies | |

Jon Leech | |

Andrew Garrard, Samsung | |

IP Status | |

No known issues. | |

Notice | |

Copyright (c) 2012-2016 The Khronos Group Inc. Copyright terms at | |

http://www.khronos.org/registry/speccopyright.html | |

Specification Update Policy | |

Khronos-approved extension specifications are updated in response to | |

issues and bugs prioritized by the Khronos OpenGL and OpenGL ES Working Groups. For | |

extensions which have been promoted to a core Specification, fixes will | |

first appear in the latest version of that core Specification, and will | |

eventually be backported to the extension document. This policy is | |

described in more detail at | |

https://www.khronos.org/registry/OpenGL/docs/update_policy.php | |

Status | |

Complete. | |

Approved by the ARB on 2012/06/18. | |

Approved by the OpenGL ES WG on 2012/06/15. | |

Ratified by the Khronos Board of Promoters on 2012/07/27 (LDR profile). | |

Ratified by the Khronos Board of Promoters on 2013/09/27 (HDR profile). | |

Version | |

Version 8, June 8, 2017 | |

Number | |

ARB Extension #118 | |

OpenGL ES Extension #117 | |

Dependencies | |

Written based on the wording of the OpenGL ES 3.1 (April 29, 2015) | |

Specification | |

May be implemented against any version of OpenGL or OpenGL ES supporting | |

compressed textures. | |

Some of the functionality of these extensions is not supported if the | |

underlying implementation does not support cube map array textures. | |

Overview | |

Adaptive Scalable Texture Compression (ASTC) is a new texture | |

compression technology that offers unprecendented flexibility, while | |

producing better or comparable results than existing texture | |

compressions at all bit rates. It includes support for 2D and | |

slice-based 3D textures, with low and high dynamic range, at bitrates | |

from below 1 bit/pixel up to 8 bits/pixel in fine steps. | |

The goal of these extensions is to support the full 2D profile of the | |

ASTC texture compression specification, and allow construction of 3D | |

textures from multiple compressed 2D slices. | |

ASTC-compressed textures are handled in OpenGL ES and OpenGL by adding | |

new supported formats to the existing commands for defining and updating | |

compressed textures, and defining the interaction of the ASTC formats | |

with each texture target. | |

New Procedures and Functions | |

None | |

New Tokens | |

Accepted by the <format> parameter of CompressedTexSubImage2D and | |

CompressedTexSubImage3D, and by the <internalformat> parameter of | |

CompressedTexImage2D, CompressedTexImage3D, TexStorage2D, | |

TextureStorage2D, TexStorage3D, and TextureStorage3D: | |

COMPRESSED_RGBA_ASTC_4x4_KHR 0x93B0 | |

COMPRESSED_RGBA_ASTC_5x4_KHR 0x93B1 | |

COMPRESSED_RGBA_ASTC_5x5_KHR 0x93B2 | |

COMPRESSED_RGBA_ASTC_6x5_KHR 0x93B3 | |

COMPRESSED_RGBA_ASTC_6x6_KHR 0x93B4 | |

COMPRESSED_RGBA_ASTC_8x5_KHR 0x93B5 | |

COMPRESSED_RGBA_ASTC_8x6_KHR 0x93B6 | |

COMPRESSED_RGBA_ASTC_8x8_KHR 0x93B7 | |

COMPRESSED_RGBA_ASTC_10x5_KHR 0x93B8 | |

COMPRESSED_RGBA_ASTC_10x6_KHR 0x93B9 | |

COMPRESSED_RGBA_ASTC_10x8_KHR 0x93BA | |

COMPRESSED_RGBA_ASTC_10x10_KHR 0x93BB | |

COMPRESSED_RGBA_ASTC_12x10_KHR 0x93BC | |

COMPRESSED_RGBA_ASTC_12x12_KHR 0x93BD | |

COMPRESSED_SRGB8_ALPHA8_ASTC_4x4_KHR 0x93D0 | |

COMPRESSED_SRGB8_ALPHA8_ASTC_5x4_KHR 0x93D1 | |

COMPRESSED_SRGB8_ALPHA8_ASTC_5x5_KHR 0x93D2 | |

COMPRESSED_SRGB8_ALPHA8_ASTC_6x5_KHR 0x93D3 | |

COMPRESSED_SRGB8_ALPHA8_ASTC_6x6_KHR 0x93D4 | |

COMPRESSED_SRGB8_ALPHA8_ASTC_8x5_KHR 0x93D5 | |

COMPRESSED_SRGB8_ALPHA8_ASTC_8x6_KHR 0x93D6 | |

COMPRESSED_SRGB8_ALPHA8_ASTC_8x8_KHR 0x93D7 | |

COMPRESSED_SRGB8_ALPHA8_ASTC_10x5_KHR 0x93D8 | |

COMPRESSED_SRGB8_ALPHA8_ASTC_10x6_KHR 0x93D9 | |

COMPRESSED_SRGB8_ALPHA8_ASTC_10x8_KHR 0x93DA | |

COMPRESSED_SRGB8_ALPHA8_ASTC_10x10_KHR 0x93DB | |

COMPRESSED_SRGB8_ALPHA8_ASTC_12x10_KHR 0x93DC | |

COMPRESSED_SRGB8_ALPHA8_ASTC_12x12_KHR 0x93DD | |

If extension "EXT_texture_storage" is supported, these tokens are also | |

accepted by TexStorage2DEXT, TextureStorage2DEXT, TexStorage3DEXT and | |

TextureStorage3DEXT. | |

Additions to Chapter 8 of the OpenGL ES 3.1 Specification (Textures and Samplers) | |

Add to Section 8.7 Compressed Texture Images: | |

Modify table 8.19 (Compressed internal formats) to add all the ASTC | |

format tokens in the New Tokens section. The "Base Internal Format" | |

column is RGBA for all ASTC formats. | |

Add a new column "Block Width x Height", which is 4x4 for all non-ASTC | |

formats in the table, and matches the size in the token name for ASTC | |

formats (e.g. COMPRESSED_SRGB8_ALPHA8_ASTC_10x8_KHR has a block size of | |

10 x 8). | |

Add a second new column "3D Tex." which is empty for all non-ASTC | |

formats. If only the LDR profile is supported by the implementation, | |

this column is also empty for all ASTC formats. If both the LDR and HDR | |

profiles are supported, this column is checked for all ASTC formats. | |

Add a third new column "Cube Map Array Tex." which is empty for all | |

non-ASTC formats, and checked for all ASTC formats. | |

Append to the table caption: | |

"The "Block Size" column specifies the compressed block size of the | |

format. Modifying compressed images along aligned block boundaries is | |

possible, as described in this section. The "3D Tex." and "Cube Map | |

Array Tex." columns determine if 3D images composed of compressed 2D | |

slices, and cube map array textures respectively can be specified using | |

CompressedTexImage3D." | |

Append to the paragraph at the bottom of p. 168: | |

"If <internalformat> is one of the specific ... supports only | |

two-dimensional images. However, if the "3D Tex." column of table 8.19 | |

is checked, CompressedTexImage3D will accept a three-dimensional image | |

specified as an array of compressed data consisting of multiple rows of | |

compressed blocks laid out as described in section 8.5." | |

Modify the second and third errors in the Errors section for | |

CompressedTexImage[2d]D on p. 169, and add a new error: | |

"An INVALID_VALUE error is generated by | |

* CompressedTexImage2D if <target> is | |

one of the cube map face targets from table 8.21, and | |

* CompressedTexImage3D if <target> is TEXTURE_CUBE_MAP_ARRAY, | |

and <width> and <height> are not equal. | |

An INVALID_OPERATION error is generated by CompressedTexImage3D if | |

<internalformat> is one of the the formats in table 8.19 and <target> is | |

not TEXTURE_2D_ARRAY, TEXTURE_CUBE_MAP_ARRAY, or TEXTURE_3D. | |

An INVALID_OPERATION error is generated by CompressedTexImage3D if | |

<target> is TEXTURE_CUBE_MAP_ARRAY and the "Cube Map Array" | |

column of table 8.19 is *not* checked, or if <target> is | |

TEXTURE_3D and the "3D Tex." column of table 8.19 is *not* checked" | |

Modify the fifth and sixth paragraphs on p. 170: | |

"Since these specific compressed formats are easily edited along texel | |

block boundaries, the limitations on subimage location and size are | |

relaxed for CompressedTexSubImage2D and CompressedTexSubImage3D. | |

The block width and height varies for different formats, as described in | |

table 8.19. The contents of any block of texels of a compressed texture | |

image in these specific compressed formats that does not intersect the | |

area being modified are preserved during CompressedTexSubImage* calls." | |

Modify the second error in the Errors section for | |

CompressedTexSubImage[23]D on p. 170, and add a new error: | |

"An INVALID_OPERATION error is generated by CompressedTexSubImage3D if | |

<format> is one of the formats in table 8.19 and <target> is not | |

TEXTURE_2D_ARRAY, TEXTURE_CUBE_MAP_ARRAY, or TEXTURE_3D. | |

An INVALID_OPERATION error is generated by CompressedTexSubImage3D if | |

<target> is TEXTURE_CUBE_MAP_ARRAY and the "Cube Map Array" column of | |

table 8.19 is *not* checked, or if <target> is TEXTURE_3D and the "3D | |

Tex." column of table 8.19 is *not* checked" | |

Modify the final error in the same section, on p. 171: | |

"An INVALID_OPERATION error is generated if format is one of the formats | |

in table 8.19 and any of the following conditions occurs. The block | |

width and height refer to the values in the corresponding column of the | |

table. | |

* <width> is not a multiple of the format's block width, and <width> + | |

<xoffset> is not equal to the value of TEXTURE_WIDTH. | |

* height is not a multiple of the format's block height, and <height> | |

+ <yoffset> is not equal to the value of TEXTURE_HEIGHT. | |

* <xoffset> or <yoffset> is not a multiple of the block width or | |

height, respectively." | |

Modify table 8.24 (sRGB texture internal formats) to add all of the | |

COMPRESSED_SRGB8_ALPHA8_ASTC_*_KHR formats defined above. | |

Additions to Appendix C of the OpenGL ES 3.1 Specification (Compressed | |

Texture Image Formats) | |

Add a new sub-section on ASTC image formats, as follows: | |

"C.2 ASTC Compressed Texture Image Formats | |

========================================= | |

C.2.1 What is ASTC? | |

--------------------- | |

ASTC stands for Adaptive Scalable Texture Compression. | |

The ASTC formats form a family of related compressed texture image | |

formats. They are all derived from a common set of definitions. | |

ASTC textures may be encoded using either high or low dynamic range, | |

corresponding to the "HDR profile" and "LDR profile". Support for the | |

HDR profile is indicated by the "GL_KHR_texture_compression_astc_hdr" | |

extension string, and support for the LDR profile is indicated by the | |

"GL_KHR_texture_compression_astc_ldr" extension string. | |

The LDR profile supports two-dimensional images for texture targets | |

TEXTURE_2D. TEXTURE_2D_ARRAY, the six texture cube map face targets, and | |

TEXTURE_CUBE_MAP_ARRAY. These images may optionally be specified using | |

the sRGB color space for the RGB channels. | |

The HDR profile is a superset of the LDR profile, and also supports | |

texture target TEXTURE_3D for images made up of multiple two-dimensional | |

slices of compressed data. HDR images may be a mix of low and high | |

dynamic range data. If the HDR profile is supported, the LDR profile and | |

its extension string must also be supported. | |

ASTC textures may be encoded as 1, 2, 3 or 4 components, but they are | |

all decoded into RGBA. | |

Different ASTC formats have different block sizes, specified as part of | |

the name of the format token passed to CompressedImage2D and its related | |

functions, and in table 8.19. | |

Additional ASTC formats (the "Full profile") exist which support 3D data | |

specified as compressed 3D blocks. However, such formats are not defined | |

by either the LDR or HDR profiles, and are not described in this | |

specification. | |

C.2.2 Design Goals | |

-------------------- | |

The design goals for the format are as follows: | |

* Random access. This is a must for any texture compression format. | |

* Bit exact decode. This is a must for conformance testing and | |

reproducibility. | |

* Suitable for mobile use. The format should be suitable for both | |

desktop and mobile GPU environments. It should be low bandwidth | |

and low in area. | |

* Flexible choice of bit rate. Current formats only offer a few bit | |

rates, leaving content developers with only coarse control over | |

the size/quality tradeoff. | |

* Scalable and long-lived. The format should support existing R, RG, | |

RGB and RGBA image types, and also have high "headroom", allowing | |

continuing use for several years and the ability to innovate in | |

encoders. Part of this is the choice to include HDR and 3D. | |

* Feature orthogonality. The choices for the various features of the | |

format are all orthogonal to each other. This has three effects: | |

first, it allows a large, flexible configuration space; second, | |

it makes that space easier to understand; and third, it makes | |

verification easier. | |

* Best in class at given bit rate. It should beat or match the current | |

best in class for peak signal-to-noise ratio (PSNR) at all bit rates. | |

* Fast decode. Texel throughput for a cached texture should be one | |

texel decode per clock cycle per decoder. Parallel decoding of several | |

texels from the same block should be possible at incremental cost. | |

* Low bandwidth. The encoding scheme should ensure that memory access | |

is kept to a minimum, cache reuse is high and memory bandwidth for | |

the format is low. | |

* Low area. It must occupy comparable die size to competing formats. | |

C.2.3 Basic Concepts | |

---------------------- | |

ASTC is a block-based lossy compression format. The compressed image | |

is divided into a number of blocks of uniform size, which makes it | |

possible to quickly determine which block a given texel resides in. | |

Each block has a fixed memory footprint of 128 bits, but these bits | |

can represent varying numbers of texels (the block "footprint"). | |

Block footprint sizes are not confined to powers-of-two, and are | |

also not confined to be square. They may be 2D, in which case the | |

block dimensions range from 4 to 12 texels, or 3D, in which case | |

the block dimensions range from 3 to 6 texels. | |

Decoding one texel requires only the data from a single block. This | |

simplifies cache design, reduces bandwidth and improves encoder throughput. | |

C.2.4 Block Encoding | |

---------------------- | |

To understand how the blocks are stored and decoded, it is useful to start | |

with a simple example, and then introduce additional features. | |

The simplest block encoding starts by defining two color "endpoints". The | |

endpoints define two colors, and a number of additional colors are generated | |

by interpolating between them. We can define these colors using 1, 2, 3, | |

or 4 components (usually corresponding to R, RG, RGB and RGBA textures), | |

and using low or high dynamic range. | |

We then store a color interpolant weight for each texel in the image, which | |

specifies how to calculate the color to use. From this, a weighted average | |

of the two endpoint colors is used to generate the intermediate color, | |

which is the returned color for this texel. | |

There are several different ways of specifying the endpoint colors, and the | |

weights, but once they have been defined, calculation of the texel colors | |

proceeds identically for all of them. Each block is free to choose whichever | |

encoding scheme best represents its color endpoints, within the constraint | |

that all the data fits within the 128 bit block. | |

For blocks which have a large number of texels (e.g. a 12x12 block), there is | |

not enough space to explicitly store a weight for every texel. In this case, | |

a sparser grid with fewer weights is stored, and interpolation is used to | |

determine the effective weight to be used for each texel position. This allows | |

very low bit rates to be used with acceptable quality. This can also be used | |

to more efficiently encode blocks with low detail, or with strong vertical | |

or horizontal features. | |

For blocks which have a mixture of disparate colors, a single line in the | |

color space is not a good fit to the colors of the pixels in the original | |

image. It is therefore possible to partition the texels into multiple sets, | |

the pixels within each set having similar colors. For each of these | |

"partitions", we specify separate endpoint pairs, and choose which pair of | |

endpoints to use for a particular texel by looking up the partition index | |

from a partitioning pattern table. In ASTC, this partition table is actually | |

implemented as a function. | |

The endpoint encoding for each partition is independent. | |

For blocks which have uncorrelated channels - for example an image with a | |

transparency mask, or an image used as a normal map - it may be necessary | |

to specify two weights for each texel. Interpolation between the components | |

of the endpoint colors can then proceed independently for each "plane" of | |

the image. The assignment of channels to planes is selectable. | |

Since each of the above options is independent, it is possible to specify any | |

combination of channels, endpoint color encoding, weight encoding, | |

interpolation, multiple partitions and single or dual planes. | |

Since these values are specified per block, it is important that they are | |

represented with the minimum possible number of bits. As a result, these | |

values are packed together in ways which can be difficult to read, but | |

which are nevertheless highly amenable to hardware decode. | |

All of the values used as weights and color endpoint values can be specified | |

with a variable number of bits. The encoding scheme used allows a fine- | |

grained tradeoff between weight bits and color endpoint bits using "integer | |

sequence encoding". This can pack adjacent values together, allowing us to | |

use fractional numbers of bits per value. | |

Finally, a block may be just a single color. This is a so-called "void | |

extent block" and has a special coding which also allows it to identify | |

nearby regions of single color. This may be used to short-circuit fetching of | |

what would be identical blocks, and further reduce memory bandwidth. | |

C.2.5 LDR and HDR Modes | |

------------------------- | |

The decoding process for LDR content can be simplified if it is known in | |

advance that sRGB output is required. This selection is therefore included | |

as part of the global configuration. | |

The two modes differ in various ways. | |

----------------------------------------------------------------------------- | |

Operation LDR Mode HDR Mode | |

----------------------------------------------------------------------------- | |

Returned value Vector of FP16 values, Vector of FP16 values | |

or Vector of UNORM8 values. | |

sRGB compatible Yes No | |

LDR endpoint 16 bits, or 16 bits | |

decoding precision 8 bits for sRGB | |

HDR endpoint mode Error color As decoded | |

results | |

Error results Error color Vector of NaNs (0xFFFF) | |

----------------------------------------------------------------------------- | |

Table C.2.1 - Differences Between LDR and HDR Modes | |

The error color is opaque fully-saturated magenta | |

(R,G,B,A = 0xFF, 0x00, 0xFF, 0xFF). This has been chosen as it is much more | |

noticeable than black or white, and occurs far less often in valid images. | |

For linear RGB decode, the error color may be either opaque fully-saturated | |

magenta (R,G,B,A = 1.0, 0.0, 1.0, 1.0) or a vector of four NaNs | |

(R,G,B,A = NaN, NaN, NaN, NaN). In the latter case, the recommended NaN | |

value returned is 0xFFFF. | |

The error color is returned as an informative response to invalid | |

conditions, including invalid block encodings or use of reserved endpoint | |

modes. | |

Future, forward-compatible extensions to KHR_texture_compression_astc | |

may define valid interpretations of these conditions, which will decode to | |

some other color. Therefore, encoders and applications must not rely on | |

invalid encodings as a way of generating the error color. | |

C.2.6 Configuration Summary | |

----------------------------- | |

The global configuration data for the format is as follows: | |

* Block dimension (always 2D for both LDR and HDR profiles) | |

* Block footprint size | |

* sRGB output enabled or not | |

The data specified per block is as follows: | |

* Texel weight grid size | |

* Texel weight range | |

* Texel weight values | |

* Number of partitions | |

* Partition pattern index | |

* Color endpoint modes (includes LDR or HDR selection) | |

* Color endpoint data | |

* Number of planes | |

* Plane-to-channel assignment | |

C.2.7 Decode Procedure | |

------------------------ | |

To decode one texel: | |

Find block containing texel | |

Read block mode | |

If void-extent block, store void extent and immediately return single | |

color (optimization) | |

For each plane in image | |

If block mode requires infill | |

Find and decode stored weights adjacent to texel, unquantize and | |

interpolate | |

Else | |

Find and decode weight for texel, and unquantize | |

Read number of partitions | |

If number of partitions > 1 | |

Read partition table pattern index | |

Look up partition number from pattern | |

Read color endpoint mode and endpoint data for selected partition | |

Unquantize color endpoints | |

Interpolate color endpoints using weight (or weights in dual-plane mode) | |

Return interpolated color | |

C.2.8 Block Determination and Bit Rates | |

The block footprint is a global setting for any given texture, and is | |

therefore not encoded in the individual blocks. | |

For 2D textures, the block footprint's width and height are selectable | |

from a number of predefined sizes, namely 4, 5, 6, 8, 10 and 12 pixels. | |

For square and nearly-square blocks, this gives the following bit rates: | |

------------------------------------- | |

Footprint | |

Width Height Bit Rate Increment | |

------------------------------------- | |

4 4 8.00 125% | |

5 4 6.40 125% | |

5 5 5.12 120% | |

6 5 4.27 120% | |

6 6 3.56 114% | |

8 5 3.20 120% | |

8 6 2.67 105% | |

10 5 2.56 120% | |

10 6 2.13 107% | |

8 8 2.00 125% | |

10 8 1.60 125% | |

10 10 1.28 120% | |

12 10 1.07 120% | |

12 12 0.89 | |

------------------------------------- | |

Table C.2.2 - 2D Footprint and Bit Rates | |

The block footprint is shown as <width>x<height> in the format name. For | |

example, the format COMPRESSED_RGBA_ASTC_8x6_KHR specifies an image with | |

a block width of 8 texels, and a block height of 6 texels. | |

The "Increment" column indicates the ratio of bit rate against the next | |

lower available rate. A consistent value in this column indicates an even | |

spread of bit rates. | |

The HDR profile supports only those block footprints listed in Table | |

C.2.2. Other block sizes are not supported. | |

For images which are not an integer multiple of the block size, additional | |

texels are added to the edges with maximum X and Y. These texels may be | |

any color, as they will not be accessed. | |

Although these are not all powers of two, it is possible to calculate block | |

addresses and pixel addresses within the block, for legal image sizes, | |

without undue complexity. | |

Given a 2D image which is W x H pixels in size, with block size | |

w x h, the size of the image in blocks is: | |

Bw = ceiling(W/w) | |

Bh = ceiling(H/h) | |

For a 3D image, each 2D slice is a single texel thick, so that for an | |

image which is W x H x D pixels in size, with block size w x h, the size | |

of the image in blocks is: | |

Bw = ceiling(W/w) | |

Bh = ceiling(H/h) | |

Bd = D | |

C.2.9 Block Layout | |

-------------------- | |

Each block in the image is stored as a single 128-bit block in memory. These | |

blocks are laid out in raster order, starting with the block at (0,0,0), then | |

ordered sequentially by X, Y and finally Z (if present). They are aligned to | |

128-bit boundaries in memory. | |

The bits in the block are labeled in little-endian order - the byte at the | |

lowest address contains bits 0..7. Bit 0 is the least significant bit in the | |

byte. | |

Each block has the same basic layout, as shown in figure C.1. | |

127 126 125 124 123 122 121 120 119 118 117 116 115 114 113 112 | |

-------------------------------------------------------------- | |

| Texel Weight Data (variable width) Fill direction -> | |

-------------------------------------------------------------- | |

111 110 109 108 107 106 105 104 103 102 101 100 99 98 97 96 | |

-------------------------------------------------------------- | |

Texel Weight Data | |

-------------------------------------------------------------- | |

95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 | |

-------------------------------------------------------------- | |

Texel Weight Data | |

-------------------------------------------------------------- | |

79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 | |

-------------------------------------------------------------- | |

Texel Weight Data | |

-------------------------------------------------------------- | |

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 | |

-------------------------------------------------------------- | |

: More config data : | |

-------------------------------------------------------------- | |

47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 | |

-------------------------------------------------------------- | |

<-Fill direction Color Endpoint Data | |

-------------------------------------------------------------- | |

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 | |

-------------------------------------------------------------- | |

: Extra configuration data | |

-------------------------------------------------------------- | |

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 | |

-------------------------------------------------------------- | |

Extra | Part | Block mode | | |

-------------------------------------------------------------- | |

Figure C.1 - Block Layout Overview | |

Dotted partition lines indicate that the split position is not fixed. | |

The "Block mode" field specifies how the Texel Weight Data is encoded. | |

The "Part" field specifies the number of partitions, minus one. If dual | |

plane mode is enabled, the number of partitions must be 3 or fewer. | |

If 4 partitions are specified, the error value is returned for all | |

texels in the block. | |

The size and layout of the extra configuration data depends on the | |

number of partitions, and the number of planes in the image, as shown in | |

figures C.2 and C.3 (only the bottom 32 bits are shown): | |

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 | |

-------------------------------------------------------------- | |

<- Color endpoint data |CEM | |

-------------------------------------------------------------- | |

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 | |

-------------------------------------------------------------- | |

CEM | 0 0 | Block Mode | | |

-------------------------------------------------------------- | |

Figure C.2 - Single-partition Block Layout | |

CEM is the color endpoint mode field, which determines how the Color | |

Endpoint Data is encoded. | |

If dual-plane mode is active, the color component selector bits appear | |

directly below the weight bits. | |

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 | |

-------------------------------------------------------------- | |

| CEM | Partition Index | |

-------------------------------------------------------------- | |

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 | |

-------------------------------------------------------------- | |

Partition Index | Block Mode | | |

-------------------------------------------------------------- | |

Figure C.3 - Multi-partition Block Layout | |

The Partition Index field specifies which partition layout to use. CEM is | |

the first 6 bits of color endpoint mode information for the various | |

partitions. For modes which require more than 6 bits of CEM data, the | |

additional bits appear at a variable position directly beneath the texel | |

weight data. | |

If dual-plane mode is active, the color component selector bits then appear | |

directly below the additional CEM bits. | |

The final special case is that if bits [8:0] of the block are "111111100", | |

then the block is a void-extent block, which has a separate encoding | |

described in section C.2.22. | |

C.2.10 Block Mode | |

------------------ | |

The Block Mode field specifies the width, height and depth of the grid of | |

weights, what range of values they use, and whether dual weight planes are | |

present. Since some these are not represented using powers of two (there | |

are 12 possible weight widths, for example), and not all combinations are | |

allowed, this is not a simple bit packing. However, it can be unpacked | |

quickly in hardware. | |

The weight ranges are encoded using a 3 bit value R, which is interpreted | |

together with a precision bit H, as follows: | |

Low Precision Range (H=0) High Precision Range (H=1) | |

R Weight Range Trits Quints Bits Weight Range Trits Quints Bits | |

------------------------------------------------------------------------- | |

000 Invalid Invalid | |

001 Invalid Invalid | |

010 0..1 1 0..9 1 1 | |

011 0..2 1 0..11 1 2 | |

100 0..3 2 0..15 4 | |

101 0..4 1 0..19 1 2 | |

110 0..5 1 1 0..23 1 3 | |

111 0..7 3 0..31 5 | |

------------------------------------------------------------------------- | |

Table C.2.7 - Weight Range Encodings | |

Each weight value is encoded using the specified number of Trits, Quints | |

and Bits. The details of this encoding can be found in Section C.3.12 - | |

Integer Sequence Encoding. | |

For 2D blocks, the Block Mode field is laid out as follows: | |

------------------------------------------------------------------------- | |

10 9 8 7 6 5 4 3 2 1 0 Width Height Notes | |

------------------------------------------------------------------------- | |

D H B A R0 0 0 R2 R1 B+4 A+2 | |

D H B A R0 0 1 R2 R1 B+8 A+2 | |

D H B A R0 1 0 R2 R1 A+2 B+8 | |

D H 0 B A R0 1 1 R2 R1 A+2 B+6 | |

D H 1 B A R0 1 1 R2 R1 B+2 A+2 | |

D H 0 0 A R0 R2 R1 0 0 12 A+2 | |

D H 0 1 A R0 R2 R1 0 0 A+2 12 | |

D H 1 1 0 0 R0 R2 R1 0 0 6 10 | |

D H 1 1 0 1 R0 R2 R1 0 0 10 6 | |

B 1 0 A R0 R2 R1 0 0 A+6 B+6 D=0, H=0 | |

x x 1 1 1 1 1 1 1 0 0 - - Void-extent | |

x x 1 1 1 x x x x 0 0 - - Reserved* | |

x x x x x x x 0 0 0 0 - - Reserved | |

------------------------------------------------------------------------- | |

Table C.2.8 - 2D Block Mode Layout | |

Note that, due to the encoding of the R field, as described in the | |

previous page, bits R2 and R1 cannot both be zero, which disambiguates | |

the first five rows from the rest of the table. | |

Bit positions with a value of x are ignored for purposes of determining | |

if a block is a void-extent block or reserved, but may have defined | |

encodings for specific void-extent blocks. | |

The penultimate row of the table is reserved only if bits [5:2] are not | |

all 1, in which case it encodes a void-extent block (as shown in the | |

previous row). | |

The D bit is set to indicate dual-plane mode. In this mode, the maximum | |

allowed number of partitions is 3. | |

The penultimate row of the table is reserved only if bits [4:2] are not | |

all 1, in which case it encodes a void-extent block (as shown in the | |

previous row). | |

The size of the grid in each dimension must be less than or equal to | |

the corresponding dimension of the block footprint. If the grid size | |

is greater than the footprint dimension in any axis, then this is an | |

illegal block encoding and all texels will decode to the error color. | |

C.2.11 Color Endpoint Mode | |

--------------------------- | |

In single-partition mode, the Color Endpoint Mode (CEM) field stores one | |

of 16 possible values. Each of these specifies how many raw data values | |

are encoded, and how to convert these raw values into two RGBA color | |

endpoints. They can be summarized as follows: | |

--------------------------------------------- | |

CEM Description Class | |

--------------------------------------------- | |

0 LDR Luminance, direct 0 | |

1 LDR Luminance, base+offset 0 | |

2 HDR Luminance, large range 0 | |

3 HDR Luminance, small range 0 | |

4 LDR Luminance+Alpha, direct 1 | |

5 LDR Luminance+Alpha, base+offset 1 | |

6 LDR RGB, base+scale 1 | |

7 HDR RGB, base+scale 1 | |

8 LDR RGB, direct 2 | |

9 LDR RGB, base+offset 2 | |

10 LDR RGB, base+scale plus two A 2 | |

11 HDR RGB, direct 2 | |

12 LDR RGBA, direct 3 | |

13 LDR RGBA, base+offset 3 | |

14 HDR RGB, direct + LDR Alpha 3 | |

15 HDR RGB, direct + HDR Alpha 3 | |

--------------------------------------------- | |

Table C.2.10 - Color Endpoint Modes. | |

[[ If the HDR profile is not implemented, remove from table C.2.10 | |

all rows whose description starts with "HDR", and add to the | |

caption: ]] | |

Modes not described in the CEM column are reserved for HDR modes, and | |

will generate errors in an unextended OpenGL ES implementation. | |

In multi-partition mode, the CEM field is of variable width, from 6 to 14 | |

bits. The lowest 2 bits of the CEM field specify how the endpoint mode | |

for each partition is calculated: | |

---------------------------------------------------- | |

Value Meaning | |

---------------------------------------------------- | |

00 All color endpoint pairs are of the same type. | |

A full 4-bit CEM is stored in block bits [28:25] | |

and is used for all partitions. | |

01 All endpoint pairs are of class 0 or 1. | |

10 All endpoint pairs are of class 1 or 2. | |

11 All endpoint pairs are of class 2 or 3. | |

---------------------------------------------------- | |

Table C.2.11 - Multi-Partition Color Endpoint Modes | |

If the CEM selector value in bits [24:23] is not 00, | |

then data layout is as follows: | |

--------------------------------------------------- | |

Part n m l k j i h g | |

------------------------------------------ | |

2 ... Weight : M1 : ... | |

------------------------------------------ | |

3 ... Weight : M2 : M1 :M0 : ... | |

------------------------------------------ | |

4 ... Weight : M3 : M2 : M1 : M0 : ... | |

------------------------------------------ | |

Part 28 27 26 25 24 23 | |

---------------------- | |

2 | M0 |C1 |C0 | CEM | | |

---------------------- | |

3 |M0 |C2 |C1 |C0 | CEM | | |

---------------------- | |

4 |C3 |C2 |C1 |C0 | CEM | | |

---------------------- | |

--------------------------------------------------- | |

Figure C.4 - Multi-Partition Color Endpoint Modes | |

In this view, each partition i has two fields. C<i> is the class | |

selector bit, choosing between the two possible CEM classes (0 indicates | |

the lower of the two classes), and M<i> is a two-bit field specifying | |

the low bits of the color endpoint mode within that class. The | |

additional bits appear at a variable bit position, immediately below the | |

texel weight data. | |

The ranges used for the data values are not explicitly specified. | |

Instead, they are derived from the number of available bits remaining | |

after the configuration data and weight data have been specified. | |

Details of the decoding procedure for Color Endpoints can be found in | |

section C.2.13. | |

C.2.12 Integer Sequence Encoding | |

--------------------------------- | |

Both the weight data and the endpoint color data are variable width, and | |

are specified using a sequence of integer values. The range of each | |

value in a sequence (e.g. a color weight) is constrained. | |

Since it is often the case that the most efficient range for these | |

values is not a power of two, each value sequence is encoded using a | |

technique known as "integer sequence encoding". This allows efficient, | |

hardware-friendly packing and unpacking of values with non-power-of-two | |

ranges. | |

In a sequence, each value has an identical range. The range is specified | |

in one of the following forms: | |

Value range MSB encoding LSB encoding Value Block Packed | |

block size | |

----------- ------------ ------------ ----------- ----- ---------- | |

0 .. 2^n-1 - n bit value m 1 n | |

m (n <= 8) | |

0 .. (3 * 2^n)-1 Base-3 "trit" n bit value t * 2^n + m 5 8 + 5*n | |

value t m (n <= 6) | |

0 .. (5 * 2^n)-1 Base-5 "quint" n bit value q * 2^n + m 3 7 + 3*n | |

value q m (n <= 5) | |

------------------------------------------- | |

Table C.2.13 -Encoding for Different Ranges | |

Since 3^5 is 243, it is possible to pack five trits into 8 bits(which has | |

256 possible values), so a trit can effectively be encoded as 1.6 bits. | |

Similarly, since 5^3 is 125, it is possible to pack three quints into | |

7 bits (which has 128 possible values), so a quint can be encoded as | |

2.33 bits. | |

The encoding scheme packs the trits or quints, and then interleaves the n | |

additional bits in positions that satisfy the requirements of an | |

arbitrary length stream. This makes it possible to correctly specify | |

lists of values whose length is not an integer multiple of 3 or 5 values. | |

It also makes it possible to easily select a value at random within the stream. | |

If there are insufficient bits in the stream to fill the final block, then | |

unused (higher order) bits are assumed to be 0 when decoding. | |

To decode the bits for value number i in a sequence of bits b, both | |

indexed from 0, perform the following: | |

If the range is encoded as n bits per value, then the value is bits | |

b[i*n+n-1:i*n] - a simple multiplexing operation. | |

If the range is encoded using a trit, then each block contains 5 values | |

(v0 to v4), each of which contains a trit (t0 to t4) and a corresponding | |

LSB value (m0 to m4). The first bit of the packed block is bit | |

floor(i/5)*(8+5*n). The bits in the block are packed as follows | |

(in this example, n is 4): | |

27 26 25 24 23 22 21 20 19 18 17 16 | |

----------------------------------------------- | |

|T7 | m4 |T6 T5 | m3 |T4 | | |

----------------------------------------------- | |

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 | |

-------------------------------------------------------------- | |

| m2 |T3 T2 | m1 |T1 T0 | m0 | | |

-------------------------------------------------------------- | |

Figure C.5 - Trit-based Packing | |

The five trits t0 to t4 are obtained by bit manipulations of the 8 bits | |

T[7:0] as follows: | |

if T[4:2] = 111 | |

C = { T[7:5], T[1:0] }; t4 = t3 = 2 | |

else | |

C = T[4:0] | |

if T[6:5] = 11 | |

t4 = 2; t3 = T[7] | |

else | |

t4 = T[7]; t3 = T[6:5] | |

if C[1:0] = 11 | |

t2 = 2; t1 = C[4]; t0 = { C[3], C[2]&~C[3] } | |

else if C[3:2] = 11 | |

t2 = 2; t1 = 2; t0 = C[1:0] | |

else | |

t2 = C[4]; t1 = C[3:2]; t0 = { C[1], C[0]&~C[1] } | |

If the range is encoded using a quint, then each block contains 3 values | |

(v0 to v2), each of which contains a quint (q0 to q2) and a corresponding | |

LSB value (m0 to m2). The first bit of the packed block is bit | |

floor(i/3)*(7+3*n). | |

The bits in the block are packed as follows (in this example, n is 4): | |

18 17 16 | |

----------- | |

|Q6 Q5 | m2 | |

----------- | |

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 | |

--------------------------------------------------------------- | |

m2 |Q4 Q3 | m1 |Q2 Q1 Q0 | m0 | | |

--------------------------------------------------------------- | |

Figure C.6 - Quint-based Packing | |

The three quints q0 to q2 are obtained by bit manipulations of the 7 bits | |

Q[6:0] as follows: | |

if Q[2:1] = 11 and Q[6:5] = 00 | |

q2 = { Q[0], Q[4]&~Q[0], Q[3]&~Q[0] }; q1 = q0 = 4 | |

else | |

if Q[2:1] = 11 | |

q2 = 4; C = { Q[4:3], ~Q[6:5], Q[0] } | |

else | |

q2 = Q[6:5]; C = Q[4:0] | |

if C[2:0] = 101 | |

q1 = 4; q0 = C[4:3] | |

else | |

q1 = C[4:3]; q0 = C[2:0] | |

Both these procedures ensure a valid decoding for all 128 possible values | |

(even though a few are duplicates). They can also be implemented | |

efficiently in software using small tables. | |

Encoding methods are not specified here, although table-based mechanisms | |

work well. | |

C.2.13 Endpoint Unquantization | |

------------------------------- | |

Each color endpoint is specified as a sequence of integers in a given | |

range. These values are packed using integer sequence encoding, as a | |

stream of bits stored from just above the configuration data, and | |

growing upwards. | |

Once unpacked, the values must be unquantized from their storage range, | |

returning them to a standard range of 0..255. | |

For bit-only representations, this is simple bit replication from the | |

most significant bit of the value. | |

For trit or quint-based representations, this involves a set of bit | |

manipulations and adjustments to avoid the expense of full-width | |

multipliers. This procedure ensures correct scaling, but scrambles | |

the order of the decoded values relative to the encoded values. | |

This must be compensated for using a table in the encoder. | |

The initial inputs to the procedure are denoted A (9 bits), B (9 bits), | |

C (9 bits) and D (3 bits) and are decoded using the range as follows: | |

--------------------------------------------------------------- | |

Range T Q B Bits A B C D | |

--------------------------------------------------------------- | |

0..5 1 1 a aaaaaaaaa 000000000 204 Trit value | |

0..9 1 1 a aaaaaaaaa 000000000 113 Quint value | |

0..11 1 2 ba aaaaaaaaa b000b0bb0 93 Trit value | |

0..19 1 2 ba aaaaaaaaa b0000bb00 54 Quint value | |

0..23 1 3 cba aaaaaaaaa cb000cbcb 44 Trit value | |

0..39 1 3 cba aaaaaaaaa cb0000cbc 26 Quint value | |

0..47 1 4 dcba aaaaaaaaa dcb000dcb 22 Trit value | |

0..79 1 4 dcba aaaaaaaaa dcb0000dc 13 Quint value | |

0..95 1 5 edcba aaaaaaaaa edcb000ed 11 Trit value | |

0..159 1 5 edcba aaaaaaaaa edcb0000e 6 Quint value | |

0..191 1 6 fedcba aaaaaaaaa fedcb000f 5 Trit value | |

--------------------------------------------------------------- | |

Table C.2.16 - Color Unquantization Parameters | |

These are then processed as follows: | |

T = D * C + B; | |

T = T ^ A; | |

T = (A & 0x80) | (T >> 2); | |

Note that the multiply in the first line is nearly trivial as it only | |

needs to multiply by 0, 1, 2, 3 or 4. | |

C.2.14 LDR Endpoint Decoding | |

----------------------------- | |

The decoding method used depends on the Color Endpoint Mode (CEM) field, | |

which specifies how many values are used to represent the endpoint. | |

The CEM field also specifies how to take the n unquantized color endpoint | |

values v0 to v[n-1] and convert them into two RGBA color endpoints e0 | |

and e1. | |

The HDR Modes are more complex and do not fit neatly into this section. | |

They are documented in following section. | |

The methods can be summarized as follows. | |

------------------------------------------------- | |

CEM Range Description n | |

------------------------------------------------- | |

0 LDR Luminance, direct 2 | |

1 LDR Luminance, base+offset 2 | |

2 HDR Luminance, large range 2 | |

3 HDR Luminance, small range 2 | |

4 LDR Luminance+Alpha, direct 4 | |

5 LDR Luminance+Alpha, base+offset 4 | |

6 LDR RGB, base+scale 4 | |

7 HDR RGB, base+scale 4 | |

8 LDR RGB, direct 6 | |

9 LDR RGB, base+offset 6 | |

10 LDR RGB, base+scale plus two A 6 | |

11 HDR RGB 6 | |

12 LDR RGBA, direct 8 | |

13 LDR RGBA, base+offset 8 | |

14 HDR RGB + LDR Alpha 8 | |

15 HDR RGB + HDR Alpha 8 | |

------------------------------------------------- | |

Table C.2.17 -Color Endpoint Modes | |

[[ If the HDR profile is not implemented, remove from table C.2.17 | |

all rows whose description starts with "HDR", and add to the | |

caption: ]] | |

Modes not described are reserved, as described in table C.2.10. | |

[[ HDR profile only ]] | |

Mode 14 is special in that the alpha values are interpolated linearly, | |

but the color components are interpolated logarithmically. This is the | |

only endpoint format with mixed-mode operation, and will return the | |

error value if encountered in LDR mode. | |

Decode the different LDR endpoint modes as follows: | |

Mode 0 LDR Luminance, direct | |

e0=(v0,v0,v0,0xFF); e1=(v1,v1,v1,0xFF); | |

Mode 1 LDR Luminance, base+offset | |

L0 = (v0>>2)|(v1&0xC0); L1=L0+(v1&0x3F); | |

if (L1>0xFF) { L1=0xFF; } | |

e0=(L0,L0,L0,0xFF); e1=(L1,L1,L1,0xFF); | |

Mode 4 LDR Luminance+Alpha,direct | |

e0=(v0,v0,v0,v2); | |

e1=(v1,v1,v1,v3); | |

Mode 5 LDR Luminance+Alpha, base+offset | |

bit_transfer_signed(v1,v0); bit_transfer_signed(v3,v2); | |

e0=(v0,v0,v0,v2); e1=(v0+v1,v0+v1,v0+v1,v2+v3); | |

clamp_unorm8(e0); clamp_unorm8(e1); | |

Mode 6 LDR RGB, base+scale | |

e0=(v0*v3>>8,v1*v3>>8,v2*v3>>8, 0xFF); | |

e1=(v0,v1,v2,0xFF); | |

Mode 8 LDR RGB, Direct | |

s0= v0+v2+v4; s1= v1+v3+v5; | |

if (s1>=s0){e0=(v0,v2,v4,0xFF); | |

e1=(v1,v3,v5,0xFF); } | |

else { e0=blue_contract(v1,v3,v5,0xFF); | |

e1=blue_contract(v0,v2,v4,0xFF); } | |

Mode 9 LDR RGB, base+offset | |

bit_transfer_signed(v1,v0); | |

bit_transfer_signed(v3,v2); | |

bit_transfer_signed(v5,v4); | |

if(v1+v3+v5 >= 0) | |

{ e0=(v0,v2,v4,0xFF); e1=(v0+v1,v2+v3,v4+v5,0xFF); } | |

else | |

{ e0=blue_contract(v0+v1,v2+v3,v4+v5,0xFF); | |

e1=blue_contract(v0,v2,v4,0xFF); } | |

clamp_unorm8(e0); clamp_unorm8(e1); | |

Mode 10 LDR RGB, base+scale plus two A | |

e0=(v0*v3>>8,v1*v3>>8,v2*v3>>8, v4); | |

e1=(v0,v1,v2, v5); | |

Mode 12 LDR RGBA, direct | |

s0= v0+v2+v4; s1= v1+v3+v5; | |

if (s1>=s0){e0=(v0,v2,v4,v6); | |

e1=(v1,v3,v5,v7); } | |

else { e0=blue_contract(v1,v3,v5,v7); | |

e1=blue_contract(v0,v2,v4,v6); } | |

Mode 13 LDR RGBA, base+offset | |

bit_transfer_signed(v1,v0); | |

bit_transfer_signed(v3,v2); | |

bit_transfer_signed(v5,v4); | |

bit_transfer_signed(v7,v6); | |

if(v1+v3+v5>=0) { e0=(v0,v2,v4,v6); | |

e1=(v0+v1,v2+v3,v4+v5,v6+v7); } | |

else { e0=blue_contract(v0+v1,v2+v3,v4+v5,v6+v7); | |

e1=blue_contract(v0,v2,v4,v6); } | |

clamp_unorm8(e0); clamp_unorm8(e1); | |

The bit_transfer_signed procedure transfers a bit from one value (a) | |

to another (b). Initially, both a and b are in the range 0..255. | |

After calling this procedure, a's range becomes -32..31, and b remains | |

in the range 0..255. Note that, as is often the case, this is easier to | |

express in hardware than in C: | |

bit_transfer_signed(int& a, int& b) | |

{ | |

b >>= 1; | |

b |= a & 0x80; | |

a >>= 1; | |

a &= 0x3F; | |

if( (a&0x20)!=0 ) a-=0x40; | |

} | |

The blue_contract procedure is used to give additional precision to | |

RGB colors near grey: | |

color blue_contract( int r, int g, int b, int a ) | |

{ | |

color c; | |

c.r = (r+b) >> 1; | |

c.g = (g+b) >> 1; | |

c.b = b; | |

c.a = a; | |

return c; | |

} | |

The clamp_unorm8 procedure is used to clamp a color into the UNORM8 range: | |

void clamp_unorm8(color c) | |

{ | |

if(c.r < 0) {c.r=0;} else if(c.r > 255) {c.r=255;} | |

if(c.g < 0) {c.g=0;} else if(c.g > 255) {c.g=255;} | |

if(c.b < 0) {c.b=0;} else if(c.b > 255) {c.b=255;} | |

if(c.a < 0) {c.a=0;} else if(c.a > 255) {c.a=255;} | |

} | |

[[ If the HDR profile is not implemented, do not include section | |

C.2.15 ]] | |

C.2.15 HDR Endpoint Decoding | |

------------------------- | |

For HDR endpoint modes, color values are represented in a 12-bit | |

pseudo-logarithmic representation. | |

HDR Endpoint Mode 2 | |

Mode 2 represents luminance-only data with a large range. It encodes | |

using two values (v0, v1). The complete decoding procedure is as follows: | |

if(v1 >= v0) | |

{ | |

y0 = (v0 << 4); | |

y1 = (v1 << 4); | |

} | |

else | |

{ | |

y0 = (v1 << 4) + 8; | |

y1 = (v0 << 4) - 8; | |

} | |

// Construct RGBA result (0x780 is 1.0f) | |

e0 = (y0, y0, y0, 0x780); | |

e1 = (y1, y1, y1, 0x780); | |

HDR Endpoint Mode 3 | |

Mode 3 represents luminance-only data with a small range. It packs the | |

bits for a base luminance value, together with an offset, into two values | |

(v0, v1): | |

Value 7 6 5 4 3 2 1 0 | |

----- ------------------------------ | |

v0 |M | L[6:0] | | |

------------------------------ | |

v1 | X[3:0] | d[3:0] | | |

------------------------------ | |

Table C.2.18 - HDR Mode 3 Value Layout | |

The bit field marked as X allocates different bits to L or d depending | |

on the value of the mode bit M. | |

The complete decoding procedure is as follows: | |

// Check mode bit and extract. | |

if((v0&0x80) !=0) | |

{ | |

y0 = ((v1 & 0xE0) << 4) | ((v0 & 0x7F) << 2); | |

d = (v1 & 0x1F) << 2; | |

} | |

else | |

{ | |

y0 = ((v1 & 0xF0) << 4) | ((v0 & 0x7F) << 1); | |

d = (v1 & 0x0F) << 1; | |

} | |

// Add delta and clamp | |

y1 = y0 + d; | |

if(y1 > 0xFFF) { y1 = 0xFFF; } | |

// Construct RGBA result (0x780 is 1.0f) | |

e0 = (y0, y0, y0, 0x780); | |

e1 = (y1, y1, y1, 0x780); | |

HDR Endpoint Mode 7 | |

Mode 7 packs the bits for a base RGB value, a scale factor, and some | |

mode bits into the four values (v0, v1, v2, v3): | |

Value 7 6 5 4 3 2 1 0 | |

----- ------------------------------ | |

v0 |M[3:2] | R[5:0] | | |

----- ------------------------------ | |

v1 |M1 |X0 |X1 | G[4:0] | | |

----- ------------------------------ | |

v2 |M0 |X2 |X3 | B[4:0] | | |

----- ------------------------------ | |

v3 |X4 |X5 |X6 | S[4:0] | | |

----- ------------------------------ | |

Table C.2.19 - HDR Mode 7 Value Layout | |

The mode bits M0 to M3 are a packed representation of an endpoint bit | |

mode, together with the major component index. For modes 0 to 4, the | |

component (red, green, or blue) with the largest magnitude is identified, | |

and the values swizzled to ensure that it is decoded from the red channel. | |

The endpoint bit mode is used to determine the number of bits assigned | |

to each component of the endpoint, and the destination of each of the | |

extra bits X0 to X6, as follows: | |

------------------------------------------------------ | |

Number of bits Destination of extra bits | |

Mode R G B S X0 X1 X2 X3 X4 X5 X6 | |

------------------------------------------------------ | |

0 11 5 5 7 R9 R8 R7 R10 R6 S6 S5 | |

1 11 6 6 5 R8 G5 R7 B5 R6 R10 R9 | |

2 10 5 5 8 R9 R8 R7 R6 S7 S6 S5 | |

3 9 6 6 7 R8 G5 R7 B5 R6 S6 S5 | |

4 8 7 7 6 G6 G5 B6 B5 R6 R7 S5 | |

5 7 7 7 7 G6 G5 B6 B5 R6 S6 S5 | |

------------------------------------------------------ | |

Table C.2.20 - Endpoint Bit Mode | |

As noted before, this appears complex when expressed in C, but much | |

easier to achieve in hardware - bit masking, extraction, shifting | |

and assignment usually ends up as a single wire or multiplexer. | |

The complete decoding procedure is as follows: | |

// Extract mode bits and unpack to major component and mode. | |

int modeval = ((v0&0xC0)>>6) | ((v1&0x80)>>5) | ((v2&0x80)>>4); | |

int majcomp; | |

int mode; | |

if( (modeval & 0xC ) != 0xC ) | |

{ | |

majcomp = modeval >> 2; mode = modeval & 3; | |

} | |

else if( modeval != 0xF ) | |

{ | |

majcomp = modeval & 3; mode = 4; | |

} | |

else | |

{ | |

majcomp = 0; mode = 5; | |

} | |

// Extract low-order bits of r, g, b, and s. | |

int red = v0 & 0x3f; | |

int green = v1 & 0x1f; | |

int blue = v2 & 0x1f; | |

int scale = v3 & 0x1f; | |

// Extract high-order bits, which may be assigned depending on mode | |

int x0 = (v1 >> 6) & 1; int x1 = (v1 >> 5) & 1; | |

int x2 = (v2 >> 6) & 1; int x3 = (v2 >> 5) & 1; | |

int x4 = (v3 >> 7) & 1; int x5 = (v3 >> 6) & 1; | |

int x6 = (v3 >> 5) & 1; | |

// Now move the high-order xs into the right place. | |

int ohm = 1 << mode; | |

if( ohm & 0x30 ) green |= x0 << 6; | |

if( ohm & 0x3A ) green |= x1 << 5; | |

if( ohm & 0x30 ) blue |= x2 << 6; | |

if( ohm & 0x3A ) blue |= x3 << 5; | |

if( ohm & 0x3D ) scale |= x6 << 5; | |

if( ohm & 0x2D ) scale |= x5 << 6; | |

if( ohm & 0x04 ) scale |= x4 << 7; | |

if( ohm & 0x3B ) red |= x4 << 6; | |

if( ohm & 0x04 ) red |= x3 << 6; | |

if( ohm & 0x10 ) red |= x5 << 7; | |

if( ohm & 0x0F ) red |= x2 << 7; | |

if( ohm & 0x05 ) red |= x1 << 8; | |

if( ohm & 0x0A ) red |= x0 << 8; | |

if( ohm & 0x05 ) red |= x0 << 9; | |

if( ohm & 0x02 ) red |= x6 << 9; | |

if( ohm & 0x01 ) red |= x3 << 10; | |

if( ohm & 0x02 ) red |= x5 << 10; | |

// Shift the bits to the top of the 12-bit result. | |

static const int shamts[6] = { 1,1,2,3,4,5 }; | |

int shamt = shamts[mode]; | |

red <<= shamt; green <<= shamt; blue <<= shamt; scale <<= shamt; | |

// Minor components are stored as differences | |

if( mode != 5 ) { green = red - green; blue = red - blue; } | |

// Swizzle major component into place | |

if( majcomp == 1 ) swap( red, green ); | |

if( majcomp == 2 ) swap( red, blue ); | |

// Clamp output values, set alpha to 1.0 | |

e1.r = clamp( red, 0, 0xFFF ); | |

e1.g = clamp( green, 0, 0xFFF ); | |

e1.b = clamp( blue, 0, 0xFFF ); | |

e1.alpha = 0x780; | |

e0.r = clamp( red - scale, 0, 0xFFF ); | |

e0.g = clamp( green - scale, 0, 0xFFF ); | |

e0.b = clamp( blue - scale, 0, 0xFFF ); | |

e0.alpha = 0x780; | |

HDR Endpoint Mode 11 | |

Mode 11 specifies two RGB values, which it calculates from a number of | |

bitfields (a, b0, b1, c, d0 and d1) which are packed together with some | |

mode bits into the six values (v0, v1, v2, v3, v4, v5): | |

Value 7 6 5 4 3 2 1 0 | |

----- ------------------------------ | |

v0 | a[7:0] | | |

----- ------------------------------ | |

v1 |m0 |a8 | c[5:0] | | |

----- ------------------------------ | |

v2 |m1 |X0 | b0[5:0] | | |

----- ------------------------------ | |

v3 |m2 |X1 | b1[5:0] | | |

----- ------------------------------ | |

v4 |mj0|X2 |X4 | d0[4:0] | | |

----- ------------------------------ | |

v5 |mj1|X3 |X5 | d1[4:0] | | |

----- ------------------------------ | |

Table C.2.21 - HDR Mode 11 Value Layout | |

If the major component bits mj[1:0 ] are both 1, then the RGB values | |

are specified directly | |

Value 7 6 5 4 3 2 1 0 | |

----- ------------------------------ | |

v0 | R0[11:4] | | |

----- ------------------------------ | |

v1 | R1[11:4] | | |

----- ------------------------------ | |

v2 | G0[11:4] | | |

----- ------------------------------ | |

v3 | G1[11:4] | | |

----- ------------------------------ | |

v4 | 1 | B0[11:5] | | |

----- ------------------------------ | |

v5 | 1 | B1[11:5] | | |

----- ------------------------------ | |

Table C.2.22 - HDR Mode 11 Value Layout | |

The mode bits m[2:0] specify the bit allocation for the different | |

values, and the destinations of the extra bits X0 to X5: | |

------------------------------------------------------------------------- | |

Number of bits Destination of extra bits | |

Mode a b c d X0 X1 X2 X3 X4 X5 | |

------------------------------------------------------------------------- | |

0 9 7 6 7 b0[6] b1[6] d0[6] d1[6] d0[5] d1[5] | |

1 9 8 6 6 b0[6] b1[6] b0[7] b1[7] d0[5] d1[5] | |

2 10 6 7 7 a[9] c[6] d0[6] d1[6] d0[5] d1[5] | |

3 10 7 7 6 b0[6] b1[6] a[9] c[6] d0[5] d1[5] | |

4 11 8 6 5 b0[6] b1[6] b0[7] b1[7] a[9] a[10] | |

5 11 6 7 6 a[9] a[10] c[7] c[6] d0[5] d1[5] | |

6 12 7 7 5 b0[6] b1[6] a[11] c[6] a[9] a[10] | |

7 12 6 7 6 a[9] a[10] a[11] c[6] d0[5] d1[5] | |

------------------------------------------------------------------------- | |

Table C.2.23 - Endpoint Bit Mode | |

The complete decoding procedure is as follows: | |

// Find major component | |

int majcomp = ((v4 & 0x80) >> 7) | ((v5 & 0x80) >> 6); | |

// Deal with simple case first | |

if( majcomp == 3 ) | |

{ | |

e0 = (v0 << 4, v2 << 4, (v4 & 0x7f) << 5, 0x780); | |

e1 = (v1 << 4, v3 << 4, (v5 & 0x7f) << 5, 0x780); | |

return; | |

} | |

// Decode mode, parameters. | |

int mode = ((v1&0x80)>>7) | ((v2&0x80)>>6) | ((v3&0x80)>>5); | |

int va = v0 | ((v1 & 0x40) << 2); | |

int vb0 = v2 & 0x3f; | |

int vb1 = v3 & 0x3f; | |

int vc = v1 & 0x3f; | |

int vd0 = v4 & 0x7f; | |

int vd1 = v5 & 0x7f; | |

// Assign top bits of vd0, vd1. | |

static const int dbitstab[8] = {7,6,7,6,5,6,5,6}; | |

vd0 = signextend( vd0, dbitstab[mode] ); | |

vd1 = signextend( vd1, dbitstab[mode] ); | |

// Extract and place extra bits | |

int x0 = (v2 >> 6) & 1; | |

int x1 = (v3 >> 6) & 1; | |

int x2 = (v4 >> 6) & 1; | |

int x3 = (v5 >> 6) & 1; | |

int x4 = (v4 >> 5) & 1; | |

int x5 = (v5 >> 5) & 1; | |

int ohm = 1 << mode; | |

if( ohm & 0xA4 ) va |= x0 << 9; | |

if( ohm & 0x08 ) va |= x2 << 9; | |

if( ohm & 0x50 ) va |= x4 << 9; | |

if( ohm & 0x50 ) va |= x5 << 10; | |

if( ohm & 0xA0 ) va |= x1 << 10; | |

if( ohm & 0xC0 ) va |= x2 << 11; | |

if( ohm & 0x04 ) vc |= x1 << 6; | |

if( ohm & 0xE8 ) vc |= x3 << 6; | |

if( ohm & 0x20 ) vc |= x2 << 7; | |

if( ohm & 0x5B ) vb0 |= x0 << 6; | |

if( ohm & 0x5B ) vb1 |= x1 << 6; | |

if( ohm & 0x12 ) vb0 |= x2 << 7; | |

if( ohm & 0x12 ) vb1 |= x3 << 7; | |

// Now shift up so that major component is at top of 12-bit value | |

int shamt = (modeval >> 1) ^ 3; | |

va <<= shamt; vb0 <<= shamt; vb1 <<= shamt; | |

vc <<= shamt; vd0 <<= shamt; vd1 <<= shamt; | |

e1.r = clamp( va, 0, 0xFFF ); | |

e1.g = clamp( va - vb0, 0, 0xFFF ); | |

e1.b = clamp( va - vb1, 0, 0xFFF ); | |

e1.alpha = 0x780; | |

e0.r = clamp( va - vc, 0, 0xFFF ); | |

e0.g = clamp( va - vb0 - vc - vd0, 0, 0xFFF ); | |

e0.b = clamp( va - vb1 - vc - vd1, 0, 0xFFF ); | |

e0.alpha = 0x780; | |

if( majcomp == 1 ) { swap( e0.r, e0.g ); swap( e1.r, e1.g ); } | |

else if( majcomp == 2 ) { swap( e0.r, e0.b ); swap( e1.r, e1.b ); } | |

HDR Endpoint Mode 14 | |

Mode 14 specifies two RGBA values, using the eight values (v0, v1, v2, | |

v3, v4, v5, v6, v7). First, the RGB values are decoded from (v0..v5) | |

using the method from Mode 11, then the alpha values are filled in | |

from v6 and v7: | |

// Decode RGB as for mode 11 | |

(e0,e1) = decode_mode_11(v0,v1,v2,v3,v4,v5) | |

// Now fill in the alphas | |

e0.alpha = v6; | |

e1.alpha = v7; | |

Note that in this mode, the alpha values are interpreted (and | |

interpolated) as 8-bit unsigned normalized values, as in the LDR modes. | |

This is the only mode that exhibits this behaviour. | |

HDR Endpoint Mode 15 | |

Mode 15 specifies two RGBA values, using the eight values (v0, v1, v2, | |

v3, v4, v5, v6, v7). First, the RGB values are decoded from (v0..v5) | |

using the method from Mode 11. The alpha values are stored in values | |

v6 and v7 as a mode and two values which are interpreted according | |

to the mode: | |

Value 7 6 5 4 3 2 1 0 | |

----- ------------------------------ | |

v6 |M0 | A[6:0] | | |

----- ------------------------------ | |

v7 |M1 | B[6:0] | | |

----- ------------------------------ | |

Table C.2.24 - HDR Mode 15 Alpha Value Layout | |

The alpha values are decoded from v6 and v7 as follows: | |

// Decode RGB as for mode 11 | |

(e0,e1) = decode_mode_11(v0,v1,v2,v3,v4,v5) | |

// Extract mode bits | |

mode = ((v6 >> 7) & 1) | ((v7 >> 6) & 2); | |

v6 &= 0x7F; | |

v7 &= 0x7F; | |

if(mode==3) | |

{ | |

// Directly specify alphas | |

e0.alpha = v6 << 5; | |

e1.alpha = v7 << 5; | |

} | |

else | |

{ | |

// Transfer bits from v7 to v6 and sign extend v7. | |

v6 |= (v7 << (mode+1))) & 0x780; | |

v7 &= (0x3F >> mode); | |

v7 ^= 0x20 >> mode; | |

v7 -= 0x20 >> mode; | |

v6 <<= (4-mode); | |

v7 <<= (4-mode); | |

// Add delta and clamp | |

v7 += v6; | |

v7 = clamp(v7, 0, 0xFFF); | |

e0.alpha = v6; | |

e1.alpha = v7; | |

} | |

Note that in this mode, the alpha values are interpreted (and | |

interpolated) as 12-bit HDR values, and are interpolated as | |

for any other HDR component. | |

C.2.16 Weight Decoding | |

----------------------- | |

The weight information is stored as a stream of bits, growing downwards | |

from the most significant bit in the block. Bit n in the stream is thus | |

bit 127-n in the block. | |

For each location in the weight grid, a value (in the specified range) | |

is packed into the stream. These are ordered in a raster pattern | |

starting from location (0,0,0), with the X dimension increasing fastest, | |

and the Z dimension increasing slowest. If dual-plane mode is selected, | |

both weights are emitted together for each location, plane 0 first, | |

then plane 1. | |

C.2.17 Weight Unquantization | |

----------------------------- | |

Each weight plane is specified as a sequence of integers in a given | |

range. These values are packed using integer sequence encoding. | |

Once unpacked, the values must be unquantized from their storage | |

range, returning them to a standard range of 0..64. The procedure | |

for doing so is similar to the color endpoint unquantization. | |

First, we unquantize the actual stored weight values to the range 0..63. | |

For bit-only representations, this is simple bit replication from the | |

most significant bit of the value. | |

For trit or quint-based representations, this involves a set of bit | |

manipulations and adjustments to avoid the expense of full-width | |

multipliers. | |

For representations with no additional bits, the results are as follows: | |

Range 0 1 2 3 4 | |

-------------------------- | |

0..2 0 32 63 - - | |

0..4 0 16 32 47 63 | |

-------------------------- | |

Table C.2.25 - Weight Unquantization Values | |

For other values, we calculate the initial inputs to a bit manipulation | |

procedure. These are denoted A (7 bits), B (7 bits), C (7 bits), and | |

D (3 bits) and are decoded using the range as follows: | |

Range T Q B Bits A B C D | |

------------------------------------------------------- | |

0..5 1 1 a aaaaaaa 0000000 50 Trit value | |

0..9 1 1 a aaaaaaa 0000000 28 Quint value | |

0..11 1 2 ba aaaaaaa b000b0b 23 Trit value | |

0..19 1 2 ba aaaaaaa b0000b0 13 Quint value | |

0..23 1 3 cba aaaaaaa cb000cb 11 Trit value | |

------------------------------------------------------- | |

Table C.2.26 - Weight Unquantization Parameters | |

These are then processed as follows: | |

T = D * C + B; | |

T = T ^ A; | |

T = (A & 0x20) | (T >> 2); | |

Note that the multiply in the first line is nearly trivial as it only | |

needs to multiply by 0, 1, 2, 3 or 4. | |

As a final step, for all types of value, the range is expanded from | |

0..63 up to 0..64 as follows: | |

if (T > 32) { T += 1; } | |

This allows the implementation to use 64 as a divisor during inter- | |

polation, which is much easier than using 63. | |

C.2.18 Weight Infill | |

--------------------- | |

After unquantization, the weights are subject to weight selection and | |

infill. The infill method is used to calculate the weight for a texel | |

position, based on the weights in the stored weight grid array (which | |

may be a different size). | |

The procedure below must be followed exactly, to ensure bit exact | |

results. | |

The block size is specified as two dimensions along the s and t | |

axes (Bs, Bt). Texel coordinates within the block (s,t) can have values | |

from 0 to one less than the block dimension in that axis. | |

For each block dimension, we compute scale factors (Ds, Dt) | |

Ds = floor( (1024 + floor(Bs/2)) / (Bs-1) ); | |

Dt = floor( (1024 + floor(Bt/2)) / (Bt-1) ); | |

Since the block dimensions are constrained, these are easily looked up | |

in a table. These scale factors are then used to scale the (s,t) | |

coordinates to a homogeneous coordinate (cs, ct): | |

cs = Ds * s; | |

ct = Dt * t; | |

This homogeneous coordinate (cs, ct) is then scaled again to give | |

a coordinate (gs, gt) in the weight-grid space . The weight-grid is | |

of size (N, M), as specified in the block mode field: | |

gs = (cs*(N-1)+32) >> 6; | |

gt = (ct*(M-1)+32) >> 6; | |

The resulting coordinates may be in the range 0..176. These are inter- | |

preted as 4:4 unsigned fixed point numbers in the range 0.0 .. 11.0. | |

If we label the integral parts of these (js, jt) and the fractional | |

parts (fs, ft), then: | |

js = gs >> 4; fs = gs & 0x0F; | |

jt = gt >> 4; ft = gt & 0x0F; | |

These values are then used to bilinearly interpolate between the stored | |

weights. | |

v0 = js + jt*N; | |

p00 = decode_weight(v0); | |

p01 = decode_weight(v0 + 1); | |

p10 = decode_weight(v0 + N); | |

p11 = decode_weight(v0 + N + 1); | |

The function decode_weight(n) decodes the nth weight in the stored weight | |

stream. The values p00 to p11 are the weights at the corner of the square | |

in which the texel position resides. These are then weighted using the | |

fractional position to produce the effective weight i as follows: | |

w11 = (fs*ft+8) >> 4; | |

w10 = ft - w11; | |

w01 = fs - w11; | |

w00 = 16 - fs - ft + w11; | |

i = (p00*w00 + p01*w01 + p10*w10 + p11*w11 + 8) >> 4; | |

C.2.19 Weight Application | |

-------------------------- | |

Once the effective weight i for the texel has been calculated, the color | |

endpoints are interpolated and expanded. | |

For LDR endpoint modes, each color component C is calculated from the | |

corresponding 8-bit endpoint components C0 and C1 as follows: | |

If sRGB conversion is not enabled, or for the alpha channel in any case, | |

C0 and C1 are first expanded to 16 bits by bit replication: | |

C0 = (C0 << 8) | C0; C1 = (C1 << 8) | C1; | |

If sRGB conversion is enabled, C0 and C1 for the R, G, and B channels | |

are expanded to 16 bits differently, as follows: | |

C0 = (C0 << 8) | 0x80; C1 = (C1 << 8) | 0x80; | |

C0 and C1 are then interpolated to produce a UNORM16 result C: | |

C = floor( (C0*(64-i) + C1*i + 32)/64 ) | |

If sRGB conversion is enabled, the top 8 bits of the interpolation | |

result for the R, G and B channels are passed to the external sRGB | |

conversion block. Otherwise, if C = 65535, then the final result is | |

1.0 (0x3C00) otherwise C is divided by 65536 and the infinite-precision | |

result of the division is converted to FP16 with round-to-zero | |

semantics. | |

For HDR endpoint modes, color values are represented in a 12-bit | |

pseudo-logarithmic representation, and interpolation occurs in a | |

piecewise-approximate logarithmic manner as follows: | |

In LDR mode, the error result is returned. | |

In HDR mode, the color components from each endpoint, C0 and C1, are | |

initially shifted left 4 bits to become 16-bit integer values and these | |

are interpolated in the same way as LDR. The 16-bit value C is then | |

decomposed into the top five bits, E, and the bottom 11 bits M, which | |

are then processed and recombined with E to form the final value Cf: | |

C = floor( (C0*(64-i) + C1*i + 32)/64 ) | |

E = (C&0xF800) >> 11; M = C&0x7FF; | |

if (M < 512) { Mt = 3*M; } | |

else if (M >= 1536) { Mt = 5*M - 2048; } | |

else { Mt = 4*M - 512; } | |

Cf = (E<<10) + (Mt>>3) | |

This interpolation is a considerably closer approximation to a | |

logarithmic space than simple 16-bit interpolation. | |

This final value Cf is interpreted as an IEEE FP16 value. If the result | |

is +Inf or NaN, it is converted to the bit pattern 0x7BFF, which is the | |

largest representable finite value. | |

C.2.20 Dual-Plane Decoding | |

--------------------------- | |

If dual-plane mode is disabled, all of the endpoint components are inter- | |

polated using the same weight value. | |

If dual-plane mode is enabled, two weights are stored with each texel. | |

One component is then selected to use the second weight for interpolation, | |

instead of the first weight. The first weight is then used for all other | |

components. | |

The component to treat specially is indicated using the 2-bit Color | |

Component Selector (CCS) field as follows: | |

Value Weight 0 Weight 1 | |

-------------------------- | |

0 GBA R | |

1 RBA G | |

2 RGA B | |

3 RGB A | |

-------------------------- | |

Table C.2.28 - Dual Plane Color Component Selector Values | |

The CCS bits are stored at a variable position directly below the weight | |

bits and any additional CEM bits. | |

C.2.21 Partition Pattern Generation | |

------------------------------------ | |

When multiple partitions are active, each texel position is assigned a | |

partition index. This partition index is calculated using a seed (the | |

partition pattern index), the texel's x,y,z position within the block, | |

and the number of partitions. An additional argument, small_block, is | |

set to 1 if the number of texels in the block is less than 31, | |

otherwise it is set to 0. | |

This function is specified in terms of x, y and z in order to support | |

3D textures. For 2D textures and texture slices, z will always be 0. | |

The full partition selection algorithm is as follows: | |

int select_partition(int seed, int x, int y, int z, | |

int partitioncount, int small_block) | |

{ | |

if( small_block ){ x <<= 1; y <<= 1; z <<= 1; } | |

seed += (partitioncount-1) * 1024; | |

uint32_t rnum = hash52(seed); | |

uint8_t seed1 = rnum & 0xF; | |

uint8_t seed2 = (rnum >> 4) & 0xF; | |

uint8_t seed3 = (rnum >> 8) & 0xF; | |

uint8_t seed4 = (rnum >> 12) & 0xF; | |

uint8_t seed5 = (rnum >> 16) & 0xF; | |

uint8_t seed6 = (rnum >> 20) & 0xF; | |

uint8_t seed7 = (rnum >> 24) & 0xF; | |

uint8_t seed8 = (rnum >> 28) & 0xF; | |

uint8_t seed9 = (rnum >> 18) & 0xF; | |

uint8_t seed10 = (rnum >> 22) & 0xF; | |

uint8_t seed11 = (rnum >> 26) & 0xF; | |

uint8_t seed12 = ((rnum >> 30) | (rnum << 2)) & 0xF; | |

seed1 *= seed1; seed2 *= seed2; | |

seed3 *= seed3; seed4 *= seed4; | |

seed5 *= seed5; seed6 *= seed6; | |

seed7 *= seed7; seed8 *= seed8; | |

seed9 *= seed9; seed10 *= seed10; | |

seed11 *= seed11; seed12 *= seed12; | |

int sh1, sh2, sh3; | |

if( seed & 1 ) | |

{ sh1 = (seed&2 ? 4:5); sh2 = (partitioncount==3 ? 6:5); } | |

else | |

{ sh1 = (partitioncount==3 ? 6:5); sh2 = (seed&2 ? 4:5); } | |

sh3 = (seed & 0x10) ? sh1 : sh2; | |

seed1 >>= sh1; seed2 >>= sh2; seed3 >>= sh1; seed4 >>= sh2; | |

seed5 >>= sh1; seed6 >>= sh2; seed7 >>= sh1; seed8 >>= sh2; | |

seed9 >>= sh3; seed10 >>= sh3; seed11 >>= sh3; seed12 >>= sh3; | |

int a = seed1*x + seed2*y + seed11*z + (rnum >> 14); | |

int b = seed3*x + seed4*y + seed12*z + (rnum >> 10); | |

int c = seed5*x + seed6*y + seed9 *z + (rnum >> 6); | |

int d = seed7*x + seed8*y + seed10*z + (rnum >> 2); | |

a &= 0x3F; b &= 0x3F; c &= 0x3F; d &= 0x3F; | |

if( partitioncount < 4 ) d = 0; | |

if( partitioncount < 3 ) c = 0; | |

if( a >= b && a >= c && a >= d ) return 0; | |

else if( b >= c && b >= d ) return 1; | |

else if( c >= d ) return 2; | |

else return 3; | |

} | |

As has been observed before, the bit selections are much easier to | |

express in hardware than in C. | |

The seed is expanded using a hash function hash52, which is defined as | |

follows: | |

uint32_t hash52( uint32_t p ) | |

{ | |

p ^= p >> 15; p -= p << 17; p += p << 7; p += p << 4; | |

p ^= p >> 5; p += p << 16; p ^= p >> 7; p ^= p >> 3; | |

p ^= p << 6; p ^= p >> 17; | |

return p; | |

} | |

This assumes that all operations act on 32-bit values | |

C.2.22 Data Size Determination | |

------------------------------- | |

The size of the data used to represent color endpoints is not | |

explicitly specified. Instead, it is determined from the block mode and | |

number of partitions as follows: | |

config_bits = 17; | |

if(num_partitions>1) | |

if(single_CEM) | |

config_bits = 29; | |

else | |

config_bits = 25 + 3*num_partitions; | |

num_weights = M * N * Q; // size of weight grid | |

if(dual_plane) | |

config_bits += 2; | |

num_weights *= 2; | |

weight_bits = ceil(num_weights*8*trits_in_weight_range/5) + | |

ceil(num_weights*7*quints_in_weight_range/3) + | |

num_weights*bits_in_weight_range; | |

remaining_bits = 128 - config_bits - weight_bits; | |

num_CEM_pairs = base_CEM_class+1 + count_bits(extra_CEM_bits); | |

The CEM value range is then looked up from a table indexed by remaining | |

bits and num_CEM_pairs. This table is initialized such that the range | |

is as large as possible, consistent with the constraint that the number | |

of bits required to encode num_CEM_pairs pairs of values is not more | |

than the number of remaining bits. | |

An equivalent iterative algorithm would be: | |

num_CEM_values = num_CEM_pairs*2; | |

for(range = each possible CEM range in descending order of size) | |

{ | |

CEM_bits = ceil(num_CEM_values*8*trits_in_CEM_range/5) + | |

ceil(num_CEM_values*7*quints_in_CEM_range/3) + | |

num_CEM_values*bits_in_CEM_range; | |

if(CEM_bits <= remaining_bits) | |

break; | |

} | |

return range; | |

In cases where this procedure results in unallocated bits, these bits | |

are not read by the decoding process and can have any value. | |

C.2.23 Void-Extent Blocks | |

-------------------------- | |

A void-extent block is a block encoded with a single color. It also | |

specifies some additional information about the extent of the single- | |

color area beyond this block, which can optionally be used by a | |

decoder to reduce or prevent redundant block fetches. | |

The layout of a 2D Void-Extent block is as follows: | |

127 126 125 124 123 122 121 120 119 118 117 116 115 114 113 112 | |

--------------------------------------------------------------- | |

| Block color A component | | |

--------------------------------------------------------------- | |

111 110 109 108 107 106 105 104 103 102 101 100 99 98 97 96 | |

---------------------------------------------------------------- | |

| Block color B component | | |

---------------------------------------------------------------- | |

95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 | |

---------------------------------------------------------------- | |

| Block color G component | | |

---------------------------------------------------------------- | |

79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 | |

---------------------------------------------------------------- | |

| Block color R component | | |

---------------------------------------------------------------- | |

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 | |

---------------------------------------------------------------- | |

| Void-extent maximum T coordinate | Min T | | |

---------------------------------------------------------------- | |

47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 | |

---------------------------------------------------------------- | |

Void-extent minimum T coordinate | Void-extent max S | | |

---------------------------------------------------------------- | |

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 | |

---------------------------------------------------------------- | |

Void-extent max S coord | Void-extent minimum S coordinate | | |

---------------------------------------------------------------- | |

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 | |

---------------------------------------------------------------- | |

Min S coord | 1 | 1 | D | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | | |

---------------------------------------------------------------- | |

------------------------------------------------- | |

Figure C.7 - 2D Void-Extent Block Layout Overview | |

Bit 9 is the Dynamic Range flag, which indicates the format in which | |

colors are stored. A 0 value indicates LDR, in which case the color | |

components are stored as UNORM16 values. A 1 indicates HDR, in which | |

case the color components are stored as FP16 values. | |

The reason for the storage of UNORM16 values in the LDR case is due | |

to the possibility that the value will need to be passed on to sRGB | |

conversion. By storing the color value in the format which comes out | |

of the interpolator, before the conversion to FP16, we avoid having | |

to have separate versions for sRGB and linear modes. | |

If a void-extent block with HDR values is decoded in LDR mode, then | |

the result will be the error color, opaque magenta, for all texels | |

within the block. | |

In the HDR case, if the color component values are infinity or NaN, this | |

will result in undefined behavior. As usual, this must not lead to GL | |

interruption or termination. | |

Bits 10 and 11 are reserved and must be 1. | |

The minimum and maximum coordinate values are treated as unsigned | |

integers and then normalized into the range 0..1 (by dividing by 2^13-1 | |

or 2^9-1, for 2D and 3D respectively). The maximum values for each | |

dimension must be greater than the corresponding minimum values, | |

unless they are all all-1s. | |

If all the coordinates are all-1s, then the void extent is ignored, | |

and the block is simply a constant-color block. | |

The existence of single-color blocks with void extents must not produce | |

results different from those obtained if these single-color blocks are | |

defined without void-extents. Any situation in which the results would | |

differ is invalid. Results from invalid void extents are undefined. | |

If a void-extent appears in a MIPmap level other than the most detailed | |

one, then the extent will apply to all of the more detailed levels too. | |

This allows decoders to avoid sampling more detailed MIPmaps. | |

If the more detailed MIPmap level is not a constant color in this region, | |

then the block may be marked as constant color, but without a void extent, | |

as detailed above. | |

If a void-extent extends to the edge of a texture, then filtered texture | |

colors may not be the same color as that specified in the block, due to | |

texture border colors, wrapping, or cube face wrapping. | |

Care must be taken when updating or extracting partial image data that | |

void-extents in the image do not become invalid. | |

C.2.24 Illegal Encodings | |

------------------------- | |

In ASTC, there is a variety of ways to encode an illegal block. Decoders | |

are required to recognize all illegal blocks and emit the standard error | |

color value upon encountering an illegal block. | |

Here is a comprehensive list of situations that represent illegal block | |

encodings: | |

* The block mode specified is one of the modes explicitly listed | |

as Reserved. | |

* A 2D void-extent block that has any of the reserved bits not | |

set to 1. | |

* A block mode has been specified that would require more than | |

64 weights total. | |

* A block mode has been specified that would require more than | |

96 bits for integer sequence encoding of the weight grid. | |

* A block mode has been specifed that would require fewer than | |

24 bits for integer sequence encoding of the weight grid. | |

* The size of the weight grid exceeds the size of the block footprint | |

in any dimension. | |

* Color endpoint modes have been specified such that the color | |

integer sequence encoding would require more than 18 integers. | |

* The number of bits available for color endpoint encoding after all | |

the other fields have been counted is less than ceil(13C/5) where C | |

is the number of color endpoint integers (this would restrict color | |

integers to a range smaller than 0..5, which is not supported). | |

* Dual weight mode is enabled for a block with 4 partitions. | |

* Void-Extent blocks where the low coordinate for some texture axis | |

is greater than or equal to the high coordinate. | |

Note also that, in LDR mode, a block which has both HDR and LDR endpoint | |

modes assigned to different partitions is not an error block. Only those | |

texels which belong to the HDR partition will result in the error color. | |

Texels belonging to a LDR partition will be decoded as normal. | |

C.2.25 LDR PROFILE SUPPORT | |

--------------------------- | |

Implementations of the LDR Profile must satisfy the following requirements: | |

* All textures with valid encodings for LDR Profile must decode | |

identically using either a LDR Profile, HDR Profile, or Full Profile | |

decoder. | |

* All features included only in the HDR Profile or Full Profile must be | |

treated as reserved in the LDR Profile, and return the error color on | |

decoding. | |

* Any sequence of API calls valid for the LDR Profile must also be valid | |

for the HDR Profile or Full Profile and return identical results when | |

given a texture encoded for the LDR Profile. | |

The feature subset for the LDR profile is: | |

* 2D textures only, including 2D, 2D array, cube map face, | |

and cube map array texture targets. | |

* Only those block sizes listed in Table C.2.2 are supported. | |

* LDR operation mode only. | |

* Only LDR endpoint formats must be supported, namely formats | |

0, 1, 4, 5, 6, 8, 9, 10, 12, 13. | |

* Decoding from a HDR endpoint results in the error color. | |

* Interpolation returns UNORM8 results when used in conjunction | |

with sRGB. | |

* LDR void extent blocks must be supported, but void extents | |

may not be checked." | |

If only the LDR profile is supported, read this extension by striking | |

all descriptions of HDR modes and decoding algorithms. The extension | |

documents how to modify the document for some particularly tricky cases, | |

but the general rule is as described in this paragraph. | |

Interactions with immutable-format texture images | |

ASTC texture formats are supported by immutable-format textures only if | |

such textures are supported by the underlying implementation (e.g. | |

OpenGL 4.1 or later, OpenGL ES 3.0 or later, or earlier versions | |

supporting the GL_EXT_texture_storage extension). Otherwise, remove all | |

references to the Tex*Storage* commands from this specification. | |

Interactions with texture cube map arrays | |

ASTC textures are supported for the TEXTURE_CUBE_MAP_ARRAY target only | |

when cube map arrays are supported by the underlying implementation | |

(e.g. OpenGL 4.0 or later, or an OpenGL or OpenGL ES version supporting | |

an extension defining cube map arrays). Otherwise, remove all references | |

to texture cube map arrays from this specification. | |

Interactions with OpenGL (all versions) | |

ASTC is not supported for 1D textures and texture rectangles, and does | |

not support non-zero borders. | |

Add the following error conditions to CompressedTexImage*D: | |

"An INVALID_ENUM error is generated by CompressedTexImage1D if | |

<internalformat> is one of the ASTC formats. | |

An INVALID_OPERATION error is generated by CompressedTexImage2D | |

and CompressedTexImage3D if <internalformat> is one of the ASTC | |

formats and <border> is non-zero." | |

Add the following error conditions to CompressedTexSubImage*D: | |

"An INVALID_ENUM error is generated by CompressedTex*SubImage1D | |

if the internal format of the texture is one of the ASTC formats. | |

An INVALID_OPERATION error is generated by CompressedTex*SubImage2D | |

if the internal format of the texture is one of the ASTC formats | |

and <border> is non-zero." | |

Add the following error conditions to TexStorage1D and TextureStorage1D: | |

"An INVALID_ENUM error is generated by TexStorage1D and TextureStorage1D | |

if <format> is one of the ASTC formats." | |

Add the following error conditions to TexStorage2D and TextureStorage2D | |

for versions of OpenGL that support texture rectangles: | |

"An INVALID_OPERATON error is generated by TexStorage2D and | |

TextureStorage2D if <format> is one of the ASTC formats and <target> | |

is TEXTURE_RECTANGLE. | |

Interactions with OpenGL 4.2 | |

OpenGL 4.2 supports the feature that compressed textures can be | |

compressed online, by passing the compressed texture format enum as | |

the internal format when uploading a texture using TexImage1D, | |

TexImage2D or TexImage3D (see Section 3.9.3, Texture Image | |

Specification, subsection Encoding of Special Internal Formats). | |

Due to the complexity of the ASTC compression algorithm, it is not | |

usually suitable for online use, and therefore ASTC support will be | |

limited to pre-compressed textures only. Where on-device compression | |

is required, a domain-specific limited compressor will typically | |

be used, and this is therefore not suitable for implementation in | |

the driver. | |

In particular, the ASTC format specifiers will not be added to | |

Table 3.14, and thus will not be accepted by the TexImage*D | |

functions, and will not be returned by the (already deprecated) | |

COMPRESSED_TEXTURE_FORMATS query. | |

Issues | |

1) Three-dimensional block ASTC formats (e.g. formats whose block depth | |

is greater than one) are not supported by these extensions. | |

2) The first release of the extension was not clear about the | |

restrictions of the LDR profile and did not document interactions | |

with cube map array textures. | |

RESOLVED. This extension has been rewritten to be based on OpenGL ES | |

3.1, to clearly document LDR restrictions, and to add cube map array | |

texture interactions. | |

Revision History | |

Revision 8, June 8, 2017 - Added missing interactions with OpenGL. | |

Revision 7, July 14, 2016 - Clarified definition of 2D void-extent | |

blocks. | |

Revision 6, March 8, 2016 - Clarified that sRGB transform is not | |

applied to Alpha channel. | |

Revision 5, September 15, 2015 - fix typo in third paragraph of section | |

8.7. | |

Revision 4, June 24, 2015 - minor cleanup from feedback. Move Issues and | |

Interactions sections to the end of the document. Merge some language | |

from OpenGL ES specification edits and rename some tables to figures, | |

due to how they're generated in the core specifications. Include a | |

description of the "Cube Map Array Texture" column added to table 3.19 | |

and expand the description of how to read this document when supporting | |

only the LDR profile (Bug 13921). | |

Revision 3, May 28, 2015 - rebase extension on OpenGL ES 3.1. Clarify | |

texture formats and targets supported by LDR and HDR profiles. Add cube | |

map array targets and an Interactions section defining when they are | |

supported. Add an Interactions section for immutable-format textures | |

(Bug 13921). | |

Revision 2, April 28, 2015 - added CompressedTex{Sub,}Image3D to | |

commands accepting ASTC format tokens in the New Tokens section (Bug | |

10183). |