skia / external / github.com / KhronosGroup / OpenGL-Registry / 1a3402f90f07618e381b5d6179e080f2d03241a3 / . / extensions / INTEL / INTEL_shader_integer_functions2.txt

Name | |

INTEL_shader_integer_functions2 | |

Name Strings | |

GL_INTEL_shader_integer_functions2 | |

Contact | |

Ian Romanick <ian.d.romanick@intel.com> | |

Contributors | |

Status | |

In progress | |

Version | |

Last Modification Date: 11/25/2019 | |

Revision: 5 | |

Number | |

OpenGL Extension #547 | |

OpenGL ES Extension #323 | |

Dependencies | |

This extension is written against the OpenGL 4.6 (Core Profile) | |

Specification. | |

This extension is written against Version 4.60 (Revision 03) of the OpenGL | |

Shading Language Specification. | |

GLSL 1.30 (OpenGL), GLSL ES 3.00 (OpenGL ES), or EXT_gpu_shader4 (OpenGL) | |

is required. | |

This extension interacts with ARB_gpu_shader_int64. | |

This extension interacts with AMD_gpu_shader_int16. | |

This extension interacts with OpenGL 4.6 and ARB_gl_spirv. | |

This extension interacts with EXT_shader_explicit_arithmetic_types. | |

Overview | |

OpenCL and other GPU programming environments provides a number of useful | |

functions operating on integer data. Many of these functions are | |

supported by specialized instructions various GPUs. Correct GLSL | |

implementations for some of these functions are non-trivial. Recognizing | |

open-coded versions of these functions is often impractical. As a result, | |

potential performance improvements go unrealized. | |

This extension makes available a number of functions that have specialized | |

instruction support on Intel GPUs. | |

New Procedures and Functions | |

None | |

New Tokens | |

None | |

IP Status | |

No known IP claims. | |

Modifications to the OpenGL Shading Language Specification, Version 4.60 | |

Including the following line in a shader can be used to control the | |

language features described in this extension: | |

#extension GL_INTEL_shader_integer_functions2 : <behavior> | |

where <behavior> is as specified in section 3.3. | |

New preprocessor #defines are added to the OpenGL Shading Language: | |

#define GL_INTEL_shader_integer_functions2 1 | |

Additions to Chapter 8 of the OpenGL Shading Language Specification | |

(Built-in Functions) | |

Modify Section 8.8, Integer Functions | |

(add a new rows after the existing "findMSB" table row, p. 161) | |

genUType countLeadingZeros(genUType value) | |

Returns the number of leading 0-bits, stating at the most significant bit, | |

in the binary representation of value. If value is zero, the size in bits | |

of the type of value or component type of value, if value is a vector will | |

be returned. | |

genUType countTrailingZeros(genUType value) | |

Returns the number of trailing 0-bits, stating at the least significant bit, | |

in the binary representation of value. If value is zero, the size in bits | |

of the type of value or component type of value (if value is a vector) will | |

be returned. | |

genUType absoluteDifference(genUType x, genUType y) | |

genUType absoluteDifference(genIType x, genIType y) | |

genU64Type absoluteDifference(genU64Type x, genU64Type y) | |

genU64Type absoluteDifference(genI64Type x, genI64Type y) | |

genU16Type absoluteDifference(genU16Type x, genU16Type y) | |

genU16Type absoluteDifference(genI16Type x, genI16Type y) | |

Returns |x - y| clamped to the range of the return type (instead of modulo | |

overflowing). Note: the return type of each of these functions is an | |

unsigned type of the same bit-size and vector element count. | |

genUType addSaturate(genUType x, genUType y) | |

genIType addSaturate(genIType x, genIType y) | |

genU64Type addSaturate(genU64Type x, genU64Type y) | |

genI64Type addSaturate(genI64Type x, genI64Type y) | |

genU16Type addSaturate(genU16Type x, genU16Type y) | |

genI16Type addSaturate(genI16Type x, genI16Type y) | |

Returns x + y clamped to the range of the type of x (instead of modulo | |

overflowing). | |

genUType average(genUType x, genUType y) | |

genIType average(genIType x, genIType y) | |

genU64Type average(genU64Type x, genU64Type y) | |

genI64Type average(genI64Type x, genI64Type y) | |

genU16Type average(genU16Type x, genU16Type y) | |

genI16Type average(genI16Type x, genI16Type y) | |

Returns (x+y) >> 1. The intermediate sum does not modulo overflow. | |

genUType averageRounded(genUType x, genUType y) | |

genIType averageRounded(genIType x, genIType y) | |

genU64Type averageRounded(genU64Type x, genU64Type y) | |

genI64Type averageRounded(genI64Type x, genI64Type y) | |

genU16Type averageRounded(genU16Type x, genU16Type y) | |

genI16Type averageRounded(genI16Type x, genI16Type y) | |

Returns (x+y+1) >> 1. The intermediate sum does not modulo overflow. | |

genUType subtractSaturate(genUType x, genUType y) | |

genIType subtractSaturate(genIType x, genIType y) | |

genU64Type subtractSaturate(genU64Type x, genU64Type y) | |

genI64Type subtractSaturate(genI64Type x, genI64Type y) | |

genU16Type subtractSaturate(genU16Type x, genU16Type y) | |

genI16Type subtractSaturate(genI16Type x, genI16Type y) | |

Returns x - y clamped to the range of the type of x (instead of modulo | |

overflowing). | |

genUType multiply32x16(genUType x_32_bits, genUType y_16_bits) | |

genIType multiply32x16(genIType x_32_bits, genIType y_16_bits) | |

genUType multiply32x16(genUType x_32_bits, genU16Type y_16_bits) | |

genIType multiply32x16(genIType x_32_bits, genI16Type y_16_bits) | |

Returns x * y, where only the (possibly sign-extended) low 16-bits of y | |

are used. In cases where one of the signed operands is known to be in the | |

range [-2^15, (2^15)-1] or unsigned operands is known to be in the range | |

[0, (2^16)-1], this may provide a higher performance multiply. | |

Interactions with OpenGL 4.6 and ARB_gl_spirv | |

If OpenGL 4.6 or ARB_gl_spirv is supported, then | |

SPV_INTEL_shader_integer_functions2 must also be supported. | |

The IntegerFunctions2INTEL capability is available whenever the | |

implementation supports INTEL_shader_integer_functions2. | |

Interactions with ARB_gpu_shader_int64 and EXT_shader_explicit_arithmetic_types_int64 | |

If the shader enables only INTEL_shader_integer_functions2 but not | |

ARB_gpu_shader_int64 or EXT_shader_explicit_arithmetic_types_int64, | |

remove all function overloads that have either genU64Type or genI64Type | |

parameters. | |

Interactions with AMD_gpu_shader_int16 and EXT_shader_explicit_arithmetic_types_int16 | |

If the shader enables only INTEL_shader_integer_functions2 but not | |

AMD_gpu_shader_int16 or EXT_shader_explicit_arithmetic_types_int16, | |

remove all function overloads that have either genU16Type or genI16Type | |

parameters. | |

Issues | |

1) What should this extension be called? | |

RESOLVED. There already exists a MESA_shader_integer_functions extension, | |

so this is called INTEL_shader_integer_functions2 to prevent confusion. | |

2) How does countLeadingZeros differ from findMSB? | |

RESOLVED: countLeadingZeros is only defined for unsigned types, and it is | |

equivalent to 32-(findMSB(x)+1). This corresponds the clz() function in | |

OpenCL and the LZD (leading zero detection) instruction on Intel GPUs. | |

3) How does countTrailingZeros differ from findLSB? | |

RESOLVED: countTrailingZeros is equivalent to min(genUType(findLSB(x)), | |

32). This corresponds to the ctz() function in OpenCL. | |

4) Should 64-bit versions of countLeadingZeros and countTrailingZeros be | |

provided? | |

RESOLVED: NO. OpenCL has 64-bit versions of clz() and ctz(), but OpenGL | |

does not have 64-bit versions of findMSB() or findLSB() even when | |

ARB_gpu_shader_int64 is supported. The instructions used to implement | |

countLeadingZeros and countTrailingZeros do not natively support 64-bit | |

operands. | |

The implementation of 64-bit countLeadingZeros() would be 5 instructions, | |

and the implementation of 64-bit countTrailingZeros() would be 7 | |

instructions. Neither of these is better than an application developer | |

could achieve in GLSL: | |

uint countLeadingZeros(uint64_t value) | |

{ | |

uvec2 v = unpackUint2x32(value); | |

return v.y == 0 | |

? 32 + countLeadingZeros(v.x) : countLeadingZeros(v.y); | |

} | |

uint countTrailingZeros(uint64_t value) | |

{ | |

uvec2 v = unpackUint2x32(value); | |

return v.x == 0 | |

? 32 + countTrailingZeros(v.y) : countTrailingZeros(v.x); | |

} | |

5) Should 64-bit versions of the arithmetic functions be provided? | |

RESOLVED: NO. Since recent generations of Intel GPUs have removed | |

hardware support for 64-bit integer arithmetic, there doesn't seem to be | |

much value in providing 64-bit arithmetic functions. | |

6) Should this extension include average()? | |

RESOLVED: YES. average() corresponds to hadd() in OpenCL, and | |

averageRounded() corresponds to rhadd() in OpenCL. | |

averageRounded() corresponds to the AVG instruction on Intel GPUs. | |

average(), on the other hand, does not correspond to a single instruction. | |

The signed and unsigned versions may have slightly different | |

implementations depending on the specific GPU. In the worst case, the | |

implementation is 4 instructions (e.g., averageRounded(x, y) - ((x ^ y) & | |

1)), and in the best case it is 3 instructions. | |

Revision History | |

Rev Date Author Changes | |

--- ----------- -------- --------------------------------------------- | |

1 04-Sep-2018 idr Initial version. | |

2 19-Sep-2018 idr Add interactions with AMD_gpu_shader_int16. | |

3 22-Jan-2019 idr Add interactions with EXT_shader_explicit_arithmetic_types. | |

4 14-Nov-2019 idr Resolve issue #1 and issue #5. | |

5 25-Nov-2019 idr Fix a bunch of typos noticed by @cmarcelo. |