| Name |
| |
| INTEL_shader_integer_functions2 |
| |
| Name Strings |
| |
| GL_INTEL_shader_integer_functions2 |
| |
| Contact |
| |
| Ian Romanick <ian.d.romanick@intel.com> |
| |
| Contributors |
| |
| |
| Status |
| |
| In progress |
| |
| Version |
| |
| Last Modification Date: 11/25/2019 |
| Revision: 5 |
| |
| Number |
| |
| OpenGL Extension #547 |
| OpenGL ES Extension #323 |
| |
| Dependencies |
| |
| This extension is written against the OpenGL 4.6 (Core Profile) |
| Specification. |
| |
| This extension is written against Version 4.60 (Revision 03) of the OpenGL |
| Shading Language Specification. |
| |
| GLSL 1.30 (OpenGL), GLSL ES 3.00 (OpenGL ES), or EXT_gpu_shader4 (OpenGL) |
| is required. |
| |
| This extension interacts with ARB_gpu_shader_int64. |
| |
| This extension interacts with AMD_gpu_shader_int16. |
| |
| This extension interacts with OpenGL 4.6 and ARB_gl_spirv. |
| |
| This extension interacts with EXT_shader_explicit_arithmetic_types. |
| |
| Overview |
| |
| OpenCL and other GPU programming environments provides a number of useful |
| functions operating on integer data. Many of these functions are |
| supported by specialized instructions various GPUs. Correct GLSL |
| implementations for some of these functions are non-trivial. Recognizing |
| open-coded versions of these functions is often impractical. As a result, |
| potential performance improvements go unrealized. |
| |
| This extension makes available a number of functions that have specialized |
| instruction support on Intel GPUs. |
| |
| New Procedures and Functions |
| |
| None |
| |
| New Tokens |
| |
| None |
| |
| IP Status |
| |
| No known IP claims. |
| |
| Modifications to the OpenGL Shading Language Specification, Version 4.60 |
| |
| Including the following line in a shader can be used to control the |
| language features described in this extension: |
| |
| #extension GL_INTEL_shader_integer_functions2 : <behavior> |
| |
| where <behavior> is as specified in section 3.3. |
| |
| New preprocessor #defines are added to the OpenGL Shading Language: |
| |
| #define GL_INTEL_shader_integer_functions2 1 |
| |
| Additions to Chapter 8 of the OpenGL Shading Language Specification |
| (Built-in Functions) |
| |
| Modify Section 8.8, Integer Functions |
| |
| (add a new rows after the existing "findMSB" table row, p. 161) |
| |
| genUType countLeadingZeros(genUType value) |
| |
| Returns the number of leading 0-bits, stating at the most significant bit, |
| in the binary representation of value. If value is zero, the size in bits |
| of the type of value or component type of value, if value is a vector will |
| be returned. |
| |
| |
| genUType countTrailingZeros(genUType value) |
| |
| Returns the number of trailing 0-bits, stating at the least significant bit, |
| in the binary representation of value. If value is zero, the size in bits |
| of the type of value or component type of value (if value is a vector) will |
| be returned. |
| |
| |
| genUType absoluteDifference(genUType x, genUType y) |
| genUType absoluteDifference(genIType x, genIType y) |
| genU64Type absoluteDifference(genU64Type x, genU64Type y) |
| genU64Type absoluteDifference(genI64Type x, genI64Type y) |
| genU16Type absoluteDifference(genU16Type x, genU16Type y) |
| genU16Type absoluteDifference(genI16Type x, genI16Type y) |
| |
| Returns |x - y| clamped to the range of the return type (instead of modulo |
| overflowing). Note: the return type of each of these functions is an |
| unsigned type of the same bit-size and vector element count. |
| |
| |
| genUType addSaturate(genUType x, genUType y) |
| genIType addSaturate(genIType x, genIType y) |
| genU64Type addSaturate(genU64Type x, genU64Type y) |
| genI64Type addSaturate(genI64Type x, genI64Type y) |
| genU16Type addSaturate(genU16Type x, genU16Type y) |
| genI16Type addSaturate(genI16Type x, genI16Type y) |
| |
| Returns x + y clamped to the range of the type of x (instead of modulo |
| overflowing). |
| |
| |
| genUType average(genUType x, genUType y) |
| genIType average(genIType x, genIType y) |
| genU64Type average(genU64Type x, genU64Type y) |
| genI64Type average(genI64Type x, genI64Type y) |
| genU16Type average(genU16Type x, genU16Type y) |
| genI16Type average(genI16Type x, genI16Type y) |
| |
| Returns (x+y) >> 1. The intermediate sum does not modulo overflow. |
| |
| |
| genUType averageRounded(genUType x, genUType y) |
| genIType averageRounded(genIType x, genIType y) |
| genU64Type averageRounded(genU64Type x, genU64Type y) |
| genI64Type averageRounded(genI64Type x, genI64Type y) |
| genU16Type averageRounded(genU16Type x, genU16Type y) |
| genI16Type averageRounded(genI16Type x, genI16Type y) |
| |
| Returns (x+y+1) >> 1. The intermediate sum does not modulo overflow. |
| |
| |
| genUType subtractSaturate(genUType x, genUType y) |
| genIType subtractSaturate(genIType x, genIType y) |
| genU64Type subtractSaturate(genU64Type x, genU64Type y) |
| genI64Type subtractSaturate(genI64Type x, genI64Type y) |
| genU16Type subtractSaturate(genU16Type x, genU16Type y) |
| genI16Type subtractSaturate(genI16Type x, genI16Type y) |
| |
| Returns x - y clamped to the range of the type of x (instead of modulo |
| overflowing). |
| |
| |
| genUType multiply32x16(genUType x_32_bits, genUType y_16_bits) |
| genIType multiply32x16(genIType x_32_bits, genIType y_16_bits) |
| genUType multiply32x16(genUType x_32_bits, genU16Type y_16_bits) |
| genIType multiply32x16(genIType x_32_bits, genI16Type y_16_bits) |
| |
| Returns x * y, where only the (possibly sign-extended) low 16-bits of y |
| are used. In cases where one of the signed operands is known to be in the |
| range [-2^15, (2^15)-1] or unsigned operands is known to be in the range |
| [0, (2^16)-1], this may provide a higher performance multiply. |
| |
| Interactions with OpenGL 4.6 and ARB_gl_spirv |
| |
| If OpenGL 4.6 or ARB_gl_spirv is supported, then |
| SPV_INTEL_shader_integer_functions2 must also be supported. |
| |
| The IntegerFunctions2INTEL capability is available whenever the |
| implementation supports INTEL_shader_integer_functions2. |
| |
| Interactions with ARB_gpu_shader_int64 and EXT_shader_explicit_arithmetic_types_int64 |
| |
| If the shader enables only INTEL_shader_integer_functions2 but not |
| ARB_gpu_shader_int64 or EXT_shader_explicit_arithmetic_types_int64, |
| remove all function overloads that have either genU64Type or genI64Type |
| parameters. |
| |
| Interactions with AMD_gpu_shader_int16 and EXT_shader_explicit_arithmetic_types_int16 |
| |
| If the shader enables only INTEL_shader_integer_functions2 but not |
| AMD_gpu_shader_int16 or EXT_shader_explicit_arithmetic_types_int16, |
| remove all function overloads that have either genU16Type or genI16Type |
| parameters. |
| |
| Issues |
| |
| 1) What should this extension be called? |
| |
| RESOLVED. There already exists a MESA_shader_integer_functions extension, |
| so this is called INTEL_shader_integer_functions2 to prevent confusion. |
| |
| 2) How does countLeadingZeros differ from findMSB? |
| |
| RESOLVED: countLeadingZeros is only defined for unsigned types, and it is |
| equivalent to 32-(findMSB(x)+1). This corresponds the clz() function in |
| OpenCL and the LZD (leading zero detection) instruction on Intel GPUs. |
| |
| 3) How does countTrailingZeros differ from findLSB? |
| |
| RESOLVED: countTrailingZeros is equivalent to min(genUType(findLSB(x)), |
| 32). This corresponds to the ctz() function in OpenCL. |
| |
| 4) Should 64-bit versions of countLeadingZeros and countTrailingZeros be |
| provided? |
| |
| RESOLVED: NO. OpenCL has 64-bit versions of clz() and ctz(), but OpenGL |
| does not have 64-bit versions of findMSB() or findLSB() even when |
| ARB_gpu_shader_int64 is supported. The instructions used to implement |
| countLeadingZeros and countTrailingZeros do not natively support 64-bit |
| operands. |
| |
| The implementation of 64-bit countLeadingZeros() would be 5 instructions, |
| and the implementation of 64-bit countTrailingZeros() would be 7 |
| instructions. Neither of these is better than an application developer |
| could achieve in GLSL: |
| |
| uint countLeadingZeros(uint64_t value) |
| { |
| uvec2 v = unpackUint2x32(value); |
| |
| return v.y == 0 |
| ? 32 + countLeadingZeros(v.x) : countLeadingZeros(v.y); |
| } |
| |
| uint countTrailingZeros(uint64_t value) |
| { |
| uvec2 v = unpackUint2x32(value); |
| |
| return v.x == 0 |
| ? 32 + countTrailingZeros(v.y) : countTrailingZeros(v.x); |
| } |
| |
| 5) Should 64-bit versions of the arithmetic functions be provided? |
| |
| RESOLVED: NO. Since recent generations of Intel GPUs have removed |
| hardware support for 64-bit integer arithmetic, there doesn't seem to be |
| much value in providing 64-bit arithmetic functions. |
| |
| 6) Should this extension include average()? |
| |
| RESOLVED: YES. average() corresponds to hadd() in OpenCL, and |
| averageRounded() corresponds to rhadd() in OpenCL. |
| |
| averageRounded() corresponds to the AVG instruction on Intel GPUs. |
| average(), on the other hand, does not correspond to a single instruction. |
| The signed and unsigned versions may have slightly different |
| implementations depending on the specific GPU. In the worst case, the |
| implementation is 4 instructions (e.g., averageRounded(x, y) - ((x ^ y) & |
| 1)), and in the best case it is 3 instructions. |
| |
| Revision History |
| |
| Rev Date Author Changes |
| --- ----------- -------- --------------------------------------------- |
| 1 04-Sep-2018 idr Initial version. |
| 2 19-Sep-2018 idr Add interactions with AMD_gpu_shader_int16. |
| 3 22-Jan-2019 idr Add interactions with EXT_shader_explicit_arithmetic_types. |
| 4 14-Nov-2019 idr Resolve issue #1 and issue #5. |
| 5 25-Nov-2019 idr Fix a bunch of typos noticed by @cmarcelo. |