extensions/INTEL/INTEL_shader_integer_functions2.txt - external/github.com/KhronosGroup/OpenGL-Registry - Git at Google

 Name

     INTEL_shader_integer_functions2

 Name Strings

     GL_INTEL_shader_integer_functions2

 Contact

     Ian Romanick <ian.d.romanick@intel.com>

 Contributors


 Status

     In progress

 Version

     Last Modification Date: 11/25/2019
     Revision: 5

 Number

     OpenGL Extension #547
     OpenGL ES Extension #323

 Dependencies

     This extension is written against the OpenGL 4.6 (Core Profile)
     Specification.

     This extension is written against Version 4.60 (Revision 03) of the OpenGL
     Shading Language Specification.

     GLSL 1.30 (OpenGL), GLSL ES 3.00 (OpenGL ES), or EXT_gpu_shader4 (OpenGL)
     is required.

     This extension interacts with ARB_gpu_shader_int64.

     This extension interacts with AMD_gpu_shader_int16.

     This extension interacts with OpenGL 4.6 and ARB_gl_spirv.

     This extension interacts with EXT_shader_explicit_arithmetic_types.

 Overview

     OpenCL and other GPU programming environments provides a number of useful
     functions operating on integer data.  Many of these functions are
     supported by specialized instructions various GPUs.  Correct GLSL
     implementations for some of these functions are non-trivial.  Recognizing
     open-coded versions of these functions is often impractical.  As a result,
     potential performance improvements go unrealized.

     This extension makes available a number of functions that have specialized
     instruction support on Intel GPUs.

 New Procedures and Functions

     None

 New Tokens

     None

 IP Status

     No known IP claims.

 Modifications to the OpenGL Shading Language Specification, Version 4.60

     Including the following line in a shader can be used to control the
     language features described in this extension:

       #extension GL_INTEL_shader_integer_functions2 : <behavior>

     where <behavior> is as specified in section 3.3.

     New preprocessor #defines are added to the OpenGL Shading Language:

       #define GL_INTEL_shader_integer_functions2        1

 Additions to Chapter 8 of the OpenGL Shading Language Specification
 (Built-in Functions)

     Modify Section 8.8, Integer Functions

     (add a new rows after the existing "findMSB" table row, p. 161)

     genUType countLeadingZeros(genUType value)

     Returns the number of leading 0-bits, stating at the most significant bit,
     in the binary representation of value.  If value is zero, the size in bits
     of the type of value or component type of value, if value is a vector will
     be returned.


     genUType countTrailingZeros(genUType value)

     Returns the number of trailing 0-bits, stating at the least significant bit,
     in the binary representation of value.  If value is zero, the size in bits
     of the type of value or component type of value (if value is a vector) will
     be returned.


     genUType absoluteDifference(genUType x, genUType y)
     genUType absoluteDifference(genIType x, genIType y)
     genU64Type absoluteDifference(genU64Type x, genU64Type y)
     genU64Type absoluteDifference(genI64Type x, genI64Type y)
     genU16Type absoluteDifference(genU16Type x, genU16Type y)
     genU16Type absoluteDifference(genI16Type x, genI16Type y)

     Returns |x - y| clamped to the range of the return type (instead of modulo
     overflowing).  Note: the return type of each of these functions is an
     unsigned type of the same bit-size and vector element count.


     genUType addSaturate(genUType x, genUType y)
     genIType addSaturate(genIType x, genIType y)
     genU64Type addSaturate(genU64Type x, genU64Type y)
     genI64Type addSaturate(genI64Type x, genI64Type y)
     genU16Type addSaturate(genU16Type x, genU16Type y)
     genI16Type addSaturate(genI16Type x, genI16Type y)

     Returns x + y clamped to the range of the type of x (instead of modulo
     overflowing).


     genUType average(genUType x, genUType y)
     genIType average(genIType x, genIType y)
     genU64Type average(genU64Type x, genU64Type y)
     genI64Type average(genI64Type x, genI64Type y)
     genU16Type average(genU16Type x, genU16Type y)
     genI16Type average(genI16Type x, genI16Type y)

     Returns (x+y) >> 1.  The intermediate sum does not modulo overflow.


     genUType averageRounded(genUType x, genUType y)
     genIType averageRounded(genIType x, genIType y)
     genU64Type averageRounded(genU64Type x, genU64Type y)
     genI64Type averageRounded(genI64Type x, genI64Type y)
     genU16Type averageRounded(genU16Type x, genU16Type y)
     genI16Type averageRounded(genI16Type x, genI16Type y)

     Returns (x+y+1) >> 1.  The intermediate sum does not modulo overflow.


     genUType subtractSaturate(genUType x, genUType y)
     genIType subtractSaturate(genIType x, genIType y)
     genU64Type subtractSaturate(genU64Type x, genU64Type y)
     genI64Type subtractSaturate(genI64Type x, genI64Type y)
     genU16Type subtractSaturate(genU16Type x, genU16Type y)
     genI16Type subtractSaturate(genI16Type x, genI16Type y)

     Returns x - y clamped to the range of the type of x (instead of modulo
     overflowing).


     genUType multiply32x16(genUType x_32_bits, genUType y_16_bits)
     genIType multiply32x16(genIType x_32_bits, genIType y_16_bits)
     genUType multiply32x16(genUType x_32_bits, genU16Type y_16_bits)
     genIType multiply32x16(genIType x_32_bits, genI16Type y_16_bits)

     Returns x * y, where only the (possibly sign-extended) low 16-bits of y
     are used.  In cases where one of the signed operands is known to be in the
     range [-2^15, (2^15)-1] or unsigned operands is known to be in the range
     [0, (2^16)-1], this may provide a higher performance multiply.

 Interactions with OpenGL 4.6 and ARB_gl_spirv

     If OpenGL 4.6 or ARB_gl_spirv is supported, then
     SPV_INTEL_shader_integer_functions2 must also be supported.

     The IntegerFunctions2INTEL capability is available whenever the
     implementation supports INTEL_shader_integer_functions2.

 Interactions with ARB_gpu_shader_int64 and EXT_shader_explicit_arithmetic_types_int64

     If the shader enables only INTEL_shader_integer_functions2 but not
     ARB_gpu_shader_int64 or EXT_shader_explicit_arithmetic_types_int64,
     remove all function overloads that have either genU64Type or genI64Type
     parameters.

 Interactions with AMD_gpu_shader_int16 and EXT_shader_explicit_arithmetic_types_int16

     If the shader enables only INTEL_shader_integer_functions2 but not
     AMD_gpu_shader_int16 or EXT_shader_explicit_arithmetic_types_int16,
     remove all function overloads that have either genU16Type or genI16Type
     parameters.

 Issues

     1) What should this extension be called?

     RESOLVED.  There already exists a MESA_shader_integer_functions extension,
     so this is called INTEL_shader_integer_functions2 to prevent confusion.

     2) How does countLeadingZeros differ from findMSB?

     RESOLVED: countLeadingZeros is only defined for unsigned types, and it is
     equivalent to 32-(findMSB(x)+1).  This corresponds the clz() function in
     OpenCL and the LZD (leading zero detection) instruction on Intel GPUs.

     3) How does countTrailingZeros differ from findLSB?

     RESOLVED: countTrailingZeros is equivalent to min(genUType(findLSB(x)),
     32).  This corresponds to the ctz() function in OpenCL.

     4) Should 64-bit versions of countLeadingZeros and countTrailingZeros be
     provided?

     RESOLVED: NO.  OpenCL has 64-bit versions of clz() and ctz(), but OpenGL
     does not have 64-bit versions of findMSB() or findLSB() even when
     ARB_gpu_shader_int64 is supported.  The instructions used to implement
     countLeadingZeros and countTrailingZeros do not natively support 64-bit
     operands.

     The implementation of 64-bit countLeadingZeros() would be 5 instructions,
     and the implementation of 64-bit countTrailingZeros() would be 7
     instructions.  Neither of these is better than an application developer
     could achieve in GLSL:

         uint countLeadingZeros(uint64_t value)
         {
             uvec2 v = unpackUint2x32(value);

             return v.y == 0
                 ? 32 + countLeadingZeros(v.x) : countLeadingZeros(v.y);
         }

         uint countTrailingZeros(uint64_t value)
         {
             uvec2 v = unpackUint2x32(value);

             return v.x == 0
                 ? 32 + countTrailingZeros(v.y) : countTrailingZeros(v.x);
         }

     5) Should 64-bit versions of the arithmetic functions be provided?

     RESOLVED: NO.  Since recent generations of Intel GPUs have removed
     hardware support for 64-bit integer arithmetic, there doesn't seem to be
     much value in providing 64-bit arithmetic functions.

     6) Should this extension include average()?

     RESOLVED: YES.  average() corresponds to hadd() in OpenCL, and
     averageRounded() corresponds to rhadd() in OpenCL.

     averageRounded() corresponds to the AVG instruction on Intel GPUs.
     average(), on the other hand, does not correspond to a single instruction.
     The signed and unsigned versions may have slightly different
     implementations depending on the specific GPU.  In the worst case, the
     implementation is 4 instructions (e.g., averageRounded(x, y) - ((x ^ y) &
     1)), and in the best case it is 3 instructions.

 Revision History

     Rev  Date         Author    Changes
     ---  -----------  --------  ---------------------------------------------
       1  04-Sep-2018  idr       Initial version.
       2  19-Sep-2018  idr       Add interactions with AMD_gpu_shader_int16.
       3  22-Jan-2019  idr       Add interactions with EXT_shader_explicit_arithmetic_types.
       4  14-Nov-2019  idr       Resolve issue #1 and issue #5.
       5  25-Nov-2019  idr       Fix a bunch of typos noticed by @cmarcelo.
	Name

	INTEL_shader_integer_functions2

	Name Strings

	GL_INTEL_shader_integer_functions2

	Contact

	Ian Romanick <ian.d.romanick@intel.com>

	Contributors


	Status

	In progress

	Version

	Last Modification Date: 11/25/2019
	Revision: 5

	Number

	OpenGL Extension #547
	OpenGL ES Extension #323

	Dependencies

	This extension is written against the OpenGL 4.6 (Core Profile)
	Specification.

	This extension is written against Version 4.60 (Revision 03) of the OpenGL
	Shading Language Specification.

	GLSL 1.30 (OpenGL), GLSL ES 3.00 (OpenGL ES), or EXT_gpu_shader4 (OpenGL)
	is required.

	This extension interacts with ARB_gpu_shader_int64.

	This extension interacts with AMD_gpu_shader_int16.

	This extension interacts with OpenGL 4.6 and ARB_gl_spirv.

	This extension interacts with EXT_shader_explicit_arithmetic_types.

	Overview

	OpenCL and other GPU programming environments provides a number of useful
	functions operating on integer data. Many of these functions are
	supported by specialized instructions various GPUs. Correct GLSL
	implementations for some of these functions are non-trivial. Recognizing
	open-coded versions of these functions is often impractical. As a result,
	potential performance improvements go unrealized.

	This extension makes available a number of functions that have specialized
	instruction support on Intel GPUs.

	New Procedures and Functions

	None

	New Tokens

	None

	IP Status

	No known IP claims.

	Modifications to the OpenGL Shading Language Specification, Version 4.60

	Including the following line in a shader can be used to control the
	language features described in this extension:

	#extension GL_INTEL_shader_integer_functions2 : <behavior>

	where <behavior> is as specified in section 3.3.

	New preprocessor #defines are added to the OpenGL Shading Language:

	#define GL_INTEL_shader_integer_functions2 1

	Additions to Chapter 8 of the OpenGL Shading Language Specification
	(Built-in Functions)

	Modify Section 8.8, Integer Functions

	(add a new rows after the existing "findMSB" table row, p. 161)

	genUType countLeadingZeros(genUType value)

	Returns the number of leading 0-bits, stating at the most significant bit,
	in the binary representation of value. If value is zero, the size in bits
	of the type of value or component type of value, if value is a vector will
	be returned.


	genUType countTrailingZeros(genUType value)

	Returns the number of trailing 0-bits, stating at the least significant bit,
	in the binary representation of value. If value is zero, the size in bits
	of the type of value or component type of value (if value is a vector) will
	be returned.


	genUType absoluteDifference(genUType x, genUType y)
	genUType absoluteDifference(genIType x, genIType y)
	genU64Type absoluteDifference(genU64Type x, genU64Type y)
	genU64Type absoluteDifference(genI64Type x, genI64Type y)
	genU16Type absoluteDifference(genU16Type x, genU16Type y)
	genU16Type absoluteDifference(genI16Type x, genI16Type y)

	Returns \|x - y\| clamped to the range of the return type (instead of modulo
	overflowing). Note: the return type of each of these functions is an
	unsigned type of the same bit-size and vector element count.


	genUType addSaturate(genUType x, genUType y)
	genIType addSaturate(genIType x, genIType y)
	genU64Type addSaturate(genU64Type x, genU64Type y)
	genI64Type addSaturate(genI64Type x, genI64Type y)
	genU16Type addSaturate(genU16Type x, genU16Type y)
	genI16Type addSaturate(genI16Type x, genI16Type y)

	Returns x + y clamped to the range of the type of x (instead of modulo
	overflowing).


	genUType average(genUType x, genUType y)
	genIType average(genIType x, genIType y)
	genU64Type average(genU64Type x, genU64Type y)
	genI64Type average(genI64Type x, genI64Type y)
	genU16Type average(genU16Type x, genU16Type y)
	genI16Type average(genI16Type x, genI16Type y)

	Returns (x+y) >> 1. The intermediate sum does not modulo overflow.


	genUType averageRounded(genUType x, genUType y)
	genIType averageRounded(genIType x, genIType y)
	genU64Type averageRounded(genU64Type x, genU64Type y)
	genI64Type averageRounded(genI64Type x, genI64Type y)
	genU16Type averageRounded(genU16Type x, genU16Type y)
	genI16Type averageRounded(genI16Type x, genI16Type y)

	Returns (x+y+1) >> 1. The intermediate sum does not modulo overflow.


	genUType subtractSaturate(genUType x, genUType y)
	genIType subtractSaturate(genIType x, genIType y)
	genU64Type subtractSaturate(genU64Type x, genU64Type y)
	genI64Type subtractSaturate(genI64Type x, genI64Type y)
	genU16Type subtractSaturate(genU16Type x, genU16Type y)
	genI16Type subtractSaturate(genI16Type x, genI16Type y)

	Returns x - y clamped to the range of the type of x (instead of modulo
	overflowing).


	genUType multiply32x16(genUType x_32_bits, genUType y_16_bits)
	genIType multiply32x16(genIType x_32_bits, genIType y_16_bits)
	genUType multiply32x16(genUType x_32_bits, genU16Type y_16_bits)
	genIType multiply32x16(genIType x_32_bits, genI16Type y_16_bits)

	Returns x * y, where only the (possibly sign-extended) low 16-bits of y
	are used. In cases where one of the signed operands is known to be in the
	range [-2^15, (2^15)-1] or unsigned operands is known to be in the range
	[0, (2^16)-1], this may provide a higher performance multiply.

	Interactions with OpenGL 4.6 and ARB_gl_spirv

	If OpenGL 4.6 or ARB_gl_spirv is supported, then
	SPV_INTEL_shader_integer_functions2 must also be supported.

	The IntegerFunctions2INTEL capability is available whenever the
	implementation supports INTEL_shader_integer_functions2.

	Interactions with ARB_gpu_shader_int64 and EXT_shader_explicit_arithmetic_types_int64

	If the shader enables only INTEL_shader_integer_functions2 but not
	ARB_gpu_shader_int64 or EXT_shader_explicit_arithmetic_types_int64,
	remove all function overloads that have either genU64Type or genI64Type
	parameters.

	Interactions with AMD_gpu_shader_int16 and EXT_shader_explicit_arithmetic_types_int16

	If the shader enables only INTEL_shader_integer_functions2 but not
	AMD_gpu_shader_int16 or EXT_shader_explicit_arithmetic_types_int16,
	remove all function overloads that have either genU16Type or genI16Type
	parameters.

	Issues

	1) What should this extension be called?

	RESOLVED. There already exists a MESA_shader_integer_functions extension,
	so this is called INTEL_shader_integer_functions2 to prevent confusion.

	2) How does countLeadingZeros differ from findMSB?

	RESOLVED: countLeadingZeros is only defined for unsigned types, and it is
	equivalent to 32-(findMSB(x)+1). This corresponds the clz() function in
	OpenCL and the LZD (leading zero detection) instruction on Intel GPUs.

	3) How does countTrailingZeros differ from findLSB?

	RESOLVED: countTrailingZeros is equivalent to min(genUType(findLSB(x)),
	32). This corresponds to the ctz() function in OpenCL.

	4) Should 64-bit versions of countLeadingZeros and countTrailingZeros be
	provided?

	RESOLVED: NO. OpenCL has 64-bit versions of clz() and ctz(), but OpenGL
	does not have 64-bit versions of findMSB() or findLSB() even when
	ARB_gpu_shader_int64 is supported. The instructions used to implement
	countLeadingZeros and countTrailingZeros do not natively support 64-bit
	operands.

	The implementation of 64-bit countLeadingZeros() would be 5 instructions,
	and the implementation of 64-bit countTrailingZeros() would be 7
	instructions. Neither of these is better than an application developer
	could achieve in GLSL:

	uint countLeadingZeros(uint64_t value)
	{
	uvec2 v = unpackUint2x32(value);

	return v.y == 0
	? 32 + countLeadingZeros(v.x) : countLeadingZeros(v.y);
	}

	uint countTrailingZeros(uint64_t value)
	{
	uvec2 v = unpackUint2x32(value);

	return v.x == 0
	? 32 + countTrailingZeros(v.y) : countTrailingZeros(v.x);
	}

	5) Should 64-bit versions of the arithmetic functions be provided?

	RESOLVED: NO. Since recent generations of Intel GPUs have removed
	hardware support for 64-bit integer arithmetic, there doesn't seem to be
	much value in providing 64-bit arithmetic functions.

	6) Should this extension include average()?

	RESOLVED: YES. average() corresponds to hadd() in OpenCL, and
	averageRounded() corresponds to rhadd() in OpenCL.

	averageRounded() corresponds to the AVG instruction on Intel GPUs.
	average(), on the other hand, does not correspond to a single instruction.
	The signed and unsigned versions may have slightly different
	implementations depending on the specific GPU. In the worst case, the
	implementation is 4 instructions (e.g., averageRounded(x, y) - ((x ^ y) &
	1)), and in the best case it is 3 instructions.

	Revision History

	Rev Date Author Changes
	--- ----------- -------- ---------------------------------------------
	1 04-Sep-2018 idr Initial version.
	2 19-Sep-2018 idr Add interactions with AMD_gpu_shader_int16.
	3 22-Jan-2019 idr Add interactions with EXT_shader_explicit_arithmetic_types.
	4 14-Nov-2019 idr Resolve issue #1 and issue #5.
	5 25-Nov-2019 idr Fix a bunch of typos noticed by @cmarcelo.