Intel shader integer functions2 (#331)

* Initial version of GL_INTEL_shader_integer_functions2

* INTEL_shader_integer_functions2: Add interactions with EXT_shader_explicit_arithmetic_types

* INTEL_shader_integer_functions2: Resolve last two issues

The name is pretty well set in stone now, and there's little to no value
in providing 64-bit arithmetic functions.

* INTEL_shader_integer_functions2: Add extension number

v2: Run 'make' as instructed by README.adoc.

* INTEL_shader_integer_functions2: Fix a bunch of typos noticed by @cmarcelo
diff --git a/extensions/INTEL/INTEL_shader_integer_functions2.txt b/extensions/INTEL/INTEL_shader_integer_functions2.txt
new file mode 100644
index 0000000..81538e1
--- /dev/null
+++ b/extensions/INTEL/INTEL_shader_integer_functions2.txt
@@ -0,0 +1,269 @@
+Name
+
+    INTEL_shader_integer_functions2
+
+Name Strings
+
+    GL_INTEL_shader_integer_functions2
+
+Contact
+
+    Ian Romanick <ian.d.romanick@intel.com>
+
+Contributors
+
+
+Status
+
+    In progress
+
+Version
+
+    Last Modification Date: 11/25/2019
+    Revision: 5
+
+Number
+
+    OpenGL Extension #547
+    OpenGL ES Extension #323
+
+Dependencies
+
+    This extension is written against the OpenGL 4.6 (Core Profile)
+    Specification.
+
+    This extension is written against Version 4.60 (Revision 03) of the OpenGL
+    Shading Language Specification.
+
+    GLSL 1.30 (OpenGL), GLSL ES 3.00 (OpenGL ES), or EXT_gpu_shader4 (OpenGL)
+    is required.
+
+    This extension interacts with ARB_gpu_shader_int64.
+
+    This extension interacts with AMD_gpu_shader_int16.
+
+    This extension interacts with OpenGL 4.6 and ARB_gl_spirv.
+
+    This extension interacts with EXT_shader_explicit_arithmetic_types.
+
+Overview
+
+    OpenCL and other GPU programming environments provides a number of useful
+    functions operating on integer data.  Many of these functions are
+    supported by specialized instructions various GPUs.  Correct GLSL
+    implementations for some of these functions are non-trivial.  Recognizing
+    open-coded versions of these functions is often impractical.  As a result,
+    potential performance improvements go unrealized.
+
+    This extension makes available a number of functions that have specialized
+    instruction support on Intel GPUs.
+
+New Procedures and Functions
+
+    None
+
+New Tokens
+
+    None
+
+IP Status
+
+    No known IP claims.
+
+Modifications to the OpenGL Shading Language Specification, Version 4.60
+
+    Including the following line in a shader can be used to control the
+    language features described in this extension:
+
+      #extension GL_INTEL_shader_integer_functions2 : <behavior>
+
+    where <behavior> is as specified in section 3.3.
+
+    New preprocessor #defines are added to the OpenGL Shading Language:
+
+      #define GL_INTEL_shader_integer_functions2        1
+
+Additions to Chapter 8 of the OpenGL Shading Language Specification
+(Built-in Functions)
+
+    Modify Section 8.8, Integer Functions
+
+    (add a new rows after the existing "findMSB" table row, p. 161)
+
+    genUType countLeadingZeros(genUType value)
+
+    Returns the number of leading 0-bits, stating at the most significant bit,
+    in the binary representation of value.  If value is zero, the size in bits
+    of the type of value or component type of value, if value is a vector will
+    be returned.
+
+
+    genUType countTrailingZeros(genUType value)
+
+    Returns the number of trailing 0-bits, stating at the least significant bit,
+    in the binary representation of value.  If value is zero, the size in bits
+    of the type of value or component type of value (if value is a vector) will
+    be returned.
+
+
+    genUType absoluteDifference(genUType x, genUType y)
+    genUType absoluteDifference(genIType x, genIType y)
+    genU64Type absoluteDifference(genU64Type x, genU64Type y)
+    genU64Type absoluteDifference(genI64Type x, genI64Type y)
+    genU16Type absoluteDifference(genU16Type x, genU16Type y)
+    genU16Type absoluteDifference(genI16Type x, genI16Type y)
+
+    Returns |x - y| clamped to the range of the return type (instead of modulo
+    overflowing).  Note: the return type of each of these functions is an
+    unsigned type of the same bit-size and vector element count.
+
+
+    genUType addSaturate(genUType x, genUType y)
+    genIType addSaturate(genIType x, genIType y)
+    genU64Type addSaturate(genU64Type x, genU64Type y)
+    genI64Type addSaturate(genI64Type x, genI64Type y)
+    genU16Type addSaturate(genU16Type x, genU16Type y)
+    genI16Type addSaturate(genI16Type x, genI16Type y)
+
+    Returns x + y clamped to the range of the type of x (instead of modulo
+    overflowing).
+
+
+    genUType average(genUType x, genUType y)
+    genIType average(genIType x, genIType y)
+    genU64Type average(genU64Type x, genU64Type y)
+    genI64Type average(genI64Type x, genI64Type y)
+    genU16Type average(genU16Type x, genU16Type y)
+    genI16Type average(genI16Type x, genI16Type y)
+
+    Returns (x+y) >> 1.  The intermediate sum does not modulo overflow.
+
+
+    genUType averageRounded(genUType x, genUType y)
+    genIType averageRounded(genIType x, genIType y)
+    genU64Type averageRounded(genU64Type x, genU64Type y)
+    genI64Type averageRounded(genI64Type x, genI64Type y)
+    genU16Type averageRounded(genU16Type x, genU16Type y)
+    genI16Type averageRounded(genI16Type x, genI16Type y)
+
+    Returns (x+y+1) >> 1.  The intermediate sum does not modulo overflow.
+
+
+    genUType subtractSaturate(genUType x, genUType y)
+    genIType subtractSaturate(genIType x, genIType y)
+    genU64Type subtractSaturate(genU64Type x, genU64Type y)
+    genI64Type subtractSaturate(genI64Type x, genI64Type y)
+    genU16Type subtractSaturate(genU16Type x, genU16Type y)
+    genI16Type subtractSaturate(genI16Type x, genI16Type y)
+
+    Returns x - y clamped to the range of the type of x (instead of modulo
+    overflowing).
+
+
+    genUType multiply32x16(genUType x_32_bits, genUType y_16_bits)
+    genIType multiply32x16(genIType x_32_bits, genIType y_16_bits)
+    genUType multiply32x16(genUType x_32_bits, genU16Type y_16_bits)
+    genIType multiply32x16(genIType x_32_bits, genI16Type y_16_bits)
+
+    Returns x * y, where only the (possibly sign-extended) low 16-bits of y
+    are used.  In cases where one of the signed operands is known to be in the
+    range [-2^15, (2^15)-1] or unsigned operands is known to be in the range
+    [0, (2^16)-1], this may provide a higher performance multiply.
+
+Interactions with OpenGL 4.6 and ARB_gl_spirv
+
+    If OpenGL 4.6 or ARB_gl_spirv is supported, then
+    SPV_INTEL_shader_integer_functions2 must also be supported.
+
+    The IntegerFunctions2INTEL capability is available whenever the
+    implementation supports INTEL_shader_integer_functions2.
+
+Interactions with ARB_gpu_shader_int64 and EXT_shader_explicit_arithmetic_types_int64
+
+    If the shader enables only INTEL_shader_integer_functions2 but not
+    ARB_gpu_shader_int64 or EXT_shader_explicit_arithmetic_types_int64,
+    remove all function overloads that have either genU64Type or genI64Type
+    parameters.
+
+Interactions with AMD_gpu_shader_int16 and EXT_shader_explicit_arithmetic_types_int16
+
+    If the shader enables only INTEL_shader_integer_functions2 but not
+    AMD_gpu_shader_int16 or EXT_shader_explicit_arithmetic_types_int16,
+    remove all function overloads that have either genU16Type or genI16Type
+    parameters.
+
+Issues
+
+    1) What should this extension be called?
+
+    RESOLVED.  There already exists a MESA_shader_integer_functions extension,
+    so this is called INTEL_shader_integer_functions2 to prevent confusion.
+
+    2) How does countLeadingZeros differ from findMSB?
+
+    RESOLVED: countLeadingZeros is only defined for unsigned types, and it is
+    equivalent to 32-(findMSB(x)+1).  This corresponds the clz() function in
+    OpenCL and the LZD (leading zero detection) instruction on Intel GPUs.
+
+    3) How does countTrailingZeros differ from findLSB?
+
+    RESOLVED: countTrailingZeros is equivalent to min(genUType(findLSB(x)),
+    32).  This corresponds to the ctz() function in OpenCL.
+
+    4) Should 64-bit versions of countLeadingZeros and countTrailingZeros be
+    provided?
+
+    RESOLVED: NO.  OpenCL has 64-bit versions of clz() and ctz(), but OpenGL
+    does not have 64-bit versions of findMSB() or findLSB() even when
+    ARB_gpu_shader_int64 is supported.  The instructions used to implement
+    countLeadingZeros and countTrailingZeros do not natively support 64-bit
+    operands.
+
+    The implementation of 64-bit countLeadingZeros() would be 5 instructions,
+    and the implementation of 64-bit countTrailingZeros() would be 7
+    instructions.  Neither of these is better than an application developer
+    could achieve in GLSL:
+
+        uint countLeadingZeros(uint64_t value)
+        {
+            uvec2 v = unpackUint2x32(value);
+
+            return v.y == 0
+                ? 32 + countLeadingZeros(v.x) : countLeadingZeros(v.y);
+        }
+
+        uint countTrailingZeros(uint64_t value)
+        {
+            uvec2 v = unpackUint2x32(value);
+
+            return v.x == 0
+                ? 32 + countTrailingZeros(v.y) : countTrailingZeros(v.x);
+        }
+
+    5) Should 64-bit versions of the arithmetic functions be provided?
+
+    RESOLVED: NO.  Since recent generations of Intel GPUs have removed
+    hardware support for 64-bit integer arithmetic, there doesn't seem to be
+    much value in providing 64-bit arithmetic functions.
+
+    6) Should this extension include average()?
+
+    RESOLVED: YES.  average() corresponds to hadd() in OpenCL, and
+    averageRounded() corresponds to rhadd() in OpenCL.
+
+    averageRounded() corresponds to the AVG instruction on Intel GPUs.
+    average(), on the other hand, does not correspond to a single instruction.
+    The signed and unsigned versions may have slightly different
+    implementations depending on the specific GPU.  In the worst case, the
+    implementation is 4 instructions (e.g., averageRounded(x, y) - ((x ^ y) &
+    1)), and in the best case it is 3 instructions.
+
+Revision History
+
+    Rev  Date         Author    Changes
+    ---  -----------  --------  ---------------------------------------------
+      1  04-Sep-2018  idr       Initial version.
+      2  19-Sep-2018  idr       Add interactions with AMD_gpu_shader_int16.
+      3  22-Jan-2019  idr       Add interactions with EXT_shader_explicit_arithmetic_types.
+      4  14-Nov-2019  idr       Resolve issue #1 and issue #5.
+      5  25-Nov-2019  idr       Fix a bunch of typos noticed by @cmarcelo.
diff --git a/extensions/esext.php b/extensions/esext.php
index 7230491..5a47607 100644
--- a/extensions/esext.php
+++ b/extensions/esext.php
@@ -669,4 +669,6 @@
 </li>
 <li value=322><a href="extensions/NV/NV_shader_subgroup_partitioned.txt">GL_NV_shader_subgroup_partitioned</a>
 </li>
+<li value=323><a href="extensions/INTEL/INTEL_shader_integer_functions2.txt">GL_INTEL_shader_integer_functions2</a>
+</li>
 </ol>
diff --git a/extensions/glext.php b/extensions/glext.php
index 8848c66..7110dba 100644
--- a/extensions/glext.php
+++ b/extensions/glext.php
@@ -1031,4 +1031,6 @@
 </li>
 <li value=546><a href="extensions/EXT/EXT_EGL_sync.txt">GL_EXT_EGL_sync</a>
 </li>
+<li value=547><a href="extensions/INTEL/INTEL_shader_integer_functions2.txt">GL_INTEL_shader_integer_functions2</a>
+</li>
 </ol>
diff --git a/extensions/registry.py b/extensions/registry.py
index 966b3a6..6ee4e26 100644
--- a/extensions/registry.py
+++ b/extensions/registry.py
@@ -2885,6 +2885,12 @@
         'flags' : { 'public' },
         'url' : 'extensions/INTEL/INTEL_performance_query.txt',
     },
+    'GL_INTEL_shader_integer_functions2' : {
+        'number' : 547,
+        'esnumber' : 323,
+        'flags' : { 'public' },
+        'url' : 'extensions/INTEL/INTEL_shader_integer_functions2.txt',
+    },
     'GLX_INTEL_swap_event' : {
         'number' : 384,
         'flags' : { 'public' },