extensions/ARB/ARB_gpu_shader5.txt - external/github.com/KhronosGroup/OpenGL-Registry - Git at Google

 Name

     ARB_gpu_shader5

 Name Strings

     GL_ARB_gpu_shader5

 Contact

     Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com)

 Contributors

     Barthold Lichtenbelt, NVIDIA
     Bill Licea-Kane, AMD
     Bruce Merry, ARM
     Chris Dodd, NVIDIA
     Eric Werness, NVIDIA
     Graham Sellers, AMD
     Greg Roth, NVIDIA
     Jeff Bolz, NVIDIA
     Nick Haemel, AMD
     Pierre Boudier, AMD
     Piers Daniell, NVIDIA

 Notice

     Copyright (c) 2010-2013 The Khronos Group Inc. Copyright terms at
         http://www.khronos.org/registry/speccopyright.html

 Specification Update Policy

     Khronos-approved extension specifications are updated in response to
     issues and bugs prioritized by the Khronos OpenGL Working Group. For
     extensions which have been promoted to a core Specification, fixes will
     first appear in the latest version of that core Specification, and will
     eventually be backported to the extension document. This policy is
     described in more detail at
         https://www.khronos.org/registry/OpenGL/docs/update_policy.php

 Status

     Complete. Approved by the ARB at the 2010/01/22 F2F meeting.
     Approved by the Khronos Board of Promoters on March 10, 2010.

 Version

     Version 16, March 30, 2012

 Number

     ARB Extension #88

 Dependencies

     This extension is written against the OpenGL 3.2 (Compatibility Profile)
     Specification.

     This extension is written against Version 1.50 (Revision 09) of the OpenGL
     Shading Language Specification.

     OpenGL 3.2 and GLSL 1.50 are required.

     This extension interacts with ARB_gpu_shader_fp64.

     This extension interacts with NV_gpu_shader5.

     This extension interacts with ARB_sample_shading.

     This extension interacts with ARB_texture_gather.

 Overview

     This extension provides a set of new features to the OpenGL Shading
     Language and related APIs to support capabilities of new GPUs, extending
     the capabilities of version 1.50 of the OpenGL Shading Language.  Shaders
     using the new functionality provided by this extension should enable this
     functionality via the construct

       #extension GL_ARB_gpu_shader5 : require     (or enable)

     This extension provides a variety of new features for all shader types,
     including:

       * support for indexing into arrays of samplers using non-constant
         indices, as long as the index doesn't diverge if multiple shader
         invocations are run in lockstep;

       * extending the uniform block capability of OpenGL 3.1 and 3.2 to allow
         shaders to index into an array of uniform blocks;

       * support for implicitly converting signed integer types to unsigned
         types, as well as more general implicit conversion and function
         overloading infrastructure to support new data types introduced by
         other extensions;

       * a "precise" qualifier allowing computations to be carried out exactly
         as specified in the shader source to avoid optimization-induced
         invariance issues (which might cause cracking in tessellation);

       * new built-in functions supporting:

         * fused floating-point multiply-add operations;

         * splitting a floating-point number into a significand and exponent
           (frexp), or building a floating-point number from a significand and
           exponent (ldexp);

         * integer bitfield manipulation, including functions to find the
           position of the most or least significant set bit, count the number
           of one bits, and bitfield insertion, extraction, and reversal;

         * packing and unpacking vectors of small fixed-point data types into a
           larger scalar; and

         * convert floating-point values to or from their integer bit
           encodings;

       * extending the textureGather() built-in functions provided by
         ARB_texture_gather:

         * allowing shaders to select any single component of a multi-component
           texture to produce the gathered 2x2 footprint;

         * allowing shaders to perform a per-sample depth comparison when
           gathering the 2x2 footprint using for shadow sampler types;

         * allowing shaders to use arbitrary offsets computed at run-time to
           select a 2x2 footprint to gather from; and

         * allowing shaders to use separate independent offsets for each of the
           four texels returned, instead of requiring a fixed 2x2 footprint.

     This extension also provides some new capabilities for individual
     shader types, including:

       * support for instanced geometry shaders, where a geometry shader may be
         run multiple times for each primitive, including a built-in
         gl_InvocationID to identify the invocation number;

       * support for emitting vertices in a geometry program where each vertex
         emitted may be directed independently at a specified vertex stream (as
         provided by ARB_transform_feedback3), and where each shader output is
         associated with a stream;

       * support for reading a mask of covered samples in a fragment shader;
         and

       * support for interpolating a fragment shader input at a programmable
         offset relative to the pixel center, a programmable sample number, or
         at the centroid.

 IP Status

     No known IP claims.

 New Procedures and Functions

     None

 New Tokens

     Accepted by the <pname> parameter of GetProgramiv:

         GEOMETRY_SHADER_INVOCATIONS                     0x887F

     Accepted by the <pname> parameter of GetBooleanv, GetIntegerv, GetFloatv,
     GetDoublev, and GetInteger64v:

         MAX_GEOMETRY_SHADER_INVOCATIONS                 0x8E5A
         MIN_FRAGMENT_INTERPOLATION_OFFSET               0x8E5B
         MAX_FRAGMENT_INTERPOLATION_OFFSET               0x8E5C
         FRAGMENT_INTERPOLATION_OFFSET_BITS              0x8E5D
         MAX_VERTEX_STREAMS                              0x8E71

     (note:  MAX_GEOMETRY_SHADER_INVOCATIONS,
      MIN_FRAGMENT_INTERPOLATION_OFFSET, MAX_FRAGMENT_INTERPOLATION_OFFSET, and
      FRAGMENT_INTERPOLATION_OFFSET_BITS have identical values to corresponding
      "NV" enums from NV_gpu_program5.  MAX_VERTEX_STREAMS is also defined in
      ARB_transform_feedback3.)


 Additions to Chapter 2 of the OpenGL 3.2 (Compatibility Profile) Specification
 (OpenGL Operation)

     Modify Section 2.15.4, Geometry Shader Execution Environment, p. 121

     (add two unnumbered subsections after "Texture Access", p. 122)

     Instanced Geometry Shaders

     For each input primitive received by the geometry shader pipeline stage,
     the geometry shader may be run once or multiple times.  The number of
     times a geometry shader should be executed for each input primitive may be
     specified using a layout qualifier in a geometry shader of a linked
     program.  If the invocation count is not specified in any layout
     qualifier, the invocation count will be one.

     Each separate geometry shader invocation is assigned a unique invocation
     number.  For a geometry shader with <N> invocations, each input primitive
     spawns <N> invocations, numbered 0 through <N>-1.  The built-in uniform
     gl_InvocationID may be used by a geometry shader invocation to determine
     its invocation number.

     When executing instanced geometry shaders, the output primitives generated
     from each input primitive are passed to subsequent pipeline stages using
     the shader invocation number to order the output.  The first primitives
     received by the subsequent pipeline stages are those emitted by the shader
     invocation numbered zero, followed by those from the shader invocation
     numbered one, and so forth.  Additionally, all output primitives generated
     from a given input primitive are passed to subsequent pipeline stages
     before any output primitives generated from subsequent input primitives.


     Geometry Shader Vertex Streams

     Geometry shaders may emit primitives to multiple independent vertex
     streams.  Each vertex emitted by the geometry shader is directed at one of
     the vertex streams.  As vertices are received on each stream, they are
     arranged into primitives of the type specified by the geometry shader
     output primitive type.  The shading language built-in functions
     EndPrimitive() and EndStreamPrimitive() may be used to end the primitive
     being assembled on a given vertex stream and start a new empty primitive
     of the same type.  If an implementation supports <N> vertex streams, the
     individual streams are numbered 0 through <N>-1.  There is no requirement
     on the order of the streams to which vertices are emitted, and the number
     of vertices emitted to each stream may be completely independent, subject
     only to implementation-dependent output limits.

     The primitives emitted to all vertex streams are passed to the transform
     feedback stage to be captured and written to buffer objects in the manner
     specified by the transform feedback state.  The primitives emitted to all
     streams but stream zero are discarded after transform feedback.
     Primitives emitted to stream zero are passed to subsequent pipeline stages
     for clipping, rasterization, and subsequent fragment processing.

     Geometry shaders that emit vertices to multiple vertex streams are
     currently limited to using only the "points" output primitive type.  A
     program will fail to link if it includes a geometry shader that calls the
     EmitStreamVertex() built-in function and has any other output primitive
     type parameter.


 Additions to Chapter 3 of the OpenGL 3.2 (Compatibility Profile) Specification
 (Rasterization)

     Modify Section 3.3.1, Multisampling, p. 148

     (add new paragraph at the end of the section, p. 149)

     If MULTISAMPLE is enabled and the current program object includes a
     fragment shader with one or more input variables qualified with "sample
     in", the data associated with those variables will be assigned
     independently.  The values for each sample must be evaluated at the
     location of the sample.  The data associated with any other variables not
     qualified with "sample in" need not be evaluated independently for each
     sample.


     Modify ARB_texture_gather, "Changes to Section 3.8.8"

     (extend language describing the operation of textureGather, allowing the
      new <comp> argument to select any of the four components from a
      multi-component texel vector)

     The textureGather and textureGatherOffset built-in shader functions...  A
     four-component vector is then assembled by taking a single component from
     the swizzled texture source colors of the four texels, in the order
     T_i0_j1, T_i1_j1, T_i1_j0, and T_i0_j0.  The selected component is
     identified by the optional <comp> argument, where the values zero, one,
     two, and three identify the Rs, Gs, Bs, or As component, respectively.  If
     <comp> is omitted, it is treated as identifying the Rs component.
     Incomplete textures (section 3.8.10) are considered to return a texture
     source color of (0,0,0,1) for all four source texels.

     (add further language describing textureGatherOffsets)

     The textureGatherOffsets built-in functions from the OpenGL Shading
     Language return a vector derived from sampling four texels in the image
     array of level <level_base>.  For each of the four texel offsets specified
     by the <offsets> argument, the rules for the LINEAR minification filter
     are applied to identify a 2x2 texel footprint, from which the single texel
     T_i0_j0 is selected.  A four-component vector is then assembled by taking
     a single component from each of the four T_i0_j0 texels in the same manner
     as for the textureGather function.


     Modify Section 3.12.1, Shader Variables, p. 273

     (insert prior to the last paragraph of the section, p. 274)

     When interpolating built-in and user-defined varying variables, the default
     screen-space location at which these variables are sampled is defined in
     previous rasterization sections.  The default location may be overriden by
     interpolation qualifiers.  When interpolating variables declared using
     "centroid in", the variable is sampled at a location within the pixel
     covered by the primitive generating the fragment.  When interpolating
     variables declared using "sample in" when MULTISAMPLE is enabled, the
     fragment shader will be invoked separately for each covered sample and the
     variable will be sampled at the corresponding sample point.

     Additionally, built-in fragment shader functions provide further
     fine-grained control over interpolation.  The built-in functions
     interpolateAtCentroid() and interpolateAtSample() will sample variables as
     though they were declared with the "centroid" or "sample" qualifiers,
     respectively.  The built-in function interpolateAtOffset() will sample
     variables at a specified (x,y) offset relative to the center of the pixel.
     The range and granularity of offsets supported by this function is
     implementation-dependent.  If either component of the specified offset is
     less than MIN_FRAGMENT_INTERPOLATION_OFFSET or greater than
     MAX_FRAGMENT_INTERPOLATION_OFFSET, the position used to interpolate the
     variable is undefined.  Not all values of <offset> may be supported; x and
     y offsets may be rounded to fixed-point values with the number of fraction
     bits given by the implementation-dependent constant
     FRAGMENT_INTERPOLATION_OFFSET_BITS.


     Modify Section 3.12.2, Shader Execution, p. 274

     (insert prior to the next-to-last paragraph in "Shader Inputs", p. 277)

     The built-in variable gl_SampleMaskIn[] is an integer array holding
     bitfields indicating the set of fragment samples covered by the primitive
     corresponding to the fragment shader invocation.  The number of elements
     in the array is ceil(<s>/32), where <s> is the maximum number of color
     samples supported by the implementation.  Bit <n> of element <w> in the
     array is set if and only if the sample numbered <w>*32+<n> is considered
     covered for this fragment shader invocation.  When rendering to a
     non-multisample buffer, or if multisample rasterization is disabled, all
     bits are zero except for bit zero of the first array element.  That bit
     will be one if the pixel is covered and zero otherwise.  Bits in the
     sample mask corresponding to covered samples that will be killed due to
     SAMPLE_COVERAGE or SAMPLE_MASK_NV will not be set (section 4.1.3).  When
     per-sample shading is active due to the use of a fragment input qualified
     by "sample", only the bit for the current sample is set in
     gl_SampleMaskIn.  When OpenGL API state specifies multiple fragment shader
     invocations for a given fragment, the sample mask for any single fragment
     shader invocation may specify a subset of the covered samples for the
     fragment.  In this case, the bit corresponding to each covered sample will
     be set in exactly one fragment shader invocation.


 Additions to Chapter 4 of the OpenGL 3.2 (Compatibility Profile) Specification
 (Per-Fragment Operations and the Frame Buffer)

     None.

 Additions to Chapter 5 of the OpenGL 3.2 (Compatibility Profile) Specification
 (Special Functions)

     None.

 Additions to Chapter 6 of the OpenGL 3.2 (Compatibility Profile) Specification
 (State and State Requests)

     Modify Section 6.1.16, Shader and Program Queries, p. 384

     (add to long first paragraph, p. 386) ... If <pname> is
     GEOMETRY_SHADER_INVOCATIONS, the number of geometry shader invocations per
     primitive will be returned.  If GEOMETRY_VERTICES_OUT,
     GEOMETRY_INPUT_TYPE, GEOMETRY_OUTPUT_TYPE, or GEOMETRY_SHADER_INVOCATIONS
     are queried for a program which has not been linked successfully, or which
     does not contain objects to form a geometry shader, then an
     INVALID_OPERATION error is generated.


 Additions to Appendix A of the OpenGL 3.2 (Compatibility Profile)
 Specification (Invariance)

     None.

 Additions to the AGL/GLX/WGL Specifications

     None.

 Modifications to The OpenGL Shading Language Specification, Version 1.50
 (Revision 09)

     Including the following line in a shader can be used to control the
     language features described in this extension:

       #extension GL_ARB_gpu_shader5 : <behavior>

     where <behavior> is as specified in section 3.3.

     New preprocessor #defines are added to the OpenGL Shading Language:

       #define GL_ARB_gpu_shader5        1


     Modify Section 3.6, Keywords, p. 14

     (add to the keyword list)

       sample


     Modify Section 4.1.7, Samplers, p. 23

     (modify 1st paragraph of the section, deleting the restriction requiring
     constant indexing of sampler arrays but still requiring uniform indexing
     across invocations) ... Samplers may aggregated into arrays within a
     shader (using square brackets [ ]) and can be indexed with general integer
     expressions.  The results of accessing a sampler array with an
     out-of-bounds index are undefined. ...

     (add new paragraph restricting the use of general integer expression in
     sampler array indexing) When indexing an array of samplers, the integer
     expression used to index the array must be uniform across shader
     invocations.  If this restriction is not satisfied, the results of
     accessing the sampler array are undefined.  For the purposes of this
     uniformity test, the index used for texture lookups performed inside a
     loop is considered uniform for the <n>th loop iteration if all shader
     invocations that execute the loop at least <n> times compute the same
     index on that iteration.  For texture lookups inside a function other than
     main(), an index is considered uniform if the value is the same for all
     invocations calling the function from the same point in the caller.  For
     nested loops and function calls, the uniformity test requires that the
     index match only those other shader invocations with identical loop
     iteration counts and function call chains.


     Modify Section 4.1.10, Implicit Conversions, p. 27

     (modify table of implicit conversions)

                                 Can be implicitly
         Type of expression        converted to
         ---------------------   -----------------
         int                     uint, float
         ivec2                   uvec2, vec2
         ivec3                   uvec3, vec3
         ivec4                   uvec4, vec4

         uint                    float
         uvec2                   vec2
         uvec3                   vec3
         uvec4                   vec4

     (modify second paragraph of the section) No implicit conversions are
     provided to convert from unsigned to signed integer types or from
     floating-point to integer types.  There are no implicit array or structure
     conversions.

     (insert before the final paragraph of the section) When performing
     implicit conversion for binary operators, there may be multiple data types
     to which the two operands can be converted.  For example, when adding an
     int value to a uint value, both values can be implicitly converted to uint
     and float.  In such cases, a floating-point type is chosen if either
     operand has a floating-point type.  Otherwise, an unsigned integer type is
     chosen if either operand has an unsigned integer type.  Otherwise, a
     signed integer type is chosen.


     Modify Section 4.3, Storage Qualifiers, p. 29

     (add to first table on the page)

       Qualifier         Meaning
       --------------    ----------------------------------------
       sample in         linkage with per-sample interpolation
       sample out        linkage with per-sample interpolation

     (modify third paragraph, p. 29) These interpolation qualifiers may only
     precede the qualifiers in, centroid in, sample in, out, centroid out, or
     sample out in a declaration.  ...


     Modify Section 4.3.4, Inputs, p. 31

     (modify first paragraph of section) Shader input variables are declared
     with the in, centroid in, or sample in storage qualifiers. ... Variables
     declared as in, centroid in, or sample in may not be written to during
     shader execution. ...

     (modify third paragraph, p. 32) ...  Fragment shader inputs get
     per-fragment values, typically interpolated from a previous stage's
     outputs.  They are declared in fragment shaders with the in, centroid in,
     or sample in storage qualifiers or the deprecated varying and centroid
     varying storage qualifiers. ...

     (add to examples immediately below)

       sample in vec4 perSampleColor;


     Modify Section 4.3.6, Outputs, p. 33

     (modify first paragraph of section) Shader output variables are declared
     with the out, centroid out, or sample out storage qualifiers. ...

     (modify third paragraph of section) Vertex and geometry output variables
     output per-vertex data and are declared using the out, centroid out, or
     sample out storage qualifiers, or the deprecated varying storage
     qualifier.

     (add to examples immediately below)

       sample out vec4 perSampleColor;

     (modify last paragraph, p. 33) Fragment outputs output per-fragment data
     and are declared using the out storage qualifier. It is an error to use
     centroid out or sample out in a fragment shader. ...


     Modify Section 4.3.7, Interface Blocks, p. 34

     (modify last paragaph, p. 36, removing the requirement for indexing
     uniform blocks using constant expressions) For uniform blocks declared as
     arrays, each individual array element corresponds to a separate buffer
     object backing one instance of the block.  As the array size indicates the
     number of buffer objects needed, uniform block array declarations must
     specify an integral array size.  Arbitrary indices may be used to index a
     uniform block array; integral constant expressions are not required.  If
     the index used to access an array of uniform blocks is out-of-bounds, the
     results of the access are undefined.


     Modify Section 4.3.8.1, Input Layout Qualifiers, p. 37

     (modify last paragraph, p. 37, and subsequent paragraphs on p. 38)

     Geometry shaders support input layout qualifiers.  There are two types of
     layout qualifiers used to specify an input primitive type and an
     invocation count.  The input primitive type and invocation count
     qualifiers are allowed only on the interface qualifier in, not on an input
     block, block member, or variable.

       layout-qualifier-id
         points
         lines
         lines_adjacency
         triangles
         triangles_adjacency
         invocations = integer-constant

     The identifiers "points", "lines", "lines_adjacency", "triangles", and
     "triangles_adjacency" are used to specify the type of input primitive
     accepted by the geometry shader, and only one of these is accepted.  At
     least one geometry shader (compilation unit) in a program must declare an
     input primitive type, and all geometry shader input primitive type
     declarations in a program must declare the same type.  It is not required
     that all geometry shaders in a program declare an input primitive type.

     The identifier "invocations" is used to specify the number of times the
     geometry shader is invoked for each input primitive received.  Invocation
     count declarations are optional.  If no invocation count is declared in
     any geometry shader in the program, the geometry shader will be run once
     for each input primitive.  If an invocation count is declared, all such
     declarations must specify the same count.  If a shader specifies an
     invocation count greater than the implementation-dependent maximum, it
     will fail to compile.

     For example,

       layout(triangles, invocations=6) in;

     will establish that all inputs to the geometry shader are triangles and
     that the geometry shader is run six times for each triangle processed.

     All geometry shader input unsized array declarations ...


     Modify Section 4.3.8.2, Output Layout Qualifiers, p. 40

     (modify second and subsequent paragraphs, p. 40)

     Geometry shaders can have output layout qualifiers.  There are three types
     of output layout qualifiers used to specify an output primitive type, a
     maximum output vertex count, and per-output stream numbers.  The output
     primitive type and output vertex count qualifiers are allowed only on the
     interface qualifier out, not on an output block, block member, or variable
     declaration.  The output stream number qualifier is allowed on the
     interface qualifier out, or on output blocks or variable declarations.

     The layout qualifier identifiers for geometry shader outputs are

       layout-qualifier-id
         points
         line_strip
         triangle_strip
         max_vertices = integer-constant
         stream = integer-constant

     The identifiers "points", "line_strip", and "triangle_strip" are used to
     specify the type of output primitive produced by the geometry shader, and
     only one of these is accepted.  At least one geometry shader (compilation
     unit) in a program must declare an output primitive type, and all geometry
     shader output primitive type declarations in a program must declare the
     same primitive type.  It is not required that all geometry shaders in a
     program declare an output primitive type.

     The identifier "max_vertices" is used to specify the maximum number of
     vertices the shader will ever emit in a single invocation.  At least one
     geometry shader (compilation unit) in a program must declare an maximum
     output vertex count, and all geometry shader output vertex count
     declarations in a program must declare the same count.  It is not required
     that all geometry shaders in a program declare a count.

     In the example,

       layout(triangle_strip, max_vertices = 60) out; // order does not matter
       layout(max_vertices = 60) out; // redeclaration okay
       layout(triangle_strip) out; // redeclaration okay
       layout(points) out; // error, contradicts triangle_strip
       layout(max_vertices = 30) out; // error, contradicts 60

     all outputs from the geometry shader are triangles and at most 60 vertices
     will be emitted by the shader.  It is an error for the maximum number of
     vertices to be greater than gl_MaxGeometryOutputVertices.

     The identifier "stream" is used to specify that a geometry shader output
     variable or block is associated with a particular vertex stream (numbered
     beginning with zero).  A default stream number may be declared at global
     scope by qualifying interface qualifier out as in this example:

       layout(stream = 1) out;

     The stream number specified in such a declaration replaces any previous
     default and applies to all subsequent block and variable declarations
     until a new default is established.  The initial default stream number is
     zero.

     Each output block or non-block output variable is associated with a vertex
     stream.  If the block or variable is declared with a stream qualifier, it
     is associated with the specified stream; otherwise, it is associated with
     the current default stream.  A block member may be declared with a stream
     qualifier, but the specified stream must match the stream associated with
     the containing block.  One example:

       layout(stream=1) out;             // default is now stream 1
       out vec4 var1;                    // var1 gets default stream (1)
       layout(stream=2) out Block1 {     // "Block1" belongs to stream 2
         layout(stream=2) vec4 var2;     // redundant block member stream decl
         layout(stream=3) vec2 var3;     // ILLEGAL (must match block stream)
         vec3 var4;                      // belongs to stream 2
       };
       layout(stream=0) out;             // default is now stream 0
       out vec4 var5;                    // var5 gets default stream (0)
       out Block2 {                      // "Block2" gets default stream (0)
         vec4 var6;
       };
       layout(stream=3) out vec4 var7;   // var7 belongs to stream 3

     If a geometry shader output block or variable is declared more than once,
     all such declarations must associate the variable with the same vertex
     stream.  If any stream declaration specifies a non-existent stream number,
     the shader will fail to compile.

     Built-in geometry shader outputs are always associated with vertex stream
     zero.

     Each vertex emitted by the geometry shader is assigned to a specific
     stream, and the attributes of the emitted vertex are taken from the set of
     output blocks and variables assigned to the targeted stream.  After each
     vertex is emitted, the values of all output variables become undefined.
     Additionally, the output variables associated with each vertex stream may
     share storage.  Writing to an output variable associated with one stream
     may overwrite output variables associated with any other stream.  When
     emitting each vertex, a geometry shader should write to all outputs
     associated with the stream to which the vertex will be emitted and to no
     outputs associated with any other stream.


     Modify Section 4.3.9, Interpolation, p. 42

     (modify first paragraph of section, add reference to sample in/out) The
     presence of and type of interpolation is controlled by the storage
     qualifiers centroid in, sample in, centroid out, and sample out, by the
     optional interpolation qualifiers smooth, flat, and noperspective, and by
     default behaviors established through the OpenGL API when no interpolation
     qualifier is present. ...

     (modify second paragraph) ... A variable may be qualified as flat centroid
     or flat sample, which will mean the same thing as qualifying it only as
     flat.

     (replace last paragraph, p. 42)

     When multisample rasterization is disabled, or for fragment shader input
     variables qualified with neither "centroid in" nor "sample in", the value
     of the assigned variable may be interpolated anywhere within the pixel and
     a single value may be assigned to each sample within the pixel, to the
     extent permitted by the OpenGL Specification.

     When multisample rasterization is enabled, "centroid" and "sample" may be
     used to control the location and frequency of the sampling of the
     qualified fragment shader input.  If a fragment shader input is qualified
     with "centroid", a single value may be assigned to that variable for all
     samples in the pixel, but that value must be interpolated at a location
     that lies in both the pixel and in the primitive being rendered, including
     any of the pixel's samples covered by the primitive.  Because the location
     at which the variable is sampled may be different in neighboring pixels,
     derivatives of centroid-sampled inputs may be less accurate than those for
     non-centroid interpolated variables.  If a fragment shader input is
     qualified with "sample", a separate value must be assigned to that
     variable for each covered sample in the pixel, and that value must be
     sampled at the location of the individual sample.


     (Insert before Section 4.7, Order of Qualification, p. 47)

     Section 4.Q, The Precise Qualifier

     Some algorithms may require that floating-point computations be carried
     out in exactly the manner specified in the source code, even if the
     implementation supports optimizations that could produce nearly equivalent
     results with higher performance.  For example, many GL implementations
     support a "multiply-add" that can compute values such as

       float result = (float(a) * float(b)) + float(c);

     in a single operation.  The result of a floating-point multiply-add may
     not always be identical to first doing a multiply yielding a
     floating-point result, and then doing a floating-point add.  By default,
     implementations are permitted to perform optimizations that effectively
     modify the order of the operations used to evaluate an expression, even if
     those optimizations may produce slightly different results relative to
     unoptimized code.

     The qualifier "precise" will ensure that operations contributing to a
     variable's value are performed in the order and with the precision
     specified in the source code.  Order of evaluation is determined by
     operator precedence and parentheses, as described in Section 5.
     Expressions must be evaluated with a precision consistent with the
     operation; for example, multiplying two "float" values must produce a
     single value with "float" precision.  This effectively prohibits the
     arbitrary use of fused multiply-add operations if the intermediate
     multiply result is kept at a higher precision.  For example:

       precise out vec4 position;

     declares that computations used to produce the value of "position" must be
     performed precisely using the order and precision specified.  As with the
     invariant qualifier (section 4.6.1), the precise qualifier may be used to
     qualify a built-in or previously declared user-defined variable as being
     precise:

       out vec3 Color;
       precise Color;            // make existing Color be precise

     This qualifier will affect the evaluation of expressions used on the
     right-hand side of an assignment if and only if:

       * the variable assigned to is qualified as "precise"; or

       * the value assigned is used later in the same function, either directly
         or indirectly, on the right-hand of an assignment to a variable
         declared as "precise".

     Expressions computed in a function are treated as precise only if assigned
     to a variable qualified as "precise" in that same function.  Any other
     expressions within a function are not automatically treated as precise,
     even if they are used to determine a value that is returned by the
     function and directly assigned to a variable qualified as "precise".

     Some examples of the use of "precise" include:

       in vec4 a, b, c, d;
       precise out vec4 v;

       float func(float e, float f, float g, float h)
       {
         return (e*f) + (g*h);            // no special precision
       }

       float func2(float e, float f, float g, float h)
       {
         precise result = (e*f) + (g*h);  // ensures a precise return value
         return result;
       }

       float func3(float i, float j, precise out float k)
       {
         k = i * i + j;                   // precise, due to <k> declaration
       }

       void main(void)
       {
         vec4 r = vec3(a * b);           // precise, used to compute v.xyz
         vec4 s = vec3(c * d);           // precise, used to compute v.xyz
         v.xyz = r + s;                          // precise
         v.w = (a.w * b.w) + (c.w * d.w);        // precise
         v.x = func(a.x, b.x, c.x, d.x);         // values computed in func()
                                                 // are NOT precise
         v.x = func2(a.x, b.x, c.x, d.x);        // precise!
         func3(a.x * b.x, c.x * d.x, v.x);       // precise!
       }


     Modify Section 4.7, Order of Qualification, p. 47

     When multiple qualifications are present, they must follow a strict order.
     This order is as follows:

       precise-qualifier invariant-qualifier interpolation-qualifier storage-qualifier
          precision-qualifier


     Modify Section 5.9, Expressions, p. 57

     (modify bulleted list as follows, adding support for implicit conversion
     between signed and unsigned types)

     Expressions in the shading language are built from the following:

     * Constants of type bool, int, int64_t, uint, uint64_t, float, all vector
       types, and all matrix types.

     ...

     * The operator modulus (%) operates on signed or unsigned integer scalars
       or vectors.  If the fundamental types of the operands do not match, the
       conversions from Section 4.1.10 "Implicit Conversions" are applied to
       produce matching types.  ...


     Modify Section 6.1, Function Definitions, p. 63

     (modify description of overloading, beginning at the top of p. 64)

      Function names can be overloaded.  The same function name can be used for
      multiple functions, as long as the parameter types differ.  If a function
      name is declared twice with the same parameter types, then the return
      types and all qualifiers must also match, and it is the same function
      being declared.  For example,

        vec4 f(in vec4 x, out vec4  y);   // (A)
        vec4 f(in vec4 x, out uvec4 y);   // (B) okay, different argument type
        vec4 f(in ivec4 x, out uvec4 y);  // (C) okay, different argument type

        int  f(in vec4 x, out ivec4 y);  // error, only return type differs
        vec4 f(in vec4 x, in  vec4  y);  // error, only qualifier differs
        vec4 f(const in vec4 x, out vec4 y);  // error, only qualifier differs

      When function calls are resolved, an exact type match for all the
      arguments is sought.  If an exact match is found, all other functions are
      ignored, and the exact match is used.  If no exact match is found, then
      the implicit conversions in Section 4.1.10 (Implicit Conversions) will be
      applied to find a match.  Mismatched types on input parameters (in or
      inout or default) must have a conversion from the calling argument type
      to the formal parameter type.  Mismatched types on output parameters (out
      or inout) must have a conversion from the formal parameter type to the
      calling argument type.

      If implicit conversions can be used to find more than one matching
      function, a single best-matching function is sought.  To determine a best
      match, the conversions between calling argument and formal parameter
      types are compared for each function argument and pair of matching
      functions.  After these comparisons are performed, each pair of matching
      functions are compared.  A function definition A is considered a better
      match than function definition B if:

        * for at least one function argument, the conversion for that argument
          in A is better than the corresponding conversion in B; and

        * there is no function argument for which the conversion in B is better
          than the corresponding conversion in A.

      If a single function definition is considered a better match than every
      other matching function definition, it will be used.  Otherwise, a
      semantic error occurs and the shader will fail to compile.

      To determine whether the conversion for a single argument in one match is
      better than that for another match, the following rules are applied, in
      order:

        1. An exact match is better than a match involving any implicit
           conversion.

        2. A match involving an implicit conversion from float to double is
           better than a match involving any other implicit conversion.

        3. A match involving an implicit conversion from either int or uint to
           float is better than a match involving an implicit conversion from
           either int or uint to double.

      If none of the rules above apply to a particular pair of conversions,
      neither conversion is considered better than the other.

      For the function prototypes (A), (B), and (C) above, the following
      examples show how the rules apply to different sets of calling argument
      types:

        f(vec4, vec4);        // exact match of vec4 f(in vec4 x, out vec4 y)
        f(vec4, uvec4);       // exact match of vec4 f(in vec4 x, out ivec4 y)
        f(vec4, ivec4);       // matched to vec4 f(in vec4 x, out vec4 y)
                              //   (C) not relevant, can't convert vec4 to
                              //   ivec4.  (A) better than (B) for 2nd
                              //   argument (rule 2), same on first argument.
        f(ivec4, vec4);       // NOT matched.  All three match by implicit
                              //   conversion.  (C) is better than (A) and (B)
                              //   on the first argument.  (A) is better than
                              //   (B) and (C).


     Modify Section 7.1, Vertex And Geometry Shader Special Variables, p. 69

     (add to the list of geometry shader special variables, p. 69)

       in int gl_InvocationID;

     (add to the end of the section, p. 71)

     The input variable gl_InvocationID is available in the geometry language
     and is filled with an integer holding the invocation number associated
     with the given shader invocation.  If the program is linked to support
     multiple geometry shader invocations per input primitive, the invocations
     are numbered 0, 1, 2, ..., <N>-1.  gl_InvocationID is not available in the
     vertex or fragment language.


     Modify Section 7.2, Fragment Shader Special Variables, p. 72

     (add to the list of built-in variables)

       in int gl_SampleMaskIn[];

     The variable gl_SampleMaskIn is an array of integers, each holding a
     bitfield indicating the set of samples covered by the primitive generating
     the fragment during multisample rasterization.  The array has ceil(<s>/32)
     elements, where <s> is the maximum number of color samples supported by
     the implementation.  Bit <n> or word <w> in the bitfield is set if and
     only if the sample numbered <w>*32+<n> is considered covered for this
     fragment shader invocation.


     Modify Section 8.3, Common Functions, p. 84

     (add support for floating-point multiply-add)

     Syntax:

       genType fma(genType a, genType b, genType c);

     The function fma() performs a fused floating-point multiply-add to compute
     the value a*b+c.  The results of fma() may not be identical to evaluating
     the expression (a*b)+c, because the computation may be performed in a
     single operation with intermediate precision different from that used to
     compute a non-fma() expression.

     The results of fma() are guaranteed to be invariant given fixed inputs
     <a>, <b>, and <c>, as though the result were taken from a variable
     declared as "precise".


     (add support for single-precision frexp and ldexp functions)

     Syntax:

       genType frexp(genType x, out genIType exp);
       genType ldexp(genType x, in genIType exp);

     The function frexp() splits each single-precision floating-point number in
     <x> into a binary significand, a floating-point number in the range [0.5,
     1.0), and an integral exponent of two, such that:

       x = significand * 2 ^ exponent

     The significand is returned by the function; the exponent is returned in
     the parameter <exp>.  For a floating-point value of zero, the significant
     and exponent are both zero.  For a floating-point value that is an
     infinity or is not a number, the results of frexp() are undefined.

     If the input <x> is a vector, this operation is performed in a
     component-wise manner; the value returned by the function and the value
     written to <exp> are vectors with the same number of components as <x>.

     The function ldexp() builds a single-precision floating-point number from
     each significand component in <x> and the corresponding integral exponent
     of two in <exp>, returning:

       significand * 2 ^ exponent

     If this product is too large to be represented as a single-precision
     floating-point value, the result is considered undefined.

     If the input <x> is a vector, this operation is performed in a
     component-wise manner; the value passed in <exp> and returned by the
     function are vectors with the same number of components as <x>.


     (add support for new integer built-in functions)

     Syntax:

       genIType bitfieldExtract(genIType value, int offset, int bits);
       genUType bitfieldExtract(genUType value, int offset, int bits);

       genIType bitfieldInsert(genIType base, genIType insert, int offset,
                               int bits);
       genUType bitfieldInsert(genUType base, genUType insert, int offset,
                               int bits);

       genIType bitfieldReverse(genIType value);
       genUType bitfieldReverse(genUType value);

       genIType bitCount(genIType value);
       genIType bitCount(genUType value);

       genIType findLSB(genIType value);
       genIType findLSB(genUType value);

       genIType findMSB(genIType value);
       genIType findMSB(genUType value);

     The function bitfieldExtract() extracts bits <offset> through
     <offset>+<bits>-1 from each component in <value>, returning them in the
     least significant bits of corresponding component of the result.  For
     unsigned data types, the most significant bits of the result will be set
     to zero.  For signed data types, the most significant bits will be set to
     the value of bit <offset>+<base>-1.  If <bits> is zero, the result will be
     zero.  The result will be undefined if <offset> or <bits> is negative, or
     if the sum of <offset> and <bits> is greater than the number of bits used
     to store the operand.  Note that for vector versions of bitfieldExtract(),
     a single pair of <offset> and <bits> values is shared for all components.

     The function bitfieldInsert() inserts the <bits> least significant bits of
     each component of <insert> into the corresponding component of <base>.
     The result will have bits numbered <offset> through <offset>+<bits>-1
     taken from bits 0 through <bits>-1 of <insert>, and all other bits taken
     directly from the corresponding bits of <base>.  If <bits> is zero, the
     result will simply be <base>.  The result will be undefined if <offset> or
     <bits> is negative, or if the sum of <offset> and <bits> is greater than
     the number of bits used to store the operand.  Note that for vector
     versions of bitfieldInsert(), a single pair of <offset> and <bits> values
     is shared for all components.

     The function bitfieldReverse() reverses the bits of <value>.  The bit
     numbered <n> of the result will be taken from bit (<bits>-1)-<n> of
     <value>, where <bits> is the total number of bits used to represent
     <value>.

     The function bitCount() returns the number of one bits in the binary
     representation of <value>.

     The function findLSB() returns the bit number of the least significant one
     bit in the binary representation of <value>.  If <value> is zero, -1 will
     be returned.

     The function findMSB() returns the bit number of the most significant bit
     in the binary representation of <value>.  For positive integers, the
     result will be the bit number of the most significant one bit.  For
     negative integers, the result will be the bit number of the most
     significant zero bit.  For a <value> of zero or negative one, -1 will be
     returned.


     (add support for general packing functions)

     Syntax:

       uint      packUnorm2x16(vec2 v);
       uint      packUnorm4x8(vec4 v);
       uint      packSnorm4x8(vec4 v);

       vec2      unpackUnorm2x16(uint v);
       vec4      unpackUnorm4x8(uint v);
       vec4      unpackSnorm4x8(uint v);

     The functions packUnorm2x16(), packUnorm4x8(), and packSnorm4x8() first
     convert each component of a two- or four-component vector of normalized
     floating-point values into 8- or 16-bit integer values.  Then, the results
     are packed into a 32-bit unsigned integer.  The first component of the
     vector will be written to the least significant bits of the output; the
     last component will be written to the most significant bits.

     The functions unpackUnorm2x16(), unpackUnorm4x8(), and unpackSnorm4x8()
     first unpacks a single 32-bit unsigned integer into a pair of 16-bit
     unsigned integers, four 8-bit unsigned integers, or four 8-bit signed
     integers.  The, each component is converted to a normalized floating-point
     value to generate a two- or four-component vector.  The first component of
     the vector will be extracted from the least significant bits of the input;
     the last component will be extracted from the most significant bits.

     The conversion between fixed- and normalized floating-point values will be
     performed as below.

       function          conversion
       ---------------   -----------------------------------------------------
       packUnorm2x16     fixed_val = round(clamp(float_val, 0, +1) * 65535.0);
       packUnorm4x8      fixed_val = round(clamp(float_val, 0, +1) * 255.0);
       packSnorm4x8      fixed_val = round(clamp(float_val, -1, +1) * 127.0);
       unpackUnorm2x16   float_val = fixed_val / 65535.0;
       unpackUnorm4x8    float_val = fixed_val / 255.0;
       unpackSnorm4x8    float_val = clamp(fixed_val / 127.0, -1, +1);


     (add functions to get/set the bit encoding for floating-point values)

     32-bit floating-point data types in the OpenGL shading language are
     specified to be encoded according to the IEEE 754 specification for
     single-precision floating-point values.  The functions below allow shaders
     to convert floating-point values to and from signed or unsigned integers
     representing their encoding.

     To obtain signed or unsigned integer values holding the encoding of a
     floating-point value, use:

       genIType floatBitsToInt(genType value);
       genUType floatBitsToUint(genType value);

     Conversions are done on a component-by-component basis.

     To obtain a floating-point value corresponding to a signed or unsigned
     integer encoding, use:

       genType intBitsToFloat(genIType value);
       genType uintBitsToFloat(genUType value);


     (support for unsigned integer add/subtract with carry-out)

     Syntax:

       genUType uaddCarry(genUType x, genUType y, out genUType carry);
       genUType usubBorrow(genUType x, genUType y, out genUType borrow);

     The function uaddCarry() adds 32-bit unsigned integers or vectors <x> and
     <y>, returning the sum modulo 2^32.  The value <carry> is set to zero if
     the sum was less than 2^32, or one otherwise.

     The function usubBorrow() subtracts the 32-bit unsigned integer or vector
     <y> from <x>, returning the difference if non-negative or 2^32 plus the
     difference, otherwise.  The value <borrow> is set to zero if x >= y, or
     one otherwise.


     (support for signed and unsigned multiplies, with 32-bit inputs and a
      64-bit result spanning two 32-bit outputs)

     Syntax:

       void umulExtended(genUType x, genUType y, out genUType msb,
                         out genUType lsb);
       void imulExtended(genIType x, genIType y, out genIType msb,
                         out genIType lsb);

     The functions umulExtended() and imulExtended() multiply 32-bit unsigned
     or signed integers or vectors <x> and <y>, producing a 64-bit result.  The
     32 least significant bits are returned in <lsb>; the 32 most significant
     bits are returned in <msb>.


     Modify Section 8.7, Texture Lookup Functions, p. 91

     (extend the basic versions of textureGather from ARB_texture_gather,
      allowing for optional component selection in a multi-component texture
      and for shadow mapping)

     Syntax:
       gvec4 textureGather(gsampler2D sampler, vec2 coord[, int comp]);
       gvec4 textureGather(gsampler2DArray sampler, vec3 coord[, int comp]);
       gvec4 textureGather(gsamplerCube sampler, vec3 coord[, int comp]);
       gvec4 textureGather(gsamplerCubeArray sampler, vec4 coord[, int comp]);
       gvec4 textureGather(gsampler2DRect sampler, vec2 coord[, int comp]);

       vec4 textureGather(sampler2DShadow sampler, vec2 coord, float refZ);
       vec4 textureGather(sampler2DArrayShadow sampler, vec3 coord, float refZ);
       vec4 textureGather(samplerCubeShadow sampler, vec3 coord, float refZ);
       vec4 textureGather(samplerCubeArrayShadow sampler, vec4 coord,
                          float refZ);
       vec4 textureGather(sampler2DRectShadow sampler, vec2 coord, float refZ);

     The textureGather() functions use the texture coordinates given by <coord>
     to determine a set of four texels to sample from the texture identified by
     <sampler>.  These functions return a four-component vector consisting of
     one component from each texel.  If specified, the value of <comp> must be
     a constant integer expression with a value of zero, one, two, or three,
     identifying the <x>, <y>, <z>, or <w> component of the four-component
     vector lookup result for each texel, respectively.  If <comp> is not
     specified, the <x> component of each texel will be used to generate the
     result vector.  As described in the OpenGL Specification, the vector
     selects the post-swizzle component corresponding to <comp> from each of
     the four texels, returning:

       vec4(T_i0_j1(coord, base).<comp>,
            T_i1_j1(coord, base).<comp>,
            T_i1_j0(coord, base).<comp>,
            T_i0_j0(coord, base).<comp>)

     For textureGather() functions using a shadow sampler type, each of the
     four texel lookups performs a depth comparison against the depth reference
     value passed in <refZ>, and returns the result of that comparison in the
     appropriate component of the result vector.  The parameter <comp> used for
     component selection is not supported for textureGather() functions with
     shader sampler types.

     As with other texture lookup functions, the results of textureGather() are
     undefined for shadow samplers if the texture referenced is not a depth
     texture or has depth comparisons disabled; or for non-shadow samplers if
     the texture referenced is a depth texture with depth comparisons enabled.


     (extend the "Offset" versions of textureGather from ARB_texture_gather,
      allowing for optional component selection in a multi-component texture,
      non-constant offsets, and shadow mapping)

     Syntax:
       gvec4 textureGatherOffset(gsampler2D sampler, vec2 coord,
                                 ivec2 offset[, int comp]);
       gvec4 textureGatherOffset(gsampler2DArray sampler, vec3 coord,
                                 ivec2 offset[, int comp]);
       gvec4 textureGatherOffset(gsampler2DRect sampler, vec2 coord,
                                 ivec2 offset[, int comp]);

       vec4 textureGatherOffset(sampler2DShadow sampler, vec2 coord,
                                float refZ, ivec2 offset);
       vec4 textureGatherOffset(sampler2DArrayShadow sampler, vec3 coord,
                                float refZ, ivec2 offset);
       vec4 textureGatherOffset(sampler2DRectShadow sampler, vec2 coord,
                                float refZ, ivec2 offset);

     The textureGatherOffset() functions operate identically to
     textureGather(), except that the 2-component integer texel offset vector
     <offset> is applied as a (u,v) offset to determine the four texels to
     sample.  The value <offset> need not be constant; however, a limited range
     of offset values are supported.  If any component of <offset> is less than
     MIN_PROGRAM_TEXTURE_GATHER_OFFSET_ARB or greater than
     MAX_PROGRAM_TEXTURE_GATHER_OFFSET_ARB, the offset applied to the texture
     coordinates is undefined.  Note that <offset> does not apply to the layer
     coordinate for array textures.


     (add new "Offsets" versions of textureGather from ARB_texture_gather,
      allowing for optional component selection in a multi-component texture,
      separate non-constant offsets for each texel in the footprint, and shadow
      mapping)

     Syntax:
       gvec4 textureGatherOffsets(gsampler2D sampler, vec2 coord,
                                  ivec2 offsets[4][, int comp]);
       gvec4 textureGatherOffsets(gsampler2DArray sampler, vec3 coord,
                                  ivec2 offsets[4][, int comp]);
       gvec4 textureGatherOffsets(gsampler2DRect sampler, vec2 coord,
                                  ivec2 offsets[4][, int comp]);

       vec4 textureGatherOffsets(sampler2DShadow sampler, vec2 coord,
                                 float refZ, ivec2 offsets[4]);
       vec4 textureGatherOffsets(sampler2DArrayShadow sampler, vec3 coord,
                                 float refZ, ivec2 offsets[4]);
       vec4 textureGatherOffsets(sampler2DRectShadow sampler, vec2 coord,
                                 float refZ, ivec2 offsets[4]);

     The textureGatherOffsets() functions operate identically to
     textureGather(), except that the array of two-component integer vectors
     <offsets> is used to determine the location of the four texels to sample.
     Each of the four texels is obtained by applying the corresponding offset
     in the four-element array <offsets> as a (u,v) coordinate offset to the
     coordinates <coord>, identifying the four-texel LINEAR footprint, and then
     selecting the texel T_i0_j0 of that footprint.  The specified values in
     <offsets> must be constant.  A limited range of offset values are
     supported; the minimum and maximum offset values are
     implementation-dependent and given by
     MIN_PROGRAM_TEXTURE_GATHER_OFFSET_ARB and
     MAX_PROGRAM_TEXTURE_GATHER_OFFSET_ARB, respectively.  Note that <offset>
     does not apply to the layer coordinate for array textures.


     Modify Section 8.8, Fragment Processing Functions, p. 101

     (add new functions to the end of section, p. 102)

     Built-in interpolation functions are available to compute an interpolated
     value of a fragment shader input variable at a shader-specified (x,y)
     location.  A separate (x,y) location may be used for each invocation of
     the built-in function, and those locations may differ from the default
     (x,y) location used to produce the default value of the input.

       float interpolateAtCentroid(float interpolant);
       vec2 interpolateAtCentroid(vec2 interpolant);
       vec3 interpolateAtCentroid(vec3 interpolant);
       vec4 interpolateAtCentroid(vec4 interpolant);

       float interpolateAtSample(float interpolant, int sample);
       vec2 interpolateAtSample(vec2 interpolant, int sample);
       vec3 interpolateAtSample(vec3 interpolant, int sample);
       vec4 interpolateAtSample(vec4 interpolant, int sample);

       float interpolateAtOffset(float interpolant, vec2 offset);
       vec2 interpolateAtOffset(vec2 interpolant, vec2 offset);
       vec3 interpolateAtOffset(vec3 interpolant, vec2 offset);
       vec4 interpolateAtOffset(vec4 interpolant, vec2 offset);

     The function interpolateAtCentroid() will return the value of the input
     varying <interpolant> sampled at a location inside the both the pixel and
     the primitive being processed.  The value obtained would be the same value
     assigned to the input variable if declared with the "centroid" qualifier.

     The function interpolateAtSample() will return the value of the input
     varying <interpolant> at the location of the sample numbered <sample>.  If
     multisample buffers are not available, the input varying will be evaluated
     at the center of the pixel.  If the sample number given by <sample> does
     not exist, the position used to interpolate the input varying is
     undefined.

     The function interpolateAtOffset() will return the value of the input
     varying <interpolant> sampled at an offset from the center of the pixel
     specified by <offset>.  The two floating-point components of <offset>
     give the offset in pixels in the x and y directions, respectively.
     An offset of (0,0) identifies the center of the pixel.  The range and
     granularity of offsets supported by this function is
     implementation-dependent.

     For all of the interpolation functions, <interpolant> must be an input
     variable or an element of an input variable declared as an array.
     Component selection operators (e.g., ".xy") may not be used when
     specifying <interpolant>.  If <interpolant> is declared with a "flat" or
     "centroid" qualifier, the qualifier will have no effect on the
     interpolated value.  If <interpolant> is declared with the "noperspective"
     qualifier, the interpolated value will be computed without perspective
     correction.


     Modify Section 8.10, Geometry Shader Functions, p. 104

     (replace the section, using the following more general formulation)

     These functions are only available in geometry shaders.

     Syntax:

         void EmitStreamVertex(int stream);      // Geometry-only
         void EndStreamPrimitive(int stream);    // Geometry-only

         void EmitVertex();                      // Geometry-only
         void EndPrimitive();                    // Geometry-only

     Description:

     The function EmitStreamVertex() specifies that the vertex being generated
     by the geometry shader is completed.  A vertex is added to the current
     output primitive in the vertex stream numbered <stream> using the current
     values of all output variables associated with <stream>.  The values of
     any unwritten output variables associated with <stream> are undefined.
     The argument <stream> must be a constant integral expression.  The values
     of all output variables (for all output streams) are undefined after
     calling EmitStreamVertex().  If a geometry shader invocation has emitted
     more vertices than permitted by the output layout qualifier
     "max_vertices", the results of calling EmitStreamVertex() are undefined.

     The function EmitVertex() is equivalent to calling EmitStreamVertex() with
     <stream> set to zero.

     The function EndStreamPrimitive() specifies that the current output
     primitive for the vertex stream numbered <stream> is completed and that a
     new empty output primitive of the same type should be started.  The
     argument <stream> must be a constant integral expression.  This function
     does not emit a vertex.  If the output layout is declared to be "points",
     calling EndPrimitive() is optional.

     The function EndPrimitive() is equivalent to calling EndStreamPrimitive()
     with <stream> set to zero.

     A geometry shader starts with an output primitive containing no vertices
     for each stream.  When a geometry shader terminates, the current output
     primitive for each vertex stream is automatically completed.  It is not
     necessary to call EndPrimitive() or EndStreamPrimitive() for any stream
     where the geometry shader writes only a single primitive.

     Multiple vertex streams are supported only if the output primitive type is
     declared to be "points".  A program will fail to link if it contains a
     geometry shader calling EmitStreamVertex() or EndStreamPrimitive() if its
     output primitive type is not "points".


     Modify Section 9, Shading Language Grammar, p. 92

     !!! TBD !!!


 GLX Protocol

     None.

 Dependencies on ARB_gpu_shader_fp64

     This extension, ARB_gpu_shader_fp64, and NV_gpu_shader5 all modify the set
     of implicit conversions supported in the OpenGL Shading Language.  If more
     than one of these extensions is supported, an expression of one type may
     be converted to another type if that conversion is allowed by any of these
     specifications.

     If ARB_gpu_shader_fp64 or a similar extension introducing new data types
     is not supported, the function overloading rule in the GLSL specification
     preferring promotion an input parameters to smaller type to a larger type
     is never applicable, as all data types are of the same size.  That rule
     and the example referring to "double" should be removed.


 Dependencies on NV_gpu_shader5

     This extension, ARB_gpu_shader_fp64, and NV_gpu_shader5 all modify the set
     of implicit conversions supported in the OpenGL Shading Language.  If more
     than one of these extensions is supported, an expression of one type may
     be converted to another type if that conversion is allowed by any of these
     specifications.

     This specification and NV_gpu_shader5 both lift the restriction in GLSL
     1.50 requiring that indexing in arrays of samplers must be done with
     constant expressions.  However, this extension specifies that results are
     undefined if the indices would diverge if multiple shader invocations are
     run in lockstep.  NV_gpu_shader5 does not impose the non-divergent
     indexing requirement.

     If NV_gpu_shader5 is supported, integer data types are supported with four
     different precisions (8-, 16, 32-, and 64-bit) and floating-point data
     types are supported with three different precisions (16-, 32-, and
     64-bit).  The extension adds the following rule for output parameters,
     which is similar to the one present in this extension for input
     parameters:

        5. If the formal parameters in both matches are output parameters, a
           conversion from a type with a larger number of bits per component is
           better than a conversion from a type with a smaller number of bits
           per component.  For example, a conversion from an "int16_t" formal
           parameter type to "int"  is better than one from an "int8_t" formal
           parameter type to "int".

     Such a rule is not provided in this extension because there is no
     combination of types in this extension and ARB_gpu_shader_fp64 where this
     rule has any effect.


 Dependencies on ARB_sample_shading

     This extension builds upon the per-sample shading support provided by
     ARB_sample_shading to provide several new capabilities, including:

       * the built-in variable gl_SampleMaskIn[] indicates the set of samples
         covered by the input primitive corresponding to the fragment shader
         invocation; and

       * use of the "sample" qualifier on a fragment shader input forces
         per-sample shading, and specifies that the value of the input be
         evaluated per-sample.

     There is no interaction between the extensions, except that shaders using
     the features of this extension seem likely to use features from
     ARB_sample_shading as well.


 Dependencies on ARB_texture_gather

     This extension builds upon the textureGather() built-ins provided by
     ARB_texture_gather to provide several new capabilities, including:

       * allowing shaders to select any single component of a multi-component
         texture to produce the gathered 2x2 footprint;

       * allowing shaders to perform a per-sample depth comparison when
         gathering the 2x2 footprint using for shadow sampler types;

       * allowing shaders to use arbitrary offsets computed at run-time to
         select a 2x2 footprint to gather from; and

       * allowing shaders to use separate independent offsets for each of the
         four texels returned, instead of requiring a fixed 2x2 footprint.

     Other than the fact that they provide similar functionality, there is no
     interaction between the extensions.

     Since this extension requires support for gathering from multi-component
     textures, the minimum value of MAX_PROGRAM_TEXTURE_GATHER_COMPONENTS_ARB
     is increased to 4.


 Errors

     INVALID_OPERATION is generated by GetProgram if <pname> is
     GEOMETRY_SHADER_INVOCATIONS and the program which has not been linked
     successfully, or does not contain objects to form a geometry shader.


 New State

     Add the following state to Table 6.40, Program Object State, p. 378

                                                     Initial
     Get Value                 Type   Get Command     Value     Description                  Sec.  Attribute
     ------------------------- ----  ------------    -------    -------------------------   ------  -------
     GEOMETRY_SHADER_           Z+    GetProgramiv      1       number of times a geometry  6.1.16    -
       INVOCATIONS                                              shader should be executed
                                                                for each input primitive

 New Implementation Dependent State

                                                Min.
     Get Value               Type  Get Command  Value  Description                  Sec.      Attrib
     ----------------------  ----  -----------  -----  --------------------------   --------  ------
     MAX_GEOMETRY_SHADER_     Z+   GetIntegerv   32    maximum supported geometry   2.16.4      -
       INVOCATIONS                                     shader invocation count
     MIN_FRAGMENT_INTERP-     R    GetFloatv    -0.5   furthest negative offset     3.12.1      -
       OLATION_OFFSET                                   for interpolateAtOffset()
     MAX_FRAGMENT_INTERP-     R    GetFloatv    +0.5   furthest positive offset     3.12.1      -
       OLATION_OFFSET                                   for interpolateAtOffset()
     FRAGMENT_INTERPOLATION_  Z+   GetIntegerv    4    supixel bits for             3.12.1      -
       OFFSET_BITS                                      interpolateAtOffset()
     MAX_VERTEX_STREAMS       Z+   GetInteger     4    total number of vertex       2.16.4      -
                                                        streams

     (Note:  The minimum value for MAX_PROGRAM_TEXTURE_GATHER_COMPONENTS_ARB,
      added by ARB_texture_gather, is increased to 4.)

 Issues

     (1) This extension builds on the capability provided by
         ARB_sample_shading, adding a new built-in variable for the input
         sample mask.  It seems likely that a shader using this mask might also
         want to use one or more ARB_sample_shading built-ins.  Are such
         shaders required to include #extension lines for both extensions?

       UNRESOLVED:  It would be nice if it wasn't required.

     (2) How do the per-sample shading features of this extension interact with
         non-multisample rendering?

       RESOLVED:  Non-multisample rendering (due to no multisample buffer or
       MULTISAMPLE disabled) is treated as single-sample rendering.

     (3) This extension lifts the restriction requiring that indices into
         samplers be constant expressions, but makes the results undefined if
         the indices used would diverge in lockstep execution.  What is this
         good for?

       RESOLVED:  This allows shaders to index into samplers using integer
       uniforms, or with non-divergent values computed at run-time (e.g., loop
       counters).  Many implementations of this extension will be SIMD, running
       multiple shader invocations at once, and some implementations may have
       difficulty with accessing multiple textures in a single SIMD
       instruction.

       Note that the NV_gpu_shader5 extension similarly lifts the restriction
       but does not require non-divergent indexing.

     (4) What sort of implicit conversions should we support in this and
         related extensions?

       RESOLVED:  In GLSL 1.50, we have implicit conversion from "int" and
       "uint" to "float", as well as equivalent conversions for vector type.
       One of the primary motivations of this feature is to allow constants
       that are nominally integer values to be used in floating-point contexts
       without requiring special suffixes.  The following code compiles
       successfully in GLSL 1.50.

         float square(float x) {
           return x * x;
         }
         float f = 0;
         float g = f * 2;
         float h = square(3);

       The same code would fail on GLSL 1.1, because "0", "2", and "3" would
       need to be written as "0.0", "2.0", and "3.0", respectively.

       This extension adds implicit conversions from "int" to "uint" to allow
       for cases like:

         uint square(uint x) {
           return x * x;
         }
         uint v = square(2);

       This code is legal with this extension, but not in GLSL 1.50 ("2" would
       need to be replaced with "2U" or "uint(2)").

       ARB_gpu_shader_fp64 adds a new type "double", and we extend existing
       implicit conversions to allow for promotion of "int", "uint", and
       "float" to "double".

       Unlike C/C++, the general rule for implicit conversions in GLSL is that
       conversions are unidirectional.  If type A can be implicitly converted
       to type B, type B can not be converted to type A.

     (5) Increasing the number of available implicit conversions means that
         there is the possibility of ambiguities in various operators?  How do
         we deal with these cases?

       RESOLVED:  For binary operators, the new implicit conversions mean that
       there may be multiple ways to resolve an expression.  For example, in
       the following declaration

         int i;
         uint u;

       the expression "i+u" could be resolved either by implicitly converting
       "i" to "uint", or by implicitly converting both values to either "float"
       or "double".  To resolve, we define a set of preferences for a common
       data type based on the types of the operands:

         - use a floating-point type if either operand is floating-point
         - use an unsigned integer type if either operand is unsigned
         - use a signed integer type otherwise

       If conversions to multiple precisions are supported, the
       lowest-precision available data type is preferred (e.g., int*float will
       be converted to float*float and not double*double).

       These rules should extend naturally if new basic data types are added.

     (6) Increasing the number of available implicit conversions means that
         there is an increased possibility of ambiguity when function
         overloading is involved?  Additionally, this and related extensions
         add new function overloads?  How do we deal with these cases?

       RESOLVED:  The general rule for function overloading in GLSL 1.50 is
       that we first check for a function prototype that exactly matches the
       parameters passed to a function call.  If no match exists, we check for
       prototypes that can be matched by implicit conversions.  If more than
       one matching prototype can be matched by conversion, the function call
       is considered ambiguous and results in a complication error.

       Unfortunately, when adding new implicit conversions, it is possible for
       cases that were formally unambiguous to become ambiguous.  For backward
       compatibility purposes, it would be desirable to ensure that shaders
       that succeeded in old language versions should still compile if
       "upgraded" to more recent versions/extensions.  However, the new
       conversions and overloads might make this more difficult without
       modifying other language rules.  For example, the following prototypes
       are available for the standard built-in function min() on scalar values
       when this extension and ARB_gpu_shader_fp64 are supported:

         int     min(int a, int b);
         uint    min(uint a, uint b);
         float   min(float a, float b);
         double  min(double a, double b);

       In GLSL 1.50, a function call such as:

         float f;
         min(f, 1);

       would be considered unambiguous because the double-precision version of
       min() didn't exist and the call matched only the single-precision
       version.  However, with double-precision, implicit conversions can be
       used to resolve to either the single- or double-precision versions.

       To resolve this issue, we provide a set of rules that can be used to
       resolve multiple candidates to a "best match".  The rules for
       determining a best match are similar to those for C++ function
       overloading, but not exactly the same.  Like C++, these rules compare
       the conversions required on an argument-by-argument basis.  A function
       prototype A is better than function prototype B if:

         - A is better than B for one or more arguments
         - B is better than A for no arguments

       If a single function prototype is better than all others, that one is
       used.  Otherwise, we get the same ambiguity error as on previous GLSL
       versions.

       As far as argument-by-argument comparisons go, the order of preference
       is:

         - favor exact matches
         - prefer "promotions" (float->double) to other conversions
         - prefer conversions from int/uint to float over similar conversion to
           double

       If none of the rules apply, one match is considered neither better nor
       worse than the other.

       With these rules, the "min(f,1)" example above resolves to the "float"
       version, as is the case in GLSL 1.50.  However, there are other cases
       where ambiguity remains.  For example, consider the prototypes:

         int f(uint x);
         int f(float x);

       With GLSL 1.50 rules, "f(3)" would match the floating-point version, as
       no implicit conversions existed from "int" to "uint".  With the new
       implicit conversions, both prototypes match and neither is preferred.
       Because of the ambiguity, "f(3)" would fail to compile with this
       extension enabled, but should still compile on implementations
       supporting this extension if the extension is not enabled in GLSL source
       code.

     (7) The function overloading rules described in this extension describe
         conversions between data types with different sizes, however all
         existing data types allowing implicit conversion (int, uint, float)
         are the same size?  Why do we specify these rules?

       RESOLVED:  This extension is specified at the same time as the related
       ARB_gpu_shader_fp64 and NV_gpu_shader5 extensions, which do provide such
       types.  The rules are specified all in one place here so we don't have
       to replicate and extend the rules in the other extensions.  It also
       provides the ability to automatically convert from signed to unsigned
       integer types, as in the C programming language.

     (8) Should we support textureGather() for rectangle textures
         (sampler2DRect)?  They aren't in ARB_texture_gather.

       RESOLVED:  Yes.

     (9) How does the input sample mask interact with the fixed-function
         SampleCoverage and SampleMask state?  Will samples be removed from the
         input mask if they would be eliminated by these masks in the
         per-fragment operations?

       UNRESOLVED.

     (10) Should we support reading patches as geometry shader inputs, and if
     so, where?

       RESOLVED:  Not in this extension.  This capability will be provided in
       NV_gpu_shader5.

     (11) Should we support per-sample interpolation of attributes?  If so,
          how?

       RESOLVED.  Yes.  When multisample rasterization is enabled, qualifying
       one or more fragment shader inputs with "sample" will force per-sample
       interpolation of those attributes.  If the same shader includes other
       fragment inputs not qualified with sample, those attributes may be
       interpolated per-pixel (i.e., all samples get the same values, likely
       evaluated at the pixel center).

     (12) Should we reserve "sample" as a keyword for per-sample interpolation
     qualifiers, or use something more obscure, such as "per_sample"?

       RESOLVED:  This extension uses "sample".

     (13) What should be the base data type for the bitCount(), findLSB(), and
          findMSB() functions -- signed or unsigned integers?

       RESOLVED:  These functions will return signed values, with -1 returned
       by findLSB/findMSB if no bit is found.  Note that the shading language
       supports implicit conversions of signed integers to unsigned, which
       makes it easy enough if an unsigned result is desired.

     (14) Why do EmitVertex() and EndPrimitive() begin with capitalized words
          while most of the other built-ins start with a lower-case (e.g.,
          emitVertex)?  Which precedent should the new per-vertex stream emit
          and end primitive functions follow?

       RESOLVED:  The inconsistency began with the original functions in
       EXT_geometry_shader4; the spec author can't recall the original reasons
       (if any).  Regardless, we decided to match the existing functions as
       closely as possible and use EmitStreamVertex() and EndStreamPrimitive().

     (15) How do the textureGather functions work with sRGB textures?

       RESOLVED:  Gamma-correction is applied to the texture source color
       before "gathering" and hence applies to all four components, unless the
       texture swizzle of the selected component is ALPHA in which case no
       gamma-correction is applied.

     (16) How should we support arrays of uniform blocks (i.e., multiple blocks
          in a group, each backed by a separate buffer object)?

       RESOLVED:  We will use instance names in the block definitions, which
       can be declared as regular arrays:

         uniform UniformData {
           vec4 stuff;
         } blocks[4];

       These four blocks used will be referred to as "block[0]" through
       "block[3]" in shader code, and "UniformData[0]" through "UniformData[3]"
       in the OpenGL API code.  The block member in this example will be
       referred to as "UniformData.stuff" in the API.  A similar approach was
       already adopted in GLSL 1.50, where geometry shaders supported arrays of
       input blocks that were treated similarly.  Since this spec depends on
       GLSL 1.50, little new spec language is required here.

     (17) What are instanced geometry shaders useful for?

       RESOLVED:  Instanced geometry shaders allow geometry programs that
       perform regular operations to run more efficiently.

       Consider a simple example of an algorithm that uses geometry shaders to
       render primitives to a cube map in a single pass.  Without instanced
       geometry shaders, the geometry shader to render triangles to the cube
       map would do something like:

         for (face = 0; face < 6; face++) {
           for (vertex = 0; vertex < 3; vertex++) {
             project vertex <vertex> onto face <face>, output position
             compute/copy attributes of emitted <vertex> to outputs
             output <face> to result.layer
             emit the projected vertex
           }
           end the primitive (next triangle)
         }

       This algorithm would output 18 vertices per input triangle, three for
       each cube face.  The six triangles emitted would be rasterized, one per
       face.  Geometry shaders that emit a large number of attributes have
       often posed performance challenges, since all the attributes must be
       stored somewhere until the emitted primitives.  Large storage
       requirements may limit the number of threads that can be run in parallel
       and reduce overall performance.

       Instanced geometry shaders allow this example to be restructured to run
       with six separate invocations, one per face.  Each invocation projects
       the triangle to only a single face (identified by the invocation number)
       and emits only 3 vertices.  The reduced storage requirements allow more
       geometry shader invocations to be run in parallel, with greater overall
       efficiency.

       Additionally, the total number of attributes that can be emitted by a
       single geometry shader invocation is limited.  However, for instanced
       geometry shaders, that limit applies to each of <N> invocations which
       allows for a larger total output.  For example, if the GL implementation
       supports only 1024 components of output per invocation, the 18-vertex
       algorithm above could emit no more than 56 components per vertex.  The
       same algorithm implemented as a 3-vertex 6-invocation geometry program
       could theoretically allow for 341 components per vertex.

     (18) Should EmitStreamVertex() and EndStreamPrimitive() accept a
          non-constant stream number?

       RESOLVED:  Not in this extension.  Requiring a constant stream number
       for each call simplifies code generation for the compiler.

     (19) Are there any restrictions on geometry shaders with multiple output
          streams?

       RESOLVED:  Yes, such geometry shaders are required to generate points;
       line strip and triangle strip outputs are not supported.

     (20) Since multi-stream geometry shaders only support points, why does
          EndStreamPrimitive() exist?  Neither it nor EndStream() does anything
          useful when emitting points.

       RESOLVED:  This function was added for completeness, and would be useful
       if the requirement for emitting points were lifted by a future
       extension.

     (21) Should we provide mechanisms allowing shaders to examine or set the
          bit representation of floating-point numbers?

       RESOLVED:  Yes, we will provide functions to convert single-precision
       floats to/from signed and unsigned 32-bit integers.  The
       ARB_gpu_shader_fp64 extension will provide similar functionality for
       double-precision floats.  We chose to adopt the Java naming convention
       here -- converting a single-precision float to/from a signed integer is
       accomplished by the functions floatBitsToInt() and intBitsToFloat().

       Note that this functionality has also been forked off into a separate
       extension (ARB_shader_bit_encoding) that can be exported on
       implementations capable of performing such conversions but not capable
       of the full feature set of this extension and/or OpenGL 4.0.

     (22) What is the "precise" qualifier good for?

       RESOLVED:  Like "invariant", "precise" provides some invariance
       guarantees is useful for certain algorithms.

       With an output position qualified as "invariant", we ensure that if the
       same geometry is processed by multiple shaders using the exact same
       code, it will be transformed in exactly the same way to ensure that we
       have no cracking or flickering in multi-pass algorithms using different
       shaders.

       With "precise", we ensure that an algorithm can be written to produce
       identical results on subtly different inputs.  For example, the order of
       vertices visible to a geometry or tessellation shader used to subdivide
       primitive edges might present an edge shared between two primitives in
       one direction for one primitive and the other direction for the adjacent
       primitive.  Even if the weights are identical in the two cases, there
       may be cracking if the computations are being done in an order-dependent
       manner.  If the position of a new vertex were provided by evaluation the
       function f() below with limited-precision floating-point math, it's not
       necessarily the case that f(a,b,c) == f(c,b,a) in the following code:

           float f(float x, float y, float z)
           {
             return (x + y) + z;
           }

       This function f() can be rewritten as follows with "precise" and a
       symmetric evaluation order to ensure that f(a,b,c) == f(c,b,a).

           float f(float x, float y, float z)
           {
             // Note that we intentionally compute "(x+z)" instead of "(x+y)"
             // here, because that value will be the same when <x> and <z>
             // are reversed.
             precise float result = (x + z) + y;
             return result;
           }

           (a + b) + c == (c + b) + a

       The "precise" qualifier will disable certain optimization and thus
       carries a performance cost.  The cost may be higher than "invariant",
       because "invariant" permits optimizations disallowed by "precise" as
       long as the compiler ensures that it always optimizes in the exact same
       manner.

     (23) What computations will be affected by the "precise" qualifier, and
          what computations aren't?

       RESOLVED:  We will ensure precise computation of any expressions within
       a single function used directly or indirectly to produce the value of a
       variable qualified as "precise".

       We chose not to provide this guarantee across function boundaries, even
       if the results of a function are used in the computation of an output
       qualified as "precise".  Algorithms requiring the use of "precise" may
       have a mix of computations, some required to be precise, some not.  This
       function boundary rule may serve to limit the amount of computation
       indirectly forced to be precise.

       Additionally, the subroutine rule permits non-precise sub-operations in
       a computation required to be precise.  For example, a shader might need
       to compute a "precise" position by taking a weighted average as in the
       following code:

         precise vec3 pos = (p[0]*w[0] + p[1]*w[1]) + (p[2]*w[2] + p[3]*w[3]);

       However, if the main precision requirement is that the same result be
       generated when <p> and <w> are reversed, the following code also gets
       the job done, even if posmad() is implemented with multiply-add
       operations.

         vec3 posmad(vec3 p0, float w0, vec3 p1w1) { return p0*w0+p1w1; }
         precise vec3 pos = (posmad(p[0], w[0], p[1]*w[1]) +
                             posmad(p[3], w[3], p[2]*w[2]));

       To generate precise results within a function, the function arguments
       and/or temporaries within the function body should be qualified as
       "precise" as needed.

       Note that when applying "precise" rules to assignments, indirect
       application of this rule applies on an assignment-by-assignment basis.
       In the following perverse example:

         float a,b,c,d,e,f;
         precise float g;
         f = a + b + c;
         ...
         f = c + d + e;
         g = f * 2.0;

       The first assignment to <f> need not be treated as "precise", since the
       value assigned will have no effect on the final value of the
       precise-qualified <g>.  The second assignment to <f> must be evaluated
       precisely.  The fact that one assignment to a variable needs to be
       treated as precise does not mean that the variable itself is implicitly
       treated as "precise".

     (24) Are "precise" qualifiers allowed on function arguments?  If so, what
          do they mean?  Can a return value for a function be declared as
          precise?

       RESOLVED:  Yes; the rules permit the use of "precise" on any variable
       declaration, including function arguments.  The code

         float f(precise in vec4 arg1, precise out vec4 arg2) { ... }

       specifies that any expressions used to assign values to <arg1> or <arg2>
       within f() will be evaluated as a precise manner.

       Expressions used to derive the value passed to the function f() as
       <arg1> will be treated as precise according to the normal rules.  The
       expression for <arg1> is treated as precise if and only if the function
       call is on the right-hand side of an assignment to a variable qualified
       as "precise" or is indirectly used in an assignment to such a variable.
       It is not automatically treated as precise just because the formal
       parameter <arg1> is qualified with "precise".

       For the purposes of this rule, variables passed as "out" parameters do
       not count as assignments.  Values assigned to an output parameter will
       not be evaluated precisely just because the caller provides a variable
       qualified as "precise".  When the output parameter itself is qualified
       as "precise", precise evaluation of that output is required within the
       callee.

       We chose not to permit function return values to be qualified as
       "precise", though we could have hypothetically allowed code such as:

         precise float f(float a, float b, float c) { return (a+b)+c; }

       To obtain a precise return value in such a case, use code such as:

         float f(float a, float b, float c)
         {
           precise float result = (a+b) + c;
           return result;
         }

     (25) How does texture gather interact with incomplete textures?

       RESOLVED:  For regular texture lookups, incomplete textures are
       considered to return a texel value with RGBA components of (0,0,0,1).
       For texture gather operations, each texel in the sampled footprint is
       considered to have RGBA components of (0,0,0,1).  When using the
       textureGather() function to select the R, G, or B component of an
       incomplete texture, (0,0,0,0) will be returned.  When selecting the A
       component, (1,1,1,1) will be returned.


 Revision History

     Rev.    Date    Author    Changes
     ----  --------  --------  -----------------------------------------
     16    03/30/12  pbrown    Fix typo in language restricting the use of
                               EmitStreamVertex()/EndStreamPrimitive() to
                               programs with an output primitive type of
                               points, not an input type of points (bug 8371).

     15    10/17/11  pbrown    Fix prototypes for textureGather and
                               textureGatherOffset to use vec2 coordinates for
                               "2DRect" sampler versions (bug 7964).

     14    01/27/11  pbrown    Add further clarification on the interaction
                               of texture gather and incomplete textures (bug
                               7289).

     13    09/24/10  pbrown    Clarify the interaction of texture gather
                               with swizzle (bug 5910), fixing conflicts
                               between API and GLSL spec language.
                               Consolidate into one copy in the API
                               spec.

     12    03/23/10  pbrown    Update issues section, both fixing/numbering
                               existing issues and including other issues
                               that were left behind in NV_gpu_shader5 when the
                               specs were refactored.

     11    03/23/10  Jon Leech Describe <offset> to interpolateAtOffset
                               without implying it is a constant expression
                               (Bug 6026).

     10    03/07/10  pbrown    Fix typo in an output stream qualifier example.

      9    03/05/10  pbrown    Modify function overloading rules to remove
                               most preferences when converting between
                               two different types.  The only preferences
                               that remain are promoting "float" to "double"
                               over other conversions, and preferring
                               conversion of integers to "float" to converting
                               to "double" (bug 5938).

      8    01/29/10  pbrown    Update the spec to require that the minimum
                               value for MAX_PROGRAM_TEXTURE_GATHER_-
                               COMPONENTS is 4 (bug 5919).

      7    01/21/10  pbrown    Clarify the rules for determining a best match
                               if implicit conversions can result in multiple
                               matching function prototypes.  Modify the rules
                               to pick a best match by comparing pairs of
                               functions, and using any function deemed better
                               than any other choice.  Modify the argument
                               conversion preference rules for overloading to
                               disfavor "int" to "uint" conversions, for
                               backward compatibility with previous GLSL
                               versions.  Add some new discussion of the
                               choices involved to the issues section (bug
                               5938).

      6    01/14/10  pbrown    Minor wording updates from spec reviews.

      5    12/10/09  pbrown    Functionality updates from spec review:
                               Rename fmad to fma.  Fix error in spec
                               language for negative diffs in usubBorrow.

      4    12/10/09  pbrown    Convert from EXT to ARB.

      3    12/08/09  pbrown    Miscellaneous fixes from spec review:  Added
                               missing implementation constants for
                               interpolation offset range and granularity;
                               added explicit section to OpenGL spec describing
                               shader requested interpolation modifiers and
                               functions.  Clean up more dangling "ThreadID"
                               references.  General typo fixes and language
                               clarifications.

      2    10/01/09  pbrown    Renamed gl_ThreadID to gl_InvocationID.

      1              pbrown    Internal revisions.