blob: 3405c589f2fd01f32fd03380408813303cba0822 [file] [log] [blame]
Name
ARB_gpu_shader5
Name Strings
GL_ARB_gpu_shader5
Contact
Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com)
Contributors
Barthold Lichtenbelt, NVIDIA
Bill Licea-Kane, AMD
Bruce Merry, ARM
Chris Dodd, NVIDIA
Eric Werness, NVIDIA
Graham Sellers, AMD
Greg Roth, NVIDIA
Jeff Bolz, NVIDIA
Nick Haemel, AMD
Pierre Boudier, AMD
Piers Daniell, NVIDIA
Notice
Copyright (c) 2010-2013 The Khronos Group Inc. Copyright terms at
http://www.khronos.org/registry/speccopyright.html
Specification Update Policy
Khronos-approved extension specifications are updated in response to
issues and bugs prioritized by the Khronos OpenGL Working Group. For
extensions which have been promoted to a core Specification, fixes will
first appear in the latest version of that core Specification, and will
eventually be backported to the extension document. This policy is
described in more detail at
https://www.khronos.org/registry/OpenGL/docs/update_policy.php
Status
Complete. Approved by the ARB at the 2010/01/22 F2F meeting.
Approved by the Khronos Board of Promoters on March 10, 2010.
Version
Version 16, March 30, 2012
Number
ARB Extension #88
Dependencies
This extension is written against the OpenGL 3.2 (Compatibility Profile)
Specification.
This extension is written against Version 1.50 (Revision 09) of the OpenGL
Shading Language Specification.
OpenGL 3.2 and GLSL 1.50 are required.
This extension interacts with ARB_gpu_shader_fp64.
This extension interacts with NV_gpu_shader5.
This extension interacts with ARB_sample_shading.
This extension interacts with ARB_texture_gather.
Overview
This extension provides a set of new features to the OpenGL Shading
Language and related APIs to support capabilities of new GPUs, extending
the capabilities of version 1.50 of the OpenGL Shading Language. Shaders
using the new functionality provided by this extension should enable this
functionality via the construct
#extension GL_ARB_gpu_shader5 : require (or enable)
This extension provides a variety of new features for all shader types,
including:
* support for indexing into arrays of samplers using non-constant
indices, as long as the index doesn't diverge if multiple shader
invocations are run in lockstep;
* extending the uniform block capability of OpenGL 3.1 and 3.2 to allow
shaders to index into an array of uniform blocks;
* support for implicitly converting signed integer types to unsigned
types, as well as more general implicit conversion and function
overloading infrastructure to support new data types introduced by
other extensions;
* a "precise" qualifier allowing computations to be carried out exactly
as specified in the shader source to avoid optimization-induced
invariance issues (which might cause cracking in tessellation);
* new built-in functions supporting:
* fused floating-point multiply-add operations;
* splitting a floating-point number into a significand and exponent
(frexp), or building a floating-point number from a significand and
exponent (ldexp);
* integer bitfield manipulation, including functions to find the
position of the most or least significant set bit, count the number
of one bits, and bitfield insertion, extraction, and reversal;
* packing and unpacking vectors of small fixed-point data types into a
larger scalar; and
* convert floating-point values to or from their integer bit
encodings;
* extending the textureGather() built-in functions provided by
ARB_texture_gather:
* allowing shaders to select any single component of a multi-component
texture to produce the gathered 2x2 footprint;
* allowing shaders to perform a per-sample depth comparison when
gathering the 2x2 footprint using for shadow sampler types;
* allowing shaders to use arbitrary offsets computed at run-time to
select a 2x2 footprint to gather from; and
* allowing shaders to use separate independent offsets for each of the
four texels returned, instead of requiring a fixed 2x2 footprint.
This extension also provides some new capabilities for individual
shader types, including:
* support for instanced geometry shaders, where a geometry shader may be
run multiple times for each primitive, including a built-in
gl_InvocationID to identify the invocation number;
* support for emitting vertices in a geometry program where each vertex
emitted may be directed independently at a specified vertex stream (as
provided by ARB_transform_feedback3), and where each shader output is
associated with a stream;
* support for reading a mask of covered samples in a fragment shader;
and
* support for interpolating a fragment shader input at a programmable
offset relative to the pixel center, a programmable sample number, or
at the centroid.
IP Status
No known IP claims.
New Procedures and Functions
None
New Tokens
Accepted by the <pname> parameter of GetProgramiv:
GEOMETRY_SHADER_INVOCATIONS 0x887F
Accepted by the <pname> parameter of GetBooleanv, GetIntegerv, GetFloatv,
GetDoublev, and GetInteger64v:
MAX_GEOMETRY_SHADER_INVOCATIONS 0x8E5A
MIN_FRAGMENT_INTERPOLATION_OFFSET 0x8E5B
MAX_FRAGMENT_INTERPOLATION_OFFSET 0x8E5C
FRAGMENT_INTERPOLATION_OFFSET_BITS 0x8E5D
MAX_VERTEX_STREAMS 0x8E71
(note: MAX_GEOMETRY_SHADER_INVOCATIONS,
MIN_FRAGMENT_INTERPOLATION_OFFSET, MAX_FRAGMENT_INTERPOLATION_OFFSET, and
FRAGMENT_INTERPOLATION_OFFSET_BITS have identical values to corresponding
"NV" enums from NV_gpu_program5. MAX_VERTEX_STREAMS is also defined in
ARB_transform_feedback3.)
Additions to Chapter 2 of the OpenGL 3.2 (Compatibility Profile) Specification
(OpenGL Operation)
Modify Section 2.15.4, Geometry Shader Execution Environment, p. 121
(add two unnumbered subsections after "Texture Access", p. 122)
Instanced Geometry Shaders
For each input primitive received by the geometry shader pipeline stage,
the geometry shader may be run once or multiple times. The number of
times a geometry shader should be executed for each input primitive may be
specified using a layout qualifier in a geometry shader of a linked
program. If the invocation count is not specified in any layout
qualifier, the invocation count will be one.
Each separate geometry shader invocation is assigned a unique invocation
number. For a geometry shader with <N> invocations, each input primitive
spawns <N> invocations, numbered 0 through <N>-1. The built-in uniform
gl_InvocationID may be used by a geometry shader invocation to determine
its invocation number.
When executing instanced geometry shaders, the output primitives generated
from each input primitive are passed to subsequent pipeline stages using
the shader invocation number to order the output. The first primitives
received by the subsequent pipeline stages are those emitted by the shader
invocation numbered zero, followed by those from the shader invocation
numbered one, and so forth. Additionally, all output primitives generated
from a given input primitive are passed to subsequent pipeline stages
before any output primitives generated from subsequent input primitives.
Geometry Shader Vertex Streams
Geometry shaders may emit primitives to multiple independent vertex
streams. Each vertex emitted by the geometry shader is directed at one of
the vertex streams. As vertices are received on each stream, they are
arranged into primitives of the type specified by the geometry shader
output primitive type. The shading language built-in functions
EndPrimitive() and EndStreamPrimitive() may be used to end the primitive
being assembled on a given vertex stream and start a new empty primitive
of the same type. If an implementation supports <N> vertex streams, the
individual streams are numbered 0 through <N>-1. There is no requirement
on the order of the streams to which vertices are emitted, and the number
of vertices emitted to each stream may be completely independent, subject
only to implementation-dependent output limits.
The primitives emitted to all vertex streams are passed to the transform
feedback stage to be captured and written to buffer objects in the manner
specified by the transform feedback state. The primitives emitted to all
streams but stream zero are discarded after transform feedback.
Primitives emitted to stream zero are passed to subsequent pipeline stages
for clipping, rasterization, and subsequent fragment processing.
Geometry shaders that emit vertices to multiple vertex streams are
currently limited to using only the "points" output primitive type. A
program will fail to link if it includes a geometry shader that calls the
EmitStreamVertex() built-in function and has any other output primitive
type parameter.
Additions to Chapter 3 of the OpenGL 3.2 (Compatibility Profile) Specification
(Rasterization)
Modify Section 3.3.1, Multisampling, p. 148
(add new paragraph at the end of the section, p. 149)
If MULTISAMPLE is enabled and the current program object includes a
fragment shader with one or more input variables qualified with "sample
in", the data associated with those variables will be assigned
independently. The values for each sample must be evaluated at the
location of the sample. The data associated with any other variables not
qualified with "sample in" need not be evaluated independently for each
sample.
Modify ARB_texture_gather, "Changes to Section 3.8.8"
(extend language describing the operation of textureGather, allowing the
new <comp> argument to select any of the four components from a
multi-component texel vector)
The textureGather and textureGatherOffset built-in shader functions... A
four-component vector is then assembled by taking a single component from
the swizzled texture source colors of the four texels, in the order
T_i0_j1, T_i1_j1, T_i1_j0, and T_i0_j0. The selected component is
identified by the optional <comp> argument, where the values zero, one,
two, and three identify the Rs, Gs, Bs, or As component, respectively. If
<comp> is omitted, it is treated as identifying the Rs component.
Incomplete textures (section 3.8.10) are considered to return a texture
source color of (0,0,0,1) for all four source texels.
(add further language describing textureGatherOffsets)
The textureGatherOffsets built-in functions from the OpenGL Shading
Language return a vector derived from sampling four texels in the image
array of level <level_base>. For each of the four texel offsets specified
by the <offsets> argument, the rules for the LINEAR minification filter
are applied to identify a 2x2 texel footprint, from which the single texel
T_i0_j0 is selected. A four-component vector is then assembled by taking
a single component from each of the four T_i0_j0 texels in the same manner
as for the textureGather function.
Modify Section 3.12.1, Shader Variables, p. 273
(insert prior to the last paragraph of the section, p. 274)
When interpolating built-in and user-defined varying variables, the default
screen-space location at which these variables are sampled is defined in
previous rasterization sections. The default location may be overriden by
interpolation qualifiers. When interpolating variables declared using
"centroid in", the variable is sampled at a location within the pixel
covered by the primitive generating the fragment. When interpolating
variables declared using "sample in" when MULTISAMPLE is enabled, the
fragment shader will be invoked separately for each covered sample and the
variable will be sampled at the corresponding sample point.
Additionally, built-in fragment shader functions provide further
fine-grained control over interpolation. The built-in functions
interpolateAtCentroid() and interpolateAtSample() will sample variables as
though they were declared with the "centroid" or "sample" qualifiers,
respectively. The built-in function interpolateAtOffset() will sample
variables at a specified (x,y) offset relative to the center of the pixel.
The range and granularity of offsets supported by this function is
implementation-dependent. If either component of the specified offset is
less than MIN_FRAGMENT_INTERPOLATION_OFFSET or greater than
MAX_FRAGMENT_INTERPOLATION_OFFSET, the position used to interpolate the
variable is undefined. Not all values of <offset> may be supported; x and
y offsets may be rounded to fixed-point values with the number of fraction
bits given by the implementation-dependent constant
FRAGMENT_INTERPOLATION_OFFSET_BITS.
Modify Section 3.12.2, Shader Execution, p. 274
(insert prior to the next-to-last paragraph in "Shader Inputs", p. 277)
The built-in variable gl_SampleMaskIn[] is an integer array holding
bitfields indicating the set of fragment samples covered by the primitive
corresponding to the fragment shader invocation. The number of elements
in the array is ceil(<s>/32), where <s> is the maximum number of color
samples supported by the implementation. Bit <n> of element <w> in the
array is set if and only if the sample numbered <w>*32+<n> is considered
covered for this fragment shader invocation. When rendering to a
non-multisample buffer, or if multisample rasterization is disabled, all
bits are zero except for bit zero of the first array element. That bit
will be one if the pixel is covered and zero otherwise. Bits in the
sample mask corresponding to covered samples that will be killed due to
SAMPLE_COVERAGE or SAMPLE_MASK_NV will not be set (section 4.1.3). When
per-sample shading is active due to the use of a fragment input qualified
by "sample", only the bit for the current sample is set in
gl_SampleMaskIn. When OpenGL API state specifies multiple fragment shader
invocations for a given fragment, the sample mask for any single fragment
shader invocation may specify a subset of the covered samples for the
fragment. In this case, the bit corresponding to each covered sample will
be set in exactly one fragment shader invocation.
Additions to Chapter 4 of the OpenGL 3.2 (Compatibility Profile) Specification
(Per-Fragment Operations and the Frame Buffer)
None.
Additions to Chapter 5 of the OpenGL 3.2 (Compatibility Profile) Specification
(Special Functions)
None.
Additions to Chapter 6 of the OpenGL 3.2 (Compatibility Profile) Specification
(State and State Requests)
Modify Section 6.1.16, Shader and Program Queries, p. 384
(add to long first paragraph, p. 386) ... If <pname> is
GEOMETRY_SHADER_INVOCATIONS, the number of geometry shader invocations per
primitive will be returned. If GEOMETRY_VERTICES_OUT,
GEOMETRY_INPUT_TYPE, GEOMETRY_OUTPUT_TYPE, or GEOMETRY_SHADER_INVOCATIONS
are queried for a program which has not been linked successfully, or which
does not contain objects to form a geometry shader, then an
INVALID_OPERATION error is generated.
Additions to Appendix A of the OpenGL 3.2 (Compatibility Profile)
Specification (Invariance)
None.
Additions to the AGL/GLX/WGL Specifications
None.
Modifications to The OpenGL Shading Language Specification, Version 1.50
(Revision 09)
Including the following line in a shader can be used to control the
language features described in this extension:
#extension GL_ARB_gpu_shader5 : <behavior>
where <behavior> is as specified in section 3.3.
New preprocessor #defines are added to the OpenGL Shading Language:
#define GL_ARB_gpu_shader5 1
Modify Section 3.6, Keywords, p. 14
(add to the keyword list)
sample
Modify Section 4.1.7, Samplers, p. 23
(modify 1st paragraph of the section, deleting the restriction requiring
constant indexing of sampler arrays but still requiring uniform indexing
across invocations) ... Samplers may aggregated into arrays within a
shader (using square brackets [ ]) and can be indexed with general integer
expressions. The results of accessing a sampler array with an
out-of-bounds index are undefined. ...
(add new paragraph restricting the use of general integer expression in
sampler array indexing) When indexing an array of samplers, the integer
expression used to index the array must be uniform across shader
invocations. If this restriction is not satisfied, the results of
accessing the sampler array are undefined. For the purposes of this
uniformity test, the index used for texture lookups performed inside a
loop is considered uniform for the <n>th loop iteration if all shader
invocations that execute the loop at least <n> times compute the same
index on that iteration. For texture lookups inside a function other than
main(), an index is considered uniform if the value is the same for all
invocations calling the function from the same point in the caller. For
nested loops and function calls, the uniformity test requires that the
index match only those other shader invocations with identical loop
iteration counts and function call chains.
Modify Section 4.1.10, Implicit Conversions, p. 27
(modify table of implicit conversions)
Can be implicitly
Type of expression converted to
--------------------- -----------------
int uint, float
ivec2 uvec2, vec2
ivec3 uvec3, vec3
ivec4 uvec4, vec4
uint float
uvec2 vec2
uvec3 vec3
uvec4 vec4
(modify second paragraph of the section) No implicit conversions are
provided to convert from unsigned to signed integer types or from
floating-point to integer types. There are no implicit array or structure
conversions.
(insert before the final paragraph of the section) When performing
implicit conversion for binary operators, there may be multiple data types
to which the two operands can be converted. For example, when adding an
int value to a uint value, both values can be implicitly converted to uint
and float. In such cases, a floating-point type is chosen if either
operand has a floating-point type. Otherwise, an unsigned integer type is
chosen if either operand has an unsigned integer type. Otherwise, a
signed integer type is chosen.
Modify Section 4.3, Storage Qualifiers, p. 29
(add to first table on the page)
Qualifier Meaning
-------------- ----------------------------------------
sample in linkage with per-sample interpolation
sample out linkage with per-sample interpolation
(modify third paragraph, p. 29) These interpolation qualifiers may only
precede the qualifiers in, centroid in, sample in, out, centroid out, or
sample out in a declaration. ...
Modify Section 4.3.4, Inputs, p. 31
(modify first paragraph of section) Shader input variables are declared
with the in, centroid in, or sample in storage qualifiers. ... Variables
declared as in, centroid in, or sample in may not be written to during
shader execution. ...
(modify third paragraph, p. 32) ... Fragment shader inputs get
per-fragment values, typically interpolated from a previous stage's
outputs. They are declared in fragment shaders with the in, centroid in,
or sample in storage qualifiers or the deprecated varying and centroid
varying storage qualifiers. ...
(add to examples immediately below)
sample in vec4 perSampleColor;
Modify Section 4.3.6, Outputs, p. 33
(modify first paragraph of section) Shader output variables are declared
with the out, centroid out, or sample out storage qualifiers. ...
(modify third paragraph of section) Vertex and geometry output variables
output per-vertex data and are declared using the out, centroid out, or
sample out storage qualifiers, or the deprecated varying storage
qualifier.
(add to examples immediately below)
sample out vec4 perSampleColor;
(modify last paragraph, p. 33) Fragment outputs output per-fragment data
and are declared using the out storage qualifier. It is an error to use
centroid out or sample out in a fragment shader. ...
Modify Section 4.3.7, Interface Blocks, p. 34
(modify last paragaph, p. 36, removing the requirement for indexing
uniform blocks using constant expressions) For uniform blocks declared as
arrays, each individual array element corresponds to a separate buffer
object backing one instance of the block. As the array size indicates the
number of buffer objects needed, uniform block array declarations must
specify an integral array size. Arbitrary indices may be used to index a
uniform block array; integral constant expressions are not required. If
the index used to access an array of uniform blocks is out-of-bounds, the
results of the access are undefined.
Modify Section 4.3.8.1, Input Layout Qualifiers, p. 37
(modify last paragraph, p. 37, and subsequent paragraphs on p. 38)
Geometry shaders support input layout qualifiers. There are two types of
layout qualifiers used to specify an input primitive type and an
invocation count. The input primitive type and invocation count
qualifiers are allowed only on the interface qualifier in, not on an input
block, block member, or variable.
layout-qualifier-id
points
lines
lines_adjacency
triangles
triangles_adjacency
invocations = integer-constant
The identifiers "points", "lines", "lines_adjacency", "triangles", and
"triangles_adjacency" are used to specify the type of input primitive
accepted by the geometry shader, and only one of these is accepted. At
least one geometry shader (compilation unit) in a program must declare an
input primitive type, and all geometry shader input primitive type
declarations in a program must declare the same type. It is not required
that all geometry shaders in a program declare an input primitive type.
The identifier "invocations" is used to specify the number of times the
geometry shader is invoked for each input primitive received. Invocation
count declarations are optional. If no invocation count is declared in
any geometry shader in the program, the geometry shader will be run once
for each input primitive. If an invocation count is declared, all such
declarations must specify the same count. If a shader specifies an
invocation count greater than the implementation-dependent maximum, it
will fail to compile.
For example,
layout(triangles, invocations=6) in;
will establish that all inputs to the geometry shader are triangles and
that the geometry shader is run six times for each triangle processed.
All geometry shader input unsized array declarations ...
Modify Section 4.3.8.2, Output Layout Qualifiers, p. 40
(modify second and subsequent paragraphs, p. 40)
Geometry shaders can have output layout qualifiers. There are three types
of output layout qualifiers used to specify an output primitive type, a
maximum output vertex count, and per-output stream numbers. The output
primitive type and output vertex count qualifiers are allowed only on the
interface qualifier out, not on an output block, block member, or variable
declaration. The output stream number qualifier is allowed on the
interface qualifier out, or on output blocks or variable declarations.
The layout qualifier identifiers for geometry shader outputs are
layout-qualifier-id
points
line_strip
triangle_strip
max_vertices = integer-constant
stream = integer-constant
The identifiers "points", "line_strip", and "triangle_strip" are used to
specify the type of output primitive produced by the geometry shader, and
only one of these is accepted. At least one geometry shader (compilation
unit) in a program must declare an output primitive type, and all geometry
shader output primitive type declarations in a program must declare the
same primitive type. It is not required that all geometry shaders in a
program declare an output primitive type.
The identifier "max_vertices" is used to specify the maximum number of
vertices the shader will ever emit in a single invocation. At least one
geometry shader (compilation unit) in a program must declare an maximum
output vertex count, and all geometry shader output vertex count
declarations in a program must declare the same count. It is not required
that all geometry shaders in a program declare a count.
In the example,
layout(triangle_strip, max_vertices = 60) out; // order does not matter
layout(max_vertices = 60) out; // redeclaration okay
layout(triangle_strip) out; // redeclaration okay
layout(points) out; // error, contradicts triangle_strip
layout(max_vertices = 30) out; // error, contradicts 60
all outputs from the geometry shader are triangles and at most 60 vertices
will be emitted by the shader. It is an error for the maximum number of
vertices to be greater than gl_MaxGeometryOutputVertices.
The identifier "stream" is used to specify that a geometry shader output
variable or block is associated with a particular vertex stream (numbered
beginning with zero). A default stream number may be declared at global
scope by qualifying interface qualifier out as in this example:
layout(stream = 1) out;
The stream number specified in such a declaration replaces any previous
default and applies to all subsequent block and variable declarations
until a new default is established. The initial default stream number is
zero.
Each output block or non-block output variable is associated with a vertex
stream. If the block or variable is declared with a stream qualifier, it
is associated with the specified stream; otherwise, it is associated with
the current default stream. A block member may be declared with a stream
qualifier, but the specified stream must match the stream associated with
the containing block. One example:
layout(stream=1) out; // default is now stream 1
out vec4 var1; // var1 gets default stream (1)
layout(stream=2) out Block1 { // "Block1" belongs to stream 2
layout(stream=2) vec4 var2; // redundant block member stream decl
layout(stream=3) vec2 var3; // ILLEGAL (must match block stream)
vec3 var4; // belongs to stream 2
};
layout(stream=0) out; // default is now stream 0
out vec4 var5; // var5 gets default stream (0)
out Block2 { // "Block2" gets default stream (0)
vec4 var6;
};
layout(stream=3) out vec4 var7; // var7 belongs to stream 3
If a geometry shader output block or variable is declared more than once,
all such declarations must associate the variable with the same vertex
stream. If any stream declaration specifies a non-existent stream number,
the shader will fail to compile.
Built-in geometry shader outputs are always associated with vertex stream
zero.
Each vertex emitted by the geometry shader is assigned to a specific
stream, and the attributes of the emitted vertex are taken from the set of
output blocks and variables assigned to the targeted stream. After each
vertex is emitted, the values of all output variables become undefined.
Additionally, the output variables associated with each vertex stream may
share storage. Writing to an output variable associated with one stream
may overwrite output variables associated with any other stream. When
emitting each vertex, a geometry shader should write to all outputs
associated with the stream to which the vertex will be emitted and to no
outputs associated with any other stream.
Modify Section 4.3.9, Interpolation, p. 42
(modify first paragraph of section, add reference to sample in/out) The
presence of and type of interpolation is controlled by the storage
qualifiers centroid in, sample in, centroid out, and sample out, by the
optional interpolation qualifiers smooth, flat, and noperspective, and by
default behaviors established through the OpenGL API when no interpolation
qualifier is present. ...
(modify second paragraph) ... A variable may be qualified as flat centroid
or flat sample, which will mean the same thing as qualifying it only as
flat.
(replace last paragraph, p. 42)
When multisample rasterization is disabled, or for fragment shader input
variables qualified with neither "centroid in" nor "sample in", the value
of the assigned variable may be interpolated anywhere within the pixel and
a single value may be assigned to each sample within the pixel, to the
extent permitted by the OpenGL Specification.
When multisample rasterization is enabled, "centroid" and "sample" may be
used to control the location and frequency of the sampling of the
qualified fragment shader input. If a fragment shader input is qualified
with "centroid", a single value may be assigned to that variable for all
samples in the pixel, but that value must be interpolated at a location
that lies in both the pixel and in the primitive being rendered, including
any of the pixel's samples covered by the primitive. Because the location
at which the variable is sampled may be different in neighboring pixels,
derivatives of centroid-sampled inputs may be less accurate than those for
non-centroid interpolated variables. If a fragment shader input is
qualified with "sample", a separate value must be assigned to that
variable for each covered sample in the pixel, and that value must be
sampled at the location of the individual sample.
(Insert before Section 4.7, Order of Qualification, p. 47)
Section 4.Q, The Precise Qualifier
Some algorithms may require that floating-point computations be carried
out in exactly the manner specified in the source code, even if the
implementation supports optimizations that could produce nearly equivalent
results with higher performance. For example, many GL implementations
support a "multiply-add" that can compute values such as
float result = (float(a) * float(b)) + float(c);
in a single operation. The result of a floating-point multiply-add may
not always be identical to first doing a multiply yielding a
floating-point result, and then doing a floating-point add. By default,
implementations are permitted to perform optimizations that effectively
modify the order of the operations used to evaluate an expression, even if
those optimizations may produce slightly different results relative to
unoptimized code.
The qualifier "precise" will ensure that operations contributing to a
variable's value are performed in the order and with the precision
specified in the source code. Order of evaluation is determined by
operator precedence and parentheses, as described in Section 5.
Expressions must be evaluated with a precision consistent with the
operation; for example, multiplying two "float" values must produce a
single value with "float" precision. This effectively prohibits the
arbitrary use of fused multiply-add operations if the intermediate
multiply result is kept at a higher precision. For example:
precise out vec4 position;
declares that computations used to produce the value of "position" must be
performed precisely using the order and precision specified. As with the
invariant qualifier (section 4.6.1), the precise qualifier may be used to
qualify a built-in or previously declared user-defined variable as being
precise:
out vec3 Color;
precise Color; // make existing Color be precise
This qualifier will affect the evaluation of expressions used on the
right-hand side of an assignment if and only if:
* the variable assigned to is qualified as "precise"; or
* the value assigned is used later in the same function, either directly
or indirectly, on the right-hand of an assignment to a variable
declared as "precise".
Expressions computed in a function are treated as precise only if assigned
to a variable qualified as "precise" in that same function. Any other
expressions within a function are not automatically treated as precise,
even if they are used to determine a value that is returned by the
function and directly assigned to a variable qualified as "precise".
Some examples of the use of "precise" include:
in vec4 a, b, c, d;
precise out vec4 v;
float func(float e, float f, float g, float h)
{
return (e*f) + (g*h); // no special precision
}
float func2(float e, float f, float g, float h)
{
precise result = (e*f) + (g*h); // ensures a precise return value
return result;
}
float func3(float i, float j, precise out float k)
{
k = i * i + j; // precise, due to <k> declaration
}
void main(void)
{
vec4 r = vec3(a * b); // precise, used to compute v.xyz
vec4 s = vec3(c * d); // precise, used to compute v.xyz
v.xyz = r + s; // precise
v.w = (a.w * b.w) + (c.w * d.w); // precise
v.x = func(a.x, b.x, c.x, d.x); // values computed in func()
// are NOT precise
v.x = func2(a.x, b.x, c.x, d.x); // precise!
func3(a.x * b.x, c.x * d.x, v.x); // precise!
}
Modify Section 4.7, Order of Qualification, p. 47
When multiple qualifications are present, they must follow a strict order.
This order is as follows:
precise-qualifier invariant-qualifier interpolation-qualifier storage-qualifier
precision-qualifier
Modify Section 5.9, Expressions, p. 57
(modify bulleted list as follows, adding support for implicit conversion
between signed and unsigned types)
Expressions in the shading language are built from the following:
* Constants of type bool, int, int64_t, uint, uint64_t, float, all vector
types, and all matrix types.
...
* The operator modulus (%) operates on signed or unsigned integer scalars
or vectors. If the fundamental types of the operands do not match, the
conversions from Section 4.1.10 "Implicit Conversions" are applied to
produce matching types. ...
Modify Section 6.1, Function Definitions, p. 63
(modify description of overloading, beginning at the top of p. 64)
Function names can be overloaded. The same function name can be used for
multiple functions, as long as the parameter types differ. If a function
name is declared twice with the same parameter types, then the return
types and all qualifiers must also match, and it is the same function
being declared. For example,
vec4 f(in vec4 x, out vec4 y); // (A)
vec4 f(in vec4 x, out uvec4 y); // (B) okay, different argument type
vec4 f(in ivec4 x, out uvec4 y); // (C) okay, different argument type
int f(in vec4 x, out ivec4 y); // error, only return type differs
vec4 f(in vec4 x, in vec4 y); // error, only qualifier differs
vec4 f(const in vec4 x, out vec4 y); // error, only qualifier differs
When function calls are resolved, an exact type match for all the
arguments is sought. If an exact match is found, all other functions are
ignored, and the exact match is used. If no exact match is found, then
the implicit conversions in Section 4.1.10 (Implicit Conversions) will be
applied to find a match. Mismatched types on input parameters (in or
inout or default) must have a conversion from the calling argument type
to the formal parameter type. Mismatched types on output parameters (out
or inout) must have a conversion from the formal parameter type to the
calling argument type.
If implicit conversions can be used to find more than one matching
function, a single best-matching function is sought. To determine a best
match, the conversions between calling argument and formal parameter
types are compared for each function argument and pair of matching
functions. After these comparisons are performed, each pair of matching
functions are compared. A function definition A is considered a better
match than function definition B if:
* for at least one function argument, the conversion for that argument
in A is better than the corresponding conversion in B; and
* there is no function argument for which the conversion in B is better
than the corresponding conversion in A.
If a single function definition is considered a better match than every
other matching function definition, it will be used. Otherwise, a
semantic error occurs and the shader will fail to compile.
To determine whether the conversion for a single argument in one match is
better than that for another match, the following rules are applied, in
order:
1. An exact match is better than a match involving any implicit
conversion.
2. A match involving an implicit conversion from float to double is
better than a match involving any other implicit conversion.
3. A match involving an implicit conversion from either int or uint to
float is better than a match involving an implicit conversion from
either int or uint to double.
If none of the rules above apply to a particular pair of conversions,
neither conversion is considered better than the other.
For the function prototypes (A), (B), and (C) above, the following
examples show how the rules apply to different sets of calling argument
types:
f(vec4, vec4); // exact match of vec4 f(in vec4 x, out vec4 y)
f(vec4, uvec4); // exact match of vec4 f(in vec4 x, out ivec4 y)
f(vec4, ivec4); // matched to vec4 f(in vec4 x, out vec4 y)
// (C) not relevant, can't convert vec4 to
// ivec4. (A) better than (B) for 2nd
// argument (rule 2), same on first argument.
f(ivec4, vec4); // NOT matched. All three match by implicit
// conversion. (C) is better than (A) and (B)
// on the first argument. (A) is better than
// (B) and (C).
Modify Section 7.1, Vertex And Geometry Shader Special Variables, p. 69
(add to the list of geometry shader special variables, p. 69)
in int gl_InvocationID;
(add to the end of the section, p. 71)
The input variable gl_InvocationID is available in the geometry language
and is filled with an integer holding the invocation number associated
with the given shader invocation. If the program is linked to support
multiple geometry shader invocations per input primitive, the invocations
are numbered 0, 1, 2, ..., <N>-1. gl_InvocationID is not available in the
vertex or fragment language.
Modify Section 7.2, Fragment Shader Special Variables, p. 72
(add to the list of built-in variables)
in int gl_SampleMaskIn[];
The variable gl_SampleMaskIn is an array of integers, each holding a
bitfield indicating the set of samples covered by the primitive generating
the fragment during multisample rasterization. The array has ceil(<s>/32)
elements, where <s> is the maximum number of color samples supported by
the implementation. Bit <n> or word <w> in the bitfield is set if and
only if the sample numbered <w>*32+<n> is considered covered for this
fragment shader invocation.
Modify Section 8.3, Common Functions, p. 84
(add support for floating-point multiply-add)
Syntax:
genType fma(genType a, genType b, genType c);
The function fma() performs a fused floating-point multiply-add to compute
the value a*b+c. The results of fma() may not be identical to evaluating
the expression (a*b)+c, because the computation may be performed in a
single operation with intermediate precision different from that used to
compute a non-fma() expression.
The results of fma() are guaranteed to be invariant given fixed inputs
<a>, <b>, and <c>, as though the result were taken from a variable
declared as "precise".
(add support for single-precision frexp and ldexp functions)
Syntax:
genType frexp(genType x, out genIType exp);
genType ldexp(genType x, in genIType exp);
The function frexp() splits each single-precision floating-point number in
<x> into a binary significand, a floating-point number in the range [0.5,
1.0), and an integral exponent of two, such that:
x = significand * 2 ^ exponent
The significand is returned by the function; the exponent is returned in
the parameter <exp>. For a floating-point value of zero, the significant
and exponent are both zero. For a floating-point value that is an
infinity or is not a number, the results of frexp() are undefined.
If the input <x> is a vector, this operation is performed in a
component-wise manner; the value returned by the function and the value
written to <exp> are vectors with the same number of components as <x>.
The function ldexp() builds a single-precision floating-point number from
each significand component in <x> and the corresponding integral exponent
of two in <exp>, returning:
significand * 2 ^ exponent
If this product is too large to be represented as a single-precision
floating-point value, the result is considered undefined.
If the input <x> is a vector, this operation is performed in a
component-wise manner; the value passed in <exp> and returned by the
function are vectors with the same number of components as <x>.
(add support for new integer built-in functions)
Syntax:
genIType bitfieldExtract(genIType value, int offset, int bits);
genUType bitfieldExtract(genUType value, int offset, int bits);
genIType bitfieldInsert(genIType base, genIType insert, int offset,
int bits);
genUType bitfieldInsert(genUType base, genUType insert, int offset,
int bits);
genIType bitfieldReverse(genIType value);
genUType bitfieldReverse(genUType value);
genIType bitCount(genIType value);
genIType bitCount(genUType value);
genIType findLSB(genIType value);
genIType findLSB(genUType value);
genIType findMSB(genIType value);
genIType findMSB(genUType value);
The function bitfieldExtract() extracts bits <offset> through
<offset>+<bits>-1 from each component in <value>, returning them in the
least significant bits of corresponding component of the result. For
unsigned data types, the most significant bits of the result will be set
to zero. For signed data types, the most significant bits will be set to
the value of bit <offset>+<base>-1. If <bits> is zero, the result will be
zero. The result will be undefined if <offset> or <bits> is negative, or
if the sum of <offset> and <bits> is greater than the number of bits used
to store the operand. Note that for vector versions of bitfieldExtract(),
a single pair of <offset> and <bits> values is shared for all components.
The function bitfieldInsert() inserts the <bits> least significant bits of
each component of <insert> into the corresponding component of <base>.
The result will have bits numbered <offset> through <offset>+<bits>-1
taken from bits 0 through <bits>-1 of <insert>, and all other bits taken
directly from the corresponding bits of <base>. If <bits> is zero, the
result will simply be <base>. The result will be undefined if <offset> or
<bits> is negative, or if the sum of <offset> and <bits> is greater than
the number of bits used to store the operand. Note that for vector
versions of bitfieldInsert(), a single pair of <offset> and <bits> values
is shared for all components.
The function bitfieldReverse() reverses the bits of <value>. The bit
numbered <n> of the result will be taken from bit (<bits>-1)-<n> of
<value>, where <bits> is the total number of bits used to represent
<value>.
The function bitCount() returns the number of one bits in the binary
representation of <value>.
The function findLSB() returns the bit number of the least significant one
bit in the binary representation of <value>. If <value> is zero, -1 will
be returned.
The function findMSB() returns the bit number of the most significant bit
in the binary representation of <value>. For positive integers, the
result will be the bit number of the most significant one bit. For
negative integers, the result will be the bit number of the most
significant zero bit. For a <value> of zero or negative one, -1 will be
returned.
(add support for general packing functions)
Syntax:
uint packUnorm2x16(vec2 v);
uint packUnorm4x8(vec4 v);
uint packSnorm4x8(vec4 v);
vec2 unpackUnorm2x16(uint v);
vec4 unpackUnorm4x8(uint v);
vec4 unpackSnorm4x8(uint v);
The functions packUnorm2x16(), packUnorm4x8(), and packSnorm4x8() first
convert each component of a two- or four-component vector of normalized
floating-point values into 8- or 16-bit integer values. Then, the results
are packed into a 32-bit unsigned integer. The first component of the
vector will be written to the least significant bits of the output; the
last component will be written to the most significant bits.
The functions unpackUnorm2x16(), unpackUnorm4x8(), and unpackSnorm4x8()
first unpacks a single 32-bit unsigned integer into a pair of 16-bit
unsigned integers, four 8-bit unsigned integers, or four 8-bit signed
integers. The, each component is converted to a normalized floating-point
value to generate a two- or four-component vector. The first component of
the vector will be extracted from the least significant bits of the input;
the last component will be extracted from the most significant bits.
The conversion between fixed- and normalized floating-point values will be
performed as below.
function conversion
--------------- -----------------------------------------------------
packUnorm2x16 fixed_val = round(clamp(float_val, 0, +1) * 65535.0);
packUnorm4x8 fixed_val = round(clamp(float_val, 0, +1) * 255.0);
packSnorm4x8 fixed_val = round(clamp(float_val, -1, +1) * 127.0);
unpackUnorm2x16 float_val = fixed_val / 65535.0;
unpackUnorm4x8 float_val = fixed_val / 255.0;
unpackSnorm4x8 float_val = clamp(fixed_val / 127.0, -1, +1);
(add functions to get/set the bit encoding for floating-point values)
32-bit floating-point data types in the OpenGL shading language are
specified to be encoded according to the IEEE 754 specification for
single-precision floating-point values. The functions below allow shaders
to convert floating-point values to and from signed or unsigned integers
representing their encoding.
To obtain signed or unsigned integer values holding the encoding of a
floating-point value, use:
genIType floatBitsToInt(genType value);
genUType floatBitsToUint(genType value);
Conversions are done on a component-by-component basis.
To obtain a floating-point value corresponding to a signed or unsigned
integer encoding, use:
genType intBitsToFloat(genIType value);
genType uintBitsToFloat(genUType value);
(support for unsigned integer add/subtract with carry-out)
Syntax:
genUType uaddCarry(genUType x, genUType y, out genUType carry);
genUType usubBorrow(genUType x, genUType y, out genUType borrow);
The function uaddCarry() adds 32-bit unsigned integers or vectors <x> and
<y>, returning the sum modulo 2^32. The value <carry> is set to zero if
the sum was less than 2^32, or one otherwise.
The function usubBorrow() subtracts the 32-bit unsigned integer or vector
<y> from <x>, returning the difference if non-negative or 2^32 plus the
difference, otherwise. The value <borrow> is set to zero if x >= y, or
one otherwise.
(support for signed and unsigned multiplies, with 32-bit inputs and a
64-bit result spanning two 32-bit outputs)
Syntax:
void umulExtended(genUType x, genUType y, out genUType msb,
out genUType lsb);
void imulExtended(genIType x, genIType y, out genIType msb,
out genIType lsb);
The functions umulExtended() and imulExtended() multiply 32-bit unsigned
or signed integers or vectors <x> and <y>, producing a 64-bit result. The
32 least significant bits are returned in <lsb>; the 32 most significant
bits are returned in <msb>.
Modify Section 8.7, Texture Lookup Functions, p. 91
(extend the basic versions of textureGather from ARB_texture_gather,
allowing for optional component selection in a multi-component texture
and for shadow mapping)
Syntax:
gvec4 textureGather(gsampler2D sampler, vec2 coord[, int comp]);
gvec4 textureGather(gsampler2DArray sampler, vec3 coord[, int comp]);
gvec4 textureGather(gsamplerCube sampler, vec3 coord[, int comp]);
gvec4 textureGather(gsamplerCubeArray sampler, vec4 coord[, int comp]);
gvec4 textureGather(gsampler2DRect sampler, vec2 coord[, int comp]);
vec4 textureGather(sampler2DShadow sampler, vec2 coord, float refZ);
vec4 textureGather(sampler2DArrayShadow sampler, vec3 coord, float refZ);
vec4 textureGather(samplerCubeShadow sampler, vec3 coord, float refZ);
vec4 textureGather(samplerCubeArrayShadow sampler, vec4 coord,
float refZ);
vec4 textureGather(sampler2DRectShadow sampler, vec2 coord, float refZ);
The textureGather() functions use the texture coordinates given by <coord>
to determine a set of four texels to sample from the texture identified by
<sampler>. These functions return a four-component vector consisting of
one component from each texel. If specified, the value of <comp> must be
a constant integer expression with a value of zero, one, two, or three,
identifying the <x>, <y>, <z>, or <w> component of the four-component
vector lookup result for each texel, respectively. If <comp> is not
specified, the <x> component of each texel will be used to generate the
result vector. As described in the OpenGL Specification, the vector
selects the post-swizzle component corresponding to <comp> from each of
the four texels, returning:
vec4(T_i0_j1(coord, base).<comp>,
T_i1_j1(coord, base).<comp>,
T_i1_j0(coord, base).<comp>,
T_i0_j0(coord, base).<comp>)
For textureGather() functions using a shadow sampler type, each of the
four texel lookups performs a depth comparison against the depth reference
value passed in <refZ>, and returns the result of that comparison in the
appropriate component of the result vector. The parameter <comp> used for
component selection is not supported for textureGather() functions with
shader sampler types.
As with other texture lookup functions, the results of textureGather() are
undefined for shadow samplers if the texture referenced is not a depth
texture or has depth comparisons disabled; or for non-shadow samplers if
the texture referenced is a depth texture with depth comparisons enabled.
(extend the "Offset" versions of textureGather from ARB_texture_gather,
allowing for optional component selection in a multi-component texture,
non-constant offsets, and shadow mapping)
Syntax:
gvec4 textureGatherOffset(gsampler2D sampler, vec2 coord,
ivec2 offset[, int comp]);
gvec4 textureGatherOffset(gsampler2DArray sampler, vec3 coord,
ivec2 offset[, int comp]);
gvec4 textureGatherOffset(gsampler2DRect sampler, vec2 coord,
ivec2 offset[, int comp]);
vec4 textureGatherOffset(sampler2DShadow sampler, vec2 coord,
float refZ, ivec2 offset);
vec4 textureGatherOffset(sampler2DArrayShadow sampler, vec3 coord,
float refZ, ivec2 offset);
vec4 textureGatherOffset(sampler2DRectShadow sampler, vec2 coord,
float refZ, ivec2 offset);
The textureGatherOffset() functions operate identically to
textureGather(), except that the 2-component integer texel offset vector
<offset> is applied as a (u,v) offset to determine the four texels to
sample. The value <offset> need not be constant; however, a limited range
of offset values are supported. If any component of <offset> is less than
MIN_PROGRAM_TEXTURE_GATHER_OFFSET_ARB or greater than
MAX_PROGRAM_TEXTURE_GATHER_OFFSET_ARB, the offset applied to the texture
coordinates is undefined. Note that <offset> does not apply to the layer
coordinate for array textures.
(add new "Offsets" versions of textureGather from ARB_texture_gather,
allowing for optional component selection in a multi-component texture,
separate non-constant offsets for each texel in the footprint, and shadow
mapping)
Syntax:
gvec4 textureGatherOffsets(gsampler2D sampler, vec2 coord,
ivec2 offsets[4][, int comp]);
gvec4 textureGatherOffsets(gsampler2DArray sampler, vec3 coord,
ivec2 offsets[4][, int comp]);
gvec4 textureGatherOffsets(gsampler2DRect sampler, vec2 coord,
ivec2 offsets[4][, int comp]);
vec4 textureGatherOffsets(sampler2DShadow sampler, vec2 coord,
float refZ, ivec2 offsets[4]);
vec4 textureGatherOffsets(sampler2DArrayShadow sampler, vec3 coord,
float refZ, ivec2 offsets[4]);
vec4 textureGatherOffsets(sampler2DRectShadow sampler, vec2 coord,
float refZ, ivec2 offsets[4]);
The textureGatherOffsets() functions operate identically to
textureGather(), except that the array of two-component integer vectors
<offsets> is used to determine the location of the four texels to sample.
Each of the four texels is obtained by applying the corresponding offset
in the four-element array <offsets> as a (u,v) coordinate offset to the
coordinates <coord>, identifying the four-texel LINEAR footprint, and then
selecting the texel T_i0_j0 of that footprint. The specified values in
<offsets> must be constant. A limited range of offset values are
supported; the minimum and maximum offset values are
implementation-dependent and given by
MIN_PROGRAM_TEXTURE_GATHER_OFFSET_ARB and
MAX_PROGRAM_TEXTURE_GATHER_OFFSET_ARB, respectively. Note that <offset>
does not apply to the layer coordinate for array textures.
Modify Section 8.8, Fragment Processing Functions, p. 101
(add new functions to the end of section, p. 102)
Built-in interpolation functions are available to compute an interpolated
value of a fragment shader input variable at a shader-specified (x,y)
location. A separate (x,y) location may be used for each invocation of
the built-in function, and those locations may differ from the default
(x,y) location used to produce the default value of the input.
float interpolateAtCentroid(float interpolant);
vec2 interpolateAtCentroid(vec2 interpolant);
vec3 interpolateAtCentroid(vec3 interpolant);
vec4 interpolateAtCentroid(vec4 interpolant);
float interpolateAtSample(float interpolant, int sample);
vec2 interpolateAtSample(vec2 interpolant, int sample);
vec3 interpolateAtSample(vec3 interpolant, int sample);
vec4 interpolateAtSample(vec4 interpolant, int sample);
float interpolateAtOffset(float interpolant, vec2 offset);
vec2 interpolateAtOffset(vec2 interpolant, vec2 offset);
vec3 interpolateAtOffset(vec3 interpolant, vec2 offset);
vec4 interpolateAtOffset(vec4 interpolant, vec2 offset);
The function interpolateAtCentroid() will return the value of the input
varying <interpolant> sampled at a location inside the both the pixel and
the primitive being processed. The value obtained would be the same value
assigned to the input variable if declared with the "centroid" qualifier.
The function interpolateAtSample() will return the value of the input
varying <interpolant> at the location of the sample numbered <sample>. If
multisample buffers are not available, the input varying will be evaluated
at the center of the pixel. If the sample number given by <sample> does
not exist, the position used to interpolate the input varying is
undefined.
The function interpolateAtOffset() will return the value of the input
varying <interpolant> sampled at an offset from the center of the pixel
specified by <offset>. The two floating-point components of <offset>
give the offset in pixels in the x and y directions, respectively.
An offset of (0,0) identifies the center of the pixel. The range and
granularity of offsets supported by this function is
implementation-dependent.
For all of the interpolation functions, <interpolant> must be an input
variable or an element of an input variable declared as an array.
Component selection operators (e.g., ".xy") may not be used when
specifying <interpolant>. If <interpolant> is declared with a "flat" or
"centroid" qualifier, the qualifier will have no effect on the
interpolated value. If <interpolant> is declared with the "noperspective"
qualifier, the interpolated value will be computed without perspective
correction.
Modify Section 8.10, Geometry Shader Functions, p. 104
(replace the section, using the following more general formulation)
These functions are only available in geometry shaders.
Syntax:
void EmitStreamVertex(int stream); // Geometry-only
void EndStreamPrimitive(int stream); // Geometry-only
void EmitVertex(); // Geometry-only
void EndPrimitive(); // Geometry-only
Description:
The function EmitStreamVertex() specifies that the vertex being generated
by the geometry shader is completed. A vertex is added to the current
output primitive in the vertex stream numbered <stream> using the current
values of all output variables associated with <stream>. The values of
any unwritten output variables associated with <stream> are undefined.
The argument <stream> must be a constant integral expression. The values
of all output variables (for all output streams) are undefined after
calling EmitStreamVertex(). If a geometry shader invocation has emitted
more vertices than permitted by the output layout qualifier
"max_vertices", the results of calling EmitStreamVertex() are undefined.
The function EmitVertex() is equivalent to calling EmitStreamVertex() with
<stream> set to zero.
The function EndStreamPrimitive() specifies that the current output
primitive for the vertex stream numbered <stream> is completed and that a
new empty output primitive of the same type should be started. The
argument <stream> must be a constant integral expression. This function
does not emit a vertex. If the output layout is declared to be "points",
calling EndPrimitive() is optional.
The function EndPrimitive() is equivalent to calling EndStreamPrimitive()
with <stream> set to zero.
A geometry shader starts with an output primitive containing no vertices
for each stream. When a geometry shader terminates, the current output
primitive for each vertex stream is automatically completed. It is not
necessary to call EndPrimitive() or EndStreamPrimitive() for any stream
where the geometry shader writes only a single primitive.
Multiple vertex streams are supported only if the output primitive type is
declared to be "points". A program will fail to link if it contains a
geometry shader calling EmitStreamVertex() or EndStreamPrimitive() if its
output primitive type is not "points".
Modify Section 9, Shading Language Grammar, p. 92
!!! TBD !!!
GLX Protocol
None.
Dependencies on ARB_gpu_shader_fp64
This extension, ARB_gpu_shader_fp64, and NV_gpu_shader5 all modify the set
of implicit conversions supported in the OpenGL Shading Language. If more
than one of these extensions is supported, an expression of one type may
be converted to another type if that conversion is allowed by any of these
specifications.
If ARB_gpu_shader_fp64 or a similar extension introducing new data types
is not supported, the function overloading rule in the GLSL specification
preferring promotion an input parameters to smaller type to a larger type
is never applicable, as all data types are of the same size. That rule
and the example referring to "double" should be removed.
Dependencies on NV_gpu_shader5
This extension, ARB_gpu_shader_fp64, and NV_gpu_shader5 all modify the set
of implicit conversions supported in the OpenGL Shading Language. If more
than one of these extensions is supported, an expression of one type may
be converted to another type if that conversion is allowed by any of these
specifications.
This specification and NV_gpu_shader5 both lift the restriction in GLSL
1.50 requiring that indexing in arrays of samplers must be done with
constant expressions. However, this extension specifies that results are
undefined if the indices would diverge if multiple shader invocations are
run in lockstep. NV_gpu_shader5 does not impose the non-divergent
indexing requirement.
If NV_gpu_shader5 is supported, integer data types are supported with four
different precisions (8-, 16, 32-, and 64-bit) and floating-point data
types are supported with three different precisions (16-, 32-, and
64-bit). The extension adds the following rule for output parameters,
which is similar to the one present in this extension for input
parameters:
5. If the formal parameters in both matches are output parameters, a
conversion from a type with a larger number of bits per component is
better than a conversion from a type with a smaller number of bits
per component. For example, a conversion from an "int16_t" formal
parameter type to "int" is better than one from an "int8_t" formal
parameter type to "int".
Such a rule is not provided in this extension because there is no
combination of types in this extension and ARB_gpu_shader_fp64 where this
rule has any effect.
Dependencies on ARB_sample_shading
This extension builds upon the per-sample shading support provided by
ARB_sample_shading to provide several new capabilities, including:
* the built-in variable gl_SampleMaskIn[] indicates the set of samples
covered by the input primitive corresponding to the fragment shader
invocation; and
* use of the "sample" qualifier on a fragment shader input forces
per-sample shading, and specifies that the value of the input be
evaluated per-sample.
There is no interaction between the extensions, except that shaders using
the features of this extension seem likely to use features from
ARB_sample_shading as well.
Dependencies on ARB_texture_gather
This extension builds upon the textureGather() built-ins provided by
ARB_texture_gather to provide several new capabilities, including:
* allowing shaders to select any single component of a multi-component
texture to produce the gathered 2x2 footprint;
* allowing shaders to perform a per-sample depth comparison when
gathering the 2x2 footprint using for shadow sampler types;
* allowing shaders to use arbitrary offsets computed at run-time to
select a 2x2 footprint to gather from; and
* allowing shaders to use separate independent offsets for each of the
four texels returned, instead of requiring a fixed 2x2 footprint.
Other than the fact that they provide similar functionality, there is no
interaction between the extensions.
Since this extension requires support for gathering from multi-component
textures, the minimum value of MAX_PROGRAM_TEXTURE_GATHER_COMPONENTS_ARB
is increased to 4.
Errors
INVALID_OPERATION is generated by GetProgram if <pname> is
GEOMETRY_SHADER_INVOCATIONS and the program which has not been linked
successfully, or does not contain objects to form a geometry shader.
New State
Add the following state to Table 6.40, Program Object State, p. 378
Initial
Get Value Type Get Command Value Description Sec. Attribute
------------------------- ---- ------------ ------- ------------------------- ------ -------
GEOMETRY_SHADER_ Z+ GetProgramiv 1 number of times a geometry 6.1.16 -
INVOCATIONS shader should be executed
for each input primitive
New Implementation Dependent State
Min.
Get Value Type Get Command Value Description Sec. Attrib
---------------------- ---- ----------- ----- -------------------------- -------- ------
MAX_GEOMETRY_SHADER_ Z+ GetIntegerv 32 maximum supported geometry 2.16.4 -
INVOCATIONS shader invocation count
MIN_FRAGMENT_INTERP- R GetFloatv -0.5 furthest negative offset 3.12.1 -
OLATION_OFFSET for interpolateAtOffset()
MAX_FRAGMENT_INTERP- R GetFloatv +0.5 furthest positive offset 3.12.1 -
OLATION_OFFSET for interpolateAtOffset()
FRAGMENT_INTERPOLATION_ Z+ GetIntegerv 4 supixel bits for 3.12.1 -
OFFSET_BITS interpolateAtOffset()
MAX_VERTEX_STREAMS Z+ GetInteger 4 total number of vertex 2.16.4 -
streams
(Note: The minimum value for MAX_PROGRAM_TEXTURE_GATHER_COMPONENTS_ARB,
added by ARB_texture_gather, is increased to 4.)
Issues
(1) This extension builds on the capability provided by
ARB_sample_shading, adding a new built-in variable for the input
sample mask. It seems likely that a shader using this mask might also
want to use one or more ARB_sample_shading built-ins. Are such
shaders required to include #extension lines for both extensions?
UNRESOLVED: It would be nice if it wasn't required.
(2) How do the per-sample shading features of this extension interact with
non-multisample rendering?
RESOLVED: Non-multisample rendering (due to no multisample buffer or
MULTISAMPLE disabled) is treated as single-sample rendering.
(3) This extension lifts the restriction requiring that indices into
samplers be constant expressions, but makes the results undefined if
the indices used would diverge in lockstep execution. What is this
good for?
RESOLVED: This allows shaders to index into samplers using integer
uniforms, or with non-divergent values computed at run-time (e.g., loop
counters). Many implementations of this extension will be SIMD, running
multiple shader invocations at once, and some implementations may have
difficulty with accessing multiple textures in a single SIMD
instruction.
Note that the NV_gpu_shader5 extension similarly lifts the restriction
but does not require non-divergent indexing.
(4) What sort of implicit conversions should we support in this and
related extensions?
RESOLVED: In GLSL 1.50, we have implicit conversion from "int" and
"uint" to "float", as well as equivalent conversions for vector type.
One of the primary motivations of this feature is to allow constants
that are nominally integer values to be used in floating-point contexts
without requiring special suffixes. The following code compiles
successfully in GLSL 1.50.
float square(float x) {
return x * x;
}
float f = 0;
float g = f * 2;
float h = square(3);
The same code would fail on GLSL 1.1, because "0", "2", and "3" would
need to be written as "0.0", "2.0", and "3.0", respectively.
This extension adds implicit conversions from "int" to "uint" to allow
for cases like:
uint square(uint x) {
return x * x;
}
uint v = square(2);
This code is legal with this extension, but not in GLSL 1.50 ("2" would
need to be replaced with "2U" or "uint(2)").
ARB_gpu_shader_fp64 adds a new type "double", and we extend existing
implicit conversions to allow for promotion of "int", "uint", and
"float" to "double".
Unlike C/C++, the general rule for implicit conversions in GLSL is that
conversions are unidirectional. If type A can be implicitly converted
to type B, type B can not be converted to type A.
(5) Increasing the number of available implicit conversions means that
there is the possibility of ambiguities in various operators? How do
we deal with these cases?
RESOLVED: For binary operators, the new implicit conversions mean that
there may be multiple ways to resolve an expression. For example, in
the following declaration
int i;
uint u;
the expression "i+u" could be resolved either by implicitly converting
"i" to "uint", or by implicitly converting both values to either "float"
or "double". To resolve, we define a set of preferences for a common
data type based on the types of the operands:
- use a floating-point type if either operand is floating-point
- use an unsigned integer type if either operand is unsigned
- use a signed integer type otherwise
If conversions to multiple precisions are supported, the
lowest-precision available data type is preferred (e.g., int*float will
be converted to float*float and not double*double).
These rules should extend naturally if new basic data types are added.
(6) Increasing the number of available implicit conversions means that
there is an increased possibility of ambiguity when function
overloading is involved? Additionally, this and related extensions
add new function overloads? How do we deal with these cases?
RESOLVED: The general rule for function overloading in GLSL 1.50 is
that we first check for a function prototype that exactly matches the
parameters passed to a function call. If no match exists, we check for
prototypes that can be matched by implicit conversions. If more than
one matching prototype can be matched by conversion, the function call
is considered ambiguous and results in a complication error.
Unfortunately, when adding new implicit conversions, it is possible for
cases that were formally unambiguous to become ambiguous. For backward
compatibility purposes, it would be desirable to ensure that shaders
that succeeded in old language versions should still compile if
"upgraded" to more recent versions/extensions. However, the new
conversions and overloads might make this more difficult without
modifying other language rules. For example, the following prototypes
are available for the standard built-in function min() on scalar values
when this extension and ARB_gpu_shader_fp64 are supported:
int min(int a, int b);
uint min(uint a, uint b);
float min(float a, float b);
double min(double a, double b);
In GLSL 1.50, a function call such as:
float f;
min(f, 1);
would be considered unambiguous because the double-precision version of
min() didn't exist and the call matched only the single-precision
version. However, with double-precision, implicit conversions can be
used to resolve to either the single- or double-precision versions.
To resolve this issue, we provide a set of rules that can be used to
resolve multiple candidates to a "best match". The rules for
determining a best match are similar to those for C++ function
overloading, but not exactly the same. Like C++, these rules compare
the conversions required on an argument-by-argument basis. A function
prototype A is better than function prototype B if:
- A is better than B for one or more arguments
- B is better than A for no arguments
If a single function prototype is better than all others, that one is
used. Otherwise, we get the same ambiguity error as on previous GLSL
versions.
As far as argument-by-argument comparisons go, the order of preference
is:
- favor exact matches
- prefer "promotions" (float->double) to other conversions
- prefer conversions from int/uint to float over similar conversion to
double
If none of the rules apply, one match is considered neither better nor
worse than the other.
With these rules, the "min(f,1)" example above resolves to the "float"
version, as is the case in GLSL 1.50. However, there are other cases
where ambiguity remains. For example, consider the prototypes:
int f(uint x);
int f(float x);
With GLSL 1.50 rules, "f(3)" would match the floating-point version, as
no implicit conversions existed from "int" to "uint". With the new
implicit conversions, both prototypes match and neither is preferred.
Because of the ambiguity, "f(3)" would fail to compile with this
extension enabled, but should still compile on implementations
supporting this extension if the extension is not enabled in GLSL source
code.
(7) The function overloading rules described in this extension describe
conversions between data types with different sizes, however all
existing data types allowing implicit conversion (int, uint, float)
are the same size? Why do we specify these rules?
RESOLVED: This extension is specified at the same time as the related
ARB_gpu_shader_fp64 and NV_gpu_shader5 extensions, which do provide such
types. The rules are specified all in one place here so we don't have
to replicate and extend the rules in the other extensions. It also
provides the ability to automatically convert from signed to unsigned
integer types, as in the C programming language.
(8) Should we support textureGather() for rectangle textures
(sampler2DRect)? They aren't in ARB_texture_gather.
RESOLVED: Yes.
(9) How does the input sample mask interact with the fixed-function
SampleCoverage and SampleMask state? Will samples be removed from the
input mask if they would be eliminated by these masks in the
per-fragment operations?
UNRESOLVED.
(10) Should we support reading patches as geometry shader inputs, and if
so, where?
RESOLVED: Not in this extension. This capability will be provided in
NV_gpu_shader5.
(11) Should we support per-sample interpolation of attributes? If so,
how?
RESOLVED. Yes. When multisample rasterization is enabled, qualifying
one or more fragment shader inputs with "sample" will force per-sample
interpolation of those attributes. If the same shader includes other
fragment inputs not qualified with sample, those attributes may be
interpolated per-pixel (i.e., all samples get the same values, likely
evaluated at the pixel center).
(12) Should we reserve "sample" as a keyword for per-sample interpolation
qualifiers, or use something more obscure, such as "per_sample"?
RESOLVED: This extension uses "sample".
(13) What should be the base data type for the bitCount(), findLSB(), and
findMSB() functions -- signed or unsigned integers?
RESOLVED: These functions will return signed values, with -1 returned
by findLSB/findMSB if no bit is found. Note that the shading language
supports implicit conversions of signed integers to unsigned, which
makes it easy enough if an unsigned result is desired.
(14) Why do EmitVertex() and EndPrimitive() begin with capitalized words
while most of the other built-ins start with a lower-case (e.g.,
emitVertex)? Which precedent should the new per-vertex stream emit
and end primitive functions follow?
RESOLVED: The inconsistency began with the original functions in
EXT_geometry_shader4; the spec author can't recall the original reasons
(if any). Regardless, we decided to match the existing functions as
closely as possible and use EmitStreamVertex() and EndStreamPrimitive().
(15) How do the textureGather functions work with sRGB textures?
RESOLVED: Gamma-correction is applied to the texture source color
before "gathering" and hence applies to all four components, unless the
texture swizzle of the selected component is ALPHA in which case no
gamma-correction is applied.
(16) How should we support arrays of uniform blocks (i.e., multiple blocks
in a group, each backed by a separate buffer object)?
RESOLVED: We will use instance names in the block definitions, which
can be declared as regular arrays:
uniform UniformData {
vec4 stuff;
} blocks[4];
These four blocks used will be referred to as "block[0]" through
"block[3]" in shader code, and "UniformData[0]" through "UniformData[3]"
in the OpenGL API code. The block member in this example will be
referred to as "UniformData.stuff" in the API. A similar approach was
already adopted in GLSL 1.50, where geometry shaders supported arrays of
input blocks that were treated similarly. Since this spec depends on
GLSL 1.50, little new spec language is required here.
(17) What are instanced geometry shaders useful for?
RESOLVED: Instanced geometry shaders allow geometry programs that
perform regular operations to run more efficiently.
Consider a simple example of an algorithm that uses geometry shaders to
render primitives to a cube map in a single pass. Without instanced
geometry shaders, the geometry shader to render triangles to the cube
map would do something like:
for (face = 0; face < 6; face++) {
for (vertex = 0; vertex < 3; vertex++) {
project vertex <vertex> onto face <face>, output position
compute/copy attributes of emitted <vertex> to outputs
output <face> to result.layer
emit the projected vertex
}
end the primitive (next triangle)
}
This algorithm would output 18 vertices per input triangle, three for
each cube face. The six triangles emitted would be rasterized, one per
face. Geometry shaders that emit a large number of attributes have
often posed performance challenges, since all the attributes must be
stored somewhere until the emitted primitives. Large storage
requirements may limit the number of threads that can be run in parallel
and reduce overall performance.
Instanced geometry shaders allow this example to be restructured to run
with six separate invocations, one per face. Each invocation projects
the triangle to only a single face (identified by the invocation number)
and emits only 3 vertices. The reduced storage requirements allow more
geometry shader invocations to be run in parallel, with greater overall
efficiency.
Additionally, the total number of attributes that can be emitted by a
single geometry shader invocation is limited. However, for instanced
geometry shaders, that limit applies to each of <N> invocations which
allows for a larger total output. For example, if the GL implementation
supports only 1024 components of output per invocation, the 18-vertex
algorithm above could emit no more than 56 components per vertex. The
same algorithm implemented as a 3-vertex 6-invocation geometry program
could theoretically allow for 341 components per vertex.
(18) Should EmitStreamVertex() and EndStreamPrimitive() accept a
non-constant stream number?
RESOLVED: Not in this extension. Requiring a constant stream number
for each call simplifies code generation for the compiler.
(19) Are there any restrictions on geometry shaders with multiple output
streams?
RESOLVED: Yes, such geometry shaders are required to generate points;
line strip and triangle strip outputs are not supported.
(20) Since multi-stream geometry shaders only support points, why does
EndStreamPrimitive() exist? Neither it nor EndStream() does anything
useful when emitting points.
RESOLVED: This function was added for completeness, and would be useful
if the requirement for emitting points were lifted by a future
extension.
(21) Should we provide mechanisms allowing shaders to examine or set the
bit representation of floating-point numbers?
RESOLVED: Yes, we will provide functions to convert single-precision
floats to/from signed and unsigned 32-bit integers. The
ARB_gpu_shader_fp64 extension will provide similar functionality for
double-precision floats. We chose to adopt the Java naming convention
here -- converting a single-precision float to/from a signed integer is
accomplished by the functions floatBitsToInt() and intBitsToFloat().
Note that this functionality has also been forked off into a separate
extension (ARB_shader_bit_encoding) that can be exported on
implementations capable of performing such conversions but not capable
of the full feature set of this extension and/or OpenGL 4.0.
(22) What is the "precise" qualifier good for?
RESOLVED: Like "invariant", "precise" provides some invariance
guarantees is useful for certain algorithms.
With an output position qualified as "invariant", we ensure that if the
same geometry is processed by multiple shaders using the exact same
code, it will be transformed in exactly the same way to ensure that we
have no cracking or flickering in multi-pass algorithms using different
shaders.
With "precise", we ensure that an algorithm can be written to produce
identical results on subtly different inputs. For example, the order of
vertices visible to a geometry or tessellation shader used to subdivide
primitive edges might present an edge shared between two primitives in
one direction for one primitive and the other direction for the adjacent
primitive. Even if the weights are identical in the two cases, there
may be cracking if the computations are being done in an order-dependent
manner. If the position of a new vertex were provided by evaluation the
function f() below with limited-precision floating-point math, it's not
necessarily the case that f(a,b,c) == f(c,b,a) in the following code:
float f(float x, float y, float z)
{
return (x + y) + z;
}
This function f() can be rewritten as follows with "precise" and a
symmetric evaluation order to ensure that f(a,b,c) == f(c,b,a).
float f(float x, float y, float z)
{
// Note that we intentionally compute "(x+z)" instead of "(x+y)"
// here, because that value will be the same when <x> and <z>
// are reversed.
precise float result = (x + z) + y;
return result;
}
(a + b) + c == (c + b) + a
The "precise" qualifier will disable certain optimization and thus
carries a performance cost. The cost may be higher than "invariant",
because "invariant" permits optimizations disallowed by "precise" as
long as the compiler ensures that it always optimizes in the exact same
manner.
(23) What computations will be affected by the "precise" qualifier, and
what computations aren't?
RESOLVED: We will ensure precise computation of any expressions within
a single function used directly or indirectly to produce the value of a
variable qualified as "precise".
We chose not to provide this guarantee across function boundaries, even
if the results of a function are used in the computation of an output
qualified as "precise". Algorithms requiring the use of "precise" may
have a mix of computations, some required to be precise, some not. This
function boundary rule may serve to limit the amount of computation
indirectly forced to be precise.
Additionally, the subroutine rule permits non-precise sub-operations in
a computation required to be precise. For example, a shader might need
to compute a "precise" position by taking a weighted average as in the
following code:
precise vec3 pos = (p[0]*w[0] + p[1]*w[1]) + (p[2]*w[2] + p[3]*w[3]);
However, if the main precision requirement is that the same result be
generated when <p> and <w> are reversed, the following code also gets
the job done, even if posmad() is implemented with multiply-add
operations.
vec3 posmad(vec3 p0, float w0, vec3 p1w1) { return p0*w0+p1w1; }
precise vec3 pos = (posmad(p[0], w[0], p[1]*w[1]) +
posmad(p[3], w[3], p[2]*w[2]));
To generate precise results within a function, the function arguments
and/or temporaries within the function body should be qualified as
"precise" as needed.
Note that when applying "precise" rules to assignments, indirect
application of this rule applies on an assignment-by-assignment basis.
In the following perverse example:
float a,b,c,d,e,f;
precise float g;
f = a + b + c;
...
f = c + d + e;
g = f * 2.0;
The first assignment to <f> need not be treated as "precise", since the
value assigned will have no effect on the final value of the
precise-qualified <g>. The second assignment to <f> must be evaluated
precisely. The fact that one assignment to a variable needs to be
treated as precise does not mean that the variable itself is implicitly
treated as "precise".
(24) Are "precise" qualifiers allowed on function arguments? If so, what
do they mean? Can a return value for a function be declared as
precise?
RESOLVED: Yes; the rules permit the use of "precise" on any variable
declaration, including function arguments. The code
float f(precise in vec4 arg1, precise out vec4 arg2) { ... }
specifies that any expressions used to assign values to <arg1> or <arg2>
within f() will be evaluated as a precise manner.
Expressions used to derive the value passed to the function f() as
<arg1> will be treated as precise according to the normal rules. The
expression for <arg1> is treated as precise if and only if the function
call is on the right-hand side of an assignment to a variable qualified
as "precise" or is indirectly used in an assignment to such a variable.
It is not automatically treated as precise just because the formal
parameter <arg1> is qualified with "precise".
For the purposes of this rule, variables passed as "out" parameters do
not count as assignments. Values assigned to an output parameter will
not be evaluated precisely just because the caller provides a variable
qualified as "precise". When the output parameter itself is qualified
as "precise", precise evaluation of that output is required within the
callee.
We chose not to permit function return values to be qualified as
"precise", though we could have hypothetically allowed code such as:
precise float f(float a, float b, float c) { return (a+b)+c; }
To obtain a precise return value in such a case, use code such as:
float f(float a, float b, float c)
{
precise float result = (a+b) + c;
return result;
}
(25) How does texture gather interact with incomplete textures?
RESOLVED: For regular texture lookups, incomplete textures are
considered to return a texel value with RGBA components of (0,0,0,1).
For texture gather operations, each texel in the sampled footprint is
considered to have RGBA components of (0,0,0,1). When using the
textureGather() function to select the R, G, or B component of an
incomplete texture, (0,0,0,0) will be returned. When selecting the A
component, (1,1,1,1) will be returned.
Revision History
Rev. Date Author Changes
---- -------- -------- -----------------------------------------
16 03/30/12 pbrown Fix typo in language restricting the use of
EmitStreamVertex()/EndStreamPrimitive() to
programs with an output primitive type of
points, not an input type of points (bug 8371).
15 10/17/11 pbrown Fix prototypes for textureGather and
textureGatherOffset to use vec2 coordinates for
"2DRect" sampler versions (bug 7964).
14 01/27/11 pbrown Add further clarification on the interaction
of texture gather and incomplete textures (bug
7289).
13 09/24/10 pbrown Clarify the interaction of texture gather
with swizzle (bug 5910), fixing conflicts
between API and GLSL spec language.
Consolidate into one copy in the API
spec.
12 03/23/10 pbrown Update issues section, both fixing/numbering
existing issues and including other issues
that were left behind in NV_gpu_shader5 when the
specs were refactored.
11 03/23/10 Jon Leech Describe <offset> to interpolateAtOffset
without implying it is a constant expression
(Bug 6026).
10 03/07/10 pbrown Fix typo in an output stream qualifier example.
9 03/05/10 pbrown Modify function overloading rules to remove
most preferences when converting between
two different types. The only preferences
that remain are promoting "float" to "double"
over other conversions, and preferring
conversion of integers to "float" to converting
to "double" (bug 5938).
8 01/29/10 pbrown Update the spec to require that the minimum
value for MAX_PROGRAM_TEXTURE_GATHER_-
COMPONENTS is 4 (bug 5919).
7 01/21/10 pbrown Clarify the rules for determining a best match
if implicit conversions can result in multiple
matching function prototypes. Modify the rules
to pick a best match by comparing pairs of
functions, and using any function deemed better
than any other choice. Modify the argument
conversion preference rules for overloading to
disfavor "int" to "uint" conversions, for
backward compatibility with previous GLSL
versions. Add some new discussion of the
choices involved to the issues section (bug
5938).
6 01/14/10 pbrown Minor wording updates from spec reviews.
5 12/10/09 pbrown Functionality updates from spec review:
Rename fmad to fma. Fix error in spec
language for negative diffs in usubBorrow.
4 12/10/09 pbrown Convert from EXT to ARB.
3 12/08/09 pbrown Miscellaneous fixes from spec review: Added
missing implementation constants for
interpolation offset range and granularity;
added explicit section to OpenGL spec describing
shader requested interpolation modifiers and
functions. Clean up more dangling "ThreadID"
references. General typo fixes and language
clarifications.
2 10/01/09 pbrown Renamed gl_ThreadID to gl_InvocationID.
1 pbrown Internal revisions.