| Name |
| |
| ARB_gpu_shader5 |
| |
| Name Strings |
| |
| GL_ARB_gpu_shader5 |
| |
| Contact |
| |
| Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com) |
| |
| Contributors |
| |
| Barthold Lichtenbelt, NVIDIA |
| Bill Licea-Kane, AMD |
| Bruce Merry, ARM |
| Chris Dodd, NVIDIA |
| Eric Werness, NVIDIA |
| Graham Sellers, AMD |
| Greg Roth, NVIDIA |
| Jeff Bolz, NVIDIA |
| Nick Haemel, AMD |
| Pierre Boudier, AMD |
| Piers Daniell, NVIDIA |
| |
| Notice |
| |
| Copyright (c) 2010-2013 The Khronos Group Inc. Copyright terms at |
| http://www.khronos.org/registry/speccopyright.html |
| |
| Specification Update Policy |
| |
| Khronos-approved extension specifications are updated in response to |
| issues and bugs prioritized by the Khronos OpenGL Working Group. For |
| extensions which have been promoted to a core Specification, fixes will |
| first appear in the latest version of that core Specification, and will |
| eventually be backported to the extension document. This policy is |
| described in more detail at |
| https://www.khronos.org/registry/OpenGL/docs/update_policy.php |
| |
| Status |
| |
| Complete. Approved by the ARB at the 2010/01/22 F2F meeting. |
| Approved by the Khronos Board of Promoters on March 10, 2010. |
| |
| Version |
| |
| Version 16, March 30, 2012 |
| |
| Number |
| |
| ARB Extension #88 |
| |
| Dependencies |
| |
| This extension is written against the OpenGL 3.2 (Compatibility Profile) |
| Specification. |
| |
| This extension is written against Version 1.50 (Revision 09) of the OpenGL |
| Shading Language Specification. |
| |
| OpenGL 3.2 and GLSL 1.50 are required. |
| |
| This extension interacts with ARB_gpu_shader_fp64. |
| |
| This extension interacts with NV_gpu_shader5. |
| |
| This extension interacts with ARB_sample_shading. |
| |
| This extension interacts with ARB_texture_gather. |
| |
| Overview |
| |
| This extension provides a set of new features to the OpenGL Shading |
| Language and related APIs to support capabilities of new GPUs, extending |
| the capabilities of version 1.50 of the OpenGL Shading Language. Shaders |
| using the new functionality provided by this extension should enable this |
| functionality via the construct |
| |
| #extension GL_ARB_gpu_shader5 : require (or enable) |
| |
| This extension provides a variety of new features for all shader types, |
| including: |
| |
| * support for indexing into arrays of samplers using non-constant |
| indices, as long as the index doesn't diverge if multiple shader |
| invocations are run in lockstep; |
| |
| * extending the uniform block capability of OpenGL 3.1 and 3.2 to allow |
| shaders to index into an array of uniform blocks; |
| |
| * support for implicitly converting signed integer types to unsigned |
| types, as well as more general implicit conversion and function |
| overloading infrastructure to support new data types introduced by |
| other extensions; |
| |
| * a "precise" qualifier allowing computations to be carried out exactly |
| as specified in the shader source to avoid optimization-induced |
| invariance issues (which might cause cracking in tessellation); |
| |
| * new built-in functions supporting: |
| |
| * fused floating-point multiply-add operations; |
| |
| * splitting a floating-point number into a significand and exponent |
| (frexp), or building a floating-point number from a significand and |
| exponent (ldexp); |
| |
| * integer bitfield manipulation, including functions to find the |
| position of the most or least significant set bit, count the number |
| of one bits, and bitfield insertion, extraction, and reversal; |
| |
| * packing and unpacking vectors of small fixed-point data types into a |
| larger scalar; and |
| |
| * convert floating-point values to or from their integer bit |
| encodings; |
| |
| * extending the textureGather() built-in functions provided by |
| ARB_texture_gather: |
| |
| * allowing shaders to select any single component of a multi-component |
| texture to produce the gathered 2x2 footprint; |
| |
| * allowing shaders to perform a per-sample depth comparison when |
| gathering the 2x2 footprint using for shadow sampler types; |
| |
| * allowing shaders to use arbitrary offsets computed at run-time to |
| select a 2x2 footprint to gather from; and |
| |
| * allowing shaders to use separate independent offsets for each of the |
| four texels returned, instead of requiring a fixed 2x2 footprint. |
| |
| This extension also provides some new capabilities for individual |
| shader types, including: |
| |
| * support for instanced geometry shaders, where a geometry shader may be |
| run multiple times for each primitive, including a built-in |
| gl_InvocationID to identify the invocation number; |
| |
| * support for emitting vertices in a geometry program where each vertex |
| emitted may be directed independently at a specified vertex stream (as |
| provided by ARB_transform_feedback3), and where each shader output is |
| associated with a stream; |
| |
| * support for reading a mask of covered samples in a fragment shader; |
| and |
| |
| * support for interpolating a fragment shader input at a programmable |
| offset relative to the pixel center, a programmable sample number, or |
| at the centroid. |
| |
| IP Status |
| |
| No known IP claims. |
| |
| New Procedures and Functions |
| |
| None |
| |
| New Tokens |
| |
| Accepted by the <pname> parameter of GetProgramiv: |
| |
| GEOMETRY_SHADER_INVOCATIONS 0x887F |
| |
| Accepted by the <pname> parameter of GetBooleanv, GetIntegerv, GetFloatv, |
| GetDoublev, and GetInteger64v: |
| |
| MAX_GEOMETRY_SHADER_INVOCATIONS 0x8E5A |
| MIN_FRAGMENT_INTERPOLATION_OFFSET 0x8E5B |
| MAX_FRAGMENT_INTERPOLATION_OFFSET 0x8E5C |
| FRAGMENT_INTERPOLATION_OFFSET_BITS 0x8E5D |
| MAX_VERTEX_STREAMS 0x8E71 |
| |
| (note: MAX_GEOMETRY_SHADER_INVOCATIONS, |
| MIN_FRAGMENT_INTERPOLATION_OFFSET, MAX_FRAGMENT_INTERPOLATION_OFFSET, and |
| FRAGMENT_INTERPOLATION_OFFSET_BITS have identical values to corresponding |
| "NV" enums from NV_gpu_program5. MAX_VERTEX_STREAMS is also defined in |
| ARB_transform_feedback3.) |
| |
| |
| Additions to Chapter 2 of the OpenGL 3.2 (Compatibility Profile) Specification |
| (OpenGL Operation) |
| |
| Modify Section 2.15.4, Geometry Shader Execution Environment, p. 121 |
| |
| (add two unnumbered subsections after "Texture Access", p. 122) |
| |
| Instanced Geometry Shaders |
| |
| For each input primitive received by the geometry shader pipeline stage, |
| the geometry shader may be run once or multiple times. The number of |
| times a geometry shader should be executed for each input primitive may be |
| specified using a layout qualifier in a geometry shader of a linked |
| program. If the invocation count is not specified in any layout |
| qualifier, the invocation count will be one. |
| |
| Each separate geometry shader invocation is assigned a unique invocation |
| number. For a geometry shader with <N> invocations, each input primitive |
| spawns <N> invocations, numbered 0 through <N>-1. The built-in uniform |
| gl_InvocationID may be used by a geometry shader invocation to determine |
| its invocation number. |
| |
| When executing instanced geometry shaders, the output primitives generated |
| from each input primitive are passed to subsequent pipeline stages using |
| the shader invocation number to order the output. The first primitives |
| received by the subsequent pipeline stages are those emitted by the shader |
| invocation numbered zero, followed by those from the shader invocation |
| numbered one, and so forth. Additionally, all output primitives generated |
| from a given input primitive are passed to subsequent pipeline stages |
| before any output primitives generated from subsequent input primitives. |
| |
| |
| Geometry Shader Vertex Streams |
| |
| Geometry shaders may emit primitives to multiple independent vertex |
| streams. Each vertex emitted by the geometry shader is directed at one of |
| the vertex streams. As vertices are received on each stream, they are |
| arranged into primitives of the type specified by the geometry shader |
| output primitive type. The shading language built-in functions |
| EndPrimitive() and EndStreamPrimitive() may be used to end the primitive |
| being assembled on a given vertex stream and start a new empty primitive |
| of the same type. If an implementation supports <N> vertex streams, the |
| individual streams are numbered 0 through <N>-1. There is no requirement |
| on the order of the streams to which vertices are emitted, and the number |
| of vertices emitted to each stream may be completely independent, subject |
| only to implementation-dependent output limits. |
| |
| The primitives emitted to all vertex streams are passed to the transform |
| feedback stage to be captured and written to buffer objects in the manner |
| specified by the transform feedback state. The primitives emitted to all |
| streams but stream zero are discarded after transform feedback. |
| Primitives emitted to stream zero are passed to subsequent pipeline stages |
| for clipping, rasterization, and subsequent fragment processing. |
| |
| Geometry shaders that emit vertices to multiple vertex streams are |
| currently limited to using only the "points" output primitive type. A |
| program will fail to link if it includes a geometry shader that calls the |
| EmitStreamVertex() built-in function and has any other output primitive |
| type parameter. |
| |
| |
| Additions to Chapter 3 of the OpenGL 3.2 (Compatibility Profile) Specification |
| (Rasterization) |
| |
| Modify Section 3.3.1, Multisampling, p. 148 |
| |
| (add new paragraph at the end of the section, p. 149) |
| |
| If MULTISAMPLE is enabled and the current program object includes a |
| fragment shader with one or more input variables qualified with "sample |
| in", the data associated with those variables will be assigned |
| independently. The values for each sample must be evaluated at the |
| location of the sample. The data associated with any other variables not |
| qualified with "sample in" need not be evaluated independently for each |
| sample. |
| |
| |
| Modify ARB_texture_gather, "Changes to Section 3.8.8" |
| |
| (extend language describing the operation of textureGather, allowing the |
| new <comp> argument to select any of the four components from a |
| multi-component texel vector) |
| |
| The textureGather and textureGatherOffset built-in shader functions... A |
| four-component vector is then assembled by taking a single component from |
| the swizzled texture source colors of the four texels, in the order |
| T_i0_j1, T_i1_j1, T_i1_j0, and T_i0_j0. The selected component is |
| identified by the optional <comp> argument, where the values zero, one, |
| two, and three identify the Rs, Gs, Bs, or As component, respectively. If |
| <comp> is omitted, it is treated as identifying the Rs component. |
| Incomplete textures (section 3.8.10) are considered to return a texture |
| source color of (0,0,0,1) for all four source texels. |
| |
| (add further language describing textureGatherOffsets) |
| |
| The textureGatherOffsets built-in functions from the OpenGL Shading |
| Language return a vector derived from sampling four texels in the image |
| array of level <level_base>. For each of the four texel offsets specified |
| by the <offsets> argument, the rules for the LINEAR minification filter |
| are applied to identify a 2x2 texel footprint, from which the single texel |
| T_i0_j0 is selected. A four-component vector is then assembled by taking |
| a single component from each of the four T_i0_j0 texels in the same manner |
| as for the textureGather function. |
| |
| |
| Modify Section 3.12.1, Shader Variables, p. 273 |
| |
| (insert prior to the last paragraph of the section, p. 274) |
| |
| When interpolating built-in and user-defined varying variables, the default |
| screen-space location at which these variables are sampled is defined in |
| previous rasterization sections. The default location may be overriden by |
| interpolation qualifiers. When interpolating variables declared using |
| "centroid in", the variable is sampled at a location within the pixel |
| covered by the primitive generating the fragment. When interpolating |
| variables declared using "sample in" when MULTISAMPLE is enabled, the |
| fragment shader will be invoked separately for each covered sample and the |
| variable will be sampled at the corresponding sample point. |
| |
| Additionally, built-in fragment shader functions provide further |
| fine-grained control over interpolation. The built-in functions |
| interpolateAtCentroid() and interpolateAtSample() will sample variables as |
| though they were declared with the "centroid" or "sample" qualifiers, |
| respectively. The built-in function interpolateAtOffset() will sample |
| variables at a specified (x,y) offset relative to the center of the pixel. |
| The range and granularity of offsets supported by this function is |
| implementation-dependent. If either component of the specified offset is |
| less than MIN_FRAGMENT_INTERPOLATION_OFFSET or greater than |
| MAX_FRAGMENT_INTERPOLATION_OFFSET, the position used to interpolate the |
| variable is undefined. Not all values of <offset> may be supported; x and |
| y offsets may be rounded to fixed-point values with the number of fraction |
| bits given by the implementation-dependent constant |
| FRAGMENT_INTERPOLATION_OFFSET_BITS. |
| |
| |
| Modify Section 3.12.2, Shader Execution, p. 274 |
| |
| (insert prior to the next-to-last paragraph in "Shader Inputs", p. 277) |
| |
| The built-in variable gl_SampleMaskIn[] is an integer array holding |
| bitfields indicating the set of fragment samples covered by the primitive |
| corresponding to the fragment shader invocation. The number of elements |
| in the array is ceil(<s>/32), where <s> is the maximum number of color |
| samples supported by the implementation. Bit <n> of element <w> in the |
| array is set if and only if the sample numbered <w>*32+<n> is considered |
| covered for this fragment shader invocation. When rendering to a |
| non-multisample buffer, or if multisample rasterization is disabled, all |
| bits are zero except for bit zero of the first array element. That bit |
| will be one if the pixel is covered and zero otherwise. Bits in the |
| sample mask corresponding to covered samples that will be killed due to |
| SAMPLE_COVERAGE or SAMPLE_MASK_NV will not be set (section 4.1.3). When |
| per-sample shading is active due to the use of a fragment input qualified |
| by "sample", only the bit for the current sample is set in |
| gl_SampleMaskIn. When OpenGL API state specifies multiple fragment shader |
| invocations for a given fragment, the sample mask for any single fragment |
| shader invocation may specify a subset of the covered samples for the |
| fragment. In this case, the bit corresponding to each covered sample will |
| be set in exactly one fragment shader invocation. |
| |
| |
| Additions to Chapter 4 of the OpenGL 3.2 (Compatibility Profile) Specification |
| (Per-Fragment Operations and the Frame Buffer) |
| |
| None. |
| |
| Additions to Chapter 5 of the OpenGL 3.2 (Compatibility Profile) Specification |
| (Special Functions) |
| |
| None. |
| |
| Additions to Chapter 6 of the OpenGL 3.2 (Compatibility Profile) Specification |
| (State and State Requests) |
| |
| Modify Section 6.1.16, Shader and Program Queries, p. 384 |
| |
| (add to long first paragraph, p. 386) ... If <pname> is |
| GEOMETRY_SHADER_INVOCATIONS, the number of geometry shader invocations per |
| primitive will be returned. If GEOMETRY_VERTICES_OUT, |
| GEOMETRY_INPUT_TYPE, GEOMETRY_OUTPUT_TYPE, or GEOMETRY_SHADER_INVOCATIONS |
| are queried for a program which has not been linked successfully, or which |
| does not contain objects to form a geometry shader, then an |
| INVALID_OPERATION error is generated. |
| |
| |
| Additions to Appendix A of the OpenGL 3.2 (Compatibility Profile) |
| Specification (Invariance) |
| |
| None. |
| |
| Additions to the AGL/GLX/WGL Specifications |
| |
| None. |
| |
| Modifications to The OpenGL Shading Language Specification, Version 1.50 |
| (Revision 09) |
| |
| Including the following line in a shader can be used to control the |
| language features described in this extension: |
| |
| #extension GL_ARB_gpu_shader5 : <behavior> |
| |
| where <behavior> is as specified in section 3.3. |
| |
| New preprocessor #defines are added to the OpenGL Shading Language: |
| |
| #define GL_ARB_gpu_shader5 1 |
| |
| |
| Modify Section 3.6, Keywords, p. 14 |
| |
| (add to the keyword list) |
| |
| sample |
| |
| |
| Modify Section 4.1.7, Samplers, p. 23 |
| |
| (modify 1st paragraph of the section, deleting the restriction requiring |
| constant indexing of sampler arrays but still requiring uniform indexing |
| across invocations) ... Samplers may aggregated into arrays within a |
| shader (using square brackets [ ]) and can be indexed with general integer |
| expressions. The results of accessing a sampler array with an |
| out-of-bounds index are undefined. ... |
| |
| (add new paragraph restricting the use of general integer expression in |
| sampler array indexing) When indexing an array of samplers, the integer |
| expression used to index the array must be uniform across shader |
| invocations. If this restriction is not satisfied, the results of |
| accessing the sampler array are undefined. For the purposes of this |
| uniformity test, the index used for texture lookups performed inside a |
| loop is considered uniform for the <n>th loop iteration if all shader |
| invocations that execute the loop at least <n> times compute the same |
| index on that iteration. For texture lookups inside a function other than |
| main(), an index is considered uniform if the value is the same for all |
| invocations calling the function from the same point in the caller. For |
| nested loops and function calls, the uniformity test requires that the |
| index match only those other shader invocations with identical loop |
| iteration counts and function call chains. |
| |
| |
| Modify Section 4.1.10, Implicit Conversions, p. 27 |
| |
| (modify table of implicit conversions) |
| |
| Can be implicitly |
| Type of expression converted to |
| --------------------- ----------------- |
| int uint, float |
| ivec2 uvec2, vec2 |
| ivec3 uvec3, vec3 |
| ivec4 uvec4, vec4 |
| |
| uint float |
| uvec2 vec2 |
| uvec3 vec3 |
| uvec4 vec4 |
| |
| (modify second paragraph of the section) No implicit conversions are |
| provided to convert from unsigned to signed integer types or from |
| floating-point to integer types. There are no implicit array or structure |
| conversions. |
| |
| (insert before the final paragraph of the section) When performing |
| implicit conversion for binary operators, there may be multiple data types |
| to which the two operands can be converted. For example, when adding an |
| int value to a uint value, both values can be implicitly converted to uint |
| and float. In such cases, a floating-point type is chosen if either |
| operand has a floating-point type. Otherwise, an unsigned integer type is |
| chosen if either operand has an unsigned integer type. Otherwise, a |
| signed integer type is chosen. |
| |
| |
| Modify Section 4.3, Storage Qualifiers, p. 29 |
| |
| (add to first table on the page) |
| |
| Qualifier Meaning |
| -------------- ---------------------------------------- |
| sample in linkage with per-sample interpolation |
| sample out linkage with per-sample interpolation |
| |
| (modify third paragraph, p. 29) These interpolation qualifiers may only |
| precede the qualifiers in, centroid in, sample in, out, centroid out, or |
| sample out in a declaration. ... |
| |
| |
| Modify Section 4.3.4, Inputs, p. 31 |
| |
| (modify first paragraph of section) Shader input variables are declared |
| with the in, centroid in, or sample in storage qualifiers. ... Variables |
| declared as in, centroid in, or sample in may not be written to during |
| shader execution. ... |
| |
| (modify third paragraph, p. 32) ... Fragment shader inputs get |
| per-fragment values, typically interpolated from a previous stage's |
| outputs. They are declared in fragment shaders with the in, centroid in, |
| or sample in storage qualifiers or the deprecated varying and centroid |
| varying storage qualifiers. ... |
| |
| (add to examples immediately below) |
| |
| sample in vec4 perSampleColor; |
| |
| |
| Modify Section 4.3.6, Outputs, p. 33 |
| |
| (modify first paragraph of section) Shader output variables are declared |
| with the out, centroid out, or sample out storage qualifiers. ... |
| |
| (modify third paragraph of section) Vertex and geometry output variables |
| output per-vertex data and are declared using the out, centroid out, or |
| sample out storage qualifiers, or the deprecated varying storage |
| qualifier. |
| |
| (add to examples immediately below) |
| |
| sample out vec4 perSampleColor; |
| |
| (modify last paragraph, p. 33) Fragment outputs output per-fragment data |
| and are declared using the out storage qualifier. It is an error to use |
| centroid out or sample out in a fragment shader. ... |
| |
| |
| Modify Section 4.3.7, Interface Blocks, p. 34 |
| |
| (modify last paragaph, p. 36, removing the requirement for indexing |
| uniform blocks using constant expressions) For uniform blocks declared as |
| arrays, each individual array element corresponds to a separate buffer |
| object backing one instance of the block. As the array size indicates the |
| number of buffer objects needed, uniform block array declarations must |
| specify an integral array size. Arbitrary indices may be used to index a |
| uniform block array; integral constant expressions are not required. If |
| the index used to access an array of uniform blocks is out-of-bounds, the |
| results of the access are undefined. |
| |
| |
| Modify Section 4.3.8.1, Input Layout Qualifiers, p. 37 |
| |
| (modify last paragraph, p. 37, and subsequent paragraphs on p. 38) |
| |
| Geometry shaders support input layout qualifiers. There are two types of |
| layout qualifiers used to specify an input primitive type and an |
| invocation count. The input primitive type and invocation count |
| qualifiers are allowed only on the interface qualifier in, not on an input |
| block, block member, or variable. |
| |
| layout-qualifier-id |
| points |
| lines |
| lines_adjacency |
| triangles |
| triangles_adjacency |
| invocations = integer-constant |
| |
| The identifiers "points", "lines", "lines_adjacency", "triangles", and |
| "triangles_adjacency" are used to specify the type of input primitive |
| accepted by the geometry shader, and only one of these is accepted. At |
| least one geometry shader (compilation unit) in a program must declare an |
| input primitive type, and all geometry shader input primitive type |
| declarations in a program must declare the same type. It is not required |
| that all geometry shaders in a program declare an input primitive type. |
| |
| The identifier "invocations" is used to specify the number of times the |
| geometry shader is invoked for each input primitive received. Invocation |
| count declarations are optional. If no invocation count is declared in |
| any geometry shader in the program, the geometry shader will be run once |
| for each input primitive. If an invocation count is declared, all such |
| declarations must specify the same count. If a shader specifies an |
| invocation count greater than the implementation-dependent maximum, it |
| will fail to compile. |
| |
| For example, |
| |
| layout(triangles, invocations=6) in; |
| |
| will establish that all inputs to the geometry shader are triangles and |
| that the geometry shader is run six times for each triangle processed. |
| |
| All geometry shader input unsized array declarations ... |
| |
| |
| Modify Section 4.3.8.2, Output Layout Qualifiers, p. 40 |
| |
| (modify second and subsequent paragraphs, p. 40) |
| |
| Geometry shaders can have output layout qualifiers. There are three types |
| of output layout qualifiers used to specify an output primitive type, a |
| maximum output vertex count, and per-output stream numbers. The output |
| primitive type and output vertex count qualifiers are allowed only on the |
| interface qualifier out, not on an output block, block member, or variable |
| declaration. The output stream number qualifier is allowed on the |
| interface qualifier out, or on output blocks or variable declarations. |
| |
| The layout qualifier identifiers for geometry shader outputs are |
| |
| layout-qualifier-id |
| points |
| line_strip |
| triangle_strip |
| max_vertices = integer-constant |
| stream = integer-constant |
| |
| The identifiers "points", "line_strip", and "triangle_strip" are used to |
| specify the type of output primitive produced by the geometry shader, and |
| only one of these is accepted. At least one geometry shader (compilation |
| unit) in a program must declare an output primitive type, and all geometry |
| shader output primitive type declarations in a program must declare the |
| same primitive type. It is not required that all geometry shaders in a |
| program declare an output primitive type. |
| |
| The identifier "max_vertices" is used to specify the maximum number of |
| vertices the shader will ever emit in a single invocation. At least one |
| geometry shader (compilation unit) in a program must declare an maximum |
| output vertex count, and all geometry shader output vertex count |
| declarations in a program must declare the same count. It is not required |
| that all geometry shaders in a program declare a count. |
| |
| In the example, |
| |
| layout(triangle_strip, max_vertices = 60) out; // order does not matter |
| layout(max_vertices = 60) out; // redeclaration okay |
| layout(triangle_strip) out; // redeclaration okay |
| layout(points) out; // error, contradicts triangle_strip |
| layout(max_vertices = 30) out; // error, contradicts 60 |
| |
| all outputs from the geometry shader are triangles and at most 60 vertices |
| will be emitted by the shader. It is an error for the maximum number of |
| vertices to be greater than gl_MaxGeometryOutputVertices. |
| |
| The identifier "stream" is used to specify that a geometry shader output |
| variable or block is associated with a particular vertex stream (numbered |
| beginning with zero). A default stream number may be declared at global |
| scope by qualifying interface qualifier out as in this example: |
| |
| layout(stream = 1) out; |
| |
| The stream number specified in such a declaration replaces any previous |
| default and applies to all subsequent block and variable declarations |
| until a new default is established. The initial default stream number is |
| zero. |
| |
| Each output block or non-block output variable is associated with a vertex |
| stream. If the block or variable is declared with a stream qualifier, it |
| is associated with the specified stream; otherwise, it is associated with |
| the current default stream. A block member may be declared with a stream |
| qualifier, but the specified stream must match the stream associated with |
| the containing block. One example: |
| |
| layout(stream=1) out; // default is now stream 1 |
| out vec4 var1; // var1 gets default stream (1) |
| layout(stream=2) out Block1 { // "Block1" belongs to stream 2 |
| layout(stream=2) vec4 var2; // redundant block member stream decl |
| layout(stream=3) vec2 var3; // ILLEGAL (must match block stream) |
| vec3 var4; // belongs to stream 2 |
| }; |
| layout(stream=0) out; // default is now stream 0 |
| out vec4 var5; // var5 gets default stream (0) |
| out Block2 { // "Block2" gets default stream (0) |
| vec4 var6; |
| }; |
| layout(stream=3) out vec4 var7; // var7 belongs to stream 3 |
| |
| If a geometry shader output block or variable is declared more than once, |
| all such declarations must associate the variable with the same vertex |
| stream. If any stream declaration specifies a non-existent stream number, |
| the shader will fail to compile. |
| |
| Built-in geometry shader outputs are always associated with vertex stream |
| zero. |
| |
| Each vertex emitted by the geometry shader is assigned to a specific |
| stream, and the attributes of the emitted vertex are taken from the set of |
| output blocks and variables assigned to the targeted stream. After each |
| vertex is emitted, the values of all output variables become undefined. |
| Additionally, the output variables associated with each vertex stream may |
| share storage. Writing to an output variable associated with one stream |
| may overwrite output variables associated with any other stream. When |
| emitting each vertex, a geometry shader should write to all outputs |
| associated with the stream to which the vertex will be emitted and to no |
| outputs associated with any other stream. |
| |
| |
| Modify Section 4.3.9, Interpolation, p. 42 |
| |
| (modify first paragraph of section, add reference to sample in/out) The |
| presence of and type of interpolation is controlled by the storage |
| qualifiers centroid in, sample in, centroid out, and sample out, by the |
| optional interpolation qualifiers smooth, flat, and noperspective, and by |
| default behaviors established through the OpenGL API when no interpolation |
| qualifier is present. ... |
| |
| (modify second paragraph) ... A variable may be qualified as flat centroid |
| or flat sample, which will mean the same thing as qualifying it only as |
| flat. |
| |
| (replace last paragraph, p. 42) |
| |
| When multisample rasterization is disabled, or for fragment shader input |
| variables qualified with neither "centroid in" nor "sample in", the value |
| of the assigned variable may be interpolated anywhere within the pixel and |
| a single value may be assigned to each sample within the pixel, to the |
| extent permitted by the OpenGL Specification. |
| |
| When multisample rasterization is enabled, "centroid" and "sample" may be |
| used to control the location and frequency of the sampling of the |
| qualified fragment shader input. If a fragment shader input is qualified |
| with "centroid", a single value may be assigned to that variable for all |
| samples in the pixel, but that value must be interpolated at a location |
| that lies in both the pixel and in the primitive being rendered, including |
| any of the pixel's samples covered by the primitive. Because the location |
| at which the variable is sampled may be different in neighboring pixels, |
| derivatives of centroid-sampled inputs may be less accurate than those for |
| non-centroid interpolated variables. If a fragment shader input is |
| qualified with "sample", a separate value must be assigned to that |
| variable for each covered sample in the pixel, and that value must be |
| sampled at the location of the individual sample. |
| |
| |
| (Insert before Section 4.7, Order of Qualification, p. 47) |
| |
| Section 4.Q, The Precise Qualifier |
| |
| Some algorithms may require that floating-point computations be carried |
| out in exactly the manner specified in the source code, even if the |
| implementation supports optimizations that could produce nearly equivalent |
| results with higher performance. For example, many GL implementations |
| support a "multiply-add" that can compute values such as |
| |
| float result = (float(a) * float(b)) + float(c); |
| |
| in a single operation. The result of a floating-point multiply-add may |
| not always be identical to first doing a multiply yielding a |
| floating-point result, and then doing a floating-point add. By default, |
| implementations are permitted to perform optimizations that effectively |
| modify the order of the operations used to evaluate an expression, even if |
| those optimizations may produce slightly different results relative to |
| unoptimized code. |
| |
| The qualifier "precise" will ensure that operations contributing to a |
| variable's value are performed in the order and with the precision |
| specified in the source code. Order of evaluation is determined by |
| operator precedence and parentheses, as described in Section 5. |
| Expressions must be evaluated with a precision consistent with the |
| operation; for example, multiplying two "float" values must produce a |
| single value with "float" precision. This effectively prohibits the |
| arbitrary use of fused multiply-add operations if the intermediate |
| multiply result is kept at a higher precision. For example: |
| |
| precise out vec4 position; |
| |
| declares that computations used to produce the value of "position" must be |
| performed precisely using the order and precision specified. As with the |
| invariant qualifier (section 4.6.1), the precise qualifier may be used to |
| qualify a built-in or previously declared user-defined variable as being |
| precise: |
| |
| out vec3 Color; |
| precise Color; // make existing Color be precise |
| |
| This qualifier will affect the evaluation of expressions used on the |
| right-hand side of an assignment if and only if: |
| |
| * the variable assigned to is qualified as "precise"; or |
| |
| * the value assigned is used later in the same function, either directly |
| or indirectly, on the right-hand of an assignment to a variable |
| declared as "precise". |
| |
| Expressions computed in a function are treated as precise only if assigned |
| to a variable qualified as "precise" in that same function. Any other |
| expressions within a function are not automatically treated as precise, |
| even if they are used to determine a value that is returned by the |
| function and directly assigned to a variable qualified as "precise". |
| |
| Some examples of the use of "precise" include: |
| |
| in vec4 a, b, c, d; |
| precise out vec4 v; |
| |
| float func(float e, float f, float g, float h) |
| { |
| return (e*f) + (g*h); // no special precision |
| } |
| |
| float func2(float e, float f, float g, float h) |
| { |
| precise result = (e*f) + (g*h); // ensures a precise return value |
| return result; |
| } |
| |
| float func3(float i, float j, precise out float k) |
| { |
| k = i * i + j; // precise, due to <k> declaration |
| } |
| |
| void main(void) |
| { |
| vec4 r = vec3(a * b); // precise, used to compute v.xyz |
| vec4 s = vec3(c * d); // precise, used to compute v.xyz |
| v.xyz = r + s; // precise |
| v.w = (a.w * b.w) + (c.w * d.w); // precise |
| v.x = func(a.x, b.x, c.x, d.x); // values computed in func() |
| // are NOT precise |
| v.x = func2(a.x, b.x, c.x, d.x); // precise! |
| func3(a.x * b.x, c.x * d.x, v.x); // precise! |
| } |
| |
| |
| Modify Section 4.7, Order of Qualification, p. 47 |
| |
| When multiple qualifications are present, they must follow a strict order. |
| This order is as follows: |
| |
| precise-qualifier invariant-qualifier interpolation-qualifier storage-qualifier |
| precision-qualifier |
| |
| |
| Modify Section 5.9, Expressions, p. 57 |
| |
| (modify bulleted list as follows, adding support for implicit conversion |
| between signed and unsigned types) |
| |
| Expressions in the shading language are built from the following: |
| |
| * Constants of type bool, int, int64_t, uint, uint64_t, float, all vector |
| types, and all matrix types. |
| |
| ... |
| |
| * The operator modulus (%) operates on signed or unsigned integer scalars |
| or vectors. If the fundamental types of the operands do not match, the |
| conversions from Section 4.1.10 "Implicit Conversions" are applied to |
| produce matching types. ... |
| |
| |
| Modify Section 6.1, Function Definitions, p. 63 |
| |
| (modify description of overloading, beginning at the top of p. 64) |
| |
| Function names can be overloaded. The same function name can be used for |
| multiple functions, as long as the parameter types differ. If a function |
| name is declared twice with the same parameter types, then the return |
| types and all qualifiers must also match, and it is the same function |
| being declared. For example, |
| |
| vec4 f(in vec4 x, out vec4 y); // (A) |
| vec4 f(in vec4 x, out uvec4 y); // (B) okay, different argument type |
| vec4 f(in ivec4 x, out uvec4 y); // (C) okay, different argument type |
| |
| int f(in vec4 x, out ivec4 y); // error, only return type differs |
| vec4 f(in vec4 x, in vec4 y); // error, only qualifier differs |
| vec4 f(const in vec4 x, out vec4 y); // error, only qualifier differs |
| |
| When function calls are resolved, an exact type match for all the |
| arguments is sought. If an exact match is found, all other functions are |
| ignored, and the exact match is used. If no exact match is found, then |
| the implicit conversions in Section 4.1.10 (Implicit Conversions) will be |
| applied to find a match. Mismatched types on input parameters (in or |
| inout or default) must have a conversion from the calling argument type |
| to the formal parameter type. Mismatched types on output parameters (out |
| or inout) must have a conversion from the formal parameter type to the |
| calling argument type. |
| |
| If implicit conversions can be used to find more than one matching |
| function, a single best-matching function is sought. To determine a best |
| match, the conversions between calling argument and formal parameter |
| types are compared for each function argument and pair of matching |
| functions. After these comparisons are performed, each pair of matching |
| functions are compared. A function definition A is considered a better |
| match than function definition B if: |
| |
| * for at least one function argument, the conversion for that argument |
| in A is better than the corresponding conversion in B; and |
| |
| * there is no function argument for which the conversion in B is better |
| than the corresponding conversion in A. |
| |
| If a single function definition is considered a better match than every |
| other matching function definition, it will be used. Otherwise, a |
| semantic error occurs and the shader will fail to compile. |
| |
| To determine whether the conversion for a single argument in one match is |
| better than that for another match, the following rules are applied, in |
| order: |
| |
| 1. An exact match is better than a match involving any implicit |
| conversion. |
| |
| 2. A match involving an implicit conversion from float to double is |
| better than a match involving any other implicit conversion. |
| |
| 3. A match involving an implicit conversion from either int or uint to |
| float is better than a match involving an implicit conversion from |
| either int or uint to double. |
| |
| If none of the rules above apply to a particular pair of conversions, |
| neither conversion is considered better than the other. |
| |
| For the function prototypes (A), (B), and (C) above, the following |
| examples show how the rules apply to different sets of calling argument |
| types: |
| |
| f(vec4, vec4); // exact match of vec4 f(in vec4 x, out vec4 y) |
| f(vec4, uvec4); // exact match of vec4 f(in vec4 x, out ivec4 y) |
| f(vec4, ivec4); // matched to vec4 f(in vec4 x, out vec4 y) |
| // (C) not relevant, can't convert vec4 to |
| // ivec4. (A) better than (B) for 2nd |
| // argument (rule 2), same on first argument. |
| f(ivec4, vec4); // NOT matched. All three match by implicit |
| // conversion. (C) is better than (A) and (B) |
| // on the first argument. (A) is better than |
| // (B) and (C). |
| |
| |
| Modify Section 7.1, Vertex And Geometry Shader Special Variables, p. 69 |
| |
| (add to the list of geometry shader special variables, p. 69) |
| |
| in int gl_InvocationID; |
| |
| (add to the end of the section, p. 71) |
| |
| The input variable gl_InvocationID is available in the geometry language |
| and is filled with an integer holding the invocation number associated |
| with the given shader invocation. If the program is linked to support |
| multiple geometry shader invocations per input primitive, the invocations |
| are numbered 0, 1, 2, ..., <N>-1. gl_InvocationID is not available in the |
| vertex or fragment language. |
| |
| |
| Modify Section 7.2, Fragment Shader Special Variables, p. 72 |
| |
| (add to the list of built-in variables) |
| |
| in int gl_SampleMaskIn[]; |
| |
| The variable gl_SampleMaskIn is an array of integers, each holding a |
| bitfield indicating the set of samples covered by the primitive generating |
| the fragment during multisample rasterization. The array has ceil(<s>/32) |
| elements, where <s> is the maximum number of color samples supported by |
| the implementation. Bit <n> or word <w> in the bitfield is set if and |
| only if the sample numbered <w>*32+<n> is considered covered for this |
| fragment shader invocation. |
| |
| |
| Modify Section 8.3, Common Functions, p. 84 |
| |
| (add support for floating-point multiply-add) |
| |
| Syntax: |
| |
| genType fma(genType a, genType b, genType c); |
| |
| The function fma() performs a fused floating-point multiply-add to compute |
| the value a*b+c. The results of fma() may not be identical to evaluating |
| the expression (a*b)+c, because the computation may be performed in a |
| single operation with intermediate precision different from that used to |
| compute a non-fma() expression. |
| |
| The results of fma() are guaranteed to be invariant given fixed inputs |
| <a>, <b>, and <c>, as though the result were taken from a variable |
| declared as "precise". |
| |
| |
| (add support for single-precision frexp and ldexp functions) |
| |
| Syntax: |
| |
| genType frexp(genType x, out genIType exp); |
| genType ldexp(genType x, in genIType exp); |
| |
| The function frexp() splits each single-precision floating-point number in |
| <x> into a binary significand, a floating-point number in the range [0.5, |
| 1.0), and an integral exponent of two, such that: |
| |
| x = significand * 2 ^ exponent |
| |
| The significand is returned by the function; the exponent is returned in |
| the parameter <exp>. For a floating-point value of zero, the significant |
| and exponent are both zero. For a floating-point value that is an |
| infinity or is not a number, the results of frexp() are undefined. |
| |
| If the input <x> is a vector, this operation is performed in a |
| component-wise manner; the value returned by the function and the value |
| written to <exp> are vectors with the same number of components as <x>. |
| |
| The function ldexp() builds a single-precision floating-point number from |
| each significand component in <x> and the corresponding integral exponent |
| of two in <exp>, returning: |
| |
| significand * 2 ^ exponent |
| |
| If this product is too large to be represented as a single-precision |
| floating-point value, the result is considered undefined. |
| |
| If the input <x> is a vector, this operation is performed in a |
| component-wise manner; the value passed in <exp> and returned by the |
| function are vectors with the same number of components as <x>. |
| |
| |
| (add support for new integer built-in functions) |
| |
| Syntax: |
| |
| genIType bitfieldExtract(genIType value, int offset, int bits); |
| genUType bitfieldExtract(genUType value, int offset, int bits); |
| |
| genIType bitfieldInsert(genIType base, genIType insert, int offset, |
| int bits); |
| genUType bitfieldInsert(genUType base, genUType insert, int offset, |
| int bits); |
| |
| genIType bitfieldReverse(genIType value); |
| genUType bitfieldReverse(genUType value); |
| |
| genIType bitCount(genIType value); |
| genIType bitCount(genUType value); |
| |
| genIType findLSB(genIType value); |
| genIType findLSB(genUType value); |
| |
| genIType findMSB(genIType value); |
| genIType findMSB(genUType value); |
| |
| The function bitfieldExtract() extracts bits <offset> through |
| <offset>+<bits>-1 from each component in <value>, returning them in the |
| least significant bits of corresponding component of the result. For |
| unsigned data types, the most significant bits of the result will be set |
| to zero. For signed data types, the most significant bits will be set to |
| the value of bit <offset>+<base>-1. If <bits> is zero, the result will be |
| zero. The result will be undefined if <offset> or <bits> is negative, or |
| if the sum of <offset> and <bits> is greater than the number of bits used |
| to store the operand. Note that for vector versions of bitfieldExtract(), |
| a single pair of <offset> and <bits> values is shared for all components. |
| |
| The function bitfieldInsert() inserts the <bits> least significant bits of |
| each component of <insert> into the corresponding component of <base>. |
| The result will have bits numbered <offset> through <offset>+<bits>-1 |
| taken from bits 0 through <bits>-1 of <insert>, and all other bits taken |
| directly from the corresponding bits of <base>. If <bits> is zero, the |
| result will simply be <base>. The result will be undefined if <offset> or |
| <bits> is negative, or if the sum of <offset> and <bits> is greater than |
| the number of bits used to store the operand. Note that for vector |
| versions of bitfieldInsert(), a single pair of <offset> and <bits> values |
| is shared for all components. |
| |
| The function bitfieldReverse() reverses the bits of <value>. The bit |
| numbered <n> of the result will be taken from bit (<bits>-1)-<n> of |
| <value>, where <bits> is the total number of bits used to represent |
| <value>. |
| |
| The function bitCount() returns the number of one bits in the binary |
| representation of <value>. |
| |
| The function findLSB() returns the bit number of the least significant one |
| bit in the binary representation of <value>. If <value> is zero, -1 will |
| be returned. |
| |
| The function findMSB() returns the bit number of the most significant bit |
| in the binary representation of <value>. For positive integers, the |
| result will be the bit number of the most significant one bit. For |
| negative integers, the result will be the bit number of the most |
| significant zero bit. For a <value> of zero or negative one, -1 will be |
| returned. |
| |
| |
| (add support for general packing functions) |
| |
| Syntax: |
| |
| uint packUnorm2x16(vec2 v); |
| uint packUnorm4x8(vec4 v); |
| uint packSnorm4x8(vec4 v); |
| |
| vec2 unpackUnorm2x16(uint v); |
| vec4 unpackUnorm4x8(uint v); |
| vec4 unpackSnorm4x8(uint v); |
| |
| The functions packUnorm2x16(), packUnorm4x8(), and packSnorm4x8() first |
| convert each component of a two- or four-component vector of normalized |
| floating-point values into 8- or 16-bit integer values. Then, the results |
| are packed into a 32-bit unsigned integer. The first component of the |
| vector will be written to the least significant bits of the output; the |
| last component will be written to the most significant bits. |
| |
| The functions unpackUnorm2x16(), unpackUnorm4x8(), and unpackSnorm4x8() |
| first unpacks a single 32-bit unsigned integer into a pair of 16-bit |
| unsigned integers, four 8-bit unsigned integers, or four 8-bit signed |
| integers. The, each component is converted to a normalized floating-point |
| value to generate a two- or four-component vector. The first component of |
| the vector will be extracted from the least significant bits of the input; |
| the last component will be extracted from the most significant bits. |
| |
| The conversion between fixed- and normalized floating-point values will be |
| performed as below. |
| |
| function conversion |
| --------------- ----------------------------------------------------- |
| packUnorm2x16 fixed_val = round(clamp(float_val, 0, +1) * 65535.0); |
| packUnorm4x8 fixed_val = round(clamp(float_val, 0, +1) * 255.0); |
| packSnorm4x8 fixed_val = round(clamp(float_val, -1, +1) * 127.0); |
| unpackUnorm2x16 float_val = fixed_val / 65535.0; |
| unpackUnorm4x8 float_val = fixed_val / 255.0; |
| unpackSnorm4x8 float_val = clamp(fixed_val / 127.0, -1, +1); |
| |
| |
| (add functions to get/set the bit encoding for floating-point values) |
| |
| 32-bit floating-point data types in the OpenGL shading language are |
| specified to be encoded according to the IEEE 754 specification for |
| single-precision floating-point values. The functions below allow shaders |
| to convert floating-point values to and from signed or unsigned integers |
| representing their encoding. |
| |
| To obtain signed or unsigned integer values holding the encoding of a |
| floating-point value, use: |
| |
| genIType floatBitsToInt(genType value); |
| genUType floatBitsToUint(genType value); |
| |
| Conversions are done on a component-by-component basis. |
| |
| To obtain a floating-point value corresponding to a signed or unsigned |
| integer encoding, use: |
| |
| genType intBitsToFloat(genIType value); |
| genType uintBitsToFloat(genUType value); |
| |
| |
| (support for unsigned integer add/subtract with carry-out) |
| |
| Syntax: |
| |
| genUType uaddCarry(genUType x, genUType y, out genUType carry); |
| genUType usubBorrow(genUType x, genUType y, out genUType borrow); |
| |
| The function uaddCarry() adds 32-bit unsigned integers or vectors <x> and |
| <y>, returning the sum modulo 2^32. The value <carry> is set to zero if |
| the sum was less than 2^32, or one otherwise. |
| |
| The function usubBorrow() subtracts the 32-bit unsigned integer or vector |
| <y> from <x>, returning the difference if non-negative or 2^32 plus the |
| difference, otherwise. The value <borrow> is set to zero if x >= y, or |
| one otherwise. |
| |
| |
| (support for signed and unsigned multiplies, with 32-bit inputs and a |
| 64-bit result spanning two 32-bit outputs) |
| |
| Syntax: |
| |
| void umulExtended(genUType x, genUType y, out genUType msb, |
| out genUType lsb); |
| void imulExtended(genIType x, genIType y, out genIType msb, |
| out genIType lsb); |
| |
| The functions umulExtended() and imulExtended() multiply 32-bit unsigned |
| or signed integers or vectors <x> and <y>, producing a 64-bit result. The |
| 32 least significant bits are returned in <lsb>; the 32 most significant |
| bits are returned in <msb>. |
| |
| |
| Modify Section 8.7, Texture Lookup Functions, p. 91 |
| |
| (extend the basic versions of textureGather from ARB_texture_gather, |
| allowing for optional component selection in a multi-component texture |
| and for shadow mapping) |
| |
| Syntax: |
| gvec4 textureGather(gsampler2D sampler, vec2 coord[, int comp]); |
| gvec4 textureGather(gsampler2DArray sampler, vec3 coord[, int comp]); |
| gvec4 textureGather(gsamplerCube sampler, vec3 coord[, int comp]); |
| gvec4 textureGather(gsamplerCubeArray sampler, vec4 coord[, int comp]); |
| gvec4 textureGather(gsampler2DRect sampler, vec2 coord[, int comp]); |
| |
| vec4 textureGather(sampler2DShadow sampler, vec2 coord, float refZ); |
| vec4 textureGather(sampler2DArrayShadow sampler, vec3 coord, float refZ); |
| vec4 textureGather(samplerCubeShadow sampler, vec3 coord, float refZ); |
| vec4 textureGather(samplerCubeArrayShadow sampler, vec4 coord, |
| float refZ); |
| vec4 textureGather(sampler2DRectShadow sampler, vec2 coord, float refZ); |
| |
| The textureGather() functions use the texture coordinates given by <coord> |
| to determine a set of four texels to sample from the texture identified by |
| <sampler>. These functions return a four-component vector consisting of |
| one component from each texel. If specified, the value of <comp> must be |
| a constant integer expression with a value of zero, one, two, or three, |
| identifying the <x>, <y>, <z>, or <w> component of the four-component |
| vector lookup result for each texel, respectively. If <comp> is not |
| specified, the <x> component of each texel will be used to generate the |
| result vector. As described in the OpenGL Specification, the vector |
| selects the post-swizzle component corresponding to <comp> from each of |
| the four texels, returning: |
| |
| vec4(T_i0_j1(coord, base).<comp>, |
| T_i1_j1(coord, base).<comp>, |
| T_i1_j0(coord, base).<comp>, |
| T_i0_j0(coord, base).<comp>) |
| |
| For textureGather() functions using a shadow sampler type, each of the |
| four texel lookups performs a depth comparison against the depth reference |
| value passed in <refZ>, and returns the result of that comparison in the |
| appropriate component of the result vector. The parameter <comp> used for |
| component selection is not supported for textureGather() functions with |
| shader sampler types. |
| |
| As with other texture lookup functions, the results of textureGather() are |
| undefined for shadow samplers if the texture referenced is not a depth |
| texture or has depth comparisons disabled; or for non-shadow samplers if |
| the texture referenced is a depth texture with depth comparisons enabled. |
| |
| |
| (extend the "Offset" versions of textureGather from ARB_texture_gather, |
| allowing for optional component selection in a multi-component texture, |
| non-constant offsets, and shadow mapping) |
| |
| Syntax: |
| gvec4 textureGatherOffset(gsampler2D sampler, vec2 coord, |
| ivec2 offset[, int comp]); |
| gvec4 textureGatherOffset(gsampler2DArray sampler, vec3 coord, |
| ivec2 offset[, int comp]); |
| gvec4 textureGatherOffset(gsampler2DRect sampler, vec2 coord, |
| ivec2 offset[, int comp]); |
| |
| vec4 textureGatherOffset(sampler2DShadow sampler, vec2 coord, |
| float refZ, ivec2 offset); |
| vec4 textureGatherOffset(sampler2DArrayShadow sampler, vec3 coord, |
| float refZ, ivec2 offset); |
| vec4 textureGatherOffset(sampler2DRectShadow sampler, vec2 coord, |
| float refZ, ivec2 offset); |
| |
| The textureGatherOffset() functions operate identically to |
| textureGather(), except that the 2-component integer texel offset vector |
| <offset> is applied as a (u,v) offset to determine the four texels to |
| sample. The value <offset> need not be constant; however, a limited range |
| of offset values are supported. If any component of <offset> is less than |
| MIN_PROGRAM_TEXTURE_GATHER_OFFSET_ARB or greater than |
| MAX_PROGRAM_TEXTURE_GATHER_OFFSET_ARB, the offset applied to the texture |
| coordinates is undefined. Note that <offset> does not apply to the layer |
| coordinate for array textures. |
| |
| |
| (add new "Offsets" versions of textureGather from ARB_texture_gather, |
| allowing for optional component selection in a multi-component texture, |
| separate non-constant offsets for each texel in the footprint, and shadow |
| mapping) |
| |
| Syntax: |
| gvec4 textureGatherOffsets(gsampler2D sampler, vec2 coord, |
| ivec2 offsets[4][, int comp]); |
| gvec4 textureGatherOffsets(gsampler2DArray sampler, vec3 coord, |
| ivec2 offsets[4][, int comp]); |
| gvec4 textureGatherOffsets(gsampler2DRect sampler, vec2 coord, |
| ivec2 offsets[4][, int comp]); |
| |
| vec4 textureGatherOffsets(sampler2DShadow sampler, vec2 coord, |
| float refZ, ivec2 offsets[4]); |
| vec4 textureGatherOffsets(sampler2DArrayShadow sampler, vec3 coord, |
| float refZ, ivec2 offsets[4]); |
| vec4 textureGatherOffsets(sampler2DRectShadow sampler, vec2 coord, |
| float refZ, ivec2 offsets[4]); |
| |
| The textureGatherOffsets() functions operate identically to |
| textureGather(), except that the array of two-component integer vectors |
| <offsets> is used to determine the location of the four texels to sample. |
| Each of the four texels is obtained by applying the corresponding offset |
| in the four-element array <offsets> as a (u,v) coordinate offset to the |
| coordinates <coord>, identifying the four-texel LINEAR footprint, and then |
| selecting the texel T_i0_j0 of that footprint. The specified values in |
| <offsets> must be constant. A limited range of offset values are |
| supported; the minimum and maximum offset values are |
| implementation-dependent and given by |
| MIN_PROGRAM_TEXTURE_GATHER_OFFSET_ARB and |
| MAX_PROGRAM_TEXTURE_GATHER_OFFSET_ARB, respectively. Note that <offset> |
| does not apply to the layer coordinate for array textures. |
| |
| |
| Modify Section 8.8, Fragment Processing Functions, p. 101 |
| |
| (add new functions to the end of section, p. 102) |
| |
| Built-in interpolation functions are available to compute an interpolated |
| value of a fragment shader input variable at a shader-specified (x,y) |
| location. A separate (x,y) location may be used for each invocation of |
| the built-in function, and those locations may differ from the default |
| (x,y) location used to produce the default value of the input. |
| |
| float interpolateAtCentroid(float interpolant); |
| vec2 interpolateAtCentroid(vec2 interpolant); |
| vec3 interpolateAtCentroid(vec3 interpolant); |
| vec4 interpolateAtCentroid(vec4 interpolant); |
| |
| float interpolateAtSample(float interpolant, int sample); |
| vec2 interpolateAtSample(vec2 interpolant, int sample); |
| vec3 interpolateAtSample(vec3 interpolant, int sample); |
| vec4 interpolateAtSample(vec4 interpolant, int sample); |
| |
| float interpolateAtOffset(float interpolant, vec2 offset); |
| vec2 interpolateAtOffset(vec2 interpolant, vec2 offset); |
| vec3 interpolateAtOffset(vec3 interpolant, vec2 offset); |
| vec4 interpolateAtOffset(vec4 interpolant, vec2 offset); |
| |
| The function interpolateAtCentroid() will return the value of the input |
| varying <interpolant> sampled at a location inside the both the pixel and |
| the primitive being processed. The value obtained would be the same value |
| assigned to the input variable if declared with the "centroid" qualifier. |
| |
| The function interpolateAtSample() will return the value of the input |
| varying <interpolant> at the location of the sample numbered <sample>. If |
| multisample buffers are not available, the input varying will be evaluated |
| at the center of the pixel. If the sample number given by <sample> does |
| not exist, the position used to interpolate the input varying is |
| undefined. |
| |
| The function interpolateAtOffset() will return the value of the input |
| varying <interpolant> sampled at an offset from the center of the pixel |
| specified by <offset>. The two floating-point components of <offset> |
| give the offset in pixels in the x and y directions, respectively. |
| An offset of (0,0) identifies the center of the pixel. The range and |
| granularity of offsets supported by this function is |
| implementation-dependent. |
| |
| For all of the interpolation functions, <interpolant> must be an input |
| variable or an element of an input variable declared as an array. |
| Component selection operators (e.g., ".xy") may not be used when |
| specifying <interpolant>. If <interpolant> is declared with a "flat" or |
| "centroid" qualifier, the qualifier will have no effect on the |
| interpolated value. If <interpolant> is declared with the "noperspective" |
| qualifier, the interpolated value will be computed without perspective |
| correction. |
| |
| |
| Modify Section 8.10, Geometry Shader Functions, p. 104 |
| |
| (replace the section, using the following more general formulation) |
| |
| These functions are only available in geometry shaders. |
| |
| Syntax: |
| |
| void EmitStreamVertex(int stream); // Geometry-only |
| void EndStreamPrimitive(int stream); // Geometry-only |
| |
| void EmitVertex(); // Geometry-only |
| void EndPrimitive(); // Geometry-only |
| |
| Description: |
| |
| The function EmitStreamVertex() specifies that the vertex being generated |
| by the geometry shader is completed. A vertex is added to the current |
| output primitive in the vertex stream numbered <stream> using the current |
| values of all output variables associated with <stream>. The values of |
| any unwritten output variables associated with <stream> are undefined. |
| The argument <stream> must be a constant integral expression. The values |
| of all output variables (for all output streams) are undefined after |
| calling EmitStreamVertex(). If a geometry shader invocation has emitted |
| more vertices than permitted by the output layout qualifier |
| "max_vertices", the results of calling EmitStreamVertex() are undefined. |
| |
| The function EmitVertex() is equivalent to calling EmitStreamVertex() with |
| <stream> set to zero. |
| |
| The function EndStreamPrimitive() specifies that the current output |
| primitive for the vertex stream numbered <stream> is completed and that a |
| new empty output primitive of the same type should be started. The |
| argument <stream> must be a constant integral expression. This function |
| does not emit a vertex. If the output layout is declared to be "points", |
| calling EndPrimitive() is optional. |
| |
| The function EndPrimitive() is equivalent to calling EndStreamPrimitive() |
| with <stream> set to zero. |
| |
| A geometry shader starts with an output primitive containing no vertices |
| for each stream. When a geometry shader terminates, the current output |
| primitive for each vertex stream is automatically completed. It is not |
| necessary to call EndPrimitive() or EndStreamPrimitive() for any stream |
| where the geometry shader writes only a single primitive. |
| |
| Multiple vertex streams are supported only if the output primitive type is |
| declared to be "points". A program will fail to link if it contains a |
| geometry shader calling EmitStreamVertex() or EndStreamPrimitive() if its |
| output primitive type is not "points". |
| |
| |
| Modify Section 9, Shading Language Grammar, p. 92 |
| |
| !!! TBD !!! |
| |
| |
| GLX Protocol |
| |
| None. |
| |
| Dependencies on ARB_gpu_shader_fp64 |
| |
| This extension, ARB_gpu_shader_fp64, and NV_gpu_shader5 all modify the set |
| of implicit conversions supported in the OpenGL Shading Language. If more |
| than one of these extensions is supported, an expression of one type may |
| be converted to another type if that conversion is allowed by any of these |
| specifications. |
| |
| If ARB_gpu_shader_fp64 or a similar extension introducing new data types |
| is not supported, the function overloading rule in the GLSL specification |
| preferring promotion an input parameters to smaller type to a larger type |
| is never applicable, as all data types are of the same size. That rule |
| and the example referring to "double" should be removed. |
| |
| |
| Dependencies on NV_gpu_shader5 |
| |
| This extension, ARB_gpu_shader_fp64, and NV_gpu_shader5 all modify the set |
| of implicit conversions supported in the OpenGL Shading Language. If more |
| than one of these extensions is supported, an expression of one type may |
| be converted to another type if that conversion is allowed by any of these |
| specifications. |
| |
| This specification and NV_gpu_shader5 both lift the restriction in GLSL |
| 1.50 requiring that indexing in arrays of samplers must be done with |
| constant expressions. However, this extension specifies that results are |
| undefined if the indices would diverge if multiple shader invocations are |
| run in lockstep. NV_gpu_shader5 does not impose the non-divergent |
| indexing requirement. |
| |
| If NV_gpu_shader5 is supported, integer data types are supported with four |
| different precisions (8-, 16, 32-, and 64-bit) and floating-point data |
| types are supported with three different precisions (16-, 32-, and |
| 64-bit). The extension adds the following rule for output parameters, |
| which is similar to the one present in this extension for input |
| parameters: |
| |
| 5. If the formal parameters in both matches are output parameters, a |
| conversion from a type with a larger number of bits per component is |
| better than a conversion from a type with a smaller number of bits |
| per component. For example, a conversion from an "int16_t" formal |
| parameter type to "int" is better than one from an "int8_t" formal |
| parameter type to "int". |
| |
| Such a rule is not provided in this extension because there is no |
| combination of types in this extension and ARB_gpu_shader_fp64 where this |
| rule has any effect. |
| |
| |
| Dependencies on ARB_sample_shading |
| |
| This extension builds upon the per-sample shading support provided by |
| ARB_sample_shading to provide several new capabilities, including: |
| |
| * the built-in variable gl_SampleMaskIn[] indicates the set of samples |
| covered by the input primitive corresponding to the fragment shader |
| invocation; and |
| |
| * use of the "sample" qualifier on a fragment shader input forces |
| per-sample shading, and specifies that the value of the input be |
| evaluated per-sample. |
| |
| There is no interaction between the extensions, except that shaders using |
| the features of this extension seem likely to use features from |
| ARB_sample_shading as well. |
| |
| |
| Dependencies on ARB_texture_gather |
| |
| This extension builds upon the textureGather() built-ins provided by |
| ARB_texture_gather to provide several new capabilities, including: |
| |
| * allowing shaders to select any single component of a multi-component |
| texture to produce the gathered 2x2 footprint; |
| |
| * allowing shaders to perform a per-sample depth comparison when |
| gathering the 2x2 footprint using for shadow sampler types; |
| |
| * allowing shaders to use arbitrary offsets computed at run-time to |
| select a 2x2 footprint to gather from; and |
| |
| * allowing shaders to use separate independent offsets for each of the |
| four texels returned, instead of requiring a fixed 2x2 footprint. |
| |
| Other than the fact that they provide similar functionality, there is no |
| interaction between the extensions. |
| |
| Since this extension requires support for gathering from multi-component |
| textures, the minimum value of MAX_PROGRAM_TEXTURE_GATHER_COMPONENTS_ARB |
| is increased to 4. |
| |
| |
| Errors |
| |
| INVALID_OPERATION is generated by GetProgram if <pname> is |
| GEOMETRY_SHADER_INVOCATIONS and the program which has not been linked |
| successfully, or does not contain objects to form a geometry shader. |
| |
| |
| New State |
| |
| Add the following state to Table 6.40, Program Object State, p. 378 |
| |
| Initial |
| Get Value Type Get Command Value Description Sec. Attribute |
| ------------------------- ---- ------------ ------- ------------------------- ------ ------- |
| GEOMETRY_SHADER_ Z+ GetProgramiv 1 number of times a geometry 6.1.16 - |
| INVOCATIONS shader should be executed |
| for each input primitive |
| |
| New Implementation Dependent State |
| |
| Min. |
| Get Value Type Get Command Value Description Sec. Attrib |
| ---------------------- ---- ----------- ----- -------------------------- -------- ------ |
| MAX_GEOMETRY_SHADER_ Z+ GetIntegerv 32 maximum supported geometry 2.16.4 - |
| INVOCATIONS shader invocation count |
| MIN_FRAGMENT_INTERP- R GetFloatv -0.5 furthest negative offset 3.12.1 - |
| OLATION_OFFSET for interpolateAtOffset() |
| MAX_FRAGMENT_INTERP- R GetFloatv +0.5 furthest positive offset 3.12.1 - |
| OLATION_OFFSET for interpolateAtOffset() |
| FRAGMENT_INTERPOLATION_ Z+ GetIntegerv 4 supixel bits for 3.12.1 - |
| OFFSET_BITS interpolateAtOffset() |
| MAX_VERTEX_STREAMS Z+ GetInteger 4 total number of vertex 2.16.4 - |
| streams |
| |
| (Note: The minimum value for MAX_PROGRAM_TEXTURE_GATHER_COMPONENTS_ARB, |
| added by ARB_texture_gather, is increased to 4.) |
| |
| Issues |
| |
| (1) This extension builds on the capability provided by |
| ARB_sample_shading, adding a new built-in variable for the input |
| sample mask. It seems likely that a shader using this mask might also |
| want to use one or more ARB_sample_shading built-ins. Are such |
| shaders required to include #extension lines for both extensions? |
| |
| UNRESOLVED: It would be nice if it wasn't required. |
| |
| (2) How do the per-sample shading features of this extension interact with |
| non-multisample rendering? |
| |
| RESOLVED: Non-multisample rendering (due to no multisample buffer or |
| MULTISAMPLE disabled) is treated as single-sample rendering. |
| |
| (3) This extension lifts the restriction requiring that indices into |
| samplers be constant expressions, but makes the results undefined if |
| the indices used would diverge in lockstep execution. What is this |
| good for? |
| |
| RESOLVED: This allows shaders to index into samplers using integer |
| uniforms, or with non-divergent values computed at run-time (e.g., loop |
| counters). Many implementations of this extension will be SIMD, running |
| multiple shader invocations at once, and some implementations may have |
| difficulty with accessing multiple textures in a single SIMD |
| instruction. |
| |
| Note that the NV_gpu_shader5 extension similarly lifts the restriction |
| but does not require non-divergent indexing. |
| |
| (4) What sort of implicit conversions should we support in this and |
| related extensions? |
| |
| RESOLVED: In GLSL 1.50, we have implicit conversion from "int" and |
| "uint" to "float", as well as equivalent conversions for vector type. |
| One of the primary motivations of this feature is to allow constants |
| that are nominally integer values to be used in floating-point contexts |
| without requiring special suffixes. The following code compiles |
| successfully in GLSL 1.50. |
| |
| float square(float x) { |
| return x * x; |
| } |
| float f = 0; |
| float g = f * 2; |
| float h = square(3); |
| |
| The same code would fail on GLSL 1.1, because "0", "2", and "3" would |
| need to be written as "0.0", "2.0", and "3.0", respectively. |
| |
| This extension adds implicit conversions from "int" to "uint" to allow |
| for cases like: |
| |
| uint square(uint x) { |
| return x * x; |
| } |
| uint v = square(2); |
| |
| This code is legal with this extension, but not in GLSL 1.50 ("2" would |
| need to be replaced with "2U" or "uint(2)"). |
| |
| ARB_gpu_shader_fp64 adds a new type "double", and we extend existing |
| implicit conversions to allow for promotion of "int", "uint", and |
| "float" to "double". |
| |
| Unlike C/C++, the general rule for implicit conversions in GLSL is that |
| conversions are unidirectional. If type A can be implicitly converted |
| to type B, type B can not be converted to type A. |
| |
| (5) Increasing the number of available implicit conversions means that |
| there is the possibility of ambiguities in various operators? How do |
| we deal with these cases? |
| |
| RESOLVED: For binary operators, the new implicit conversions mean that |
| there may be multiple ways to resolve an expression. For example, in |
| the following declaration |
| |
| int i; |
| uint u; |
| |
| the expression "i+u" could be resolved either by implicitly converting |
| "i" to "uint", or by implicitly converting both values to either "float" |
| or "double". To resolve, we define a set of preferences for a common |
| data type based on the types of the operands: |
| |
| - use a floating-point type if either operand is floating-point |
| - use an unsigned integer type if either operand is unsigned |
| - use a signed integer type otherwise |
| |
| If conversions to multiple precisions are supported, the |
| lowest-precision available data type is preferred (e.g., int*float will |
| be converted to float*float and not double*double). |
| |
| These rules should extend naturally if new basic data types are added. |
| |
| (6) Increasing the number of available implicit conversions means that |
| there is an increased possibility of ambiguity when function |
| overloading is involved? Additionally, this and related extensions |
| add new function overloads? How do we deal with these cases? |
| |
| RESOLVED: The general rule for function overloading in GLSL 1.50 is |
| that we first check for a function prototype that exactly matches the |
| parameters passed to a function call. If no match exists, we check for |
| prototypes that can be matched by implicit conversions. If more than |
| one matching prototype can be matched by conversion, the function call |
| is considered ambiguous and results in a complication error. |
| |
| Unfortunately, when adding new implicit conversions, it is possible for |
| cases that were formally unambiguous to become ambiguous. For backward |
| compatibility purposes, it would be desirable to ensure that shaders |
| that succeeded in old language versions should still compile if |
| "upgraded" to more recent versions/extensions. However, the new |
| conversions and overloads might make this more difficult without |
| modifying other language rules. For example, the following prototypes |
| are available for the standard built-in function min() on scalar values |
| when this extension and ARB_gpu_shader_fp64 are supported: |
| |
| int min(int a, int b); |
| uint min(uint a, uint b); |
| float min(float a, float b); |
| double min(double a, double b); |
| |
| In GLSL 1.50, a function call such as: |
| |
| float f; |
| min(f, 1); |
| |
| would be considered unambiguous because the double-precision version of |
| min() didn't exist and the call matched only the single-precision |
| version. However, with double-precision, implicit conversions can be |
| used to resolve to either the single- or double-precision versions. |
| |
| To resolve this issue, we provide a set of rules that can be used to |
| resolve multiple candidates to a "best match". The rules for |
| determining a best match are similar to those for C++ function |
| overloading, but not exactly the same. Like C++, these rules compare |
| the conversions required on an argument-by-argument basis. A function |
| prototype A is better than function prototype B if: |
| |
| - A is better than B for one or more arguments |
| - B is better than A for no arguments |
| |
| If a single function prototype is better than all others, that one is |
| used. Otherwise, we get the same ambiguity error as on previous GLSL |
| versions. |
| |
| As far as argument-by-argument comparisons go, the order of preference |
| is: |
| |
| - favor exact matches |
| - prefer "promotions" (float->double) to other conversions |
| - prefer conversions from int/uint to float over similar conversion to |
| double |
| |
| If none of the rules apply, one match is considered neither better nor |
| worse than the other. |
| |
| With these rules, the "min(f,1)" example above resolves to the "float" |
| version, as is the case in GLSL 1.50. However, there are other cases |
| where ambiguity remains. For example, consider the prototypes: |
| |
| int f(uint x); |
| int f(float x); |
| |
| With GLSL 1.50 rules, "f(3)" would match the floating-point version, as |
| no implicit conversions existed from "int" to "uint". With the new |
| implicit conversions, both prototypes match and neither is preferred. |
| Because of the ambiguity, "f(3)" would fail to compile with this |
| extension enabled, but should still compile on implementations |
| supporting this extension if the extension is not enabled in GLSL source |
| code. |
| |
| (7) The function overloading rules described in this extension describe |
| conversions between data types with different sizes, however all |
| existing data types allowing implicit conversion (int, uint, float) |
| are the same size? Why do we specify these rules? |
| |
| RESOLVED: This extension is specified at the same time as the related |
| ARB_gpu_shader_fp64 and NV_gpu_shader5 extensions, which do provide such |
| types. The rules are specified all in one place here so we don't have |
| to replicate and extend the rules in the other extensions. It also |
| provides the ability to automatically convert from signed to unsigned |
| integer types, as in the C programming language. |
| |
| (8) Should we support textureGather() for rectangle textures |
| (sampler2DRect)? They aren't in ARB_texture_gather. |
| |
| RESOLVED: Yes. |
| |
| (9) How does the input sample mask interact with the fixed-function |
| SampleCoverage and SampleMask state? Will samples be removed from the |
| input mask if they would be eliminated by these masks in the |
| per-fragment operations? |
| |
| UNRESOLVED. |
| |
| (10) Should we support reading patches as geometry shader inputs, and if |
| so, where? |
| |
| RESOLVED: Not in this extension. This capability will be provided in |
| NV_gpu_shader5. |
| |
| (11) Should we support per-sample interpolation of attributes? If so, |
| how? |
| |
| RESOLVED. Yes. When multisample rasterization is enabled, qualifying |
| one or more fragment shader inputs with "sample" will force per-sample |
| interpolation of those attributes. If the same shader includes other |
| fragment inputs not qualified with sample, those attributes may be |
| interpolated per-pixel (i.e., all samples get the same values, likely |
| evaluated at the pixel center). |
| |
| (12) Should we reserve "sample" as a keyword for per-sample interpolation |
| qualifiers, or use something more obscure, such as "per_sample"? |
| |
| RESOLVED: This extension uses "sample". |
| |
| (13) What should be the base data type for the bitCount(), findLSB(), and |
| findMSB() functions -- signed or unsigned integers? |
| |
| RESOLVED: These functions will return signed values, with -1 returned |
| by findLSB/findMSB if no bit is found. Note that the shading language |
| supports implicit conversions of signed integers to unsigned, which |
| makes it easy enough if an unsigned result is desired. |
| |
| (14) Why do EmitVertex() and EndPrimitive() begin with capitalized words |
| while most of the other built-ins start with a lower-case (e.g., |
| emitVertex)? Which precedent should the new per-vertex stream emit |
| and end primitive functions follow? |
| |
| RESOLVED: The inconsistency began with the original functions in |
| EXT_geometry_shader4; the spec author can't recall the original reasons |
| (if any). Regardless, we decided to match the existing functions as |
| closely as possible and use EmitStreamVertex() and EndStreamPrimitive(). |
| |
| (15) How do the textureGather functions work with sRGB textures? |
| |
| RESOLVED: Gamma-correction is applied to the texture source color |
| before "gathering" and hence applies to all four components, unless the |
| texture swizzle of the selected component is ALPHA in which case no |
| gamma-correction is applied. |
| |
| (16) How should we support arrays of uniform blocks (i.e., multiple blocks |
| in a group, each backed by a separate buffer object)? |
| |
| RESOLVED: We will use instance names in the block definitions, which |
| can be declared as regular arrays: |
| |
| uniform UniformData { |
| vec4 stuff; |
| } blocks[4]; |
| |
| These four blocks used will be referred to as "block[0]" through |
| "block[3]" in shader code, and "UniformData[0]" through "UniformData[3]" |
| in the OpenGL API code. The block member in this example will be |
| referred to as "UniformData.stuff" in the API. A similar approach was |
| already adopted in GLSL 1.50, where geometry shaders supported arrays of |
| input blocks that were treated similarly. Since this spec depends on |
| GLSL 1.50, little new spec language is required here. |
| |
| (17) What are instanced geometry shaders useful for? |
| |
| RESOLVED: Instanced geometry shaders allow geometry programs that |
| perform regular operations to run more efficiently. |
| |
| Consider a simple example of an algorithm that uses geometry shaders to |
| render primitives to a cube map in a single pass. Without instanced |
| geometry shaders, the geometry shader to render triangles to the cube |
| map would do something like: |
| |
| for (face = 0; face < 6; face++) { |
| for (vertex = 0; vertex < 3; vertex++) { |
| project vertex <vertex> onto face <face>, output position |
| compute/copy attributes of emitted <vertex> to outputs |
| output <face> to result.layer |
| emit the projected vertex |
| } |
| end the primitive (next triangle) |
| } |
| |
| This algorithm would output 18 vertices per input triangle, three for |
| each cube face. The six triangles emitted would be rasterized, one per |
| face. Geometry shaders that emit a large number of attributes have |
| often posed performance challenges, since all the attributes must be |
| stored somewhere until the emitted primitives. Large storage |
| requirements may limit the number of threads that can be run in parallel |
| and reduce overall performance. |
| |
| Instanced geometry shaders allow this example to be restructured to run |
| with six separate invocations, one per face. Each invocation projects |
| the triangle to only a single face (identified by the invocation number) |
| and emits only 3 vertices. The reduced storage requirements allow more |
| geometry shader invocations to be run in parallel, with greater overall |
| efficiency. |
| |
| Additionally, the total number of attributes that can be emitted by a |
| single geometry shader invocation is limited. However, for instanced |
| geometry shaders, that limit applies to each of <N> invocations which |
| allows for a larger total output. For example, if the GL implementation |
| supports only 1024 components of output per invocation, the 18-vertex |
| algorithm above could emit no more than 56 components per vertex. The |
| same algorithm implemented as a 3-vertex 6-invocation geometry program |
| could theoretically allow for 341 components per vertex. |
| |
| (18) Should EmitStreamVertex() and EndStreamPrimitive() accept a |
| non-constant stream number? |
| |
| RESOLVED: Not in this extension. Requiring a constant stream number |
| for each call simplifies code generation for the compiler. |
| |
| (19) Are there any restrictions on geometry shaders with multiple output |
| streams? |
| |
| RESOLVED: Yes, such geometry shaders are required to generate points; |
| line strip and triangle strip outputs are not supported. |
| |
| (20) Since multi-stream geometry shaders only support points, why does |
| EndStreamPrimitive() exist? Neither it nor EndStream() does anything |
| useful when emitting points. |
| |
| RESOLVED: This function was added for completeness, and would be useful |
| if the requirement for emitting points were lifted by a future |
| extension. |
| |
| (21) Should we provide mechanisms allowing shaders to examine or set the |
| bit representation of floating-point numbers? |
| |
| RESOLVED: Yes, we will provide functions to convert single-precision |
| floats to/from signed and unsigned 32-bit integers. The |
| ARB_gpu_shader_fp64 extension will provide similar functionality for |
| double-precision floats. We chose to adopt the Java naming convention |
| here -- converting a single-precision float to/from a signed integer is |
| accomplished by the functions floatBitsToInt() and intBitsToFloat(). |
| |
| Note that this functionality has also been forked off into a separate |
| extension (ARB_shader_bit_encoding) that can be exported on |
| implementations capable of performing such conversions but not capable |
| of the full feature set of this extension and/or OpenGL 4.0. |
| |
| (22) What is the "precise" qualifier good for? |
| |
| RESOLVED: Like "invariant", "precise" provides some invariance |
| guarantees is useful for certain algorithms. |
| |
| With an output position qualified as "invariant", we ensure that if the |
| same geometry is processed by multiple shaders using the exact same |
| code, it will be transformed in exactly the same way to ensure that we |
| have no cracking or flickering in multi-pass algorithms using different |
| shaders. |
| |
| With "precise", we ensure that an algorithm can be written to produce |
| identical results on subtly different inputs. For example, the order of |
| vertices visible to a geometry or tessellation shader used to subdivide |
| primitive edges might present an edge shared between two primitives in |
| one direction for one primitive and the other direction for the adjacent |
| primitive. Even if the weights are identical in the two cases, there |
| may be cracking if the computations are being done in an order-dependent |
| manner. If the position of a new vertex were provided by evaluation the |
| function f() below with limited-precision floating-point math, it's not |
| necessarily the case that f(a,b,c) == f(c,b,a) in the following code: |
| |
| float f(float x, float y, float z) |
| { |
| return (x + y) + z; |
| } |
| |
| This function f() can be rewritten as follows with "precise" and a |
| symmetric evaluation order to ensure that f(a,b,c) == f(c,b,a). |
| |
| float f(float x, float y, float z) |
| { |
| // Note that we intentionally compute "(x+z)" instead of "(x+y)" |
| // here, because that value will be the same when <x> and <z> |
| // are reversed. |
| precise float result = (x + z) + y; |
| return result; |
| } |
| |
| (a + b) + c == (c + b) + a |
| |
| The "precise" qualifier will disable certain optimization and thus |
| carries a performance cost. The cost may be higher than "invariant", |
| because "invariant" permits optimizations disallowed by "precise" as |
| long as the compiler ensures that it always optimizes in the exact same |
| manner. |
| |
| (23) What computations will be affected by the "precise" qualifier, and |
| what computations aren't? |
| |
| RESOLVED: We will ensure precise computation of any expressions within |
| a single function used directly or indirectly to produce the value of a |
| variable qualified as "precise". |
| |
| We chose not to provide this guarantee across function boundaries, even |
| if the results of a function are used in the computation of an output |
| qualified as "precise". Algorithms requiring the use of "precise" may |
| have a mix of computations, some required to be precise, some not. This |
| function boundary rule may serve to limit the amount of computation |
| indirectly forced to be precise. |
| |
| Additionally, the subroutine rule permits non-precise sub-operations in |
| a computation required to be precise. For example, a shader might need |
| to compute a "precise" position by taking a weighted average as in the |
| following code: |
| |
| precise vec3 pos = (p[0]*w[0] + p[1]*w[1]) + (p[2]*w[2] + p[3]*w[3]); |
| |
| However, if the main precision requirement is that the same result be |
| generated when <p> and <w> are reversed, the following code also gets |
| the job done, even if posmad() is implemented with multiply-add |
| operations. |
| |
| vec3 posmad(vec3 p0, float w0, vec3 p1w1) { return p0*w0+p1w1; } |
| precise vec3 pos = (posmad(p[0], w[0], p[1]*w[1]) + |
| posmad(p[3], w[3], p[2]*w[2])); |
| |
| To generate precise results within a function, the function arguments |
| and/or temporaries within the function body should be qualified as |
| "precise" as needed. |
| |
| Note that when applying "precise" rules to assignments, indirect |
| application of this rule applies on an assignment-by-assignment basis. |
| In the following perverse example: |
| |
| float a,b,c,d,e,f; |
| precise float g; |
| f = a + b + c; |
| ... |
| f = c + d + e; |
| g = f * 2.0; |
| |
| The first assignment to <f> need not be treated as "precise", since the |
| value assigned will have no effect on the final value of the |
| precise-qualified <g>. The second assignment to <f> must be evaluated |
| precisely. The fact that one assignment to a variable needs to be |
| treated as precise does not mean that the variable itself is implicitly |
| treated as "precise". |
| |
| (24) Are "precise" qualifiers allowed on function arguments? If so, what |
| do they mean? Can a return value for a function be declared as |
| precise? |
| |
| RESOLVED: Yes; the rules permit the use of "precise" on any variable |
| declaration, including function arguments. The code |
| |
| float f(precise in vec4 arg1, precise out vec4 arg2) { ... } |
| |
| specifies that any expressions used to assign values to <arg1> or <arg2> |
| within f() will be evaluated as a precise manner. |
| |
| Expressions used to derive the value passed to the function f() as |
| <arg1> will be treated as precise according to the normal rules. The |
| expression for <arg1> is treated as precise if and only if the function |
| call is on the right-hand side of an assignment to a variable qualified |
| as "precise" or is indirectly used in an assignment to such a variable. |
| It is not automatically treated as precise just because the formal |
| parameter <arg1> is qualified with "precise". |
| |
| For the purposes of this rule, variables passed as "out" parameters do |
| not count as assignments. Values assigned to an output parameter will |
| not be evaluated precisely just because the caller provides a variable |
| qualified as "precise". When the output parameter itself is qualified |
| as "precise", precise evaluation of that output is required within the |
| callee. |
| |
| We chose not to permit function return values to be qualified as |
| "precise", though we could have hypothetically allowed code such as: |
| |
| precise float f(float a, float b, float c) { return (a+b)+c; } |
| |
| To obtain a precise return value in such a case, use code such as: |
| |
| float f(float a, float b, float c) |
| { |
| precise float result = (a+b) + c; |
| return result; |
| } |
| |
| (25) How does texture gather interact with incomplete textures? |
| |
| RESOLVED: For regular texture lookups, incomplete textures are |
| considered to return a texel value with RGBA components of (0,0,0,1). |
| For texture gather operations, each texel in the sampled footprint is |
| considered to have RGBA components of (0,0,0,1). When using the |
| textureGather() function to select the R, G, or B component of an |
| incomplete texture, (0,0,0,0) will be returned. When selecting the A |
| component, (1,1,1,1) will be returned. |
| |
| |
| Revision History |
| |
| Rev. Date Author Changes |
| ---- -------- -------- ----------------------------------------- |
| 16 03/30/12 pbrown Fix typo in language restricting the use of |
| EmitStreamVertex()/EndStreamPrimitive() to |
| programs with an output primitive type of |
| points, not an input type of points (bug 8371). |
| |
| 15 10/17/11 pbrown Fix prototypes for textureGather and |
| textureGatherOffset to use vec2 coordinates for |
| "2DRect" sampler versions (bug 7964). |
| |
| 14 01/27/11 pbrown Add further clarification on the interaction |
| of texture gather and incomplete textures (bug |
| 7289). |
| |
| 13 09/24/10 pbrown Clarify the interaction of texture gather |
| with swizzle (bug 5910), fixing conflicts |
| between API and GLSL spec language. |
| Consolidate into one copy in the API |
| spec. |
| |
| 12 03/23/10 pbrown Update issues section, both fixing/numbering |
| existing issues and including other issues |
| that were left behind in NV_gpu_shader5 when the |
| specs were refactored. |
| |
| 11 03/23/10 Jon Leech Describe <offset> to interpolateAtOffset |
| without implying it is a constant expression |
| (Bug 6026). |
| |
| 10 03/07/10 pbrown Fix typo in an output stream qualifier example. |
| |
| 9 03/05/10 pbrown Modify function overloading rules to remove |
| most preferences when converting between |
| two different types. The only preferences |
| that remain are promoting "float" to "double" |
| over other conversions, and preferring |
| conversion of integers to "float" to converting |
| to "double" (bug 5938). |
| |
| 8 01/29/10 pbrown Update the spec to require that the minimum |
| value for MAX_PROGRAM_TEXTURE_GATHER_- |
| COMPONENTS is 4 (bug 5919). |
| |
| 7 01/21/10 pbrown Clarify the rules for determining a best match |
| if implicit conversions can result in multiple |
| matching function prototypes. Modify the rules |
| to pick a best match by comparing pairs of |
| functions, and using any function deemed better |
| than any other choice. Modify the argument |
| conversion preference rules for overloading to |
| disfavor "int" to "uint" conversions, for |
| backward compatibility with previous GLSL |
| versions. Add some new discussion of the |
| choices involved to the issues section (bug |
| 5938). |
| |
| 6 01/14/10 pbrown Minor wording updates from spec reviews. |
| |
| 5 12/10/09 pbrown Functionality updates from spec review: |
| Rename fmad to fma. Fix error in spec |
| language for negative diffs in usubBorrow. |
| |
| 4 12/10/09 pbrown Convert from EXT to ARB. |
| |
| 3 12/08/09 pbrown Miscellaneous fixes from spec review: Added |
| missing implementation constants for |
| interpolation offset range and granularity; |
| added explicit section to OpenGL spec describing |
| shader requested interpolation modifiers and |
| functions. Clean up more dangling "ThreadID" |
| references. General typo fixes and language |
| clarifications. |
| |
| 2 10/01/09 pbrown Renamed gl_ThreadID to gl_InvocationID. |
| |
| 1 pbrown Internal revisions. |