skia / external / github.com / KhronosGroup / OpenGL-Registry / 108-cleanup-xml / . / extensions / NV / NV_gpu_shader5.txt

Name | |

NV_gpu_shader5 | |

Name Strings | |

GL_NV_gpu_shader5 | |

Contact | |

Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com) | |

Contributors | |

Barthold Lichtenbelt, NVIDIA | |

Chris Dodd, NVIDIA | |

Eric Werness, NVIDIA | |

Greg Roth, NVIDIA | |

Jeff Bolz, NVIDIA | |

Piers Daniell, NVIDIA | |

Daniel Rakos, AMD | |

Mathias Heyer, NVIDIA | |

Status | |

Shipping. | |

Version | |

Last Modified Date: 04/16/2016 | |

NVIDIA Revision: 10 | |

Number | |

OpenGL Extension #389 | |

OpenGL ES Extension #260 | |

Dependencies | |

This extension is written against the OpenGL 3.2 (Compatibility Profile) | |

Specification. | |

This extension is written against version 1.50 (revision 09) of the OpenGL | |

Shading Language Specification. | |

If implemented in OpenGL, OpenGL 3.2 and GLSL 1.50 are required. | |

If implemented in OpenGL, ARB_gpu_shader5 is required. | |

This extension interacts with ARB_gpu_shader5. | |

This extension interacts with ARB_gpu_shader_fp64. | |

This extension interacts with ARB_tessellation_shader. | |

This extension interacts with NV_shader_buffer_load. | |

This extension interacts with EXT_direct_state_access. | |

This extension interacts with EXT_vertex_attrib_64bit and | |

NV_vertex_attrib_integer_64bit. | |

This extension interacts with OpenGL ES 3.1 (dated October 29th 2014). | |

This extension interacts with OpenGL ES Shading Language 3.1 (revision 3). | |

If implemented in OpenGL ES, OpenGL ES 3.1 and GLSL ES 3.10 are required. | |

If implemented in OpenGL ES, OES/EXT_gpu_shader5 and EXT_shader_implicit- | |

_conversions are required. | |

This extension interacts with OES/EXT_tessellation_shader | |

This extension interacts with OES/EXT_geometry_shader | |

Overview | |

This extension provides a set of new features to the OpenGL Shading | |

Language and related APIs to support capabilities of new GPUs. Shaders | |

using the new functionality provided by this extension should enable this | |

functionality via the construct | |

#extension GL_NV_gpu_shader5 : require (or enable) | |

This extension was developed concurrently with the ARB_gpu_shader5 | |

extension, and provides a superset of the features provided there. The | |

features common to both extensions are documented in the ARB_gpu_shader5 | |

specification; this document describes only the addition language features | |

not available via ARB_gpu_shader5. A shader that enables this extension | |

via an #extension directive also implicitly enables the common | |

capabilities provided by ARB_gpu_shader5. | |

In addition to the capabilities of ARB_gpu_shader5, this extension | |

provides a variety of new features for all shader types, including: | |

* support for a full set of 8-, 16-, 32-, and 64-bit scalar and vector | |

data types, including uniform API, uniform buffer object, and shader | |

input and output support; | |

* the ability to aggregate samplers into arrays, index these arrays with | |

arbitrary expressions, and not require that non-constant indices be | |

uniform across all shader invocations; | |

* new built-in functions to pack and unpack 64-bit integer types into a | |

two-component 32-bit integer vector; | |

* new built-in functions to pack and unpack 32-bit unsigned integer | |

types into a two-component 16-bit floating-point vector; | |

* new built-in functions to convert double-precision floating-point | |

values to or from their 64-bit integer bit encodings; | |

* new built-in functions to compute the composite of a set of boolean | |

conditions a group of shader threads; | |

* vector relational functions supporting comparisons of vectors of 8-, | |

16-, and 64-bit integer types or 16-bit floating-point types; and | |

* extending texel offset support to allow loading texel offsets from | |

regular integer operands computed at run-time, except for lookups with | |

gradients (textureGrad*). | |

This extension also provides additional support for processing patch | |

primitives (introduced by ARB_tessellation_shader). | |

ARB_tessellation_shader requires the use of a tessellation evaluation | |

shader when processing patches, which means that patches will never | |

survive past the tessellation pipeline stage. This extension lifts that | |

restriction, and allows patches to proceed further in the pipeline and be | |

used | |

* as input to a geometry shader, using a new "patches" layout qualifier; | |

* as input to transform feedback; | |

* by fixed-function rasterization stages, in which case the patches are | |

drawn as independent points. | |

Additionally, it allows geometry shaders to read per-patch attributes | |

written by a tessellation control shader using input variables declared | |

with "patch in". | |

New Procedures and Functions | |

void Uniform1i64NV(int location, int64EXT x); | |

void Uniform2i64NV(int location, int64EXT x, int64EXT y); | |

void Uniform3i64NV(int location, int64EXT x, int64EXT y, int64EXT z); | |

void Uniform4i64NV(int location, int64EXT x, int64EXT y, int64EXT z, | |

int64EXT w); | |

void Uniform1i64vNV(int location, sizei count, const int64EXT *value); | |

void Uniform2i64vNV(int location, sizei count, const int64EXT *value); | |

void Uniform3i64vNV(int location, sizei count, const int64EXT *value); | |

void Uniform4i64vNV(int location, sizei count, const int64EXT *value); | |

void Uniform1ui64NV(int location, uint64EXT x); | |

void Uniform2ui64NV(int location, uint64EXT x, uint64EXT y); | |

void Uniform3ui64NV(int location, uint64EXT x, uint64EXT y, uint64EXT z); | |

void Uniform4ui64NV(int location, uint64EXT x, uint64EXT y, uint64EXT z, | |

uint64EXT w); | |

void Uniform1ui64vNV(int location, sizei count, const uint64EXT *value); | |

void Uniform2ui64vNV(int location, sizei count, const uint64EXT *value); | |

void Uniform3ui64vNV(int location, sizei count, const uint64EXT *value); | |

void Uniform4ui64vNV(int location, sizei count, const uint64EXT *value); | |

void GetUniformi64vNV(uint program, int location, int64EXT *params); | |

(The following function is also provided by NV_shader_buffer_load.) | |

void GetUniformui64vNV(uint program, int location, uint64EXT *params); | |

(All of the following ProgramUniform* functions are supported if and only | |

if implemented in OpenGL ES or EXT_direct_state_access is supported.) | |

void ProgramUniform1i64NV(uint program, int location, int64EXT x); | |

void ProgramUniform2i64NV(uint program, int location, int64EXT x, | |

int64EXT y); | |

void ProgramUniform3i64NV(uint program, int location, int64EXT x, | |

int64EXT y, int64EXT z); | |

void ProgramUniform4i64NV(uint program, int location, int64EXT x, | |

int64EXT y, int64EXT z, int64EXT w); | |

void ProgramUniform1i64vNV(uint program, int location, sizei count, | |

const int64EXT *value); | |

void ProgramUniform2i64vNV(uint program, int location, sizei count, | |

const int64EXT *value); | |

void ProgramUniform3i64vNV(uint program, int location, sizei count, | |

const int64EXT *value); | |

void ProgramUniform4i64vNV(uint program, int location, sizei count, | |

const int64EXT *value); | |

void ProgramUniform1ui64NV(uint program, int location, uint64EXT x); | |

void ProgramUniform2ui64NV(uint program, int location, uint64EXT x, | |

uint64EXT y); | |

void ProgramUniform3ui64NV(uint program, int location, uint64EXT x, | |

uint64EXT y, uint64EXT z); | |

void ProgramUniform4ui64NV(uint program, int location, uint64EXT x, | |

uint64EXT y, uint64EXT z, uint64EXT w); | |

void ProgramUniform1ui64vNV(uint program, int location, sizei count, | |

const uint64EXT *value); | |

void ProgramUniform2ui64vNV(uint program, int location, sizei count, | |

const uint64EXT *value); | |

void ProgramUniform3ui64vNV(uint program, int location, sizei count, | |

const uint64EXT *value); | |

void ProgramUniform4ui64vNV(uint program, int location, sizei count, | |

const uint64EXT *value); | |

New Tokens | |

Returned by the <type> parameter of GetActiveAttrib, GetActiveUniform, and | |

GetTransformFeedbackVarying: | |

INT64_NV 0x140E | |

UNSIGNED_INT64_NV 0x140F | |

INT8_NV 0x8FE0 | |

INT8_VEC2_NV 0x8FE1 | |

INT8_VEC3_NV 0x8FE2 | |

INT8_VEC4_NV 0x8FE3 | |

INT16_NV 0x8FE4 | |

INT16_VEC2_NV 0x8FE5 | |

INT16_VEC3_NV 0x8FE6 | |

INT16_VEC4_NV 0x8FE7 | |

INT64_VEC2_NV 0x8FE9 | |

INT64_VEC3_NV 0x8FEA | |

INT64_VEC4_NV 0x8FEB | |

UNSIGNED_INT8_NV 0x8FEC | |

UNSIGNED_INT8_VEC2_NV 0x8FED | |

UNSIGNED_INT8_VEC3_NV 0x8FEE | |

UNSIGNED_INT8_VEC4_NV 0x8FEF | |

UNSIGNED_INT16_NV 0x8FF0 | |

UNSIGNED_INT16_VEC2_NV 0x8FF1 | |

UNSIGNED_INT16_VEC3_NV 0x8FF2 | |

UNSIGNED_INT16_VEC4_NV 0x8FF3 | |

UNSIGNED_INT64_VEC2_NV 0x8FF5 | |

UNSIGNED_INT64_VEC3_NV 0x8FF6 | |

UNSIGNED_INT64_VEC4_NV 0x8FF7 | |

FLOAT16_NV 0x8FF8 | |

FLOAT16_VEC2_NV 0x8FF9 | |

FLOAT16_VEC3_NV 0x8FFA | |

FLOAT16_VEC4_NV 0x8FFB | |

(If ARB_tessellation_shader is supported, the following enum is accepted | |

by a new primitive.) | |

Accepted by the <primitiveMode> parameter of BeginTransformFeedback: | |

PATCHES | |

Additions to Chapter 2 of the OpenGL 3.2 (Compatibility Profile) Specification | |

(OpenGL Operation) | |

Modify Section 2.6.1, Begin and End, p. 22 | |

(Extend language describing PATCHES introduced by ARB_tessellation_shader. | |

It particular, add the following to the end of the description of the | |

primitive type.) | |

If a patch primitive is drawn, each patch is drawn separately as a | |

collection of points, which each patch vertex definining a separate point. | |

Extra vertices from an incomplete patch are never drawn. | |

Modify Section 2.14.3, Vertex Attributes, p. 86 | |

(modify the second paragraph, p. 87) ... exceeds MAX_VERTEX_ATTRIBS. For | |

the purposes of this comparison, attribute variables of the type i64vec3, | |

u64vec3, i64vec4, and u64vec4 count as consuming twice as many attributes | |

as equivalent single-precision types. | |

(extend the list of types in the first paragraph, p. 88) | |

... UNSIGNED_INT_VEC3, UNSIGNED_INT_VEC4, INT8_NV, INT8_VEC2_NV, | |

INT8_VEC3_NV, INT8_VEC4_NV, INT16_NV, INT16_VEC2_NV, INT16_VEC3_NV, | |

INT16_VEC4_NV, INT64_NV, INT64_VEC2_NV, INT64_VEC3_NV, INT64_VEC4_NV, | |

UNSIGNED_INT8_NV, UNSIGNED_INT8_VEC2_NV, UNSIGNED_INT8_VEC3_NV, | |

UNSIGNED_INT8_VEC4_NV, UNSIGNED_INT16_NV, UNSIGNED_INT16_VEC2_NV, | |

UNSIGNED_INT16_VEC3_NV, UNSIGNED_INT16_VEC4_NV, UNSIGNED_INT64_NV, | |

UNSIGNED_INT64_VEC2_NV, UNSIGNED_INT64_VEC3_NV, UNSIGNED_INT64_VEC4_NV, | |

FLOAT16_NV, FLOAT16_VEC2_NV, FLOAT16_VEC3_NV, or FLOAT16_VEC4_NV. | |

Modify Section 2.14.4, Uniform Variables, p. 89 | |

(modify third paragraph, p. 90) ... uniform variable storage for a vertex | |

shader. A scalar or vector uniform with with 64-bit integer components | |

will consume no more than 2<n> components, where <n> is 1 for scalars, and | |

the component count for vectors. A link error is generated ... | |

(add to Table 2.13, p. 96) | |

Type Name Token Keyword | |

-------------------- ---------------- | |

INT8_NV int8_t | |

INT8_VEC2_NV i8vec2 | |

INT8_VEC3_NV i8vec3 | |

INT8_VEC4_NV i8vec4 | |

INT16_NV int16_t | |

INT16_VEC2_NV i16vec2 | |

INT16_VEC3_NV i16vec3 | |

INT16_VEC4_NV i16vec4 | |

INT64_NV int64_t | |

INT64_VEC2_NV i64vec2 | |

INT64_VEC3_NV i64vec3 | |

INT64_VEC4_NV i64vec4 | |

UNSIGNED_INT8_NV uint8_t | |

UNSIGNED_INT8_VEC2_NV u8vec2 | |

UNSIGNED_INT8_VEC3_NV u8vec3 | |

UNSIGNED_INT8_VEC4_NV u8vec4 | |

UNSIGNED_INT16_NV uint16_t | |

UNSIGNED_INT16_VEC2_NV u16vec2 | |

UNSIGNED_INT16_VEC3_NV u16vec3 | |

UNSIGNED_INT16_VEC4_NV u16vec4 | |

UNSIGNED_INT64_NV uint64_t | |

UNSIGNED_INT64_VEC2_NV u64vec2 | |

UNSIGNED_INT64_VEC3_NV u64vec3 | |

UNSIGNED_INT64_VEC4_NV u64vec4 | |

FLOAT16_NV float16_t | |

FLOAT16_VEC2_NV f16vec2 | |

FLOAT16_VEC3_NV f16vec3 | |

FLOAT16_VEC4_NV f16vec4 | |

(modify list of commands at the bottom of p. 99) | |

void Uniform{1,2,3,4}{i64,ui64}NV(int location, T value); | |

void Uniform{1,2,3,4}{i64,ui64}vNV(int location, T value); | |

(insert after fourth paragraph, p. 100) The Uniform*i64{v}NV and | |

Uniform*ui64{v}NV commands will load <count> sets of one to four 64-bit | |

signed or unsigned integer values into a uniform location defined as a | |

64-bit signed or unsigned integer scalar or vector types. | |

(modify "Uniform Buffer Object Storage", p. 102, adding two bullets after | |

the last "Members of type", and modifying the subsequent bullet) | |

* Members of type int8_t, int16_t, and int64_t are extracted from a | |

buffer object by reading a single byte, short, or int64-typed value at | |

the specified offset. | |

* Members of type uint8_t, uint16_t, and uint64_t are extracted from a | |

buffer object by reading a single ubyte, ushort, or uint64-typed value | |

at the specified offset. | |

* Members of type float16_t are extracted from a buffer object by reading | |

a single half-typed value at the specified offset. | |

* Vectors with N elements with basic data types of bool, int, uint, | |

float, double, int8_t, int16_t, int64_t, uint8_t, uint16_t, uint64_t, | |

or float16_t are extracted as N values in consecutive memory locations | |

beginning at the specified offset, with components stored in order with | |

the first (X) component at the lowest offset. The GL data type used for | |

component extraction is derived according to the rules for scalar | |

members above. | |

Modify Section 2.14.6, Varying Variables, p. 106 | |

(modify third paragraph, p. 107) ... For the purposes of counting input | |

and output components consumed by a shader, variables declared as vectors, | |

matrices, and arrays will all consume multiple components. Each component | |

of variables declared as 64-bit integer scalars or vectors, will be | |

counted as consuming two components. | |

(add after the bulleted list, p. 108) For the purposes of counting the | |

total number of components to capture, each component of outputs declared | |

as 64-bit integer scalars or vectors will be counted as consuming two | |

components. | |

Modify Section 2.15.1, Geometry Shader Input Primitives, p. 118 | |

(add new qualifier at the end of the section, p. 120) | |

Patches (patches) | |

Geometry shaders that operate on patches are valid for the PATCHES | |

primitive type. The number of vertices available to each program | |

invocation is equal to the vertex count of the variable-size patch, with | |

vertices presented to the geometry shader in the order specified in the | |

patch. | |

Modify Section 2.15.4, Geometry Shader Execution Environment, p. 121 | |

(add to the end of "Geometry Shader Inputs", p. 123) | |

Geometry shaders also support built-in and user-defined per-primitive | |

inputs. The following built-in inputs, not replicated per-vertex and not | |

contained in gl_in[], are supported: | |

* The variable gl_PatchVerticesIn is filled with the number of the | |

vertices in the input primitive. | |

* The variables gl_TessLevelOuter[] and gl_TessLevelInner[] are arrays | |

holding outer and inner tessellation levels of an input patch. If a | |

tessellation control shader is active, the tessellation levels will be | |

taken from the corresponding outputs of the tessellation control | |

shader. Otherwise, the default levels provided as patch parameters | |

are used. Tessellation level values loaded in these variables will be | |

prior to the clamping and rounding operations performed by the | |

primitive generator as described in Section 2.X.2 of | |

ARB_tessellation_shader. For triangular tessellation, | |

gl_TessLevelOuter[3] and gl_TessLevelInner[1] will be undefined. For | |

isoline tessellation, gl_TessLevelOuter[2], gl_TessLevelOuter[3], and | |

both values in gl_TessLevelInner[] are undefined. | |

Additionally, a geometry shader with an input primitive type of "patches" | |

may declare per-patch input variables using the qualifier "patch in". | |

Unlike per-vertex inputs, per-patch inputs do not correspond to any | |

specific vertex in the input primitive, and are not indexed by vertex | |

number. Per-patch inputs declared as arrays have multiple values for the | |

input patch; similarly declared per-vertex inputs would indicate a single | |

value for each vertex in the output patch. User-defined per-patch input | |

variables are filled with corresponding per-patch output values written by | |

the tessellation control shader. If no tessellation control shader is | |

active, all such variables are undefined. | |

Per-patch input variables and the built-in inputs "gl_PatchVerticesIn", | |

"gl_TessLevelOuter[]", and "gl_TessLevelInner[]" are supported only for | |

geometry shaders with an input primitive type of "patches". A program | |

will fail to link if any such variable is used in a geometry shader with a | |

input primitive type other than "patches". | |

Modify Section 2.19, Transform Feedback, p. 130 | |

(add to Table 2.14, p. 131) | |

Transform Feedback | |

primitiveMode allowed render primitive modes | |

---------------------- --------------------------------- | |

PATCHES PATCHES | |

(modify first paragraph, p. 131) ... <primitiveMode> is one of TRIANGLES, | |

LINES, POINTS, or PATCHES and specifies the type of primitives that will | |

be recorded into the buffer objects bound for transform feedback (see | |

below). ... | |

(modify last paragraph, p. 131 and first paragraph, p. 132, adding patch | |

support, and dealing with capture of 8- and 16-bit components) | |

When an individual point, line, triangle, or patch primitive reaches the | |

transform feedback stage ... When capturing line, triangle, and patch | |

primitives, all attributes ... For multi-component varying variables or | |

varying array elements, the individual components are written in order. | |

For variables with 8- or 16-bit fixed- or floating-point components, | |

individual components will be converted to and stored as equivalent values | |

of type "int", "uint", or "float". The value for any attribute specified | |

... | |

(modify next-to-last paragraph, p. 132) ... is not incremented. If | |

transform feedback receives a primitive that fits in the remaining space | |

after such an overflow occurs, that primitive may or may not be recorded. | |

Primitives that fail to fit in the remaining space are never recorded. | |

Additions to Chapter 3 of the OpenGL 3.2 (Compatibility Profile) Specification | |

(Rasterization) | |

None. | |

Additions to Chapter 4 of the OpenGL 3.2 (Compatibility Profile) Specification | |

(Per-Fragment Operations and the Frame Buffer) | |

None. | |

Additions to Chapter 5 of the OpenGL 3.2 (Compatibility Profile) Specification | |

(Special Functions) | |

None. | |

Additions to Chapter 6 of the OpenGL 3.2 (Compatibility Profile) Specification | |

(State and State Requests) | |

Modify Section 6.1.15, Shader and Program Queries, p. 332 | |

(add to the first list of commands, p. 337) | |

void GetUniformi64vNV(uint program, int location, int64EXT *params); | |

void GetUniformui64vNV(uint program, int location, uint64EXT *params); | |

Additions to Appendix A of the OpenGL 3.2 (Compatibility Profile) | |

Specification (Invariance) | |

None. | |

Additions to the AGL/GLX/WGL Specifications | |

None. | |

Modifications to The OpenGL Shading Language Specification, Version 1.50 | |

(Revision 09) | |

Including the following line in a shader can be used to control the | |

language features described in this extension: | |

#extension GL_NV_gpu_shader5 : <behavior> | |

where <behavior> is as specified in section 3.3. | |

New preprocessor #defines are added to the OpenGL Shading Language: | |

#define GL_NV_gpu_shader5 1 | |

If the features of this extension are enabled by an #extension directive, | |

shading language features documented in the ARB_gpu_shader5 extension will | |

also be provided. | |

Modify Section 3.6, Keywords, p. 15 | |

(add the following to the list of reserved keywords) | |

int8_t i8vec2 i8vec3 i8vec4 | |

int16_t i16vec2 i16vec3 i16vec4 | |

int32_t i32vec2 i32vec3 i32vec4 | |

int64_t i64vec2 i64vec3 i64vec4 | |

uint8_t u8vec2 u8vec3 u8vec4 | |

uint16_t u16vec2 u16vec3 u16vec4 | |

uint32_t u32vec2 u32vec3 u32vec4 | |

uint64_t u64vec2 u64vec3 u64vec4 | |

float16_t f16vec2 f16vec3 f16vec4 | |

float32_t f32vec2 f32vec3 f32vec4 | |

float64_t f64vec2 f64vec3 f64vec4 | |

(note: the "float64_t" and "f64vec*" types are available if and only if | |

ARB_gpu_shader_fp64 is also supported) | |

Modify Section 4.1, Basic Types, p. 18 | |

(add to the basic "Transparent Types" table, p. 18) | |

Types Meaning | |

-------- ---------------------------------------------------------- | |

int8_t an 8-bit signed integer | |

i8vec2 a two-component signed integer vector (8-bit components) | |

i8vec3 a three-component signed integer vector (8-bit components) | |

i8vec4 a four-component signed integer vector (8-bit components) | |

int16_t a 16-bit signed integer | |

i16vec2 a two-component signed integer vector (16-bit components) | |

i16vec3 a three-component signed integer vector (16-bit components) | |

i16vec4 a four-component signed integer vector (16-bit components) | |

int32_t a 32-bit signed integer | |

i32vec2 a two-component signed integer vector (32-bit components) | |

i32vec3 a three-component signed integer vector (32-bit components) | |

i32vec4 a four-component signed integer vector (32-bit components) | |

int64_t a 64-bit signed integer | |

i64vec2 a two-component signed integer vector (64-bit components) | |

i64vec3 a three-component signed integer vector (64-bit components) | |

i64vec4 a four-component signed integer vector (64-bit components) | |

uint8_t a 8-bit unsigned integer | |

u8vec2 a two-component unsigned integer vector (8-bit components) | |

u8vec3 a three-component unsigned integer vector (8-bit components) | |

u8vec4 a four-component unsigned integer vector (8-bit components) | |

uint16_t a 16-bit unsigned integer | |

u16vec2 a two-component unsigned integer vector (16-bit components) | |

u16vec3 a three-component unsigned integer vector (16-bit components) | |

u16vec4 a four-component unsigned integer vector (16-bit components) | |

uint32_t a 32-bit unsigned integer | |

u32vec2 a two-component unsigned integer vector (32-bit components) | |

u32vec3 a three-component unsigned integer vector (32-bit components) | |

u32vec4 a four-component unsigned integer vector (32-bit components) | |

uint64_t a 64-bit unsigned integer | |

u64vec2 a two-component unsigned integer vector (64-bit components) | |

u64vec3 a three-component unsigned integer vector (64-bit components) | |

u64vec4 a four-component unsigned integer vector (64-bit components) | |

float16_t a single 16-bit floating-point value | |

f16vec2 a two-component floating-point vector (16-bit components) | |

f16vec3 a three-component floating-point vector (16-bit components) | |

f16vec4 a four-component floating-point vector (16-bit components) | |

float32_t a single 32-bit floating-point value | |

f32vec2 a two-component floating-point vector (32-bit components) | |

f32vec3 a three-component floating-point vector (32-bit components) | |

f32vec4 a four-component floating-point vector (32-bit components) | |

float64_t a single 64-bit floating-point value | |

f64vec2 a two-component floating-point vector (64-bit components) | |

f64vec3 a three-component floating-point vector (64-bit components) | |

f64vec4 a four-component floating-point vector (64-bit components) | |

Modify Section 4.1.3, Integers, p. 20 | |

(add after the first paragraph of the section, p. 20) | |

Variables with the types "int8_t", "int16_t", and "int64_t" represent | |

signed integer values with exactly 8, 16, or 64 bits, respectively. | |

Variables with the type "uint8_t", "uint16_t", and "uint64_t" represent | |

unsigned integer values with exactly 8, 16, or 64 bits, respectively. | |

Variables with the type "int32_t" and "uint32_t" represent signed and | |

unsigned integer values with 32 bits, and are equivalent to "int" and | |

"uint" types, respectively. | |

(modify the grammar, p. 21, adding "L" and "UL" suffixes) | |

integer-suffix: one of | |

u U l L ul UL | |

(modify next-to-last paragraph, p. 21) ... When the suffix "u" or "U" is | |

present, the literal has type <uint>. When the suffix "l" or "L" is | |

present, the literal has type <int64_t>. When the suffix "ul" or "UL" is | |

present, the literal has type <uint64_t>. Otherwise, the type is | |

<int>. ... | |

Modify Section 4.1.4, Floats, p. 22 | |

(insert after second paragraph, p. 22) | |

Variables of type "float16_t" represent floating-point using exactly 16 | |

bits and are stored using the 16-bit floating-point representation | |

described in the OpenGL Specification. Variables of type "float32_t" | |

and "float64_t" represent floating-point with 32 or 64 bits, and are | |

equivalent to "float" and "double" types, respectively. | |

Modify Section 4.1.7, Samplers, p. 23 | |

(modify 1st paragraph of the section, deleting the restriction requiring | |

constant indexing of sampler arrays) ... Samplers may aggregated into | |

arrays within a shader (using square brackets [ ]) and can be indexed with | |

general integer expressions. The results of accessing a sampler array | |

with an out-of-bounds index are undefined. ... | |

(remove the additional restriction added by ARB_gpu_shader5 making a | |

similar edit requiring uniform indexing across shader invocations for | |

defined results. NV_gpu_shader5 has no such limitation.) | |

Modify Section 4.1.10, Implicit Conversions, p. 27 | |

(modify table of implicit conversions) | |

Can be implicitly | |

Type of expression converted to | |

-------------------- ----------------------------------------- | |

int uint, int64_t, uint64_t, float, double(*) | |

ivec2 uvec2, i64vec2, u64vec2, vec2, dvec2(*) | |

ivec3 uvec3, i64vec3, u64vec3, vec3, dvec3(*) | |

ivec4 uvec4, i64vec4, u64vec4, vec4, dvec4(*) | |

int8_t int16_t int, int64_t, uint, uint64_t, float, double(*) | |

i8vec2 i16vec2 ivec2, i64vec2, uvec2, u64vec2, vec2, dvec2(*) | |

i8vec3 i16vec3 ivec3, i64vec3, uvec3, u64vec3, vec3, dvec3(*) | |

i8vec4 i16vec4 ivec4, i64vec4, uvec4, u64vec4, vec4, dvec4(*) | |

int64_t uint64_t, double(*) | |

i64vec2 u64vec2, dvec2(*) | |

i64vec3 u64vec3, dvec3(*) | |

i64vec4 u64vec4, dvec4(*) | |

uint uint64_t, float, double(*) | |

uvec2 u64vec2, vec2, dvec2(*) | |

uvec3 u64vec3, vec3, dvec3(*) | |

uvec4 u64vec4, vec4, dvec4(*) | |

uint8_t uint16_t uint, uint64_t, float, double(*) | |

u8vec2 u16vec2 uvec2, u64vec2, vec2, dvec2(*) | |

u8vec3 i16vec3 uvec3, u64vec3, vec3, dvec3(*) | |

u8vec4 i16vec4 uvec4, u64vec4, vec4, dvec4(*) | |

uint64_t double(*) | |

u64vec2 dvec2(*) | |

u64vec3 dvec3(*) | |

u64vec4 dvec4(*) | |

float double(*) | |

vec2 dvec2(*) | |

vec3 dvec3(*) | |

vec4 dvec4(*) | |

float16_t float, double(*) | |

f16vec2 vec2, dvec2(*) | |

f16vec3 vec3, dvec3(*) | |

f16vec4 vec4, dvec4(*) | |

(*) if ARB_gpu_shader_fp64 is supported | |

(Note: Expressions of type "int32_t", "uint32_t", "float32_t", and | |

"float64_t" are treated as identical to those of type "int", "uint", | |

"float", and "double", respectively. Implicit conversions to and from | |

these explicitly-sized types are allowed whenever conversions involving | |

the equivalent base type are allowed.) | |

(modify second paragraph of the section) No implicit conversions are | |

provided to convert from unsigned to signed integer types, from | |

floating-point to integer types, from higher-precision to lower-precision | |

types, from 8-bit to 16-bit types, or between matrix types. There are no | |

implicit array or structure conversions. | |

(add before the final paragraph of the section, p. 27) | |

(insert before the final paragraph of the section) When performing | |

implicit conversion for binary operators, there may be multiple data types | |

to which the two operands can be converted. For example, when adding an | |

int8_t value to a uint16_t value, both values can be implicitly converted | |

to uint, uint64_t, float, and double. In such cases, a floating-point | |

type is chosen if either operand has a floating-point type. Otherwise, an | |

unsigned integer type is chosen if either operand has an unsigned integer | |

type. Otherwise, a signed integer type is chosen. If operands can be | |

converted to both 32- and 64-bit versions of the chosen base data type, | |

the 32-bit version is used. | |

Modify Section 4.3.4, Inputs, p. 31 | |

(modify third paragraph of section, p. 31, allowing explicitly-sized | |

types) ... Vertex shader inputs variables can only be signed and unsigned | |

integers, floats, doubles, explicitly-sized integers and floating-point | |

values, vectors of any of these types, and matrices. ... | |

(modify edits done in ARB_tessellation_shader adding support for "patch | |

in", allowing for geometry shaders as well) Additionally, tessellation | |

evaluation and geometry shaders support per-patch input variables declared | |

with the "patch in" qualifier. Per-patch input ... | |

(modify third paragraph, p. 32) ... Fragment inputs can only be signed and | |

unsigned integers, floats, doubles, explicitly-sized integers and | |

floating-point values, vectors of any of these types, matrices, or arrays | |

or structures of these. Fragment inputs declared as signed or unsigned | |

integers, doubles, 64-bit floating-point values, including vectors, | |

matrices, or arrays derived from those types, must be qualified as "flat". | |

Modify Section 4.3.6, Outputs, p. 33 | |

(modify third paragraph of the section, p. 33) ... They can only be signed | |

and unsigned integers, floats, doubles, explicitly-sized integers and | |

floating-point values, vectors of any of these types, matrices, or arrays | |

or structures of these. | |

(modify last paragraph, p. 33) ... Fragment outputs can only be signed | |

and unsigned integers, floats, explicitly-sized integers and | |

floating-point values with 32 or fewer bits, vectors of any of these | |

types, or arrays of these. Doubles, 64-bit integers or floating-point | |

values, vectors or arrays of those types, matrices, and structures cannot | |

be output. ... | |

Modify Section 4.3.8.1, Input Layout Qualifiers, p. 37 | |

(add to the list of qualifiers for geometry shaders, p. 37) | |

layout-qualifier-id: | |

... | |

triangles_adjacency | |

patches | |

(modify the "size of input arrays" table, p. 38) | |

Layout Size of Input Arrays | |

------------ -------------------- | |

patches gl_MaxPatchVertices | |

(add paragraph below that table, p. 38) | |

When using the input primitive type "patches", the geometry shader is used | |

to process a set of patches with vertex counts that may vary from patch to | |

patch. For the purposes of input array sizing, patches are treated as | |

having a vertex count fixed at the implementation-dependent maximum patch | |

size, gl_MaxPatchVertices. If a shader reads an input corresponding to a | |

vertex not found in the patch being processed, the values read are | |

undefined. | |

Modify Section 5.4.1, Conversion and Scalar Constructors, p. 49 | |

(add after first list of constructor examples) | |

Similar constructors are provided to convert to and from explicitly-sized | |

scalar data types, as well: | |

float(uint8_t) // converts an 8-bit uint value to a float | |

int64_t(double) // converts a double value to a 64-bit int | |

float64_t(int16_t) // converts a 16-bit int value to a 64-bit float | |

uint16_t(bool) // converts a Boolean value to a 16-bit uint | |

(replace final two paragraphs, p. 49, and the first paragraph, p. 50, | |

using more general language) | |

When constructors are used to convert any floating-point type to any | |

integer type, the fractional part of the floating-point value is dropped. | |

It is undefined to convert a negative floating point value to an unsigned | |

integer type. | |

When a constructor is used to convert any integer or floating-point type | |

to bool, 0 and 0.0 are converted to false, and non-zero values are | |

converted to true. When a constructor is used to convert a bool to any | |

integer or floating-point type, false is converted to 0 or 0.0, and true | |

is converted to 1 or 1.0. | |

Constructors converting between signed and unsigned integers with the same | |

bit count always preserve the bit pattern of the input. This will change | |

the value of the argument if its most significant bit is set, converting a | |

negative signed integer to a large unsigned integer, or vice versa. | |

Modify Section 5.9, Expressions, p. 57 | |

(modify bulleted list as follows, adding support for expressions with | |

64-bit integer types) | |

Expressions in the shading language are built from the following: | |

* Constants of type bool, int, int64_t, uint, uint64_t, float, all vector | |

types, and all matrix types. | |

... | |

* The arithmetic binary operators add (+), subtract (-), multiply (*), and | |

divide (/) operate on 32-bit integer, 64-bit integer, and floating-point | |

scalars, vectors, and matrices. If the fundamental types of the | |

operands do not match, the conversions from Section 4.1.10 "Implicit | |

Conversions" are applied to produce matching types. ... | |

* The operator modulus (%) operate on 32- and 64-bit integer scalars or | |

vectors. If the fundamental types of the operands do not match, the | |

conversions from Section 4.1.10 "Implicit Conversions" are applied to | |

produce matching types. ... | |

* The arithmetic unary operators negate (-), post- and pre-increment and | |

decrement (-- and ++) operate on 32-bit integer, 64-bit integer, and | |

floating-point values (including vectors and matrices). ... | |

* The relational operators greater than (>), less than (<), and less than | |

or equal (<=) operate only on scalar 32-bit integer, 64-bit integer, and | |

floating-point expressions. The result is scalar Boolean. The | |

fundamental type of the two operands must match, either as specified, or | |

after one of the implicit type conversions specified in Section 4.1.10. | |

... | |

* The equality operators equal (==), and not equal (!=) operate only on | |

scalar 32-bit integer, 64-bit integer, and floating-point expressions. | |

The result is scalar Boolean. The fundamental type of the two operands | |

must match, either as specified, or after one of the implicit type | |

conversions specified in Section 4.1.10. ... | |

Modify Section 6.1, Function Definitions, p. 63 | |

(ARB_gpu_shader5 adds a set of rules for defining whether implicit | |

conversions for one matching function definition are better or worse than | |

those for another. These comparisons are done argument by argument. | |

Extend the edits made by ARB_gpu_shader5 to add several new rules for | |

comparing implicit conversions for a single argument, corresponding to the | |

new data types introduced by this extension.) | |

To determine whether the conversion for a single argument in one match is | |

better than that for another match, the following rules are applied, in | |

order: | |

1. An exact match is better than a match involving any implicit | |

conversion. | |

2. A match involving a conversion from a signed integer, unsigned | |

integer, or floating-point type to a similar type having a larger | |

number of bits is better a match not involving another conversion. | |

The set of conversions qualifying under this rule are: | |

source types destination types | |

----------------- ----------------- | |

int8_t, int16_t int, int64_t | |

int int64_t | |

uint8_t, uint16_t uint, uint64_t | |

uint uint64_t | |

float16_t float | |

float double | |

3. A match involving one conversion in rule 2 is better than a match | |

involving another conversion in rule 2 if: | |

(a) both conversions start with the same type and the first | |

conversion is to a type with a smaller number of bits (e.g., | |

converting from int16_t to int is preferred to converting | |

int16_t to int64_t), or | |

(b) both conversions end with the same type and the first | |

conversion is from a type with a larger number of bits (e.g., | |

converting an "out" parameter from int16_t to int is preferred | |

to convering from int8_t to int). | |

4. A match involving an implicit conversion from any integer type to | |

float is better than a match involving an implicit conversion from | |

any integer type to double. | |

Modify Section 7.1, Vertex and Geometry Shader Special Variables, p. 69 | |

(NOTE: These edits are written against the re-organized section in the | |

ARB_tessellation_shader specification.) | |

(add to the list of built-ins inputs for geometry shaders) In the geometry | |

language, built-in input and output variables are intrinsically declared | |

as: | |

in int gl_PatchVerticesIn; | |

patch in float gl_TessLevelOuter[4]; | |

patch in float gl_TessLevelInner[2]; | |

... | |

The input variable gl_PatchVerticesIn behaves as in the identically-named | |

tessellation control and evaluation shader inputs. | |

The input variables gl_TessLevelOuter[] and gl_TessLevelInner[] behave as | |

in the identically-named tessellation evaluation shader inputs. | |

Modify Chapter 8, Built-in Functions, p. 81 | |

(add to description of generic types, last paragraph of p. 69) ... Where | |

the input arguments (and corresponding output) can be int64_t, i64vec2, | |

i64vec3, or i64vec4, <genI64Type> is used as the argument. Where the | |

input arguments (and corresponding output) can be uint64_t, u64vec2, | |

u64vec3, or u64vec4, <genU64Type> is used as the argument. | |

Modify Section 8.3, Common Functions, p. 84 | |

(add support for 64-bit integer packing and unpacking functions) | |

Syntax: | |

int64_t packInt2x32(ivec2 v); | |

uint64_t packUint2x32(uvec2 v); | |

ivec2 unpackInt2x32(int64_t v); | |

uvec2 unpackUint2x32(uint64_t v); | |

The functions packInt2x32() and packUint2x32() return a signed or unsigned | |

64-bit integer obtained by packing the components of a two-component | |

signed or unsigned integer vector, respectively. The first vector | |

component specifies the 32 least significant bits; the second component | |

specifies the 32 most significant bits. | |

The functions unpackInt2x32() and unpackUint2x32() return a signed or | |

unsigned integer vector built from a 64-bit signed or unsigned integer | |

scalar, respectively. The first component of the vector contains the 32 | |

least significant bits of the input; the second component consists the 32 | |

most significant bits. | |

(add support for 16-bit floating-point packing and unpacking functions) | |

Syntax: | |

uint packFloat2x16(f16vec2 v); | |

f16vec2 unpackFloat2x16(uint v); | |

The function packFloat2x16() returns an unsigned integer obtained by | |

interpreting the components of a two-component 16-bit floating-point | |

vector as integers according to OpenGL Specification, and then packing the | |

two 16-bit integers into a 32-bit unsigned integer. The first vector | |

component specifies the 16 least significant bits of the result; the | |

second component specifies the 16 most significant bits. | |

The function unpackFloat2x16() returns a two-component vector with 16-bit | |

floating-point components obtained by unpacking a 32-bit unsigned integer | |

into a pair of 16-bit values, and interpreting those values as 16-bit | |

floating-point numbers according to the OpenGL Specification. The first | |

component of the vector is obtained from the 16 least significant bits of | |

the input; the second component is obtained from the 16 most significant | |

bits. | |

(add functions to get/set the bit encoding for floating-point values) | |

64-bit floating-point data types in the OpenGL shading language are | |

specified to be encoded according to the IEEE specification for | |

double-precision floating-point values. The functions below allow shaders | |

to convert double-precision floating-point values to and from 64-bit | |

signed or unsigned integers representing their encoding. | |

To obtain signed or unsigned integer values holding the encoding of a | |

floating-point value, use: | |

genI64Type doubleBitsToInt64(genDType value); | |

genU64Type doubleBitsToUint64(genDType value); | |

Conversions are done on a component-by-component basis. | |

To obtain a floating-point value corresponding to a signed or unsigned | |

integer encoding, use: | |

genDType int64BitsToDouble(genI64Type value); | |

genDType uint64BitsToDouble(genU64Type value); | |

(add functions to evaluate predicates over groups of threads) | |

Syntax: | |

bool anyThreadNV(bool value); | |

bool allThreadsNV(bool value); | |

bool allThreadsEqualNV(bool value); | |

Implementations of the OpenGL Shading Language may, but are not required, | |

to run multiple shader threads for a single stage as a SIMD thread group, | |

where individual execution threads are assigned to thread groups in an | |

undefined, implementation-dependent order. Algorithms may benefit from | |

being able to evaluate a composite of boolean values over all active | |

threads in the thread group. | |

The function anyThreadNV() returns true if and only if <value> is true for | |

at least one active thread in the group. The function allThreadsNV() | |

returns true if and only if <value> is true for all active threads in the | |

group. The function allThreadsEqualNV() returns true if <value> is the | |

same for all active threads in the group; the result of | |

allThreadsEqualNV() will be true if and only if anyThreadNV() and | |

allThreadsNV() would return the same value. | |

Since these functions depends on the values of <value> in an undefined | |

group of threads, the value returned by these functions is largely | |

undefined. However, anyThreadNV() is guaranteed to return true if <value> | |

is true, and allThreadsNV() is guaranteed to return false if <value> is | |

false. | |

Since implementations are generally not required to combine threads into | |

groups, simply returning <value> for anyThreadNV() and allThreadsNV() and | |

returning true for allThreadsEqualNV() is a legal implementation of these | |

functions. | |

Modify Section 8.6, Vector Relational Functions, p. 90 | |

(modify the first paragraph, p. 90, adding support for relational | |

functions operating on explicitly-sized types) | |

Relational and equality operators (<, <=, >, >=, ==, !=) are defined (or | |

reserved) to operate on scalars and produce scalar Boolean results. For | |

vector results, use the following built-in functions. In the definitions | |

below, the following terms are used as placeholders for all vector types | |

for a given fundamental data type: | |

placeholder fundamental types | |

----------- ------------------------------------------------ | |

bvec bvec2, bvec3, bvec4 | |

ivec ivec2, ivec3, ivec4, i8vec2, i8vec3, i8vec4, | |

i16vec2, i16vec3, i16vec4, i64vec2, i64vec3, i64vec4 | |

uvec uvec2, uvec3, uvec4, u8vec2, u8vec3, u8vec4, | |

u16vec2, u16vec3, u16vec4, u64vec2, u64vec3, u64vec4 | |

vec vec2, vec3, vec4, dvec2(*), dvec3(*), dvec4(*), | |

f16vec2, f16vec3, f16vec4 | |

(*) only if ARB_gpu_shader_fp64 is supported | |

In all cases, the sizes of the input and return vectors for any | |

particular call must match. | |

Modify Section 8.7, Texture Lookup Functions, p. 91 | |

(modify text for textureOffset() functions, p. 94, allowing non-constant | |

offsets) | |

Do a texture lookup as in texture but with offset added to the (u,v,w) | |

texel coordinates before looking up each texel. The value <offset> need | |

not be constant; however, a limited range of offset values are supported. | |

If any component of <offset> is less than MIN_PROGRAM_TEXEL_OFFSET_EXT or | |

greater than MAX_PROGRAM_TEXEL_OFFSET_EXT, the offset applied to the | |

texture coordinates is undefined. Note that offset does not apply to the | |

layer coordinate for texture arrays. This is explained in detail in | |

section 3.9.9 of the OpenGL Specification (Version 3.2, Compatibility | |

Profile), where offset is (delta_u, delta_v, delta_w). Note that texel | |

offsets are also not supported for cube maps. | |

(Note: This lifting of the constant offset restriction also applies to | |

texelFetchOffset, p. 95, textureProjOffset, p. 95, textureLodOffset, | |

p. 96, textureProjLodOffset, p. 96.) | |

(modify the description of the textureGradOffset() functions, p. 97, | |

preserving the restriction on constant offsets) | |

Do a texture lookup with both explicit gradient and offset, as described | |

in textureGrad and textureOffset. For these functions, the offset value | |

must be a constant expression. A limited range of offset values are | |

supported; the minimum and maximum offset values are | |

implementation-dependent and given by MIN_PROGRAM_TEXEL_OFFSET and | |

MAX_PROGRAM_TEXEL_OFFSET, respectively. | |

(modify the description of the textureProjGradOffset() functions, | |

p. 98, preserving the restriction on constant offsets) | |

Do a texture lookup projectively and with explicit gradient as described | |

in textureProjGrad, as well as with offset, as described in textureOffset. | |

For these functions, the offset value must be a constant expression. A | |

limited range of offset values are supported; the minimum and maximum | |

offset values are implementation-dependent and given by | |

MIN_PROGRAM_TEXEL_OFFSET and MAX_PROGRAM_TEXEL_OFFSET, respectively. | |

(modify the description of the textureGatherOffsets() functions, | |

added in ARB_gpu_shader5, to remove the restriction on constant offsets) | |

The textureGatherOffsets() functions operate identically ... | |

selecting the texel T_i0_j0 of that footprint. The specified values in | |

<offsets> need not be constant. A limited range of ... | |

Modify Section 9, Shading Language Grammar, p. 92 | |

!!! TBD !!! | |

GLX Protocol | |

TBD | |

Interactions with OpenGL ES 3.1 | |

If implemented in OpenGL ES, NV_gpu_shader5 acts as a superset | |

of functionality provided by OES_gpu_shader5. | |

A shader that enables this extension | |

via an #extension directive also implicitly enables the common | |

capabilities provided by OES_gpu_shader5. | |

Replace references to ARB_gpu_shader5 with OES_gpu_shader5 and | |

EXT_shader_implicit_conversions (as appropriate). | |

Replace references to ARB_geometry_shader with OES/EXT_geometry_shader. | |

Replace references to ARB_tessellation_shader with OES/EXT_tessellation_shader. | |

Replace references to int64EXT and uint64EXT with int64 and uint64, | |

respectively. | |

The specification should be edited as follows to include new | |

ProgramUniform* functions. | |

(modify the ProgramUniform* language) | |

The following commands: | |

.... | |

void ProgramUniform{1,2,3,4}{i64,ui64}NV | |

(uint program int location, T value); | |

void ProgramUniform{1,2,3,4}{i64,ui64}vNV | |

(uint program, int location, const T *value); | |

operate identically to the corresponding command where "Program" is | |

deleted from the name (and extension suffixes are dropped or updated | |

appropriately) except, rather than updating the currently active program | |

object, these "Program" commands update the program object named by the | |

<program> parameter. ... | |

Changes to Section 2.6.1 "Begin and End" don't apply. | |

Disregard introduction of 64bit -integer or -floating point vertex | |

attribute types. | |

Interactions with OpenGL ES Shading Language 3.10, revision 3 | |

If implemented in GLSL ES, NV_gpu_shader5 acts as a superset | |

of functionality provided by OES_gpu_shader5 and | |

EXT_shader_implicit_conversions. | |

A shader that enables this extension via an #extension directive | |

also implicitly enables the common capabilities provided by | |

OES_gpu_shader5 and EXT_shader_implicit_conversions. | |

Replace references to ARB_tessellation_shader with OES/EXT_tessellation_shader. | |

Implicit conversion between GLSL ES types are introduced by | |

EXT_shader_implicit_conversions instead of ARB_gpu_shader5. | |

Disregard the notion of 'double' types as vertex shader inputs. | |

Section 4.1.7.2 "Images" | |

Remove the third sentence restricts | |

access to arrays of images to constant integral expression. | |

This essentially leaves it to the 'dynamically uniform integral | |

expressions' default as OES_gpu_shader5 introduced. | |

Modify Section 4.3.9 "Interface Blocks", as modified OES_gpu_shader5 | |

NV_gpu_shader5 also lifts OES_gpu_shader5 restrictions with | |

regard to indexing into arrays of uniforms blocks and shader | |

storage blocks. | |

Change sentence | |

"All indices used to index a shader storage block array must be | |

constant integral expressions. A uniform block array can only | |

be indexed with a dynamically uniform integral expression, | |

otherwise results are undefined." into | |

"Arbitrary indices may be used to index a uniform block array; | |

integral constant expressions are not required. If the index | |

used to access an array of uniform blocks is out-of-bounds, | |

the results of the access are undefined." | |

Indexing into arrays of shader storage blocks defaults to | |

'dynamically uniform integral expressions'. | |

Changes to Section 4.3.9, p.48 "Interface Blocks" | |

Replace the sentence | |

"All indices used to index a shader storage block array must be | |

constant integral expressions. A uniform block array can only | |

be indexed with a dynamically uniform integral expression, | |

otherwise results are undefined." | |

with | |

"Arbitrary indices may be used to index a uniform block array; | |

integral constant expressions are not required. If the index | |

used to access an array of uniform blocks is out-of-bounds, the | |

results of the access are undefined." | |

4.4.1.1 "Compute Shader Inputs" change | |

"layout-qualifier-id: | |

local_size_x = integer-constant | |

local_size_y = integer-constant | |

local_size_z = integer-constant" into | |

"layout-qualifier-id: | |

local_size_x = integer-constant-expression | |

local_size_y = integer-constant-expression | |

local_size_z = integer-constant-expression" | |

Section 4.4.1.gs "Geometry Shader Inputs" change | |

"<layout-qualifier-id> | |

... | |

invocations = integer-constant" into | |

"<layout-qualifier-id> | |

... | |

invocations = integer-constant-expression" | |

Section 4.4.2 "Output Layout Qualifiers" change | |

"layout-qualifier-id: | |

location = integer-constant" into | |

"layout-qualifier-id: | |

location = integer-constant-expression" | |

Section 4.4.2.ts "Tessellation Control Outputs" change | |

"layout-qualifier-id | |

vertices = integer-constant" into | |

"layout-qualifier-id: | |

vertices = integer-constant-expression" | |

Section 4.4.3 "Uniform Variable Layout Qualifiers" change | |

"layout-qualifier-id: | |

location = integer-constant" into | |

"layout-qualifier-id: | |

location = integer-constant-expression" | |

Section 4.4.4 "Uniform and Shader Storage Block Layout Qualifiers" change | |

"layout-qualifier-id: | |

... | |

binding = integer-constant" into | |

"layout-qualifier-id: | |

... | |

binding = integer-constant-expression" | |

Section 4.4.5 "Opaque Uniform Layout Qualifiers" change | |

"layout-qualifier-id: | |

binding = integer-constant" into | |

"layout-qualifier-id: | |

binding = integer-constant-expression" | |

Change sentence | |

"A link-time error will result if two shaders in a program | |

specify different integer-constant bindings for the same | |

opaque-uniform name." into | |

"A link-time error will result if two shaders in a program | |

specify different bindings for the same opaque-uniform | |

name." | |

Section 4.4.6 "Atomic Counter Layout Qualifiers" change | |

"layout-qualifier-id: | |

binding = integer-constant | |

offset = integer-constant" into | |

"layout-qualifier-id: | |

binding = integer-constant-expression | |

offset = integer-constant-expression" | |

Section 4.4.7 "Format Layout Qualifiers" change | |

"layout-qualifier-id: | |

... | |

binding = integer-constant" into | |

"layout-qualifier-id: | |

... | |

binding = integer-constant-expression" | |

Section 4.7.3 "Precision Qualifiers" | |

After "Literal constants do not have precision qualifiers." add | |

"Neither do explicitly sized types such as int8_t, uint32_t, | |

float16_t etc." | |

Dependencies on OES_gpu_shader5 | |

In addition to allowing arbitrary indexing arrays of samplers, this | |

extension also lifts OES_gpu_shader5 restrictions for indexing | |

arrays of images and shader storage blocks. Additionally, it allows | |

usage of 'integer-constant-expressions' for layout qualifiers that | |

formerly took 'integer-constant'. | |

In Section 'Overview': change the bullet point | |

"* the ability to aggregate samplers into arrays...." | |

to | |

"* the ability to index into arrays of samplers, uniforms and shader | |

storage blocks with arbitrary expressions, and not require that | |

non-constant indices be uniform across all shader invocations." | |

"* the ability to index into arrays of images using dynamically | |

uniform integers." | |

"* the ability to use 'integer-constant-expressions' in place of | |

'integer-constant' for layout qualifiers." | |

Dependencies on OES/EXT_tessellation_shader | |

If implemented in GLSL ES and OES/EXT_tessellation_shader is not | |

supported, language introduced by this extension describing | |

processing patches in geometry shaders, transform feedback, and | |

rasterization should be removed. | |

If implemented in GLSL ES and OES/EXT_tessellation_shader is supported, | |

it is legal to send patches past the tessellation stage -- the | |

following language from OES/EXT_tessellation_shader is removed: | |

Patch primitives are not supported by pipeline stages below the | |

tessellation evaluation shader. | |

Dependencies on OES/EXT_geometry_shader | |

If implemented in GLSL ES and OES/EXT_geometry_shader is not supported, | |

disregard all changes to geometry shader related functionality. | |

Dependencies on ARB_gpu_shader5 | |

This extension also incorporates all the changes to the OpenGL Shading | |

Language made by ARB_gpu_shader5; enabling this extension by a #extension | |

directive in shader code also enables all features of ARB_gpu_shader5 as | |

though the shader code has also declared | |

#extension GL_ARB_gpu_shader5 : enable | |

The converse is not true; implementations supporting both extensions | |

should not provide the shading language features in this extension if | |

shader code #extension directives enable only ARB_gpu_shader5. | |

This specification and ARB_gpu_shader5 both lift the restriction in GLSL | |

1.50 requiring that indexing in arrays of samplers must be done with | |

constant expressions. However, ARB_gpu_shader5 specifies that results are | |

undefined if the indices would diverge if multiple shader invocations are | |

run in lockstep. This extension does not impose the non-divergent | |

indexing requirement. | |

Dependencies on ARB_gpu_shader_fp64 | |

This extension and ARB_gpu_shader_fp64 both provide support for shading | |

language variables with 64-bit components. If both extensions are | |

supported, the various edits describing this new support should be | |

combined. | |

If ARB_gpu_shader_fp64 is not supported, the following edits should be | |

removed: | |

* language adding the data types "float64_t", "f64vec2", "f64vec3", and | |

"f64vec4"; | |

* language allowing implicit conversions of various types to double, | |

dvec2, dvec3, or dvec4; and | |

* the built-in functions doubleBitsToInt64(), doubleBitsToUint64(), | |

int64BitsToDouble(), and uint64BitsToDouble(). | |

Dependencies on ARB_tessellation_shader | |

If ARB_tessellation_shader is not supported, language introduced by this | |

extension describing processing patches in geometry shaders, transform | |

feedback, and rasterization should be removed. | |

If this extension and ARB_tessellation_shader are supported, it is legal | |

to send patches past the tessellation stage -- the following language from | |

ARB_tessellation_shader is removed: | |

Patch primitives are not supported by pipeline stages below the | |

tessellation evaluation shader. If there is no active program object or | |

the active program object does not contain a tessellation evaluation | |

shader, the error INVALID_OPERATION is generated by Begin (or vertex | |

array commands that implicitly call Begin) if the primitive mode is | |

PATCHES. | |

Dependencies on NV_shader_buffer_load | |

If NV_shader_buffer_load is supported, that specification should be edited | |

as follows, to allow pointers to dereference the new data types added by | |

this extension. | |

Modify "Section 2.20.X, Shader Memory Access" from NV_shader_buffer_load. | |

(add rules for loads of variables having the new data types from this | |

extension to the list of bullets following "When a shader dereferences a | |

pointer variable") | |

- Data of type "int8_t," "int16_t", "int32_t", and "int64_t" are read | |

from or written to memory as a single 8-, 16-, 32-, or 64-bit signed | |

integer value at the specified GPU address. | |

- Data of type "uint8_t," "uint16_t", "uint32_t", and "uint64_t" are read | |

from or written to memory as a single 8-, 16-, 32-, or 64-bit unsigned | |

integer value at the specified GPU address. | |

- Data of type "float16_t", "float32_t", and "float64_t" are read from or | |

written to memory as a single 16-, 32-, or 64-bit floating-point value | |

at the specified GPU address. | |

Dependencies on EXT_direct_state_access | |

If EXT_direct_state_access is supported, that specification should be | |

edited as follows to include new ProgramUniform* functions. | |

(modify the ProgramUniform* language) | |

The following commands: | |

.... | |

void ProgramUniform{1,2,3,4}{i64,ui64}NV | |

(uint program int location, T value); | |

void ProgramUniform{1,2,3,4}{i64,ui64}vNV | |

(uint program, int location, const T *value); | |

operate identically to the corresponding command where "Program" is | |

deleted from the name (and extension suffixes are dropped or updated | |

appropriately) except, rather than updating the currently active program | |

object, these "Program" commands update the program object named by the | |

<program> parameter. ... | |

Dependencies on EXT_vertex_attrib_64bit and NV_vertex_attrib_integer_64bit | |

The EXT_vertex_attrib_64bit extension provides the ability to specify | |

64-bit floating-point vertex attributes in a GLSL vertex shader and the | |

specify the values of these attributes via the OpenGL API. To | |

successfully compile vertex shaders with fp64 input variables, is | |

necessary to include | |

#extension GL_EXT_vertex_attrib_64bit : enable | |

in the shader text. | |

However, this extension is considered to enable 64-bit | |

floating-point and integer inputs. Provided EXT_vertex_attrib_64bit | |

and NV_vertex_attrib_integer_64bit are supported, including the | |

following code in a vertex shader | |

#extension GL_NV_gpu_shader5 : enable | |

will enable 64-bit floating-point or integer input variables whose | |

values would be specified using the OpenGL API mechanisms found in | |

the EXT_vertex_attrib_64bit and NV_vertex_attrib_integer_64bit | |

extensions. | |

Errors | |

None. | |

New State | |

None. | |

New Implementation Dependent State | |

None. | |

Issues | |

(1) What implicit conversions are supported by this extension on top of | |

those provided by related extensions? | |

RESOLVED: ARB_gpu_shader5 and ARB_gpu_shader_fp64 provide new implicit | |

conversions from "int" to "uint", and from "int", "uint", and "float" to | |

"double". | |

This extension provides integer types of multiple sizes and supports | |

implicit conversions from small integer types to 32- or 64-bit integer | |

types of the same signedness, as well as float and double. It also | |

provides floating-point types of multiple sizes and supports implicit | |

conversions from smaller to larger types. Additionally, it supports | |

conversion from 64-bit integer types to double. | |

(2) How do these implicit conversions impact binary operators? | |

RESOLVED: For binary operators, we prefer converting to a common type | |

that is as close as possible in size and type to the original | |

expression. | |

(3) How do these implicit conversions impact function overloading rules? | |

RESOLVED: We extend the preference rules in ARB_gpu_shader5 to account | |

for the new data types, adding rules to: | |

* favor new "promotions" in integer/floating point types (previously, | |

the only promotion was float-to-double) | |

* for promotions, favor conversion to the type closer in size (e.g., | |

prefer converting from int16_t to int over converting to int64_t) | |

(4) What should be done to distinguish between 32- and 64-bit integer | |

constants? | |

RESOLVED: We will use "L" and "UL" to identify signed and unsigned | |

64-bit integer constants; the use of "L" matches a similar ("long") | |

suffix in the C programming language. C leaves the size of integer | |

types implementation-dependent, and many implementations require an "LL" | |

suffix to declare 64-bit integer constants. With our size definitions, | |

"L" will be considered sufficient to make an integer constant 64-bit. | |

(5) Should provide support for vertex attributes with 64-bit components, | |

and if so, how should the support be provided in the OpenGL API? | |

RESOLVED: Yes, this seems like useful functionality, particularly for | |

applications wanting to provide double-precision or 64-bit integer data | |

to shaders performing computations on such types. We provide | |

VertexAttribL* entry points for 64-bit components in the separate | |

EXT_vertex_attrib_64bit and NV_vertex_attrib_64bit extensions, which | |

should be supported on all implementations supporting this extension. | |

(6) Should we allow vertex attributes with 8- or 16-bit components in the | |

shading language, and if so, how does it interact with the OpenGL API? | |

RESOLVED: Yes, but we will use existing APIs to specify such | |

attributes, which already typically allow 8- and 16-bit components on | |

the API side. Vertex attribute components (other than 64-bit ones) | |

specified by the API will be converted from the type specified in the | |

vertex attribute commands to the component type of the attribute. For | |

floating-point values, that may involve 16-to-32 bit conversion or vice | |

versa. For integer types, that may involve dropping all but the least | |

significant bits of attribute components. | |

(7) Should we support uniforms with double or 64-bit attribute types, and | |

if so, how? Should we support uniforms with <32-bit components, and | |

if so, how? | |

RESOLVED: We will support uniforms of all component types, either in a | |

buffer object (via OpenGL 3.1 or ARB_uniform_buffer_object) or in | |

storage associated with the program. | |

When uniforms are stored in buffer object, they are stored using their | |

native data types according to the pre-existing packing and layout | |

rules. Those rules were already written to be able to accommodate both | |

the larger and smaller new data types. | |

Uniforms stored in program objects are loaded with Uniform* APIs. There | |

are no pre-existing uniform APIs accepting doubles or other "long" | |

types, so there was no clear need to add an extra "L" to the name to | |

distinguish from other APIs like we do with VertexAttribL* APIs. | |

Uniforms with 8- and 16- bit components are loaded with the "larger" | |

Uniform*{i,ui,f} APIs; it didn't seem worth it to add numerous entry | |

points to the APIs to handle all those new types. | |

(8) How do the uniform loading commands introduced by this extension | |

interact similar commands added by NV_shader_buffer_load? | |

RESOLVED: NV_shader_buffer_load provided the command Uniformui64NV to | |

load pointer uniforms with a single 64-bit unsigned integer. This | |

extension provides vectors of 64-bit unsigned integers, so we needed | |

Uniform{2,3,4}ui64NV commands. We chose to provide a Uniform1ui64NV | |

command, which will be functionally equivalent to Uniformui64NV. | |

(9) How will transform feedback work for capturing variables with double | |

or 64-bit components? Should we support transform feedback on | |

variables with components with fewer than 32 bits? | |

RESOLVED: Transform feedback will support variables with any component | |

size. Components with fewer than 32-bits are converted to their | |

equivalent 32-bit types. | |

For doubles and variables with 64-bit components, each component | |

captured will count as 64-bit values and occupy two components for the | |

purpose of component counting rules. This could be a problem for the | |

SEPARATE_ATTRIBS mode, since the minimum component limit is four, which | |

would not be sufficient to capture a dvec3 or dvec4. However, | |

implementations supporting this extension should also be able to support | |

ARB_transform_feedback3, which extends INTERLEAVED_ATTRIBS mode to | |

capture vertex attribute values interleaved into multiple buffers. That | |

functionality effectively obsoletes the SEPARATE_ATTRIBS mode, since it | |

is a functional superset. | |

We considered support for capturing 8- and 16-bit values directly, which | |

had a number of problems. First, full byte addressing might impose both | |

alignment issues (e.g., capturing a uint8_t followed by a float might | |

misalign the float) and additional hardware implementation burdens. One | |

other option would be to pack multiple values into a 32-bit integer | |

(e.g., f16vec2 would be packed with .x in the LSBs and .y in the MSBs). | |

This could work, even with word addressing, but would require padding | |

for odd sizes (e.g., f16vec2 padded to two words, with the second word | |

holding only .z). It would also have endianness issues; packed values | |

would look like arrays of the corresponding smaller type on | |

little-endian systems, but not on big-endian ones. | |

(10) What precision will be used for computation, storage, and inter-stage | |

transfer of 8- and 16-bit component data types? | |

RESOLVED: The components may be considered to occupy a full 32 bits for | |

the purposes of input/output component count limits. 8- and 16-bit | |

values should, however, be passed at that precision. | |

(11) Is the new support for non-constant texel offsets completely | |

orthogonal? | |

RESOLVED: No. Non-constant offsets are not supported for the existing | |

functions textureGradOffset() and textureProjGradOffset(). | |

(12) Should we provide functions like intBitsToFloat() that operate on | |

16-bit floating-point values? | |

RESOLVED: Not in this extension. Such conversions can be performed | |

using the following code: | |

uint16_t float16BitsToUint16(float16_t v) | |

{ | |

return uint16_t(packFloat2x16(f16vec2(v, 0)); | |

} | |

float16_t uint16BitsToFloat16(uint16_t v) | |

{ | |

return unpackFloat2x16(uint(v)).x; | |

} | |

(13) Should we provide distinct sized types for 32-bit integers and | |

floats, and 64-bit floats? Should we provide those types as aliases | |

for existing unsized types? Or should we provide no such types at | |

all? | |

RESOLVED: We will provide sized versions of these types, which are | |

defined as completely equivalent to unsized types according to the | |

following table: | |

unsized type sized types | |

------------- --------------- | |

int int32_t | |

uint uint32_t | |

float float32_t | |

double float64_t | |

Vector types with sized and unsized components have equivalent | |

relationships. | |

Note that the nominally "unsized" data types in the GLSL 1.30 spec are | |

actually sized. The specification explicitly defines signed and unsized | |

integers (int, uint) to be 32-bit values. It also defines | |

floating-point values to "match the IEEE single precision floating-point | |

definition for precision and dynamic range", which are also 32-bit | |

values. | |

This type equivalence has minor implications on function overloading: | |

* You can't declare separate versions of a function with an "int" | |

argument in one version and an "int32_t" argument in another. | |

* Because there is no implicit conversion between equivalent types, we | |

will get an exact match if an argument is declared with one type | |

(e.g., "int") in the caller and a textually different but equivalent | |

type ("int32_t") in the function. | |

Note that the type equivalence also applies to API data type queries. | |

For example, the type INT will be returned for a variable declared as | |

"int32_t". | |

(14) What are functions like anyThreadNV() and allThreadsNV() good for? | |

NRESOLVED: If an implementation performs SIMD thread execution, | |

divergent branching may result in reduced performance if the "if" and | |

"else" blocks of an "if" statement are executed sequentially. For | |

example, an algorithm may have both a "fast path" that performs a | |

computation quickly for a subset of all cases and a "fast path" that | |

performs a computation quickly but correctly. When performing SIMD | |

execution, code like the following: | |

if (condition) { | |

result = do_fast_path(...); | |

} else { | |

result = do_slow_path(...); | |

} | |

may end up executing *both* the fast and slow paths for a SIMD thread | |

group if <condition> diverges, and may execute more slowly than simply | |

executing the slow path unconditionally. These functions allow code | |

like: | |

if (allThreadsNV(condition)) { | |

result = do_fast_path(...); | |

} else { | |

result = do_slow_path(...); | |

} | |

that executes the fast path if and only if it can be used for *all* | |

threads in the group. For thread groups where <condition> diverges, | |

this algorithm would unconditionally run the slow path, but would never | |

run both in sequence. | |

There may be other cases where "voting" across shader invocations may be | |

useful. Note that we provide no control over how shader invocations may | |

be packed within a SIMD thread group, unlike various "compute" APIs | |

(CUDA, OpenCL). | |

(15) Can the 64-bit uniform APIs be used to load values for uniforms of | |

type "bool", "bvec2", "bvec3", or "bvec4"? | |

RESOLVED: No. OpenGL 2.0 and beyond did allow "bool" variable to be | |

set with Uniform*i* and Uniform*f APIs, and OpenGL 3.0 extended that | |

support to Uniform*ui* for orthogonality. But it seems pointless to | |

extended this capability forward to 64-bit Uniform APIs as well. | |

(19) The ARB_tessellation_shader extension adds support for patch | |

primitives that might survive to the transform feedback stage. How | |

are such primitives captured? | |

RESOLVED: If patch primitives survive to the transform feedback stage, | |

they are recorded on a patch-by-patch basis. Incomplete patches are not | |

recorded. As with other primitive types, if the transform feedback | |

buffers do not contain enough space to capture an entire patch, no | |

vertices are recorded. | |

Note that the only way to get patch primitives all the way to transform | |

feedback is to have tessellation evaluation and geometry shaders | |

disabled; the output streams from both of those shader stages are | |

collections of points, lines, or triangles. | |

(20) Previous transform feedback allowed capturing only fixed-size | |

primitives; this extension supports variable-sized patches. What | |

interactions does this functionality have with transform feedback | |

buffer overflow? | |

RESOLVED: With fixed-size point, line, or triangle primitives, once any | |

primitive fails to be recorded due to insufficient space, all subsequent | |

primitives would also fail. With variable-size patch primitives, the | |

transform feedback stage might first receive a large patch that doesn't | |

fit, followed by a smaller patch that could squeeze into the remaining | |

space. | |

To allow for different types of implementation of this extension without | |

requiring special-case handling of this corner case, we've chosen to | |

leave this behavior undefined -- the smaller patch may or may not be | |

recorded. | |

Revision History | |

Rev. Date Author Changes | |

---- -------- -------- ----------------------------------------- | |

10 04/16/16 mheyer Add OpenGL ES interactions (written before | |

revision 9, but not published) | |

9 02/19/16 pbrown Clarify that non-constant offset vectors are | |

supported in textureGatherOffsets(). | |

8 09/11/14 pbrown Fix incorrect implicit conversions, which | |

follow the general pattern of little->big | |

and int->uint->float. Thanks to Daniel | |

Rakos, author of similar functionality in | |

the AMD_gpu_shader_int64 spec. | |

7 11/08/10 pbrown Fix typos in description of packFloat2x16 and | |

unpackFloat2x16. | |

6 03/23/10 pbrown Update overview, dependencies, remove references | |

to old extension names. Extend the function | |

overloading prioritization rules from | |

ARB_gpu_shader5 to account for new data types. | |

Major overhaul of the issues section to match | |

the refactoring done to produce ARB specs. | |

5 03/08/10 pbrown Add interaction with EXT_vertex_attrib_64bit and | |

NV_vertex_attrib_integer_64bit; enabling this | |

extension automatically enables 64-bit floating- | |

point and integer vertex inputs. | |

4 03/01/10 pbrown Fix prototype for GetUniformui64vNV. | |

3 01/14/10 pbrown Fix with updated enum assignments. | |

2 12/08/09 pbrown Add explicit component counting rules for | |

64-bit integer attributes similar to those | |

in the ARB_gpu_shader_fp64 spec. | |

1 pbrown Internal revisions. |