| Name |
| |
| NV_shader_atomic_fp16_vector |
| |
| Name Strings |
| |
| GL_NV_shader_atomic_fp16_vector |
| |
| Contact |
| |
| Jeff Bolz, NVIDIA Corporation (jbolz 'at' nvidia.com) |
| |
| Contributors |
| |
| Pat Brown, NVIDIA |
| Mathias Heyer, NVIDIA |
| |
| Status |
| |
| Shipping |
| |
| Version |
| |
| Last Modified Date: February 4, 2015 |
| NVIDIA Revision: 3 |
| |
| Number |
| |
| OpenGL Extension #474 |
| OpenGL ES Extension #261 |
| |
| Dependencies |
| |
| This extension is written against the OpenGL 4.3 (Compatibility Profile) |
| Specification. |
| |
| This extension is written against version 4.30 of the OpenGL Shading |
| Language Specification. |
| |
| This extension interacts with NV_shader_buffer_store and NV_gpu_shader5. |
| |
| This extension interacts with NV_gpu_program5, NV_shader_buffer_store, and |
| NV_gpu_program5_mem_extended. |
| |
| This extension requires NV_gpu_shader5. |
| |
| This extension interacts with NV_shader_storage_buffer_object. |
| |
| This extension interacts with NV_compute_program5. |
| |
| This extension interacts with NV_image_formats. |
| |
| This extension interacts with OES_shader_image_atomic. |
| |
| Overview |
| |
| This extension provides GLSL built-in functions and assembly opcodes |
| allowing shaders to perform a limited set of atomic read-modify-write |
| operations to buffer or texture memory with 16-bit floating point vector |
| surface formats. |
| |
| New Procedures and Functions |
| |
| None. |
| |
| New Tokens |
| |
| None. |
| |
| Additions to the AGL/GLX/WGL Specifications |
| |
| None. |
| |
| GLX Protocol |
| |
| None. |
| |
| Modifications to the OpenGL Shading Language Specification, Version 4.30 |
| |
| Including the following line in a shader can be used to control the |
| language features described in this extension: |
| |
| #extension GL_NV_shader_atomic_fp16_vector : <behavior> |
| |
| where <behavior> is as specified in section 3.3. |
| |
| New preprocessor #defines are added to the OpenGL Shading Language: |
| |
| #define GL_NV_shader_atomic_fp16_vector 1 |
| |
| Modify Section 8.11, Atomic Memory Functions (p. 163) |
| |
| Add before the table of functions: |
| |
| Some atomic memory operations are supported on two- and four-component |
| vectors with 16-bit floating-point components. |
| |
| Add new functions to the table |
| |
| // Computes a new value per-component using the specified operation. |
| // Atomicity is only guaranteed on a per-component basis. |
| f16vec2 atomicAdd(inout f16vec2 mem, f16vec2 data); |
| f16vec4 atomicAdd(inout f16vec4 mem, f16vec4 data); |
| f16vec2 atomicMin(inout f16vec2 mem, f16vec2 data); |
| f16vec4 atomicMin(inout f16vec4 mem, f16vec4 data); |
| f16vec2 atomicMax(inout f16vec2 mem, f16vec2 data); |
| f16vec4 atomicMax(inout f16vec4 mem, f16vec4 data); |
| f16vec2 atomicExchange(inout f16vec2 mem, f16vec2 data); |
| f16vec4 atomicExchange(inout f16vec4 mem, f16vec4 data); |
| |
| |
| Modify Section 8.12, Image Functions (p. 164) |
| |
| Add before the table of functions: |
| |
| Some atomic memory operations are supported on two- and four-component |
| vectors with 16-bit floating-point components, for images with format |
| qualifiers of <rg16f> and <rgba16f>. |
| |
| Add new functions to the table: |
| |
| // Computes a new value per-component using the specified operation |
| // Atomicity is only guaranteed on a per-component basis. |
| f16vec2 imageAtomicAdd(IMAGE_PARAMS, f16vec2 data); |
| f16vec4 imageAtomicAdd(IMAGE_PARAMS, f16vec4 data); |
| f16vec2 imageAtomicMin(IMAGE_PARAMS, f16vec2 data); |
| f16vec4 imageAtomicMin(IMAGE_PARAMS, f16vec4 data); |
| f16vec2 imageAtomicMax(IMAGE_PARAMS, f16vec2 data); |
| f16vec4 imageAtomicMax(IMAGE_PARAMS, f16vec4 data); |
| f16vec2 imageAtomicExchange(IMAGE_PARAMS, f16vec2 data); |
| f16vec4 imageAtomicExchange(IMAGE_PARAMS, f16vec4 data); |
| |
| Dependencies on OES_shader_image_atomic |
| |
| If implemented in OpenGL ES and OES_shader_image_atomic is not |
| supported, do not introduce additional imageAtomic* functions. |
| |
| Dependencies on NV_image_formats |
| |
| If implemented in OpenGL ES and NV_image_formats is not |
| supported, remove references to two-component images of format |
| <rg16f>. |
| |
| Dependencies on NV_shader_buffer_store and NV_gpu_shader5 |
| If NV_shader_buffer_store and NV_gpu_shader5 are supported, the following |
| functions should be added to the "Section 8.Y, Shader Memory Functions" |
| language in the NV_shader_buffer_store specification: |
| |
| // Computes a new value per-component using the specified operation |
| // Atomicity is only guaranteed on a per-component basis. |
| f16vec2 atomicAdd(f16vec2 *address, f16vec2 data); |
| f16vec4 atomicAdd(f16vec4 *address, f16vec4 data); |
| f16vec2 atomicMin(f16vec2 *address, f16vec2 data); |
| f16vec4 atomicMin(f16vec4 *address, f16vec4 data); |
| f16vec2 atomicMax(f16vec2 *address, f16vec2 data); |
| f16vec4 atomicMax(f16vec4 *address, f16vec4 data); |
| f16vec2 atomicExchange(f16vec2 *address, f16vec2 data); |
| f16vec4 atomicExchange(f16vec4 *address, f16vec4 data); |
| |
| Dependencies on NV_gpu_program5, NV_shader_buffer_store, and |
| NV_gpu_program5_mem_extended |
| |
| If NV_gpu_program5 is supported and "OPTION NV_shader_atomic_fp16_vector" |
| is specified in an assembly program, "F16X2" and "F16X4" should be allowed |
| as storage modifiers to the ATOM instruction for the atomic operations |
| "ADD", "MIN", "MAX" and "EXCH". These operate on each of the two or four |
| fp16 values independently. Atomicity is only guaranteed on a per-component |
| basis. |
| |
| (Add to "Section 2.X.6, Program Options" of the NV_gpu_program4 extension, |
| as extended by NV_gpu_program5:) |
| |
| + Floating-Point Vector Atomic Operations (NV_shader_atomic_fp16_vector) |
| |
| If a program specifies the "NV_shader_atomic_fp16_vector" option, it may |
| use the "F16X2" and "F16X4" storage modifiers with the "ATOM" opcodes to |
| perform atomic floating-point add or exchange operations. |
| |
| (Add to the table in "Section 2.X.8.Z, ATOM" in NV_gpu_program5:) |
| |
| atomic storage |
| modifier modifiers operation |
| -------- ------------------ -------------------------------------- |
| ADD U32, S32, U64, compute a sum |
| F16X2, F16X4 |
| MIN U32, S32, compute minimum |
| F16X2, F16X4 |
| MAX U32, S32, compute maximum |
| F16X2, F16X4 |
| EXCH U32, S32, F32 exchange memory with operand |
| F16X2, F16X4 |
| ... |
| |
| Dependencies on EXT_shader_image_load_store and NV_gpu_program5 |
| |
| If EXT_shader_image_load_store and NV_gpu_program5 are supported and |
| "OPTION NV_shader_atomic_fp16_vector" is specified in an assembly program, |
| "F16X2" and "F16X4" should be allowed as storage modifiers to the ATOMIM |
| instruction for the atomic operations "ADD", "MIN", "MAX", and "EXCH". |
| These operate on each of the two or four fp16 values independently. |
| Atomicity is only guaranteed on a per-component basis. |
| |
| (Add to the table in "Section 2.X.8.Z, ATOMIM" in the "Dependencies on |
| NV_gpu_program5" portion of the EXT_shader_image_load specification) |
| |
| atomic storage |
| modifier modifiers operation |
| -------- ------------- -------------------------------------- |
| ADD U32, S32, compute a sum |
| F16X2, F16X4 |
| MIN U32, S32, compute minimum |
| F16X2, F16X4 |
| MAX U32, S32, compute maximum |
| F16X2, F16X4 |
| EXCH U32, S32, F32 exchange memory with operand |
| F16X2, F16X4 |
| ... |
| |
| Dependencies on NV_compute_program5 |
| |
| If NV_compute_program5 is supported and "OPTION |
| NV_shader_atomic_fp16_vector" is specified in an assembly program, "F16X2" |
| and "F16X4" should be allowed as storage modifiers to the ATOMB instruction |
| for the atomic operations "ADD", "MIN", "MAX", and "EXCH". These operate on |
| each of the two or four fp16 values independently. Atomicity is only |
| guaranteed on a per-component basis. |
| |
| (Add to the table in "Section 2.X.8.Z, ATOMB" in the "Dependencies on |
| NV_gpu_program5" portion of the NV_shader_storage_buffer_object |
| specification) |
| |
| atomic storage |
| modifier modifiers operation |
| -------- ------------- -------------------------------------- |
| ADD U32, S32, U64 compute a sum |
| F32, F16X2, F16X4 |
| MIN U32, S32, compute minimum |
| F16X2, F16X4 |
| MAX U32, S32, compute maximum |
| F16X2, F16X4 |
| EXCH U32, S32, F32 exchange memory with operand |
| F16X2, F16X4 |
| ... |
| |
| Dependencies on NV_shader_storage_buffer_object |
| |
| If NV_shader_storage_buffer_object is supported and "OPTION |
| NV_shader_atomic_fp16_vector" is specified in an assembly program, "F16X2" |
| and "F16X4" should be allowed as storage modifiers to the ATOMS instruction |
| for the atomic operations "ADD", "MIN", "MAX", and "EXCH". These operate on |
| each of the two or four fp16 values independently. Atomicity is only |
| guaranteed on a per-component basis. |
| |
| (Add to the table in "Section 2.X.8.Z, ATOMS" in the "Dependencies on |
| NV_gpu_program5" portion of the NV_compute_program5 specification) |
| |
| atomic storage |
| modifier modifiers operation |
| -------- ------------- -------------------------------------- |
| ADD U32, S32, U64 compute a sum |
| F32, F16X2, F16X4 |
| MIN U32, S32, compute minimum |
| F16X2, F16X4 |
| MAX U32, S32, compute maximum |
| F16X2, F16X4 |
| EXCH U32, S32, F32 exchange memory with operand |
| F16X2, F16X4 |
| ... |
| |
| |
| Errors |
| |
| None. |
| |
| New State |
| |
| None. |
| |
| New Implementation Dependent State |
| |
| None. |
| |
| Issues |
| |
| (1) Should we allow "partial" atomics to a f16vec2 or f16vec4, only |
| modifying some of the components? |
| |
| RESOLVED: No. If an app really cares to do this, they could inject |
| "special" values in those components that cause the atomic to have no |
| effect for that component (e.g. add zero, max with -infinity, etc). This |
| would work for atomicAdd, atomicMin, and atomicMax, but not for |
| atomicExchange. |
| |
| (2) Are these vector atomics guaranteed to update all components of the |
| vector atomically? |
| |
| RESOLVED: No. The spec only guarantees that individual components of a |
| vector be updated atomically. The initial implementation of this |
| extension will only atomically update pairs of components. For many of |
| the algorithms supported by this extension (computing component-wise sums, |
| minimums, or maximums of multi-component vectors), it is not necessary to |
| update all components in a vector as a single unit. |
| |
| (3) What support should we provide for four-component vectors? |
| |
| RESOLVED: All of image, global, buffer, and shared memory atomic |
| operations will fully support two- and four-component variants. While one |
| might emulate some four-component atomic operations using pairs of |
| two-component operations, we choose to support four-component operations |
| universally. Supporting atomics on four-component vectors seems useful, |
| as it supports computing sums, minimums, or maximums on RGBA color values |
| and other data with more than two components. |
| |
| Revision History |
| |
| Revision 2 |
| - Add OpenGL ES interactions |
| Revision 1 |
| - Internal revisions. |