extensions/NV/NV_shader_atomic_fp16_vector.txt - external/github.com/KhronosGroup/OpenGL-Registry - Git at Google

 Name

     NV_shader_atomic_fp16_vector

 Name Strings

     GL_NV_shader_atomic_fp16_vector

 Contact

     Jeff Bolz, NVIDIA Corporation (jbolz 'at' nvidia.com)

 Contributors

     Pat Brown, NVIDIA
     Mathias Heyer, NVIDIA

 Status

     Shipping

 Version

     Last Modified Date:         February 4, 2015
     NVIDIA Revision:            3

 Number

     OpenGL Extension #474
     OpenGL ES Extension #261

 Dependencies

     This extension is written against the OpenGL 4.3 (Compatibility Profile)
     Specification.

     This extension is written against version 4.30 of the OpenGL Shading
     Language Specification.

     This extension interacts with NV_shader_buffer_store and NV_gpu_shader5.

     This extension interacts with NV_gpu_program5, NV_shader_buffer_store, and
     NV_gpu_program5_mem_extended.

     This extension requires NV_gpu_shader5.

     This extension interacts with NV_shader_storage_buffer_object.

     This extension interacts with NV_compute_program5.

     This extension interacts with NV_image_formats.

     This extension interacts with OES_shader_image_atomic.

 Overview

     This extension provides GLSL built-in functions and assembly opcodes
     allowing shaders to perform a limited set of atomic read-modify-write
     operations to buffer or texture memory with 16-bit floating point vector
     surface formats.

 New Procedures and Functions

     None.

 New Tokens

     None.

 Additions to the AGL/GLX/WGL Specifications

     None.

 GLX Protocol

     None.

 Modifications to the OpenGL Shading Language Specification, Version 4.30

     Including the following line in a shader can be used to control the
     language features described in this extension:

       #extension GL_NV_shader_atomic_fp16_vector : <behavior>

     where <behavior> is as specified in section 3.3.

     New preprocessor #defines are added to the OpenGL Shading Language:

       #define GL_NV_shader_atomic_fp16_vector         1

     Modify Section 8.11, Atomic Memory Functions (p. 163)

     Add before the table of functions:

     Some atomic memory operations are supported on two- and four-component
     vectors with 16-bit floating-point components.

     Add new functions to the table

         // Computes a new value per-component using the specified operation.
         // Atomicity is only guaranteed on a per-component basis.
         f16vec2 atomicAdd(inout f16vec2 mem, f16vec2 data);
         f16vec4 atomicAdd(inout f16vec4 mem, f16vec4 data);
         f16vec2 atomicMin(inout f16vec2 mem, f16vec2 data);
         f16vec4 atomicMin(inout f16vec4 mem, f16vec4 data);
         f16vec2 atomicMax(inout f16vec2 mem, f16vec2 data);
         f16vec4 atomicMax(inout f16vec4 mem, f16vec4 data);
         f16vec2 atomicExchange(inout f16vec2 mem, f16vec2 data);
         f16vec4 atomicExchange(inout f16vec4 mem, f16vec4 data);


     Modify Section 8.12, Image Functions (p. 164)

     Add before the table of functions:

     Some atomic memory operations are supported on two- and four-component
     vectors with 16-bit floating-point components, for images with format
     qualifiers of <rg16f> and <rgba16f>.

     Add new functions to the table:

         // Computes a new value per-component using the specified operation
         // Atomicity is only guaranteed on a per-component basis.
         f16vec2 imageAtomicAdd(IMAGE_PARAMS, f16vec2 data);
         f16vec4 imageAtomicAdd(IMAGE_PARAMS, f16vec4 data);
         f16vec2 imageAtomicMin(IMAGE_PARAMS, f16vec2 data);
         f16vec4 imageAtomicMin(IMAGE_PARAMS, f16vec4 data);
         f16vec2 imageAtomicMax(IMAGE_PARAMS, f16vec2 data);
         f16vec4 imageAtomicMax(IMAGE_PARAMS, f16vec4 data);
         f16vec2 imageAtomicExchange(IMAGE_PARAMS, f16vec2 data);
         f16vec4 imageAtomicExchange(IMAGE_PARAMS, f16vec4 data);

 Dependencies on OES_shader_image_atomic

     If implemented in OpenGL ES and OES_shader_image_atomic is not
     supported, do not introduce additional imageAtomic* functions.

 Dependencies on NV_image_formats

     If implemented in OpenGL ES and NV_image_formats is not
     supported, remove references to two-component images of format
     <rg16f>.

 Dependencies on NV_shader_buffer_store and NV_gpu_shader5
     If NV_shader_buffer_store and NV_gpu_shader5 are supported, the following
     functions should be added to the "Section 8.Y, Shader Memory Functions"
     language in the NV_shader_buffer_store specification:

       // Computes a new value per-component using the specified operation
       // Atomicity is only guaranteed on a per-component basis.
       f16vec2 atomicAdd(f16vec2 *address, f16vec2 data);
       f16vec4 atomicAdd(f16vec4 *address, f16vec4 data);
       f16vec2 atomicMin(f16vec2 *address, f16vec2 data);
       f16vec4 atomicMin(f16vec4 *address, f16vec4 data);
       f16vec2 atomicMax(f16vec2 *address, f16vec2 data);
       f16vec4 atomicMax(f16vec4 *address, f16vec4 data);
       f16vec2 atomicExchange(f16vec2 *address, f16vec2 data);
       f16vec4 atomicExchange(f16vec4 *address, f16vec4 data);

 Dependencies on NV_gpu_program5, NV_shader_buffer_store, and
 NV_gpu_program5_mem_extended

     If NV_gpu_program5 is supported and "OPTION NV_shader_atomic_fp16_vector"
     is specified in an assembly program, "F16X2" and "F16X4" should be allowed
     as storage modifiers to the ATOM instruction for the atomic operations
     "ADD", "MIN", "MAX" and "EXCH". These operate on each of the two or four
     fp16 values independently. Atomicity is only guaranteed on a per-component
     basis.

     (Add to "Section 2.X.6, Program Options" of the NV_gpu_program4 extension,
     as extended by NV_gpu_program5:)

       + Floating-Point Vector Atomic Operations (NV_shader_atomic_fp16_vector)

       If a program specifies the "NV_shader_atomic_fp16_vector" option, it may
       use the "F16X2" and "F16X4" storage modifiers with the "ATOM" opcodes to
       perform atomic floating-point add or exchange operations.

     (Add to the table in "Section 2.X.8.Z, ATOM" in NV_gpu_program5:)

       atomic     storage
       modifier   modifiers            operation
       --------   ------------------   --------------------------------------
        ADD       U32, S32, U64,       compute a sum
                  F16X2, F16X4
        MIN       U32, S32,            compute minimum
                  F16X2, F16X4
        MAX       U32, S32,            compute maximum
                  F16X2, F16X4
        EXCH      U32, S32, F32        exchange memory with operand
                  F16X2, F16X4
        ...

 Dependencies on EXT_shader_image_load_store and NV_gpu_program5

     If EXT_shader_image_load_store and NV_gpu_program5 are supported and
     "OPTION NV_shader_atomic_fp16_vector" is specified in an assembly program,
     "F16X2" and "F16X4" should be allowed as storage modifiers to the ATOMIM
     instruction for the atomic operations "ADD", "MIN", "MAX", and "EXCH".
     These operate on each of the two or four fp16 values independently.
     Atomicity is only guaranteed on a per-component basis.

     (Add to the table in "Section 2.X.8.Z, ATOMIM" in the "Dependencies on
     NV_gpu_program5" portion of the EXT_shader_image_load specification)

       atomic     storage
       modifier   modifiers       operation
       --------   -------------   --------------------------------------
        ADD       U32, S32,       compute a sum
                  F16X2, F16X4
        MIN       U32, S32,       compute minimum
                  F16X2, F16X4
        MAX       U32, S32,       compute maximum
                  F16X2, F16X4
        EXCH      U32, S32, F32   exchange memory with operand
                  F16X2, F16X4
        ...

 Dependencies on NV_compute_program5

     If NV_compute_program5 is supported and "OPTION
     NV_shader_atomic_fp16_vector" is specified in an assembly program, "F16X2"
     and "F16X4" should be allowed as storage modifiers to the ATOMB instruction
     for the atomic operations "ADD", "MIN", "MAX", and "EXCH". These operate on
     each of the two or four fp16 values independently. Atomicity is only
     guaranteed on a per-component basis.

     (Add to the table in "Section 2.X.8.Z, ATOMB" in the "Dependencies on
     NV_gpu_program5" portion of the NV_shader_storage_buffer_object
     specification)

       atomic     storage
       modifier   modifiers          operation
       --------   -------------      --------------------------------------
        ADD       U32, S32, U64      compute a sum
                  F32, F16X2, F16X4
        MIN       U32, S32,          compute minimum
                  F16X2, F16X4
        MAX       U32, S32,          compute maximum
                  F16X2, F16X4
        EXCH      U32, S32, F32      exchange memory with operand
                  F16X2, F16X4
        ...

 Dependencies on NV_shader_storage_buffer_object

     If NV_shader_storage_buffer_object is supported and "OPTION
     NV_shader_atomic_fp16_vector" is specified in an assembly program, "F16X2"
     and "F16X4" should be allowed as storage modifiers to the ATOMS instruction
     for the atomic operations "ADD", "MIN", "MAX", and "EXCH". These operate on
     each of the two or four fp16 values independently. Atomicity is only
     guaranteed on a per-component basis.

     (Add to the table in "Section 2.X.8.Z, ATOMS" in the "Dependencies on
     NV_gpu_program5" portion of the NV_compute_program5 specification)

       atomic     storage
       modifier   modifiers          operation
       --------   -------------      --------------------------------------
        ADD       U32, S32, U64      compute a sum
                  F32, F16X2, F16X4
        MIN       U32, S32,          compute minimum
                  F16X2, F16X4
        MAX       U32, S32,          compute maximum
                  F16X2, F16X4
        EXCH      U32, S32, F32      exchange memory with operand
                  F16X2, F16X4
        ...


 Errors

     None.

 New State

     None.

 New Implementation Dependent State

     None.

 Issues

     (1) Should we allow "partial" atomics to a f16vec2 or f16vec4, only
     modifying some of the components?

     RESOLVED: No. If an app really cares to do this, they could inject
     "special" values in those components that cause the atomic to have no
     effect for that component (e.g. add zero, max with -infinity, etc).  This
     would work for atomicAdd, atomicMin, and atomicMax, but not for
     atomicExchange.

     (2) Are these vector atomics guaranteed to update all components of the
     vector atomically?

     RESOLVED:  No.  The spec only guarantees that individual components of a
     vector be updated atomically.  The initial implementation of this
     extension will only atomically update pairs of components.  For many of
     the algorithms supported by this extension (computing component-wise sums,
     minimums, or maximums of multi-component vectors), it is not necessary to
     update all components in a vector as a single unit.

     (3) What support should we provide for four-component vectors?

     RESOLVED:  All of image, global, buffer, and shared memory atomic
     operations will fully support two- and four-component variants.  While one
     might emulate some four-component atomic operations using pairs of
     two-component operations, we choose to support four-component operations
     universally.  Supporting atomics on four-component vectors seems useful,
     as it supports computing sums, minimums, or maximums on RGBA color values
     and other data with more than two components.

 Revision History

     Revision 2
     - Add OpenGL ES interactions
     Revision 1
     - Internal revisions.
	Name

	NV_shader_atomic_fp16_vector

	Name Strings

	GL_NV_shader_atomic_fp16_vector

	Contact

	Jeff Bolz, NVIDIA Corporation (jbolz 'at' nvidia.com)

	Contributors

	Pat Brown, NVIDIA
	Mathias Heyer, NVIDIA

	Status

	Shipping

	Version

	Last Modified Date: February 4, 2015
	NVIDIA Revision: 3

	Number

	OpenGL Extension #474
	OpenGL ES Extension #261

	Dependencies

	This extension is written against the OpenGL 4.3 (Compatibility Profile)
	Specification.

	This extension is written against version 4.30 of the OpenGL Shading
	Language Specification.

	This extension interacts with NV_shader_buffer_store and NV_gpu_shader5.

	This extension interacts with NV_gpu_program5, NV_shader_buffer_store, and
	NV_gpu_program5_mem_extended.

	This extension requires NV_gpu_shader5.

	This extension interacts with NV_shader_storage_buffer_object.

	This extension interacts with NV_compute_program5.

	This extension interacts with NV_image_formats.

	This extension interacts with OES_shader_image_atomic.

	Overview

	This extension provides GLSL built-in functions and assembly opcodes
	allowing shaders to perform a limited set of atomic read-modify-write
	operations to buffer or texture memory with 16-bit floating point vector
	surface formats.

	New Procedures and Functions

	None.

	New Tokens

	None.

	Additions to the AGL/GLX/WGL Specifications

	None.

	GLX Protocol

	None.

	Modifications to the OpenGL Shading Language Specification, Version 4.30

	Including the following line in a shader can be used to control the
	language features described in this extension:

	#extension GL_NV_shader_atomic_fp16_vector : <behavior>

	where <behavior> is as specified in section 3.3.

	New preprocessor #defines are added to the OpenGL Shading Language:

	#define GL_NV_shader_atomic_fp16_vector 1

	Modify Section 8.11, Atomic Memory Functions (p. 163)

	Add before the table of functions:

	Some atomic memory operations are supported on two- and four-component
	vectors with 16-bit floating-point components.

	Add new functions to the table

	// Computes a new value per-component using the specified operation.
	// Atomicity is only guaranteed on a per-component basis.
	f16vec2 atomicAdd(inout f16vec2 mem, f16vec2 data);
	f16vec4 atomicAdd(inout f16vec4 mem, f16vec4 data);
	f16vec2 atomicMin(inout f16vec2 mem, f16vec2 data);
	f16vec4 atomicMin(inout f16vec4 mem, f16vec4 data);
	f16vec2 atomicMax(inout f16vec2 mem, f16vec2 data);
	f16vec4 atomicMax(inout f16vec4 mem, f16vec4 data);
	f16vec2 atomicExchange(inout f16vec2 mem, f16vec2 data);
	f16vec4 atomicExchange(inout f16vec4 mem, f16vec4 data);


	Modify Section 8.12, Image Functions (p. 164)

	Add before the table of functions:

	Some atomic memory operations are supported on two- and four-component
	vectors with 16-bit floating-point components, for images with format
	qualifiers of <rg16f> and <rgba16f>.

	Add new functions to the table:

	// Computes a new value per-component using the specified operation
	// Atomicity is only guaranteed on a per-component basis.
	f16vec2 imageAtomicAdd(IMAGE_PARAMS, f16vec2 data);
	f16vec4 imageAtomicAdd(IMAGE_PARAMS, f16vec4 data);
	f16vec2 imageAtomicMin(IMAGE_PARAMS, f16vec2 data);
	f16vec4 imageAtomicMin(IMAGE_PARAMS, f16vec4 data);
	f16vec2 imageAtomicMax(IMAGE_PARAMS, f16vec2 data);
	f16vec4 imageAtomicMax(IMAGE_PARAMS, f16vec4 data);
	f16vec2 imageAtomicExchange(IMAGE_PARAMS, f16vec2 data);
	f16vec4 imageAtomicExchange(IMAGE_PARAMS, f16vec4 data);

	Dependencies on OES_shader_image_atomic

	If implemented in OpenGL ES and OES_shader_image_atomic is not
	supported, do not introduce additional imageAtomic* functions.

	Dependencies on NV_image_formats

	If implemented in OpenGL ES and NV_image_formats is not
	supported, remove references to two-component images of format
	<rg16f>.

	Dependencies on NV_shader_buffer_store and NV_gpu_shader5
	If NV_shader_buffer_store and NV_gpu_shader5 are supported, the following
	functions should be added to the "Section 8.Y, Shader Memory Functions"
	language in the NV_shader_buffer_store specification:

	// Computes a new value per-component using the specified operation
	// Atomicity is only guaranteed on a per-component basis.
	f16vec2 atomicAdd(f16vec2 *address, f16vec2 data);
	f16vec4 atomicAdd(f16vec4 *address, f16vec4 data);
	f16vec2 atomicMin(f16vec2 *address, f16vec2 data);
	f16vec4 atomicMin(f16vec4 *address, f16vec4 data);
	f16vec2 atomicMax(f16vec2 *address, f16vec2 data);
	f16vec4 atomicMax(f16vec4 *address, f16vec4 data);
	f16vec2 atomicExchange(f16vec2 *address, f16vec2 data);
	f16vec4 atomicExchange(f16vec4 *address, f16vec4 data);

	Dependencies on NV_gpu_program5, NV_shader_buffer_store, and
	NV_gpu_program5_mem_extended

	If NV_gpu_program5 is supported and "OPTION NV_shader_atomic_fp16_vector"
	is specified in an assembly program, "F16X2" and "F16X4" should be allowed
	as storage modifiers to the ATOM instruction for the atomic operations
	"ADD", "MIN", "MAX" and "EXCH". These operate on each of the two or four
	fp16 values independently. Atomicity is only guaranteed on a per-component
	basis.

	(Add to "Section 2.X.6, Program Options" of the NV_gpu_program4 extension,
	as extended by NV_gpu_program5:)

	+ Floating-Point Vector Atomic Operations (NV_shader_atomic_fp16_vector)

	If a program specifies the "NV_shader_atomic_fp16_vector" option, it may
	use the "F16X2" and "F16X4" storage modifiers with the "ATOM" opcodes to
	perform atomic floating-point add or exchange operations.

	(Add to the table in "Section 2.X.8.Z, ATOM" in NV_gpu_program5:)

	atomic storage
	modifier modifiers operation
	-------- ------------------ --------------------------------------
	ADD U32, S32, U64, compute a sum
	F16X2, F16X4
	MIN U32, S32, compute minimum
	F16X2, F16X4
	MAX U32, S32, compute maximum
	F16X2, F16X4
	EXCH U32, S32, F32 exchange memory with operand
	F16X2, F16X4
	...

	Dependencies on EXT_shader_image_load_store and NV_gpu_program5

	If EXT_shader_image_load_store and NV_gpu_program5 are supported and
	"OPTION NV_shader_atomic_fp16_vector" is specified in an assembly program,
	"F16X2" and "F16X4" should be allowed as storage modifiers to the ATOMIM
	instruction for the atomic operations "ADD", "MIN", "MAX", and "EXCH".
	These operate on each of the two or four fp16 values independently.
	Atomicity is only guaranteed on a per-component basis.

	(Add to the table in "Section 2.X.8.Z, ATOMIM" in the "Dependencies on
	NV_gpu_program5" portion of the EXT_shader_image_load specification)

	atomic storage
	modifier modifiers operation
	-------- ------------- --------------------------------------
	ADD U32, S32, compute a sum
	F16X2, F16X4
	MIN U32, S32, compute minimum
	F16X2, F16X4
	MAX U32, S32, compute maximum
	F16X2, F16X4
	EXCH U32, S32, F32 exchange memory with operand
	F16X2, F16X4
	...

	Dependencies on NV_compute_program5

	If NV_compute_program5 is supported and "OPTION
	NV_shader_atomic_fp16_vector" is specified in an assembly program, "F16X2"
	and "F16X4" should be allowed as storage modifiers to the ATOMB instruction
	for the atomic operations "ADD", "MIN", "MAX", and "EXCH". These operate on
	each of the two or four fp16 values independently. Atomicity is only
	guaranteed on a per-component basis.

	(Add to the table in "Section 2.X.8.Z, ATOMB" in the "Dependencies on
	NV_gpu_program5" portion of the NV_shader_storage_buffer_object
	specification)

	atomic storage
	modifier modifiers operation
	-------- ------------- --------------------------------------
	ADD U32, S32, U64 compute a sum
	F32, F16X2, F16X4
	MIN U32, S32, compute minimum
	F16X2, F16X4
	MAX U32, S32, compute maximum
	F16X2, F16X4
	EXCH U32, S32, F32 exchange memory with operand
	F16X2, F16X4
	...

	Dependencies on NV_shader_storage_buffer_object

	If NV_shader_storage_buffer_object is supported and "OPTION
	NV_shader_atomic_fp16_vector" is specified in an assembly program, "F16X2"
	and "F16X4" should be allowed as storage modifiers to the ATOMS instruction
	for the atomic operations "ADD", "MIN", "MAX", and "EXCH". These operate on
	each of the two or four fp16 values independently. Atomicity is only
	guaranteed on a per-component basis.

	(Add to the table in "Section 2.X.8.Z, ATOMS" in the "Dependencies on
	NV_gpu_program5" portion of the NV_compute_program5 specification)

	atomic storage
	modifier modifiers operation
	-------- ------------- --------------------------------------
	ADD U32, S32, U64 compute a sum
	F32, F16X2, F16X4
	MIN U32, S32, compute minimum
	F16X2, F16X4
	MAX U32, S32, compute maximum
	F16X2, F16X4
	EXCH U32, S32, F32 exchange memory with operand
	F16X2, F16X4
	...


	Errors

	None.

	New State

	None.

	New Implementation Dependent State

	None.

	Issues

	(1) Should we allow "partial" atomics to a f16vec2 or f16vec4, only
	modifying some of the components?

	RESOLVED: No. If an app really cares to do this, they could inject
	"special" values in those components that cause the atomic to have no
	effect for that component (e.g. add zero, max with -infinity, etc). This
	would work for atomicAdd, atomicMin, and atomicMax, but not for
	atomicExchange.

	(2) Are these vector atomics guaranteed to update all components of the
	vector atomically?

	RESOLVED: No. The spec only guarantees that individual components of a
	vector be updated atomically. The initial implementation of this
	extension will only atomically update pairs of components. For many of
	the algorithms supported by this extension (computing component-wise sums,
	minimums, or maximums of multi-component vectors), it is not necessary to
	update all components in a vector as a single unit.

	(3) What support should we provide for four-component vectors?

	RESOLVED: All of image, global, buffer, and shared memory atomic
	operations will fully support two- and four-component variants. While one
	might emulate some four-component atomic operations using pairs of
	two-component operations, we choose to support four-component operations
	universally. Supporting atomics on four-component vectors seems useful,
	as it supports computing sums, minimums, or maximums on RGBA color values
	and other data with more than two components.

	Revision History

	Revision 2
	- Add OpenGL ES interactions
	Revision 1
	- Internal revisions.