| Name |
| |
| NV_shader_buffer_store |
| |
| Name Strings |
| |
| none (implied by GL_NV_gpu_program5 or GL_NV_gpu_shader5) |
| |
| Contact |
| |
| Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com) |
| |
| Status |
| |
| Shipping. |
| |
| Version |
| |
| Last Modified Date: August 13, 2012 |
| NVIDIA Revision: 5 |
| |
| Number |
| |
| 390 |
| |
| Dependencies |
| |
| OpenGL 3.0 and GLSL 1.30 are required. |
| |
| This extension is written against the OpenGL 3.2 (Compatibility Profile) |
| specification, dated July 24, 2009. |
| |
| This extension is written against version 1.50.09 of the OpenGL Shading |
| Language Specification. |
| |
| OpenGL 3.0 and GLSL 1.30 are required. |
| |
| NV_shader_buffer_load is required. |
| |
| NV_gpu_program5 and/or NV_gpu_shader5 is required. |
| |
| This extension interacts with EXT_shader_image_load_store. |
| |
| This extension interacts with NV_gpu_shader5. |
| |
| This extension interacts with NV_gpu_program5. |
| |
| This extension interacts with GLSL 4.30, ARB_shader_storage_buffer_object, |
| and ARB_compute_shader. |
| |
| Overview |
| |
| This extension builds upon the mechanisms added by the |
| NV_shader_buffer_load extension to allow shaders to perform random-access |
| reads to buffer object memory without using dedicated buffer object |
| binding points. Instead, it allowed an application to make a buffer |
| object resident, query a GPU address (pointer) for the buffer object, and |
| then use that address as a pointer in shader code. This approach allows |
| shaders to access a large number of buffer objects without needing to |
| repeatedly bind buffers to a limited number of fixed-functionality binding |
| points. |
| |
| This extension lifts the restriction from NV_shader_buffer_load that |
| disallows writes. In particular, the MakeBufferResidentNV function now |
| allows READ_WRITE and WRITE_ONLY access modes, and the shading language is |
| extended to allow shaders to write through (GPU address) pointers. |
| Additionally, the extension provides built-in functions to perform atomic |
| memory transactions to buffer object memory. |
| |
| As with the shader writes provided by the EXT_shader_image_load_store |
| extension, writes to buffer object memory using this extension are weakly |
| ordered to allow for parallel or distributed shader execution. The |
| EXT_shader_image_load_store extension provides mechanisms allowing for |
| finer control of memory transaction order, and those mechanisms apply |
| equally to buffer object stores using this extension. |
| |
| |
| New Procedures and Functions |
| |
| None. |
| |
| New Tokens |
| |
| Accepted by the <barriers> parameter of MemoryBarrierNV: |
| |
| SHADER_GLOBAL_ACCESS_BARRIER_BIT_NV 0x00000010 |
| |
| Accepted by the <access> parameter of MakeBufferResidentNV: |
| |
| READ_WRITE |
| WRITE_ONLY |
| |
| |
| Additions to Chapter 2 of the OpenGL 3.2 (Compatibility Profile) Specification |
| (OpenGL Operation) |
| |
| Modify Section 2.9, Buffer Objects, p. 46 |
| |
| (extend the language inserted by NV_shader_buffer_load in its "Append to |
| Section 2.9 (p. 45) to allow READ_WRITE and WRITE_ONLY mappings) |
| |
| The data store of a buffer object may be made accessible to the GL |
| via shader buffer loads and stores by calling: |
| |
| void MakeBufferResidentNV(enum target, enum access); |
| |
| <access> may be READ_ONLY, READ_WRITE, and WRITE_ONLY. If a shader loads |
| from a buffer with WRITE_ONLY <access> or stores to a buffer with |
| READ_ONLY <access>, the results of that shader operation are undefined and |
| may lead to application termination. <target> may be any of the buffer |
| targets accepted by BindBuffer. |
| |
| The data store of a buffer object may be made inaccessible to the GL |
| via shader buffer loads and stores by calling: |
| |
| void MakeBufferNonResidentNV(enum target); |
| |
| |
| Modify "Section 2.20.X, Shader Memory Access" introduced by the |
| NV_shader_buffer_load specification, to reflect that shaders may store to |
| buffer object memory. |
| |
| (first paragraph) Shaders may load from or store to buffer object memory |
| by dereferencing pointer variables. ... |
| |
| (second paragraph) When a shader dereferences a pointer variable, data are |
| read from or written to buffer object memory according to the following |
| rules: |
| |
| (modify the paragraph after the end of the alignment and stride rules, |
| allowing for writes, and also providing rules forbidding reads to |
| WRITE_ONLY mappings or vice-versa) If a shader reads or writes to a GPU |
| memory address that does not correspond to a buffer object made resident |
| by MakeBufferResidentNV, the results of the operation are undefined and |
| may result in application termination. If a shader reads from a buffer |
| object made resident with an <access> parameter of WRITE_ONLY, or writes |
| to a buffer object made resident with an <access> parameter of READ_ONLY, |
| the results of the operation are also undefined and may lead to |
| application termination. |
| |
| Incorporate the contents of "Section 2.14.X, Shader Memory Access" from |
| the EXT_shader_image_load_store specification into the same "Shader memory |
| Access", with the following edits. |
| |
| (modify first paragraph to reference pointers) Shaders may perform |
| random-access reads and writes to texture or buffer object memory using |
| pointers or with built-in image load, store, and atomic functions, as |
| described in the OpenGL Shading Language Specification. ... |
| |
| (add to list of bits in <barriers> in MemoryBarrierNV) |
| |
| - SHADER_GLOBAL_ACCESS_BARRIER_BIT_NV: Memory accesses using pointers and |
| assembly program global loads, stores, and atomics issued after the |
| barrier will reflect data written by shaders prior to the barrier. |
| Additionally, memory writes using pointers issued after the barrier |
| will not execute until memory accesses (loads, stores, texture |
| fetches, vertex fetches, etc) initiated prior to the barrier complete. |
| |
| (modify second paragraph after the list of <barriers> bits) To allow for |
| independent shader threads to communicate by reads and writes to a common |
| memory address, pointers and image variables in the OpenGL shading |
| language may be declared as "coherent". Buffer object or texture memory |
| accessed through such variables may be cached only if... |
| |
| (add to the coherency guidelines) |
| |
| - Data written using pointers in one rendering pass and read by the shader |
| in a later pass need not use coherent variables or memoryBarrier(). |
| Calling MemoryBarrierNV() with the SHADER_GLOBAL_ACCESS_BARRIER_BIT_NV |
| set in <barriers> between passes is necessary. |
| |
| |
| Additions to Chapter 3 of the OpenGL 3.2 (Compatibility Profile) Specification |
| (Rasterization) |
| |
| None. |
| |
| |
| Additions to Chapter 4 of the OpenGL 3.2 (Compatibility Profile) Specification |
| (Per-Fragment Operations and the Frame Buffer) |
| |
| None. |
| |
| |
| Additions to Chapter 5 of the OpenGL 3.2 (Compatibility Profile) Specification |
| (Special Functions) |
| |
| None. |
| |
| |
| Additions to Chapter 6 of the OpenGL 3.2 (Compatibility Profile) Specification |
| (State and State Requests) |
| |
| None. |
| |
| |
| Additions to Appendix A of the OpenGL 3.2 (Compatibility Profile) |
| Specification (Invariance) |
| |
| None. |
| |
| Additions to the AGL/GLX/WGL Specifications |
| |
| None. |
| |
| GLX Protocol |
| |
| None. |
| |
| |
| Additions to the OpenGL Shading Language Specification, Version 1.50 (Revision |
| 09) |
| |
| Modify Section 4.3.X, Memory Access Qualifiers, as added by |
| EXT_shader_image_load_store |
| |
| (modify second paragraph) Memory accesses to image and pointer variables |
| declared using the "coherent" storage qualifier are performed coherently |
| with similar accesses from other shader threads. ... |
| |
| (modify fourth paragraph) Memory accesses to image and pointer variables |
| declared using the "volatile" storage qualifier must treat the underlying |
| memory as though it could be read or written at any point during shader |
| execution by some source other than the executing thread. ... |
| |
| (modify fifth paragraph) Memory accesses to image and pointer variables |
| declared using the "restrict" storage qualifier may be compiled assuming |
| that the variable used to perform the memory access is the only way to |
| access the underlying memory using the shader stage in question. ... |
| |
| (modify sixth paragraph) Memory accesses to image and pointer variables |
| declared using the "const" storage qualifier may only read the underlying |
| memory, which is treated as read-only. ... |
| |
| (insert after seventh paragraph) |
| |
| In pointer variable declarations, the "coherent", "volatile", "restrict", |
| and "const" qualifiers can be positioned anywhere in the declaration, and |
| may apply qualify either a pointer or the underlying data being pointed |
| to, depending on its position in the declaration. Each qualifier to the |
| right of the basic data type in a declaration is considered to apply to |
| whatever type is found immediately to its left; qualifiers to the left of |
| the basic type are considered to apply to that basic type. To interpret |
| the meaning of qualifiers in pointer declarations, it is useful to read |
| the declaration from right to left as in the following examples. |
| |
| int * * const a; // a is a constant pointer to a pointer to int |
| int * volatile * b; // b is a pointer to a volatile pointer to int |
| int const * * c; // c is a pointer to a pointer to a constant int |
| const int * * d; // d is like c |
| int const * const * // e is a constant pointer to a constant pointer |
| const e; // to a constant int |
| |
| For pointer types, the "restrict" qualifier can be used to qualify |
| pointers, but not non-pointer types being pointed to. |
| |
| int * restrict a; // a is a restricted pointer to int |
| int restrict * b; // b qualifies "int" as restricted - illegal |
| |
| (modify eighth paragraph) The "coherent", "volatile", and "restrict" |
| storage qualifiers may only be used on image and pointer variables, and |
| may not be used on variables of any other type. ... |
| |
| (modify last paragraph) The values of image and pointer variables |
| qualified with "coherent," "volatile," "restrict", or "const" may not be |
| assigned to function parameters or l-values lacking such qualifiers. |
| |
| (add examples for the last paragraph) |
| |
| int volatile * var1; |
| int * var2; |
| int * restrict var3; |
| var1 = var2; // OK, adding "volatile" is allowed |
| var2 = var3; // illegal, stripping "restrict" is not |
| |
| |
| Modify Section 5.X, Pointer Operations, as added by NV_shader_buffer_load |
| |
| (modify second paragraph, allowing storing through pointers) The pointer |
| dereference operator ... The result of a pointer dereference may be used |
| as the left-hand side of an assignment. |
| |
| |
| Modify Section 8.Y, Shader Memory Functions, as added by |
| EXT_shader_image_load_store |
| |
| (modify first paragraph) Shaders of all types may read and write the |
| contents of textures and buffer objects using pointers and image |
| variables. ... |
| |
| (modify description of memoryBarrier) memoryBarrier() can be used to |
| control the ordering of memory transactions issued by a shader thread. |
| When called, it will wait on the completion of all memory accesses |
| resulting from the use of pointers and image variables prior to calling |
| the function. ... |
| |
| (add the following paragraphs to the end of the section) |
| |
| If multiple threads need to atomically access shared memory addresses |
| using pointers, they may do so using the following built-in functions. |
| The following atomic memory access functions allow a shader thread to |
| read, modify, and write an address in memory in a manner that guarantees |
| that no other shader thread can modify the memory between the read and the |
| write. All of these functions read a single data element from memory, |
| compute a new value based on the value read from memory and one or more |
| other values passed to the function, and writes the result back to the |
| same memory address. The value returned to the caller is always the data |
| element originally read from memory. |
| |
| Syntax: |
| |
| uint atomicAdd(uint *address, uint data); |
| int atomicAdd(int *address, int data); |
| uint64_t atomicAdd(uint64_t *address, uint64_t data); |
| |
| uint atomicMin(uint *address, uint data); |
| int atomicMin(int *address, int data); |
| |
| uint atomicMax(uint *address, uint data); |
| int atomicMax(int *address, int data); |
| |
| uint atomicIncWrap(uint *address, uint wrap); |
| |
| uint atomicDecWrap(uint *address, uint wrap); |
| |
| uint atomicAnd(uint *address, uint data); |
| int atomicAnd(int *address, int data); |
| |
| uint atomicOr(uint *address, uint data); |
| int atomicOr(int *address, int data); |
| |
| uint atomicXor(uint *address, uint data); |
| int atomicXor(int *address, int data); |
| |
| uint atomicExchange(uint *address, uint data); |
| int atomicExchange(int *address, uint data); |
| uint64_t atomicExchange(uint64_t *address, uint64_t data); |
| |
| uint atomicCompSwap(uint *address, uint compare, uint data); |
| int atomicCompSwap(int *address, int compare, int data); |
| uint64_t atomicCompSwap(uint64_t *address, uint64_t compare, |
| uint64_t data); |
| |
| Description: |
| |
| atomicAdd() computes the new value written to <address> by adding the |
| value of <data> to the contents of <address>. This function supports 32- |
| and 64-bit unsigned integer operands, and 32-bit signed integer operands. |
| |
| atomicMin() computes the new value written to <address> by taking the |
| minimum of the value of <data> and the contents of <address>. This |
| function supports 32-bit signed and unsigned integer operands. |
| |
| atomicMax() computes the new value written to <address> by taking the |
| maximum of the value of <data> and the contents of <address>. This |
| function supports 32-bit signed and unsigned integer operands. |
| |
| atomicIncWrap() computes the new value written to <address> by adding one |
| to the contents of <address>, and then forcing the result to zero if and |
| only if the incremented value is greater than or equal to <wrap>. This |
| function supports only 32-bit unsigned integer operands. |
| |
| atomicDecWrap() computes the new value written to <address> by subtracting |
| one from the contents of <address>, and then forcing the result to |
| <wrap>-1 if the original value read from <address> was either zero or |
| greater than <wrap>. This function supports only 32-bit unsigned integer |
| operands. |
| |
| atomicAnd() computes the new value written to <address> by performing a |
| bitwise and of the value of <data> and the contents of <address>. This |
| function supports 32-bit signed and unsigned integer operands. |
| |
| atomicOr() computes the new value written to <address> by performing a |
| bitwise or of the value of <data> and the contents of <address>. This |
| function supports 32-bit signed and unsigned integer operands. |
| |
| atomicXor() computes the new value written to <address> by performing a |
| bitwise exclusive or of the value of <data> and the contents of <address>. |
| This function supports 32-bit signed and unsigned integer operands. |
| |
| atomicExchange() uses the value of <data> as the value written to |
| <address>. This function supports 32- and 64-bit unsigned integer |
| operands and 32-bit signed integer operands. |
| |
| atomicCompSwap() compares the value of <compare> and the contents of |
| <address>. If the values are equal, <data> is written to <address>; |
| otherwise, the original contents of <address> are preserved. This |
| function supports 32- and 64-bit unsigned integer operands and 32-bit |
| signed integer operands. |
| |
| |
| Modify Section 9, Shading Language Grammar, p. 105 |
| |
| !!! TBD: Add grammar constructs for memory access qualifiers, allowing |
| memory access qualifiers before or after the type and the "*" |
| characters indicating pointers in a variable declaration. |
| |
| |
| Dependencies on EXT_shader_image_load_store |
| |
| This specification incorporates the memory access ordering and |
| synchronization discussion from EXT_shader_image_load_store verbatim. |
| |
| If EXT_shader_image_load_store is not supported, this spec should be |
| construed to introduce: |
| |
| * the shader memory access language from that specification, including |
| the MemoryBarrierNV() command and the tokens accepted by <barriers> |
| from that specification; |
| |
| * the memoryBarrier() function to the OpenGL shading language |
| specification; and |
| |
| * the capability and spec language allowing applications to enable early |
| depth tests. |
| |
| Dependencies on NV_gpu_shader5 |
| |
| This specification requires either NV_gpu_shader5 or NV_gpu_program5. |
| |
| If NV_gpu_shader5 is supported, use of the new shading language features |
| described in this extension requires |
| |
| #extension GL_NV_gpu_shader5 : enable |
| |
| If NV_gpu_shader5 is not supported, modifications to the OpenGL Shading |
| Language Specification should be removed. |
| |
| Dependencies on NV_gpu_program5 |
| |
| If NV_gpu_program5 is supported, the extension provides support for stores |
| and atomic memory transactions to buffer object memory. Stores are |
| provided by the STORE opcode; atomics are provided by the ATOM opcode. No |
| "OPTION" line is required for these features, which are implied by |
| NV_gpu_program5 program headers such as "!!NVfp5.0". The operation of |
| these opcodes is described in the NV_gpu_program5 extension specification. |
| |
| Note also that NV_gpu_program5 also supports the LOAD opcode originally |
| added by the NV_shader_buffer_load and the MEMBAR opcode originally |
| provided by EXT_shader_image_load_store. |
| |
| |
| Dependencies on GLSL 4.30, ARB_shader_storage_buffer_object, and |
| ARB_compute_shader |
| |
| If GLSL 4.30 is supported, add the following atomic memory functions to |
| section 8.11 (Atomic Memory Functions) of the GLSL 4.30 specification: |
| |
| uint atomicIncWrap(inout uint mem, uint wrap); |
| uint atomicDecWrap(inout uint mem, uint wrap); |
| |
| with the following documentation |
| |
| atomicIncWrap() computes the new value written to <mem> by adding one to |
| the contents of <mem>, and then forcing the result to zero if and only |
| if the incremented value is greater than or equal to <wrap>. This |
| function supports only 32-bit unsigned integer operands. |
| |
| atomicDecWrap() computes the new value written to <mem> by subtracting |
| one from the contents of <mem>, and then forcing the result to <wrap>-1 |
| if the original value read from <mem> was either zero or greater than |
| <wrap>. This function supports only 32-bit unsigned integer operands. |
| |
| Additionally, add the following functions to the section: |
| |
| uint64_t atomicAdd(inout uint64_t mem, uint data); |
| uint64_t atomicExchange(inout uint64_t mem, uint data); |
| uint64_t atomicCompSwap(inout uint64_t mem, uint64_t compare, |
| uint64_t data); |
| |
| If ARB_shader_storage_buffer_object or ARB_compute_shader are supported, |
| make similar edits to the functions documented in the |
| ARB_shader_storage_buffer object extension. |
| |
| These functions are available if and only if GL_NV_gpu_shader5 is enabled |
| via the "#extension" directive. |
| |
| |
| Errors |
| |
| None |
| |
| New State |
| |
| None. |
| |
| Issues |
| |
| (1) Does MAX_SHADER_BUFFER_ADDRESS_NV still apply? |
| |
| RESOLVED: The primary reason for this limitation to exist was the lack |
| of 64-bit integer support in shaders (see issue 15 of |
| NV_shader_buffer_load). Given that this extension is being released at |
| the same time as NV_gpu_shader5 which adds 64-bit integer support, it |
| is expected that this maximum address will match the maximum address |
| supported by the GPU's address space, or will be equal to "~0ULL" |
| indicating that any GPU address returned by the GL will be usable in a |
| shader. |
| |
| (2) What qualifiers should be supported on pointer variables, and how can |
| they be used in declarations? |
| |
| RESOLVED: We will support the qualifiers "coherent", "volatile", |
| "restrict", and "const" to be used in pointer declarations. "coherent" |
| is taken from EXT_shader_image_load_store and is used to ensure that |
| memory accesses from different shader threads are cached coherently |
| (i.e., will be able to see each other when complete). "volatile" and |
| "const" behave is as in C. |
| |
| "restrict" behaves as in the C99 standard, and can be used to indicate |
| that no other pointer points to the same underlying data. This permits |
| optimizations that would otherwise be impossible if the compiler has to |
| assume that a pair of pointers might end up pointing to the same data. |
| For example, in standard C/C++, a loop like: |
| |
| int *a, *b; |
| a[0] = b[0] + b[0]; |
| a[1] = b[0] + b[1]; |
| a[2] = b[0] + b[2]; |
| |
| would need to reload b[0] for each assignment because a[0] or a[1] |
| might point at the same data as b[0]. With restrict, the compiler can |
| assume that b[0] is not modified by any of the instructions and load it |
| just once. |
| |
| (3) What amount of automatic synchronization is provided for buffer object |
| writes through pointers? |
| |
| RESOLVED: Use of MemoryBarrierEXT() is required, and there is no |
| automatic synchronization when buffers are bound or unbound. With |
| resident buffers, there are no well-defined binding points in the first |
| place -- all resident buffers are effectively "bound". |
| |
| Implicit synchronization is difficult, as it might require some |
| combination of: |
| |
| - tracking which buffers might be written (randomly) in the shader |
| itself; |
| |
| - assuming that if a shader that performs writes is executed, all |
| bytes of all resident buffers could be modified and thus must be |
| treated as dirty; |
| |
| - idling at the end of each primitive or draw call, so that the |
| results of all previous commands are complete. |
| |
| Since normal OpenGL operation is pipelined, idling would result in a |
| significant performance impact since pipelining would otherwise allow |
| fragment shader execution for draw call N while simultaneously |
| performing vertex shader execution for draw call N+1. |
| |
| |
| Revision History |
| |
| Rev. Date Author Changes |
| ---- -------- -------- ----------------------------------------- |
| 5 08/13/12 pbrown Add interaction with OpenGL 4.3 (and related ARB |
| extensions) supporting atomic{Inc,Dec}Wrap and |
| 64-bit unsigned integer atomics to shared and |
| shader storage buffer memory. |
| |
| 4 04/13/10 pbrown Remove the floating-point version of atomicAdd(). |
| |
| 3 03/23/10 pbrown Minor cleanups to the dependency sections. |
| Fixed obsolete extension names. Add an issue |
| on synchronization. |
| |
| 2 03/16/10 pbrown Updated memory access qualifiers section |
| (volatile, coherent, restrict, const) for |
| pointers. Added language to document how |
| these qualifiers work in possibly complicated |
| expression. |
| |
| 1 pbrown Internal revisions. |