blob: 2cbb8530b375967449cccf9414cab7518c76ca2b [file] [log] [blame]
Name
ARB_shader_storage_buffer_object
Name Strings
GL_ARB_shader_storage_buffer_object
Contact
Pat Brown, NVIDIA (pbrown 'at' nvidia.com)
Contributors
Jeff Bolz, NVIDIA
Piers Daniell, NVIDIA
Christophe Riccio, AMD
Graham Sellers, AMD
Bruce Merry
John Kessenich
Notice
Copyright (c) 2012-2014 The Khronos Group Inc. Copyright terms at
http://www.khronos.org/registry/speccopyright.html
Specification Update Policy
Khronos-approved extension specifications are updated in response to
issues and bugs prioritized by the Khronos OpenGL Working Group. For
extensions which have been promoted to a core Specification, fixes will
first appear in the latest version of that core Specification, and will
eventually be backported to the extension document. This policy is
described in more detail at
https://www.khronos.org/registry/OpenGL/docs/update_policy.php
Status
Complete.
Approved by the ARB on 2012/06/12.
Version
Last Modified Date: April 28, 2014
Revision: 16
Number
ARB Extension #137
Dependencies
OpenGL 4.0 (either core or compatibility profile) is required.
OpenGL 4.3 or ARB_program_interface_query is required.
This extension is written against the OpenGL 4.2 (Compatibility Profile)
Specification.
This extension interacts with OpenGL 4.3 and ARB_compute_shader.
This extension interacts with OpenGL 4.3 and ARB_program_interface_query.
This extension interacts with NV_bindless_texture.
Overview
This extension provides the ability for OpenGL shaders to perform random
access reads, writes, and atomic memory operations on variables stored in
a buffer object. Application shader code can declare sets of variables
(referred to as "buffer variables") arranged into interface blocks in a
manner similar to that done with uniform blocks in OpenGL 3.1. In both
cases, the values of the variables declared in a given interface block are
taken from a buffer object bound to a binding point associated with the
block. Buffer objects used in this extension are referred to as "shader
storage buffers".
While the capability provided by this extension is similar to that
provided by OpenGL 3.1 and ARB_uniform_buffer_object, there are several
significant differences. Most importantly, shader code is allowed to
write to shader storage buffers, while uniform buffers are always
read-only. Shader storage buffers have a separate set of binding points,
with different counts and size limits. The maximum usable size for shader
storage buffers is implementation-dependent, but its minimum value is
substantially larger than the minimum for uniform buffers.
The ability to write to buffer objects creates the potential for multiple
independent shader invocations to read and write the same underlying
memory. The same issue exists with the ARB_shader_image_load_store
extension provided in OpenGL 4.2, which can write to texture objects and
buffers. In both cases, the specification makes few guarantees related to
the relative order of memory reads and writes performed by the shader
invocations. For ARB_shader_image_load_store, the OpenGL API and shading
language do provide some control over memory transactions; those
mechanisms also affect reads and writes of shader storage buffers. In the
OpenGL API, the glMemoryBarrier() call can be used to ensure that certain
memory operations related to commands issued prior the barrier complete
before other operations related to commands issued after the barrier.
Additionally, the shading language provides the memoryBarrier() function
to control the relative order of memory accesses within individual shader
invocations and provides various memory qualifiers controlling how the
memory corresponding to individual variables is accessed.
New Procedures and Functions
void ShaderStorageBlockBinding(uint program, uint storageBlockIndex,
uint storageBlockBinding);
New Tokens
Accepted by the <target> parameters of BindBuffer, BufferData,
BufferSubData, MapBuffer, UnmapBuffer, GetBufferSubData, and
GetBufferPointerv:
SHADER_STORAGE_BUFFER 0x90D2
Accepted by the <pname> parameter of GetIntegerv, GetIntegeri_v,
GetBooleanv, GetInteger64v, GetFloatv, GetDoublev, GetBooleani_v,
GetIntegeri_v, GetFloati_v, GetDoublei_v, and GetInteger64i_v:
SHADER_STORAGE_BUFFER_BINDING 0x90D3
Accepted by the <pname> parameter of GetIntegeri_v, GetBooleani_v,
GetIntegeri_v, GetFloati_v, GetDoublei_v, and GetInteger64i_v:
SHADER_STORAGE_BUFFER_START 0x90D4
SHADER_STORAGE_BUFFER_SIZE 0x90D5
Accepted by the <pname> parameter of GetIntegerv, GetBooleanv,
GetInteger64v, GetFloatv, and GetDoublev:
MAX_VERTEX_SHADER_STORAGE_BLOCKS 0x90D6
MAX_GEOMETRY_SHADER_STORAGE_BLOCKS 0x90D7
MAX_TESS_CONTROL_SHADER_STORAGE_BLOCKS 0x90D8
MAX_TESS_EVALUATION_SHADER_STORAGE_BLOCKS 0x90D9
MAX_FRAGMENT_SHADER_STORAGE_BLOCKS 0x90DA
MAX_COMPUTE_SHADER_STORAGE_BLOCKS 0x90DB
MAX_COMBINED_SHADER_STORAGE_BLOCKS 0x90DC
MAX_SHADER_STORAGE_BUFFER_BINDINGS 0x90DD
MAX_SHADER_STORAGE_BLOCK_SIZE 0x90DE
SHADER_STORAGE_BUFFER_OFFSET_ALIGNMENT 0x90DF
Accepted in the <barriers> bitfield in glMemoryBarrier:
SHADER_STORAGE_BARRIER_BIT 0x2000
Also, add a new alias for the existing token
MAX_COMBINED_IMAGE_UNITS_AND_FRAGMENT_OUTPUTS:
MAX_COMBINED_SHADER_OUTPUT_RESOURCES 0x8F39 (alias)
Additions to Chapter 2 of the OpenGL 4.2 (Compatibility Profile) Specification
(OpenGL Operation)
Modify Section 2.9, Buffer Objects, p. 56
(Add to Table 2.9, p. 57)
Target Name Purpose Described in section(s)
--------------------- -------------------- ----------------------
SHADER_STORAGE_BUFFER read-write storage 2.14.X
for shaders
(modify next-to-last paragraph, p. 58) target must be one of
ATOMIC_COUNTER_BUFFER, SHADER_STORAGE_BUFFER, TRANSFORM_FEEDBACK_BUFFER,
UNIFORM_BUFFER. ...
Modify Section 2.14.7, Uniform Variables, p. 113
(modify Table 2.16, pp. 122-125)
Add a new column labeled "Buffer". Include dots for all the types on
p. 122 (including BOOL types not supported for "Attrib" and "Xfb"). Add
dots for the "DOUBLE_MAT*" rows on p. 123. Add no dots for any image or
sampler types.
In the description of the table (p. 125), add a new sentence: Types whose
"Buffer" column are marked may be declared as buffer variables (see
section 2.14.X).
Modify unnumbered "Standard Uniform Block Layout" section, p. 132
(insert a new paragraph at the end of the section, at the bottom of
p. 133) Shader storage blocks (section 2.14.X) also support the "std140"
layout qualifier, as well as a "std430" layout qualifier not supported for
uniform blocks. When using the "std430" storage layout, shader storage
blocks will be laid out in buffer storage identically to uniform and
shader storage blocks using the "std140" layout, except that the base
alignment of arrays of scalars and vectors in rule (4) and of structures
in rule (9) are not rounded up a multiple of the base alignment of a vec4.
Add new section immediately before Section 2.14.8, Subroutine Uniform
Variables (p. 135)
2.14.X, Shader Buffer Variables
Shaders can declare named /buffer variables/, as described in the OpenGL
Shading Language Specification. Sets of buffer variables are grouped into
interface blocks called /shader storage blocks/. The values of each
buffer variable in a shader storage block are read from or written to the
data store of a buffer object bound to the binding point associated with
the block. The values of active buffer variables may be changed by
executing shaders that assign values to them or perform atomic memory
operations on them, by modifying the contents of the bound buffer object's
data store with commands such as BufferSubData, by binding a new buffer
object to the binding point associated with the block, or by changing the
binding point associated with the block.
Buffer variables in shader storage blocks are represented in memory in the
same way as uniforms stored in uniform blocks, as described in the
"Uniform Buffer Object Storage" subsection of Section 2.14.7. When a
program is linked successfully, each active buffer variable is assigned an
offset relative to the base of the buffer object binding associated with
its shader storage block. For buffer variables declared as arrays and
matrices, strides between array elements or matrix columns or rows will
also be assigned. Offsets and strides of buffer variables will be
assigned in an implementation-dependent manner unless the shader storage
block is declared using the "std140" or "std430" storage layout
qualifiers. For "std140" and "std430" shader storage blocks, offsets will
be assigned using the method described in the "Standard Uniform Block
Layout" subsection of Section 2.14.7. If a program is re-linked, existing
buffer variable offsets and strides are invalidated, and a new set of
active variables, offsets, and strides will be generated.
The total amount of buffer object storage that can be accessed in any
shader storage block is subject to an implementation-dependent limit. The
maximum amount of available space, in basic machine units, can be queried
by calling GetIntegerv with the constant MAX_SHADER_STORAGE_BLOCK_SIZE.
If the amount of storage required for any shader storage block exceeds
this limit, a program will fail to link.
If the number of active shader storage blocks referenced by the shaders in
a program exceeds implementation-dependent limits, the program will fail
to link. The limits for vertex, tessellation control, tessellation
evaluation, geometry, fragment, and compute shaders can be obtained by
calling GetIntegerv with pname values of MAX_VERTEX_SHADER_STORAGE_BLOCKS,
MAX_TESS_CONTROL_SHADER_STORAGE_BLOCKS,
MAX_TESS_EVALUATION_SHADER_STORAGE_BLOCKS,
MAX_GEOMETRY_SHADER_STORAGE_BLOCKS, MAX_FRAGMENT_SHADER_STORAGE_BLOCKS,
and MAX_COMPUTE_SHADER_STORAGE_BLOCKS, respectively. Additionally, a
program will fail to link if the sum of the number of active shader
storage blocks referenced by each shader stage in a program exceeds the
value of the implementation-dependent limit
MAX_COMBINED_SHADER_STORAGE_BLOCKS. If a shader storage block in a
program is referenced by multiple shaders, each such reference counts
separately against this combined limit.
When a named shader storage block is declared by multiple shaders in a
program, it must be declared identically in each shader. The buffer
variables within the block must be declared with the same names, types,
qualification, and declaration order. If a program contains multiple
shaders with different declarations for the same named shader storage
block, the program will fail to link.
Regions of buffer objects are bound as storage for shader storage blocks
by calling one of the commands BindBufferRange or BindBufferBase (see
section 2.9.1) with target set to SHADER_STORAGE_BUFFER. In addition to
the general errors described in section 2.9.1, BindBufferRange will
generate an INVALID_VALUE error if index is greater than or equal to the
value of MAX_SHADER_STORAGE_BUFFER_BINDINGS, or if <offset> is not a
multiple of the implementation-dependent alignment requirement (the value
of SHADER_STORAGE_BUFFER_OFFSET_ALIGNMENT).
Each of a program's active shader storage blocks has a corresponding
shader storage buffer object binding point. When a program object is
linked, the shader storage buffer object binding point assigned to each of
its active shader storage blocks is reset to the value specified by the
corresponding "binding" layout qualifier, if present, or zero otherwise.
After a program is linked, the command
void ShaderStorageBlockBinding(uint program, uint storageBlockIndex,
uint storageBlockBinding);
changes the active shader storage block with an assigned index of
<storageBlockIndex> in program object <program>. The error INVALID_VALUE
is generated if <storageBlockIndex> is not an active shader storage block
index in <program>, or if <storageBlockBinding> is greater than or equal
to the value of MAX_SHADER_STORAGE_BUFFER_BINDINGS. If successful,
ShaderStorageBlockBinding specifies that <program> will use the data
store of the buffer object bound to the binding point
<storageBlockBinding> to read and write the values of the buffer
variables in the shader storage block identified by <storageBlockIndex>.
When executing shaders that access shader storage blocks, the binding
point corresponding to each active shader storage block must be populated
with a buffer object with a size no smaller than the minimum required size
of the shader storage block (the value of BUFFER_SIZE for the appropriate
SHADER_STORAGE_BUFFER resource). For binding points populated by
BindBufferRange, the size in question is the value of the <size> parameter
or the size of the buffer minus the value of the <offset> parameter,
whichever is smaller. If any active shader storage block is not backed by
a sufficiently large buffer object, the results of shader execution are
undefined, and may result in GL interruption or termination. Shaders may
be executed to process the primitives and vertices specified between Begin
and End, or by vertex array commands (see section 2.8). Shaders may also
be executed as a result of DrawPixels, Bitmap, or RasterPos* commands.
Modify Section 2.14.12, Shader Execution (p. 145)
(add new sub-section before "Shader Inputs", p. 151)
Shader Storage Buffer Access
Shaders have the ability to read and write to buffer memory via buffer
variables in shader storage blocks. The maximum number of shader storage
blocks available to shaders are the values of the implementation dependent
constants
* MAX_VERTEX_SHADER_STORAGE_BLOCKS (for vertex shaders),
* MAX_TESS_CONTROL_SHADER_STORAGE_BLOCKS (for tessellation control
shaders),
* MAX_TESS_EVALUATION_SHADER_STORAGE_BLOCKS (for tessellation evaluation
shaders),
* MAX_GEOMETRY_SHADER_STORAGE_BLOCKS (for geometry shaders),
* MAX_FRAGMENT_SHADER_STORAGE_BLOCKS (for fragment shaders), and
* MAX_COMPUTE_SHADER_STORAGE_BLOCKS (for compute shaders).
All active shaders combined cannot use more than the value of
MAX_COMBINED_SHADER_STORAGE_BLOCKS shader storage blocks. If more than one
pipeline stage accesses the same shader storage block, each such access
counts separately against this combined limit.
(add to the list of bullets in the "Validation" section on p. 153)
* The sum of the number of active shader storage blocks used by the
current program objects exceeds the combined limit on the number of
active shader storage blocks (MAX_COMBINED_SHADER_STORAGE_BLOCKS).
Modify Section 2.14.13, Shader Memory Access (p. 153)
(modify last paragraph, p. 153) Shaders may perform random-access reads
and writes to texture or buffer object memory by using built-in image
load, store, and atomic functions operating on shader image variables, or
by reading from, assigning to, or performing atomic memory operation on
shader buffer variables, as described in the OpenGL Shading Language
Specification. The ability to perform such random-access reads and writes
in systems that may be highly pipelined results in ordering and
synchronization issues discussed in the sections below.
(add to list of MemoryBarrier <barriers> bullets, p. 158)
* SHADER_STORAGE_BARRIER_BIT: Memory accesses using shader buffer
variables issued after the barrier will reflect data written by
shaders prior to the barrier. Additionally, assignments to and atomic
operations performed on shader buffer variables after the barrier will
not execute until all memory accesses (e.g., loads, stores, texture
fetches, vertex fetches) initiated prior to the barrier complete.
Additions to Chapter 3 of the OpenGL 4.2 (Compatibility Profile) Specification
(Rasterization)
Modify Section 3.10.22, Texture Image Loads and Stores (p. 358)
(modify first paragraph, p. 367) Implementations may support a limited
combined number of image units, shader storage blocks, and active fragment
shader outputs (see section 4.2.1). A link error will be generated if the
sum of the number of active image uniforms used in all shaders, the number
of active shader storage blocks, and the number of active fragment shader
outputs exceeds the implementation-dependent value of
MAX_COMBINED_SHADER_OUTPUT_RESOURCES.
Additions to Chapter 4 of the OpenGL 4.2 (Compatibility Profile) Specification
(Per-Fragment Operations and the Frame Buffer)
None.
Additions to Chapter 5 of the OpenGL 4.2 (Compatibility Profile) Specification
(Special Functions)
None.
Additions to Chapter 6 of the OpenGL 4.2 (Compatibility Profile) Specification
(State and State Requests)
Modify Secction 6.1.15, Buffer Object Queries (p. 490)
(add to end of section)
To query which buffer objects are bound to the array of shader storage
buffer binding points and will be used as the storage for active shader
storage blocks, call GetIntegeri_v with <param> set to
SHADER_STORAGE_BUFFER_BINDING. <index> must be in the range zero to the
value of MAX_SHADER_STORAGE_BUFFER_BINDINGS-1. The name of the buffer
object bound to index is returned in <values>. If no buffer object is
bound for <index>, zero is returned in <values>.
To query the starting offset or size of the range of each buffer object
binding used for shader storage buffers, call GetInteger64i_v with <param>
set to SHADER_STORAGE_BUFFER_START or SHADER_STORAGE_BUFFER_SIZE
respectively. <index> must be in the range zero to the value of
MAX_SHADER_STORAGE_BUFFER_BINDINGS-1. If the parameter (starting offset
or size) was not specified when the buffer object was bound (e.g. if
bound with BindBufferBase), or if no buffer object is bound to index, zero
is returned.
Additions to Appendix A of the OpenGL 4.2 (Compatibility Profile) Specification
(Invariance)
Modify Section A.1, Repeatability (p. 583)
(modify last sentence of the first paragraph, p. 583) ... This
repeatability requirement doesn't apply when using shaders containing side
effects (image stores, image atomic operations, atomic counter operations,
buffer variable stores, buffer variable atomic operations), because these
memory operations are not guaranteed to be processed in a defined order.
Modify Section A.3, Invariance (p. 584)
(modify first sentence of the paragraph after Rule 5, p. 586) If a
sequence of GL commands specifies primitives to be rendered with shaders
containing side effects (image stores, image atomic operations, atomic
counter operations, buffer variable stores, buffer variable atomic
operations), invariance rules are relaxed. ...
(modify first paragraph, p. 587) When any sequence of GL commands triggers
shader invocations that perform image stores, image atomic operations,
atomic counter operations, buffer variable stores, or buffer variable
atomic operations and subsequent GL commands read the memory written by
those shader invocations, these operations must be explicitly
synchronized. For more details, see Section 2.14.X, Shader Memory Access.
Additions to Appendix D of the OpenGL 4.2 (Compatibility Profile) Specification
(Shared Objects and Multiple Contexts)
Modify Section D.3, Propagating State Changes, p. 611
(modify second bullet, p. 612)
* Rendering commands that trigger shader invocations, where the shader
performs image stores, image atomic operations, atomic counter
operations, buffer variable stores, or buffer variable atomic
operations.
Additions to the OpenGL Shading Language 4.20 Specification
Including the following line in a shader can be used to control the
language features described in this extension:
#extension GL_ARB_shader_storage_buffer_object : <behavior>
where <behavior> is as specified in section 3.3.
New preprocessor #defines are added to the OpenGL Shading Language:
#define GL_ARB_shader_storage_buffer_object 1
Modify Section 3.6, Keywords (p. 15)
(add to list of keywords)
buffer
Modify Section 4.1.9, Arrays (p. 29)
(modify first paragraph of the section, p. 29, adding an exception
allowing general indexing of the last array of a shader storage block)
... Except for the last declared member of a shader storage block
(section 4.3.X), the size of an array must be declared before it is
indexed with anything other than an integral constant expression. The
size of an array must be declared before passing it as an argument to a
function. ...
(modify last paragraph, p. 30) ... This returns a type int. If an array
has been explicitly sized, the value returned by the length method is
a constant expression. If an array has not been explicitly
sized and is not the last declared member of a shader storage block, the
value returned by the length method is not a constant
expression and will be determined when a program is linked. If an array
has not been explicitly sized and is the last declared member of a shader
storage block, the value returned will not be constant expression and
will be determined at run time based on
the size of the buffer object providing storage for the block. For such
arrays, the value returned by the length method will be undefined if the
array is contained in an array of shader storage blocks that is indexed
with a non-constant expression less than zero or greater than or equal
to the number of blocks in the array.
(add a new paragraph to end of the section, at the bottom of p. 30) In a
shader storage block, the last member may be declared without an explicit
size. In this case, the effective array size is inferred at run-time from
the size of the data store backing the interface block. Such unsized
arrays may be indexed with general integer expressions, but may not be
passed as an argument to a function or indexed with a negative constant
expression.
Modify Section 4.3, Storage Qualifiers (p. 36)
Storage
Qualifier Meaning
---------- ---------------------------------------------------
buffer value is stored in a buffer object, and can be read
or written by shader invocations and the OpenGL API
Modify Section 4.3.3, Constant Expressions (p. 38)
(modify first bullet, p. 39, clarifying that the length() method only
produces constant expressions on explicitly sized objects, since we now
allow it on implicitly sized or unsized arrays)
* valid use of the length() method on an explicitly sized object, whether
or not the object itself is constant (implicitly sized or unsized arrays
do not return a constant expression)
Insert after Section 4.3.5, Uniform (p. 40)
4.3.X, Buffer Variables
The <buffer> qualifier is used to declare global variables whose values
are stored in the data store of a buffer object bound through the OpenGL
API. Buffer variables can be read and written, with the underlying
storage shared among all active shader invocations. Buffer variable
memory reads and writes within a single shader invocation are processed in
order. However, the order of reads and writes performed in one invocation
relative to those performed by another invocation is largely undefined.
Buffer variables may be qualified with memory qualifiers affecting how the
underlying memory is accessed, as described in Section 4.10.
The "buffer" qualifier can be used with any of the basic data types, or
when declaring a variable whose type is a structure, or an array of any of
these.
Buffer variables may only be declared inside interface blocks (Section
4.3.7), which are referred to as shader storage blocks. It is illegal to
declare buffer variables at global scope (outside a block). Buffer
variables cannot have initializers.
There are implementation-dependent limits on the number of the shader
storage blocks used for each type of shader, the combined number of shader
storage blocks used for a program, and the amount of storage required by
each individual shader storage block. If any of these limits are
exceeded, it will cause a compile-time or link-time error.
If multiple shaders are linked together, then they will share a single
global buffer variable name space, including within a language as well as
across languages. Hence, the types of buffer variables with the same name
must match across all shaders that are linked into a single program.
Modify Section 4.3.7, Interface Blocks (p. 43)
(modify first paragraph) Input, output, uniform, and buffer variable
declarations can be grouped into named interface blocks ... A uniform
block is backed by the application with a buffer object. A block of
buffer variables, called a shader storage block, is also backed by the
application with a buffer object. ...
(modify second paragraph) An interface block is started by an in, out,
uniform, or buffer keyword, followed by ...
(add "buffer" to the grammar rules)
interface-qualifier:
in
out
uniform
buffer
(modify first paragraph, p. 44) Types and declarators are the same as for
other input, output, uniform, and buffer variable declarations...
(modify third paragraph, p. 44) If no optional qualifier is used in a
member-declaration, the qualification of the variable is just in, out,
uniform, or buffer as determined by <interface-qualifier>. ... Input
variables, output variables, uniform variables, and buffer variables can
only be in in blocks, out blocks, uniform blocks, and shader storage
blocks, respectively. Repeating the "in", "out", "uniform", or "buffer"
interface qualifier for a member's storage qualifier is optional. ...
(modify fourth paragraph, p. 44) For this section, define an interface to
be one of these:
* All the uniforms of a program. This spans all compilation units linked
together within one program.
* All the buffer variables of a program.
* The boundary between adjacent programmable pipeline stages: ...
(modify next-to-last paragraph, p. 45) For uniform or shader storage
blocks declared as an array, each individual array element corresponds to
a separate buffer object bind range, backing one instance of the block. As
the array size indicates the number of buffer objects needed, uniform and
shader storage block array declarations must specify an array size. A
uniform or shader storage block array can only be indexed with a
dynamically uniform integral expression, otherwise results are undefined.
(modify last paragraph of the section, p. 46) There are
implementation-dependent limits on the number of uniform blocks and the
number of shader storage blocks that can be used per stage. If either
limit is exceeded, it will cause a link error.
Modify Section 4.4.1.2, Geometry Shader Inputs (p. 49)
(modify example at the top of p. 51, since it's now legal to take the
length of implicitly sized arrays)
// code sequence within one shader...
in vec4 Color1[]; // legal, size still unknown
in vec4 Color2[2]; // legal, size is 2
in vec4 Color3[3]; // illegal, input sizes are inconsistent
layout(lines) in; // legal for Color2, input size is 2, matching Color2
in vec4 Color4[3]; // illegal, contradicts layout of lines
layout(lines) in; // legal, matches other layout() declaration
layout(triangles) in; // illegal, does not match earlier layout()
// declaration
Modify Section 4.4.3, Uniform Block Layout Qualifiers (p. 57). Rename
section title to "Uniform and Shader Storage Block Layout Qualifiers".
(modify first paragraph) Layout qualifiers can be used for uniform and
shader storage blocks, but not for non-block uniform declarations. The
layout qualifier identifiers for uniform and shader storage blocks are
layout-qualifier-id
shared
packed
std140
std430
row_major
column_major
binding = integer-constant
(modify last paragraph, p. 57) Uniform and shader storage block layout
qualifiers can be declared for global scope, on a single uniform or shader
storage block, or on a single block member declaration.
(modify first paragraph, p. 58) Default layouts are established (except
for binding) at global scope for uniform blocks as
layout(layout-qualifier-id-list) uniform;
and for shader storage blocks as
layout(layout-qualifier-id-list) buffer;
... The result becomes the new default qualification scoped to subsequent
uniform or shader storage block definitions.
(modify third paragraph, p. 58) The initial state of compilation is as if
the following were declared:
layout(shared, column_major) uniform;
layout(shared, column_major) buffer;
(modify fourth paragraph, p. 58) Uniform and shader storage blocks can be
declared with optional layout qualifiers, and so can their individual
member declarations. Such block layout qualification is scoped only to the
content of the block. As with global layout declarations, block layout
qualification first inherits from the current default qualification and
then overrides it. Similarly, individual member layout qualification is
scoped just to the member declaration, and inherits from and overrides the
block's qualification.
(modify the fifth paragraph, p. 58) The shared qualifier overrides only
the std140, std430, and packed qualifiers; other qualifiers are
inherited. The compiler/linker will ensure that multiple programs and
programmable stages containing this definition will share the same memory
layout for this block, as long as all arrays are declared with explicit
sizes and all matrices have matching row_major and/or column_major
qualifications (which may come from a declaration outside the block
definition). ...
(modify sixth paragraph, p. 58) The packed qualfier overrides only std140,
std430, and shared; other qualifiers are inherited. ... Attempts to share
a packed uniform or shader storage block across programs or stages will
generally fail. ...
(modify seventh paragraph, p. 58) The std140 and std430 qualifiers
override only the packed, shared, std140, and std430 qualifiers; other
qualifiers are inherited. The std430 qualifier is supported only for
shader storage blocks; a shader using the std430 qualifier on a uniform
block will fail to compile. ...
(modify eight paragraph, p. 58) Layout qualifiers on member declarations
cannot use the shared, packed, std140, or std430 qualifiers. ...
(modify last paragraph, p. 58) The <binding> identifier specifies the
buffer binding point corresponding to the uniform or shader storage block,
which will be used to obtain the values of the member variables of the
block. It is an error to specify the binding identifier for the global
scope or for block member declarations. Any uniform or shader storage
block declared without a binding identifier is initially assigned to block
binding point zero. After a program is linked, the binding points used
for uniform and shader storage blocks declared with or without a binding
identifier can be updated by the OpenGL API.
(modify second paragraph, p. 59) If the <binding> identifier is used with
a uniform or shader storage block instanced as an array then the first
element of the array takes the specified block binding and each subsequent
element takes the next consecutive block binding point.
(modify third paragraph, p. 59) If the binding point for any uniform or
shader storage block instance is less than zero or greater than or equal
to the implementation-dependent maximum number of bindings for the block
type (uniform or shader storage), a compilation error will occur. When
the binding identifier is used with a uniform or shader storage block
instanced as an array of size <N>, all elements of the array from
<binding> through <binding>+<N>-1 must be within this range.
Modify Section 4.10, Memory Qualifiers (p. 71)
(modify first paragraph of section, p. 71, removing the "Only" from "Only
variables") Variables declared as image types (the basic opaque types with
"image" in their keyword) can be qualified with a memory qualifier.
(add to the end of the third paragraph, p. 73) ... It is an error to
qualify an image variable with both "readonly" and "writeonly".
(insert after third paragraph, p. 73) The memory qualifiers "coherent",
"volatile", "restrict", "readonly", and "writeonly" may be used in the
declaration of buffer variables (i.e., members of shader storage blocks).
When a buffer variable is declared with a memory qualifier, the behavior
specified for memory accesses involving image variables described above
applies identically to memory accesses involving that buffer variable. It
is an error to assign to a buffer variable qualified with "readonly" or to
read from a buffer variable qualified with "writeonly".
Additionally, memory qualifiers may also be used in the declaration of
shader storage blocks. When a block declaration is qualified with a
memory qualifier, it is as if all of its members were declared with the
same memory qualifier. For example, the block declaration
coherent buffer Block {
readonly vec4 member1;
vec4 member2;
};
is equivalent to
buffer Block {
coherent readonly vec4 member1;
coherent vec4 member2;
};
Memory qualifiers are only supported in the declarations of image
variables, buffer variables, and shader storage blocks; it is an error to
use such qualifiers in any other declaration.
Modify Section 5.5, Vector and Scalar Components and Length, p. 79
(modify last paragraph of section, p. 81) ... The type returned by
.length() on a vector is int, and the value returned is considered a
constant expression.
Modify Section 5.6, Matrix Components, p. 81
(modify last paragraph of section, p. 81) ... The type returned by
.length() on a matrix is int, and the value returned is considered a
constant expression.
Modify Section 5.9, Expressions, p. 83
(insert after 4th bullet of section, p. 83, correcting the oversight that
.length() can also be used on vectors and matrices)
* an expression of vector or matrix type with the length method applied
Insert new section after Section 8.10, Atomic Counter Functions (p. 149)
8.X Atomic Memory Functions
Atomic memory functions perform atomic operations on an individual signed
or unsigned integer found in buffer object or shared variable storage.
All of the atomic memory operations read a value from memory, compute a
new value using one of the operations described below, write the new value
to memory, and return the original value read. The contents of the memory
being updated by the atomic operation are guaranteed not to be modified by
any other assignment or atomic memory function in any shader invocation
between the time the original value is read and the time the new value is
written.
Atomic memory functions are supported only for a limited set of variables.
A shader will fail to compile if the value passed to the <mem> argument of
an atomic memory function does not correspond to a buffer or shared
variable. It is acceptable to pass an element of an array or a single
component of a vector to the <mem> argument of an atomic memory function,
as long as the underlying array or vector is a buffer or shared variable.
Functions:
uint atomicAdd(inout uint mem, uint data);
int atomicAdd(inout int mem, int data);
Computes a new value by adding the value of <data> to the contents
of <mem>.
uint atomicMin(inout uint mem, uint data);
int atomicMin(inout int mem, int data);
Computes a new value by taking the minimum of the value of <data>
and the contents of <mem>.
uint atomicMax(inout uint mem, uint data);
int atomicMax(inout int mem, int data);
Computes a new value by taking the maximum of the value of <data>
and the contents of <mem>.
uint atomicAnd(inout uint mem, uint data);
int atomicAnd(inout int mem, int data);
Computes a new value by performing a bit-wise and of the value of
<data> and the contents of <mem>.
uint atomicOr(inout uint mem, uint data);
int atomicOr(inout int mem, int data);
Computes a new value by performing a bit-wise or of the value of
<data> and the contents of <mem>.
uint atomicXor(inout uint mem, uint data);
int atomicXor(inout int mem, int data);
Computes a new value by performing a bit-wise exclusive or of the
value of <data> and the contents of <mem>.
uint atomicExchange(inout uint mem, uint data);
int atomicExchange(inout int mem, int data);
Computes a new value by simply copying the value of <data>.
uint atomicCompSwap(inout uint mem, uint compare, uint data);
int atomicCompSwap(inout int mem, int compare, int data);
Compares the value of <compare> and the contents of <mem>. If the
values are equal, the new value is given by <data>; otherwise, it is
taken from the original contents of <mem>.
Additions to the AGL/EGL/GLX/WGL Specifications
None
GLX Protocol
TBD
Dependencies on OpenGL 4.3 and ARB_compute_shader:
If OpenGL 4.3 and ARB_compute_shader are not supported, any references to
uses of shader storage blocks in compute shaders, as well as the enumerant
MAX_COMPUTE_SHADER_STORAGE_BLOCKS, should be removed. Additionally, this
extension provides GLSL atomic memory functions that can be used with
buffer variables (from this extension) and shared variables (from
ARB_compute_shader). If ARB_compute_shader is not supported, references
to shared variables should be removed from the language describing these
functions.
Note that no "#extension" directive is necessary to use atomic memory
functions on shared variables in compute shaders.
Dependencies on OpenGL 4.3 and ARB_program_interface_query
If OpenGL 4.3 and ARB_program_interface_query are not supported, it
wouldn't be possible to use GLSL query APIs to enumerate active buffer
variables and shader storage blocks used by a program. We require that
OpenGL 4.3 or ARB_program_interface_query be supported; this shouldn't be
a problem for any implementations of this extension.
Dependencies on NV_bindless_texture
If NV_bindless_texture is supported (and enabled via the #extension
directive), the restriction that image and sampler variables must be
uniform variables not in blocks is lifted. In this case, image and
sampler variables may be members in shader storage blocks.
If an image variable is declared as a member of a shader storage block,
the memory qualifiers on such variable declarations apply to the memory
holding the block member and *not* the memory referenced by the image. If
it is necessary to apply a memory qualifier to the memory referenced by an
image variable found inside a shader storage block, it's possible to embed
the image variable declaration in a sturcture and then embed the structure
in a block. In the following example:
struct S {
readonly image2D x;
};
buffer Block {
S m;
};
"readonly" is considered to apply to the memory pointed to by the image
variable <x>. In this example:
buffer Block {
readonly image2D m;
}
"readonly" is considered to apply to the memory holding the image handle.
It would be illegal to write to <m>, but it would be legal to write to
the texture memory pointed to by <m> (i.e., you can pass <m> to
imageStore).
Errors
INVALID_VALUE is generated by BindBufferRange if <target> is
SHADER_STORAGE_BUFFER and <index> is greater than or equal to the value of
MAX_SHADER_STORAGE_BUFFER_BINDINGS.
INVALID_VALUE is generated by BindBufferRange if <target> is
SHADER_STORAGE_BUFFER and <offset> is not a multiple of the value of
SHADER_STORAGE_BUFFER_OFFSET_ALIGNMENT.
INVALID_VALUE is generated by ShaderStorageBlockBinding if
<storageBlockIndex> is not an active shader storage block index of
<program>.
INVALID_VALUE is generated by ShaderStorageBlockBinding if
<storageBlockBinding> is is greater than or equal to the value of
MAX_SHADER_STORAGE_BUFFER_BINDINGS.
New State
Add new table, labeled "Shader Storage Buffer State", after Table 6.58
(Atomic Counter State), p. 562:
Initial
Get Value Type Get Command Value Description Sec.
----------------------- ---- ----------- ------- ------------------------ -----
SHADER_STORAGE_BUFFER_BINDING Z+ GetIntegerv 0 Current value of generic 2.14.X
shader storage buffer
binding
SHADER_STORAGE_BUFFER_BINDING n*Z+ GetIntegeri_v 0 Buffer object bound 2.14.X
to each shader storage
buffer binding point
SHADER_STORAGE_BUFFER_START n*Z+ GetInteger64i_v 0 Start offset of 2.14.X
binding range for each
shader storage buffer
SHADER_STORAGE_BUFFER_SIZE n*Z+ GetInteger64i_v 0 Size of binding range for 2.14.X
each shader storage buffer
New Implementation Dependent State
Add to Table 6.66, Implementation Dependent Vertex Shader Limits, p. 570
Get Value Type Get Command Minimum Value Description Sec.
----------------------- ---- ----------- ------------- ------------------------- -----
MAX_VERTEX_SHADER_STORAGE_BLOCKS Z+ GetIntegerv 0 Number of shader storage 2.14.X
blocks accessed by a
vertex shader
Add to Table 6.67, Implementation Dependent Tessellation Shader Limits, p. 571
Get Value Type Get Command Minimum Value Description Sec.
----------------------- ---- ----------- ------------- ------------------------- -----
MAX_TESS_CONTROL_SHADER_ Z+ GetIntegerv 0 Number of shader storage 2.14.X
STORAGE_BLOCKS blocks accessed by a
tess. control shader
MAX_TESS_EVALUATION_SHADER_ Z+ GetIntegerv 0 Number of shader storage 2.14.X
STORAGE_BLOCKS blocks accessed by a
tess. evaluation shader
Add to Table 6.68, Implementation Dependent Geometry Shader Limits, p. 572
Get Value Type Get Command Minimum Value Description Sec.
----------------------- ---- ----------- ------------- ------------------------- -----
MAX_GEOMETRY_SHADER_STORAGE_ Z+ GetIntegerv 0 Number of shader storage 2.14.X
BLOCKS blocks accessed by a
geometry shader
Add to Table 6.69, Implementation Dependent Fragment Shader Limits, p. 573
Get Value Type Get Command Minimum Value Description Sec.
----------------------- ---- ----------- ------------- ------------------------- -----
MAX_FRAGMENT_SHADER_STORAGE_ Z+ GetIntegerv 8 Number of shader storage 2.14.X
BLOCKS blocks accessed by a
fragment shader
Add to new table in ARB_compute_shader, Implementation Dependent Compute Shader Limits
Get Value Type Get Command Minimum Value Description Sec.
----------------------- ---- ----------- ------------- ------------------------- -----
MAX_COMPUTE_SHADER_STORAGE_ Z+ GetIntegerv 8 Number of shader storage 2.14.X
BLOCKS blocks accessed by a
compute shader
Add to Table 6.70, Implementation Dependent Aggregate Shader Limits, p. 574
Get Value Type Get Command Minimum Value Description Sec.
----------------------- ---- ----------- ------------- ------------------------- -----
MAX_COMBINED_SHADER_STORAGE_ Z+ GetIntegerv 8 Number of shader storage 2.14.X
BLOCKS blocks accessed by a
program
MAX_SHADER_STORAGE_BLOCK_SIZE Z+ GetInteger- 2^24 Maximum size in basic 2.14.X
64v machine units of a shader
storage block
SHADER_STORAGE_BUFFER_OFFSET_ Z+ GetIntegerv 256 Minimum required alignment 2.14.X
ALIGNMENT for shader storage buffer
binding offsets
MAX_SHADER_STORAGE_BUFFER_ Z+ GetIntegerv 8 Maximum number of shader 2.14.X
BINDINGS storage buffer bindings
in the context
Modify Table 6.71, Implementation Dependent Aggregate Shader Limits (cont.), p. 575
Get Value Type Get Command Minimum Value Description Sec.
----------------------- ---- ----------- ------------- ------------------------- -----
MAX_COMBINED_SHADER_OUTPUT_ Z+ GetIntegerv 8 limit on active image 3.10.22
RESOURCES units, shader storage
blocks, and fragment outputs
(The only change here is a rename of the token formerly called
MAX_COMBINED_IMAGE_UNITS_AND_FRAGMENT_OUTPUTS.)
Sample Code
The following example code records a list of fragment (x,y) coordinates
and colors in rasterized primitives into a buffer object. Fragment shader
code would incude:
#extension GL_ARB_shader_storage_buffer_object : require
// Use an atomic counter to keep a running count of the number of
// fragments recorded in the shader storage buffer.
layout(binding=0, offset=0) uniform atomic_uint fragmentCounter;
// Keep a uniform with the number of fragments that can be recorded in
// the buffer.
uniform uint maxFragmentCount;
// Structure with the per-fragment information to record.
struct FragmentData {
ivec2 position;
vec4 color;
};
// Shader storage block holding an array <fragments> declared without
// a fixed size. Application code should determine how many fragments
// it wants to record and allocate a buffer appropriately. With the
// "std140" layout, each FragmentData record will take 32B. With other
// layouts, the stride of the array is implementation-dependent. The
// "binding=2" layout qualifier says that the block <Fragments> should
// be associated with shader storage buffer binding point #2.
layout(std140, binding=2) buffer Fragments {
FragmentData fragments[];
};
in vec4 color;
void main()
{
uint fragmentNumber = atomicCounterIncrement(fragmentCounter);
if (fragmentNumber < maxFragmentCount) {
fragments[fragmentNumber].position = ivec2(gl_FragCoord.xy);
fragments[fragmentNumber].color = color;
}
}
In application code
#define NFRAGMENTS 100000
#define FRAGMENT_SIZE 32 // known due to "std140" usage
GLuint fragmentBuffer, counterBuffer;
// Generate, bind, and specify the data store to hold fragments. The
// NULL pointer in BufferData says that the intial buffer contents are
// undefined. They will be filled in by the fragment shader code.
glGenBuffers(1, &fragmentBuffer);
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 2, fragmentBuffer);
glBufferData(GL_SHADER_STORAGE_BUFFER, NFRAGMENTS*FRAGMENT_SIZE,
NULL, GL_DYNAMIC_DRAW);
// Generate, bind, and specify the data store for the atomic counter.
glGenBuffers(1, &counterBuffer);
glBindBufferBase(GL_ATOMIC_COUNTER_BUFFER, 0, counterBuffer);
glBufferData(GL_ATOMIC_COUNTER_BUFFER, sizeof(GLuint), NULL,
GL_DYNAMIC_DRAW);
// Reset the atomic counter to zero, then draw stuff. This will record
// values into the shader storage buffer as fragments are generated.
GLuint zero = 0;
glBufferSubData(GL_ATOMIC_COUNTER_BUFFER, 0, sizeof(GLuint), &zero);
glUseProgram(program);
glDrawElements(GL_TRIANGLES, ...);
// You could inspect the contents with a call such as:
void *ptr = glMapBuffer(GL_SHADER_STORAGE_BUFFER, GL_READ_ONLY);
...
glUnmapBuffer(GL_SHADER_STORAGE_BUFFER);
// You could also use the storage buffer contents for vertex pulling.
// The glMemoryBarrier() command ensures that the data writes to the
// storage buffer complete prior to vertex pulling.
glMemoryBarrier(GL_VERTEX_ATTRIB_ARRAY_BARRIER_BIT);
glBindBuffer(GL_ARRAY_BUFFER, fragmentBuffer);
glVertexAttribIPointer(0, 2, GL_INT, GL_FALSE, FRAGMENT_SIZE,
(void*)0);
glVertexAttribPointer(1, 4, GL_FLOAT, GL_FALSE, FRAGMENT_SIZE,
(void*)16);
glEnableVertexAttribArray(0);
glEnableVertexAttribArray(1);
glDrawArrays(GL_POINTS, ...);
Conformance Tests
TBD
Issues
(1) The main goal of this extension is to allow C-style GLSL shader code
to write to buffer objects without using roundabout hacks like
creating buffer textures and using shader image loads and stores.
What other approaches could we take to achieving the same thing?
RESOLVED: We are using "shader storage blocks" as an abstraction
similar to uniform blocks, except that we allow shaders to write to
"shader storage blocks". Other options considered include:
- Use uniform blocks, but with a special layout qualifier (e.g.,
"writeonly" or "readwrite") that implies different semantics and
implementation-dependent limits. This would avoid the need for a new
storage qualifier in the shading language, and could also avoid adding
new GL APIs to enumerate active buffer variables and shader storage
blocks. However, it would have the disadvantage of shoehorning two
features, which might be implemented very differently in hardware,
into a single abstraction.
- Use C-style pointer syntax as in NV_shader_buffer_store, but treat the
pointers as referring to a buffer binding rather than a specific GPU
address. In this approach, pointers might be required to be uniform.
(In NV_shader_buffer_store, pointers are just data. They can be
passed as uniforms, uniform block members, shader inputs/outputs,
reconstructed from texture data, or however the application wants to
pass them.)
(2) When using shader storage blocks to append records to a buffer, the
storage is provided by a buffer object. There doesn't seem to be any
reason why the shader really needs to know the "length" of the buffer.
It might therefore want to declare global storage blocks containing
unsized arrays. Should we allow this? If so, how does that interact
with bounds checking? What does it mean for the ".length()" method in
GLSL? What would happen if you tried to pass such an array as a
function parameter? What does it mean for a possible introspection
API allowing applications to query how big the block needs to be?
RESOLVED: We will support shader storage blocks whose last member is an
unsized array. For this unsized array, the effective size will be
determined at run-time from the size of the data store. Such unsized
arrays can be indexed with general integer expressions (other than
negative constant expressions, which are generally forbidden for array
indexing in GLSL). The ".length()" method is not supported, nor is
passing the array as a function argument.
When using the ARB_program_interface_query extension to enumerate the
set of active buffer variables, only the first element of arrays (sized
or unsized) will be enumerated; the array size and offsets for array
elements other than the first can be determined by querying the
TOP_LEVEL_ARRAY_SIZE and TOP_LEVEL_ARRAY_STRIDE properties of the buffer
variable.
The bounds checking rules for unsized arrays at the end of shader
storage blocks are the same as for uniform blocks. If the array is
accessed using an index pointing at memory beyond the end of the buffer
object associated with the shader storage blocks, the results are
undefined and can lead to program termination; see also issue (7).
Other options considered here included having the shader declare an
array with a dummy size that's either unrealistically small (1 or 2) or
unrealistically large, and providing guarantees like:
- (small) if the last element of the storage block is an array, we
have defined behavior for indexed accesses off the end of the array,
as long as the effective offset is contained within the buffer; or
- (large) if the buffer is too small for the large declared array,
we have defined behavior for accesses to array elements as long as
the effective offset is contained within the buffer.
Note that it wouldn't be possible for the application to determine the
stride of an array of structures if it were declared with a size of 1.
For a size of 2 or larger, you could use
offset(array[1].member) - offset(array[0].member)
for "shared" layouts at least, but that's not possible if there is no
"array[1].member".
(3) Do we allow arrays of shader storage blocks?
RESOLVED: Yes; we already allow arrays of uniform blocks, where each
block instance has an identical layout but is backed by a separate
buffer object. It seems like we should do this here for consistency.
If we had overloaded the existing uniform block APIs (e.g., by applying
a "readwrite" layout qualifier to uniform blocks), it would be really
weird if we disallowed arrays of writeable uniform blocks since we
already allow it for regular (read-only) uniform blocks.
(4) We have typically provided some sort of "introspection" API where
application code written with no explicit knowledge of the shaders
used can discover properties of active variables. Should we provide
some here? If so, any pitfalls?
RESOLVED: Yes, we will provide an introspection API, but not as part of
this extension. Instead, we require support for the
ARB_program_interface_query extension, which provides a generic
mechanism for enumerating the set of active resources for a number of
"interfaces". This API includes interfaces for all active shader
storage blocks as well as all active buffer variables. Supporting
enumeration of these new resources was one of the primary motivations
for the generic ARB_program_interface_query extension; however, that
extension also added enumeration support for other resources that
previously had no enumeration API.
The enumeration of buffer variables follows slightly different rules
than other variables; in particular, only the first element of members
declared as arrays are enumerated. The previous enumeration rules would
have awful consequences when applied to large arrays of structures in
shader storage blocks. For example, the following declaration would
report 80K active uniforms, starting with "records[0].position" and
ending with "records[39999].texcoord". Ouch!
struct FragmentData {
vec4 position;
vec2 texcoord;
};
buffer FragmentInfo {
FragmentData records[40000];
};
Regular uniforms and UBOs also have exactly the same problem; the
primary difference is that current implementation limits on uniform
storage provide a bounds on how bad this could get. Even those limits
might not actually bound the GetActiveUniform* badness, as the spec
doesn't require a program to link successfully for GetActiveUniform* to
enumerate uniforms.
(5) Uniform blocks already have a well-established usage model, for which
implementations may have dedicated support as well as limits that
reflect this usage model. If we were to overload uniform blocks, some
new uses might not meet this limit and usage model. Is that a
problem?
RESOLVED: Yes, it could have been a problem if we had overloaded
uniform blocks. Implementations may be able to distinguish between
different types of uniform blocks, which might be implemented
differently. One might be able to distinguish based on the size of the
block as well as the layout qualifier (i.e., "readwrite" might be
"different" than "readonly").
Note that if an implementation wants to use the size of the block as a
factor for determining how the block is accessed, this would introduce a
new wrinkle into the unsized array use case above. That might not be a
huge deal; implementations could make a worst-case assumption and treat
the effective size of an unsized array as resulting in a maximum-size
buffer object.
Note that this consideration applies equally to purely read-only uniform
storage. For example, implementations might have a limit on the size of
uniform blocks that can be accessed by shaders with accelerated hardware
support. However, applications might well want to store large data sets
in buffer objects and access them using random-access reads in shader
code. OpenGL 4.2's mechanisms allow data to be pulled from buffer
objects for vertex shaders using vertex buffers (but only using the
vertex/instance number as an index). Data can also be read from a
texture buffer object via texelFetch(), but that doesn't allow for more
complex data structures (as noted above in the "write" example above).
It would be desirable to have a mechanism to allow random access reads
to "large" buffer objects, even if the implementation and performance
characterstics are different from regular UBO usage.
NVIDIA's NV_shader_buffer_load extension fills this need by allowing the
use of read-only pointers. That extension has been supported for a
longer time and is supported on more platforms than the
NV_shader_buffer_store mentioned above.
(6) The size of uniform blocks on typical OpenGL 3/4 implementations is
64KB. Is this good enough for shader storage buffers, or do we need a
higher limit?
RESOLVED: 64K is not good enough; a higher limit is required. The
current specification requires implementations to support shader storage
blocks of at least 2^24 bytes (16MB). Implementations may support
larger sizes; the maximum size can be determined by querying
MAX_SHADER_STORAGE_BLOCK_SIZE. Because implementations may choose to
support block sizes >= 2^31 bytes, applications should query the maximum
size with GetInteger64v().
(7) How are write accesses to shader storage blocks bounds-checked?
RESOLVED: For shader storage blocks, we use the same language found in
the current OpenGL 4.2 specification for uniform blocks, which
guarantees no bounds-checking:
| If any active uniform block is not backed by a sufficiently
| large buffer object, the results of shader execution are
| undefined, and may result in GL interruption or termination.
It would be desirable for to have a "robustness" feature that provides
more solid guarantees when accessing outside the bounds of a buffer
object range. However, such a feature is not present in the existing
ARB_robustness specification and is considered orthogonal to the
functionality being added here.
If we were to add bounds-checking here or in the future, there may still
be issues of how bounds-checking would be performed, with multiple use
cases. For example, some existing UBO hardware might include hardware
bounds checking (e.g., return zeroes if accessing off the end of a
buffer object), but that support might not be extended to cover writes
or even some other read-only use cases.
If shader-based bounds checking is required, using code inserted by the
compiler, we'd have to figure out how to specify it. In particular,
we'd have to figure out what granularity the check be done at. At the
byte/word level? Using the first index of the array?
struct FragmentData {
vec4 position;
vec2 texcoord;
};
layout(writeonly,binding=2) uniform FragmentInfo {
FragmentData records[40000];
};
In the example above, let's assume that the structure was tightly
packed, where each element of <records> requires exactly 24 bytes -- 16
for <position> and 8 for <texcoord>. If we bound a 32-byte buffer, what
would happen to reads/writes of records[1].position? Are reads/writes
of the x/y components guaranteed to work, with "out-of-bounds" behavior
on z/w? What about a 31-byte buffer -- do you read/write partial data
for records[1].position.y? What about a 40-byte buffer, which contains
sufficient storage for all of records[1].position? Is it guaranteed to
work, or should we allow implementations to treat accesses to array
elements out of bounds unless the buffer storage for the entire element
(including records[1].texcoord in this case).
(8) Should we provide new "packing" layout qualifiers to augment the
existing vec4-centric "std140" rule for uniform blocks?
RESOLVED: Yes, add a new "std430" layout that provided for tighter
packing of arrays and structures. With "std140", the base alignment of
arrays of scalars and vectors and of structures is always a multiple of
the base alignment of a vec4 (16B), which means that the stride of an
array of type "float", "int", or "uint" is 16B instead of 4B. With
"std430", such arrays will now be tightly packed.
Note that in the "std430" packing, arrays of vec3s are still not tightly
packed; vec3 types still require a 16B alignment as in "std140".
Note that the "std430" layout is supported only for shader storage
blocks, and not for uniform blocks.
(9) Should we allow memory qualifiers ("coherent", "volatile", "restrict",
"readonly", and "writeonly") to apply to entire shader storage blocks?
To individual shader storage block members.
RESOLVED: We allow memory qualifiers to apply to both shader storage
blocks and block members (buffer variables). When a memory qualifier is
applied to a block declaration, it is considered to apply to all block
members.
Note that the extension NV_bindless_texture allows image variables
(which accept memory qualifiers) to be declared as members of shader
storage blocks (which also accept memory qualifiers). This spec adds an
interaction that says that if this case occurs, the qualifier is
considered to apply to the image handle, stored in the block, and not
the memory referenced by the image.
(10) Should we allow mutable assignments of storage blocks to binding
points?
RESOLVED: Yes, allow them in a manner similar to uniform blocks, since
OpenGL 4.2's atomic counter buffer feature requires the "binding=N"
layout in atomic counter declarations and doesn't let you change the
binding used post-link. However, we decided to use the same behavior as
uniform blocks, since the functionality seems so similar.
(11) Is this extension/feature really needed? Isn't it possible to do
something similar in unextended OpenGL 4.2?
RESOLVED: Yes, it's possible to achieve similar functionality in
unextended OpenGL 4.2, but something cleaner is clearly desirable.
One of the intended uses of OpenGL 4.2's atomic counter feature
(ARB_shader_atomic_counters) is to allow shader invocations to write
values generated by shaders into a buffer object, using the atomic
counters to reserve a unique slot number in an array of outputs. The
array itself is accessed by associating the buffer object with a buffer
texture (ARB_texture_buffer_object) and writing to that texture using
shader image stores (ARB_shader_image_load_store). There are a number
of unfortunate limitations of this approach:
* Buffers written to using image stores must have a 1- to 4-component
texture format associated with them. It's not possible to write out
an array of structures, though one can use multiple buffers with
each buffer holding a separate member.
* The image store function takes a canonical vec4/ivec4/uvec4 value to
write, regardless of the value stored. If you're only storing a
float or a vec2, you need to use a constructor (or a swizzle hack)
to generate a vec4 in which the extra components are ignored.
* The image store function takes signed integer coordinates (like the
texelFetch built-ins). However, the atomic counter returns an
unsigned value, and GLSL doesn't support implicit conversions from
unsigned to signed.
* Image stores to buffers require the use of a buffer texture, even
though we don't ever use it as a texture.
The solution offered here is far more direct -- shader code simply
declares the format of the buffer object as an interface block and can
read and write the buffer using normal shader code.
(12) Are there other extensions providing similar functionality?
RESOLVED: Yes. The NVIDIA extension NV_shader_buffer_store also
provides a mechanism where buffer objects can be written to with regular
shader code. Using that extension, an application is able to query a
GPU address of a buffer, make that buffer resident, and then access the
buffer in GLSL code using the queried GPU address as a pointer.
Applications using NV_shader_buffer_store are required to ensure that
pointers are valid and no automatic bounds checking is provided.
This proposed extension is intended to provide GLSL functionality
similar to what you can get with NV_shader_buffer_store, but without
general pointers. Instead, this extension uses bindings, with shader
code effectively extracting a pointer from the bound buffer.
(13) Do we need some sort of limit on the combined sum of actively used
shader storage blocks and other resources, similar to what we had for
image units in OpenGL 4.2 (MAX_COMBINED_IMAGE_UNITS_
AND_FRAGMENT_OUTPUTS)?
RESOLVED: Yes. For this extension, we just add shader storage blocks
to the set of resources that have a combined limit and also create a new
general token name (MAX_COMBINED_SHADER_OUTPUT_RESOURCES) that is a new
alias of the old combined limit token.
Some OpenGL 4.2 and 4.3 implementations need to share a single set of
internal hardware resources to handle fragment shader outputs, image
loads and stores (from OpenGL 4.3 and ARB_shader_image_load_store), as
well as shader storage buffers. We specify that a link error will occur
if a program requires more of these internal resources than are
available. It is expected that implementations without a need for a
combined limit will expose a limit greater than or equal to the sum of
the individual limits for each shader stage and resource type.
This link error have interaction problems with the
ARB_separate_shader_objects extension and OpenGL 4.1. When linking a
separable program, the linker will not know anything about the usage of
fragment shader outputs, image units, and shader storage blocks from
other programs that could be in use at the same time as the program
being linked. This makes it seemingly impossible to enforce a combined
limit. In practice, this is unlikely to be a problem because the
implementations needing to enforce this combined limit will support the
use of image uniforms and shader storage blocks only in fragment and
compute shaders, and those two stages can't run concurrently.
(14) Are accesses to shader storage buffers coherent with other accesses
to the same underlying resource (e.g., image loads/stores, texture
fetches)? In the same shader invocation? In different shader
invocations?
RESOLVED: No; we don't guarantee coherent accesses between shader
resources of different types. Spec language corresponding to this issue
will be proposed outside this extension.
(15) Do we really need to have a combined limit on the sum of the number
of active shader storage blocks for each program stage
(MAX_COMBINED_SHADER_STORAGE_BLOCKS)?
RESOLVED: We include such a limit, following the precedent of providing
a combined limit for each new resource with per-stage limits. It's not
clear that this combined limit is needed by any current implementation,
though we envision an implementation that could have a set of physical
resources shared between shader stages without providing a full set of
resources for every stage.
Some implementations do need a combined limit on the number of fragment
shader outputs, image uniforms, and shader storage blocks, which is
handled by the separate MAX_COMBINED_SHADER_OUTPUT_RESOURCES limit
discussed in issue (13).
(16) How does an application determine the required buffer object size for
a shader storage block whose last member is an unsized array?
RESOLVED: The ARB_program_interface_query extension includes a property
BUFFER_SIZE that can be queried for active shader storage blocks. For
blocks where all members have known storage requirements, the value of
this property gives the minimum buffer size required to back the shader
storage block.
For shader storage blocks ending in an unsized array, the BUFFER_SIZE
property returns the minimum buffer size needed to store a single
element in the unsized array. The actual storage requirements are a
function of the number of elements the application wants to store in the
buffer object. If an application needs to store N elements in the
unsized array, the required size can be derived by
minimum_size = buffer_size + (N-1) * top_level_stride
where <buffer_size> is the value of the BUFFER_SIZE property of the
shader storage block, and <top_level_stride> is the value of the
TOP_LEVEL_STRIDE property for the unsized array.
Note that when using the "std140" layout qualifier, applications can
determine the layout of shader storage blocks without any queries by
following the layout rules documented in the API specification.
(17) Should we provide GLSL constants for the implementation-dependent
limits in this specification (e.g., gl_MaxVertexShaderStorageBlocks)?
RESOLVED: No. It's not clear that these constants are of any real
value, and they've been specified inconsistently. In particular, we
have a bunch of constants for atomic counters, atomic counter buffers,
and image units/uniforms, but we don't have any limits for uniform
blocks (ARB_uniform_buffer_object).
(18) Other than the last member of a shader storage block, should we allow
block members declared without an explicit size?
RESOLVED: Yes, for consistency with the rest of GLSL. GLSL in general
allows for arrays declared without a size. Such arrays are implicitly
sized by the compiler based on usage. For example, if a shader includes
code such as:
uniform int array[]; // no explicit size
...
expression = array[2] * array[9]; // only references to <array>
the array is likely to be implicitly sized to 10 elements, since it
needs to provide storage for array[9]. These implicitly sized arrays
are also permitted in interface blocks, such as uniform blocks.
When an array is declared in shader code, there are limitations on how
the array can be used. Such arrays may not be passed to functions in
their entirety or used by the ".length()" method. Additionally, the
array may only be indexed with integer constant expressions.
If the last member of a shader storage block is declared as an array
without an explicit size, it will be considered to be an explicitly
unsized array whose size will be inferred at run-time based on the
provided buffer object. Such arrays can be indexed with arbitrary
expressions, but can not be passed as function arguments or be used by
the ".length()" method.
Note that when using uniform or shader storage blocks using the "shared"
or "std140" layout qualifier, shaders should avoid using implicitly
sized arrays. In this case, the size will be inferred by the compiler
based on shader code and might not be computed identically for multiple
programs using the same block.
(19) Should the ".length()" method be supported for unsized arrays at the
end of a shader storage block? If not, how can shader code determine
the effective size of an unsized array?
RESOLVED: In previous versions of GLSL, the ".length()" method is not
supported for arrays without a declared size, which means that its value
is known at compile time. As a result, the value returned by
".length()" is considered a constant expression.
In this expression, we allow unsized arrays at the end of shader storage
blocks, and allow the ".length()" method to be used to determine the
size of such arrays based on the size of the provided buffer object.
The derived array size can be derived by reversing the process described
in issue (16):
array.length() =
max((buffer_object_size - offset_of_array) / stride_of_array, 0)
Given that we will support the ".length()" method on unsized arrays, we
will also support on implicitly sized arrays for consistency. For such
arrays, the array size will be determined at link time but will not be
considered a constant expression.
Revision History
Revision 16, April 28, 2014 (pbrown)
- Fix typo in description of MAX_COMBINED_SHADER_STORAGE_BLOCKS.
Revision 15, September 23, 2013 (Jon Leech)
- Fix typo ShaderStorageBinding -> ShaderStorageBlockBinding in the
description of that command (Bug 10715).
Revision 14, September 6, 2013 (Jon Leech)
- Fix typo SHADER_STORAGE_BLOCK -> SHADER_STORAGE_BUFFER in the
description of ShaderStorageBlockBinding (Bug 10795).
Revision 13, June 1, 2012 (pbrown)
- Mark issues (8) and (9) as resolved.
Revision 12, May 31, 2012 (pbrown)
- Modify spec to allow the "std430" layout qualifier only on shader
storage blocks, not uniform blocks (bug 8992).
Revision 11, May 14, 2012 (pbrown)
- Further clarify the interaction with ARB_compute_shader on atomic
memory functions; add a clarification that no #extension directive is
needed to use these functions on shared memory variables in compute
shaders.
Revision 10, May 8, 2012 (pbrown)
- Add explicit language specifying that the value returned by the
.length() method for unsized arrays is undefined when the array is in
an array of blocks dereferenced with an out-of-bounds index.
Revision 9, May 7, 2012 (pbrown)
- Allow the use of the .length() method on unsized and implicitly sized
arrays. For unsized arrays in shader storage blocks, .length() will
be computed from the size of the associated buffer object. For
implicitly sized arrays, .length() will be determined at link time.
Revision 8, May 3, 2012 (pbrown)
- Add a "std430" layout qualifier supporting more tightly packed arrays
and structures relative to "std140" for issue (8).
- Add support for memory qualifiers on shader storage block declarations
for issue (9), also add more explicit language on how these qualifiers
work on buffer variables.
- Add spec language making it illegal to use "readonly" and "writeonly"
memory qualifiers on the same declaration.
- Remove built-in constants for shader storage block implementation
limits, as described in issue (17).
- Mark various spec issues as resolved per the Khronos F2F.
- Add interaction with NV_bindless_texture, describing the behavior of
memory qualifiers on image variables inside shader storage blocks.
Revision 7, April 25, 2012 (pbrown)
- Remove the GLSL spec language generally disallowing unsized arrays in
interface blocks (bug 8837). We have supported implicitly sized
arrays in blocks in previous versions of GLSL and decided to retain
backward compatibility.
- Added a warning in the descript the "shared" layout qualifier
indicating that such blocks might not be shareable between programs if
they contain implicitly-sized array members.
- Minor typo/wording fixes.
- Fixed token table to describe all the general query functions
(e.g., GetIntegerv, GetInteger64) where certain tokens can be used.
- Update the spec to require dynamically uniform indexing on arrays of
shader storage blocks.
- Added issues (18) and (19).
Revision 6, April 16, 2012 (pbrown)
- Tentatively add built-in constants for implementation limits on shader
storage blocks, as well as new issue (17) on the topic.
Revision 5, April 13, 2012 (pbrown)
- Add missing #extension and #define built-in documentation for the GLSL
part of the extension.
- Add GLSL spec language documenting support for unsized arrays at the
end of shader storage blocks.
- Add GLSL spec language generally disallowing unsized arrays in
interface blocks, including input/output blocks, uniform blocks, and
shader storage buffers (bug 8837). This borrows from similar language
where unsized arrays are not permitted in structures.
- Extend the tables describing API tokens enumerating GLSL types to
indicate the set of types that can be used for buffer variables.
- Add sample code.
- Update language for several issues, and mark them as resolved.
- Add an issue indicating how an application can determine the required
size of a shader storage buffer when using unsized arrays.
Revision 4, April 12, 2012 (pbrown)
- Remove the enumeration APIs for buffer variables and shader storage
blocks; these resources can only be enumerated using the new APIs
provided by the ARB_program_interface_query extension.
- Add an interaction with ARB_program_interface_query, and have this
spec require that extension to ensure that the queries are available.
- Add a new interaction with ARB_compute_shader; the atomic memory
functions provided in this extension for buffer variables can also be
used for shared variables in compute shaders. Also add new compute
shader limit for active storage blocks.
- Add values for new enumerants in this extension.
- Fix up the "New Procedures and Functions" and "New Tokens" sections.
- Assign enumerant values for all tokens.
- Add a new token MAX_COMBINED_SHADER_OUTPUT_RESOURCES that's an alias
for MAX_COMBINED_IMAGE_UNITS_AND_FRAGMENT_OUTPUTS. That combined
limit now needs to apply to fragment outputs, image units, and shader
storage blocks.
- General cleanup of API specification language for shader storage
blocks.
- Add documentation of per-stage and combined limits in "Shader Execution"
spec langauge, and a validation error for exceeding combined limits with
separate program objects.
- Add new edits to Appendix A and Appendix D.
- Add appropriate text to the Dependencies, New Errors, New State, and
New Implementation-Dependent State sections.
- Add some new issues; update issue (13).
Revision 3, January 23, 2012 (pbrown)
- Add actual spec language in place of the previous "here's our options"
overview. Clean up the overview and issues section to reflect the
general approach chosen in the initial feature discussion.
- Note: Lists of new enumerants, functions, state, and errors have not
been built yet.
Revision 2, January 3, 2012 (pbrown)
- Move issues from overview to separate section in preparation for
further edits; no other changes.
Revision 1, October 26, 2011 (pbrown)
- Initial sketch/proposal, containing only an introduction and issues
list.