blob: 54ff4069372953a23b6b7b97c0a4fff063137284 [file] [log] [blame]
Name Strings
Daniel Koch, NVIDIA Corportation
Neil Henning, Codeplay
Contributors to GL_KHR_shader_subgroup (GLSL)
James Glanville, Imagination
Jan-Harald Fredriksen, Arm
Graeme Leese, Broadcom
Jesse Hall, Google
Approved by the OpenGL Working Group on 2019-05-29
Approved by the OpenGL ES Working Group on 2019-05-29
Approved by the Khronos Promoters on 2019-07-26
Last Modified: 2019-07-26
Revision: 8
ARB Extension #196
OpenGL ES Extension #321
This extension is written against the OpenGL 4.6 Specification
(Core Profile), dated July 30, 2017.
This extension requires OpenGL 4.3 or OpenGL ES 3.1.
This extension requires the KHR_shader_subgroup GLSL extension.
This extension interacts with ARB_gl_spirv and OpenGL 4.6.
This extension interacts with ARB_spirv_extensions and OpenGL 4.6.
This extension interacts with OpenGL ES 3.x.
This extension interacts with ARB_shader_draw_parameters and
This extension interacts with SPV_KHR_storage_buffer_storage_class.
This extension requires SPIR-V 1.3 when SPIR-V is supported in OpenGL.
This extension enables support for the KHR_shader_subgroup shading
language extension in OpenGL and OpenGL ES.
The extension adds API queries to be able to query
- the size of subgroups in this implementation (SUBGROUP_SIZE_KHR)
- which shader stages support subgroup operations
- which subgroup features are supported (SUBGROUP_SUPPORTED_FEATURES_KHR)
- whether quad subgroup operations are supported in all
stages supporting subgroup operations (SUBGROUP_QUAD_ALL_STAGES_KHR)
In OpenGL implementations supporting SPIR-V, this extension enables the
minimal subset of SPIR-V 1.3 which is required to support the subgroup
features that are supported by the implementation.
In OpenGL ES implementations, this extension does NOT add support for
SPIR-V or for any of the built-in shading language functions (8.18)
that have genDType (double) prototypes.
New Procedures and Functions
New Tokens
Accepted as the <pname> argument for GetIntegerv and
Accepted as the <pname> argument for GetBooleanv:
Returned as a bitfield in the <data> argument when GetIntegerv
is queried with a <pname> of SUBGROUP_SUPPORTED_STAGES_KHR
(existing tokens)
Returned as bitfield in the <data> argument when GetIntegerv
is queried with a <pname> of SUBGROUP_SUPPORTED_FEATURES_KHR:
Modifications to the OpenGL 4.6 Specification (Core Profile)
Add a new Chapter SG, "Subgroups"
A subgroup is a set of invocations that can synchronize and share data
with each other efficiently. An invocation group is partitioned into
one or more subgroups.
Subgroup operations are divided into various categories as described
SG.1 Subgroup Operations
Subgroup operations are divided into a number of categories as
described in this section.
SG.1.1 Basic Subgroup Operations
The basic subgroup operations allow two classes of functionality within
shaders - elect and barrier. Invocations within a subgroup can choose a
single invocation to perform some task for the subgroup as a whole using
elect. Invocations within a subgroup can perform a subgroup barrier to
ensure the ordering of execution or memory accesses within a subgroup.
Barriers can be performed on buffer memory accesses, shared memory
accesses, and image memory accesses to ensure that any results written are
visible by other invocations within the subgroup. A _subgroupBarrier_ can
also be used to perform a full execution control barrier. A full execution
control barrier will ensure that each active invocation within the
subgroup reaches a point of execution before any are allowed to continue.
SG.1.2 Vote Subgroup Operations
The vote subgroup operations allow invocations within a subgroup to
compare values across a subgroup. The types of votes enabled are:
* Do all active subgroup invocations agree that an expression is true?
* Do any active subgroup invocations evaluate an expression to true?
* Do all active subgroup invocations have the same value of an expression?
These operations are useful in combination with control flow in that
they allow for developers to check whether conditions match across the
subgroup and choose potentially faster code-paths in these cases.
SG.1.3 Arithmetic Subgroup Operations
The arithmetic subgroup operations allow invocations to perform scan
and reduction operations across a subgroup. For reduction operations,
each invocation in a subgroup will obtain the same result of these
arithmetic operations applied across the subgroup. For scan operations,
each invocation in the subgroup will perform an inclusive or exclusive
scan, cumulatively applying the operation across the invocations in a
subgroup in an implementation-defined order. The operations supported
are add, mul, min, max, and, or, xor.
SG.1.4 Ballot Subgroup Operations
The ballot subgroup operations allow invocations to perform more
complex votes across the subgroup. The ballot functionality allows
all invocations within a subgroup to provide a boolean value and get
as a result what each invocation provided as their boolean value. The
broadcast functionality allows values to be broadcast from an
invocation to all other invocations within the subgroup, given that
the invocation to be broadcast from is known at shader compilation
SG.1.5 Shuffle Subgroup Operations
The shuffle subgroup operations allow invocations to read values from
other invocations within a subgroup.
SG.1.6 Shuffle Relative Subgroup Operations
The shuffle relative subgroup operations allow invocations to read
values from other invocations within the subgroup relative to the
current invocation in the group. The relative operations supported
allow data to be shifted up and down through the invocations within
a subgroup.
SG.1.7 Clustered Subgroup Operations
The clustered subgroup operations allow invocations to perform
arithmetic operations among partitions of a subgroup, such that the
operation is only performed within the subgroup invocations within a
partition. The partitions for clustered subgroup operations are
consecutive power-of-two size groups of invocations and the cluster size
must be known at compilation time. The operations supported are
add, mul, min, max, and, or, xor.
SG.1.8 Quad Subgroup Operations
The quad subgroup operations allow clusters of 4 invocations (a quad),
to share data efficiently with each other. For fragment shaders, if the
value of SUBGROUP_SIZE_KHR is at least 4, each quad corresponds to one
of the groups of four shader invocations used for derivatives. The order
in which the fragments appear within the quad is implementation-defined.
In OpenGL and OpenGL ES, the order of invocations within a quad may
depend on the rendering orientation and whether rendering to a framebuffer
object or to the default framebuffer (window).
This language supersedes the quad arrangement described in the GLSL
KHR_shader_subgroup document.
SG.2 Subgroup Queries
SG.2.1 Subgroup Size
The subgroup size is the maximum number of invocations in a subgroup.
This is an implementation-dependent value which can be obtained by
calling GetIntegerv with a <pname> of SUBGROUP_SIZE_KHR. This value
is also provided in the gl_SubgroupSize built-in shading language
variable. The subgroup size must be at least 1, and must be a power
of 2. The maximum number of invocations an implementation can support
per subgroup is 128.
SG.2.2 Subgroup Supported Stages
Subgroup operations may not be supported in all shader stages. To
determine which shader stages support the subgroup operations, call
GetIntegerv with a <pname> of SUBGROUP_SUPPORTED_STAGES_KHR. On
return, <data> will contain the bitwise OR of the *_SHADER_BIT flags
indicating which of the vertex, tessellation control, tessellation
evaluation, geometry, fragment, and compute shader stages support
subgroup operations. All implementations must support at least
SG.2.3 Subgroup Supported Operations
To determine which subgroup operations are supported by an
implementation, call GetIntegerv with a <pname> of
contain the bitwise OR of the SUBGROUP_FEATURE_*_BIT_KHR
flags indicating which subgroup operations are supported by the
implementation. Possible values include:
* SUBGROUP_FEATURE_BASIC_BIT_KHR indicates the GL supports shaders
with the KHR_shader_subgroup_basic extension enabled. See SG.1.1.
* SUBGROUP_FEATURE_VOTE_BIT_KHR indicates the GL supports shaders
with the KHR_shader_subgroup_vote extension enabled. See SG.1.2.
shaders with the KHR_shader_subgroup_arithmetic extension enabled.
See SG.1.3.
* SUBGROUP_FEATURE_BALLOT_BIT_KHR indicates the GL supports
shaders with the KHR_shader_subgroup_ballot extension enabled.
See SG.1.4.
* SUBGROUP_FEATURE_SHUFFLE_BIT_KHR indicates the GL supports
shaders with the KHR_shader_subgroup_shuffle extension enabled.
See SG.1.5.
supports shaders with the KHR_shader_subgroup_shuffle_relative
extension enabled. See SG.1.6.
shaders with the KHR_shader_subgroup_clustered extension enabled.
See SG.1.7.
* SUBGROUP_FEATURE_QUAD_BIT_KHR indicates the GL supports shaders
with the GL_KHR_shader_subgroup_quad extension enabled. See SG.1.8.
All implementations must support SUBGROUP_FEATURE_BASIC_BIT_KHR.
SG.2.4 Subgroup Quads Support
To determine whether subgroup quad operations (See SG.1.8) are
available in all stages, call GetBooleanv with a <pname> of
SUBGROUP_QUAD_ALL_STAGES_KHR. On return, <data> will be TRUE
if subgroup quad operations are supported in all shader stages
which support subgroup operations. FALSE is returned if subgroup quad
operations are not supported, or if they are restricted to fragment
and compute stages.
Modifications to Appendix C of the OpenGL 4.6 (Core Profile) Specification
(The OpenGL SPIR-V Execution Environment)
Modifications to section C.1 (Required Versions and Formats) [p661]
Replace the first sentence with the following:
"Implementations must support the 1.0 and 1.3 versions of SPIR-V
and the 1.0 version of the SPIR-V Extended Instructions
for the OpenGL Shading Language (see section 1.3.4)."
Modifications to section C.2 (Valid SPIR-V Built-In Variable
Decorations) [661]
Add the following rows to Table C.1 (Built-in Variable Decorations)
NumSubgroups (if SUBGROUP_FEATURE_BASIC_BIT_KHR is supported)
SubgroupId (if SUBGROUP_FEATURE_BASIC_BIT_KHR is supported)
SubgroupSize (if SUBGROUP_FEATURE_BASIC_BIT_KHR is supported)
SubgroupLocalInvocationId (if SUBGROUP_FEATURE_BASIC_BIT_KHR is supported)
SubgroupEqMask (if SUBGROUP_FEATURE_BALLOT_BIT_KHR is supported)
SubgroupGeMask (if SUBGROUP_FEATURE_BALLOT_BIT_KHR is supported)
SubgroupGtMask (if SUBGROUP_FEATURE_BALLOT_BIT_KHR is supported)
SubgroupLeMask (if SUBGROUP_FEATURE_BALLOT_BIT_KHR is supported)
SubgroupLtMask (if SUBGROUP_FEATURE_BALLOT_BIT_KHR is supported)
Additions to section C.3 (Valid SPIR-V Capabilities):
Add the following rows to Table C.2 (Valid SPIR-V Capabilities):
GroupNonUniform (if SUBGROUP_FEATURE_BASIC_BIT_KHR is supported)
GroupNonUniformVote (if SUBGROUP_FEATURE_VOTE_BIT_KHR is supported)
GroupNonUniformArithmetic (if SUBGROUP_FEATURE_ARITHMETIC_BIT_KHR is supported)
GroupNonUniformBallot (if SUBGROUP_FEATURE_BALLOT_BIT_KHR is supported)
GroupNonUniformShuffle (if SUBGROUP_FEATURE_SHUFFLE_BIT_KHR is supported)
GroupNonUniformShuffleRelative (if SUBGROUP_FEATURE_SHUFFLE_RELATIVE_BIT_KHR is supported)
GroupNonUniformClustered (if SUBGROUP_FEATURE_CLUSTERED_BIT_KHR is supported)
GroupNonUniformQuad (if SUBGROUP_FEATURE_QUAD_BIT_KHR is supported)
Additions to section C.4 (Validation Rules):
Make the following changes to the validation rules:
Add *Subgroup* to the list of acceptable scopes for memory.
*Scope* for *Non Uniform Group Operations* must be limited to:
- *Subgroup*
* If OpControlBarrier is used in fragment, vertex, tessellation
evaluation, or geometry stages, the execution Scope must be
* "`Result Type`" for *Non Uniform Group Operations* must be
limited to 32-bit float, 32-bit integer, boolean, or vectors
of these types. If the Float64 capability is enabled, double
and vectors of double types are also permitted.
* If OpGroupNonUniformBallotBitCount is used, the group operation
must be one of:
- *Reduce*
- *InclusiveScan*
- *ExclusiveScan*
Add the following restrictions (disallowing SPIR-V 1.1, 1.2, and
1.3 features not related to subgroups);
* The *LocalSizeId* Execution Mode must not be used.
[[If SPV_KHR_storage_buffer_storage_class is not supported]]
* The *StorageBuffer* Storage Class must not be used.
* The *DependencyInfinite* and *DependencyLength* Loop Control
masks must not be used.
[[If SPV_KHR_shader_draw_parameters or OpenGL 4.6 is not supported]]
* The *DrawParameters* Capability must not be used.
* The *StorageBuffer16BitAccess*, *UniformAndStorageBuffer16BitAccess*,
*StoragePushConstant16*, *StorageInputOutput16* Capabilities must
not be used.
* The *DeviceGroup*, *MultiView*, *VariablePointersStorageBuffer*, and
*VariablePointers* Capabilities must not be used.
* The *OpModuleProcessed*, *OpDecorateId*, and *OpExecutionModeId*
Instructions must not be used.
Modifications to the OpenGL Shading Language Specification, Version 4.60
See the separate KHR_shader_subgroup GLSL document.
Dependencies on ARB_gl_spirv and OpenGL 4.6
If ARB_gl_spirv or OpenGL 4.6 are not supported, ignore all
references to SPIR-V functionality.
Dependencies on ARB_spirv_extensions and OpenGL 4.6
If ARB_spirv_extensions or OpenGL 4.6 are not supported, ignore
references to the ability to advertise additional SPIR-V extensions.
Dependencies on OpenGL ES 3.x
If implemented in OpenGL ES, ignore all references to SPIR-V and to
GLSL built-in functions which utilize the genDType (double) types.
Dependencies on ARB_shader_draw_parameters and SPV_KHR_shader_draw_parameters
If neither OpenGL 4.6, nor ARB_shader_draw_parameters and
SPV_KHR_shader_draw_parameters are supported, the *DrawParameters*
Capability is not supported.
Dependencies on SPV_KHR_storage_buffer_storage_class
If SPV_KHR_storage_buffer_storage_class is not supported, the
*StorageBuffer* Storage Class must not be used.
Additions to the AGL/GLX/WGL Specifications
New State
New Implementation Dependent State
Additions to table 2.53 - Implementation Dependent Values
Get Value Type Get Command Value Description Sec.
--------- ----- --------------- ------- ------------------------ ------
SUBGROUP_SIZE_KHR Z+ GetIntegerv 1 No. of invocations in SG.2.1
each subgroup
SUBGROUP_SUPPORTED_ E GetIntegerv Sec Bitfield of stages that SG.2.2
STAGES_KHR SG.2.2 subgroups are supported in
SUBGROUP_SUPPORTED_ E GetIntegerv Sec Bitfield of subgroup SG.2.3
FEATURES_KHR SG.2.3 operations supported
SUBGROUP_QUAD_ B GetBooleanv - Quad subgroups supported SG.2.4
ALL_STAGES_KHR in all stages
1. What should we name this extension?
DISCUSSION. We will use the same name as the GLSL extension
in order to minimize confusion. This has been done for other
extensions and people seem to have figured it out. Other
options considered: KHR_subgroups, KHR_shader_subgroup_operations,
RESOLVED: use KHR_shader_subgroup to match the GLSL extension.
2. What should happen if subgroup operations are attempted on
unsupported stages?
DISCUSSION: There are basically two options
A. compile or link-time error?
B. draw time invalid_op error?
Seems like Option (A) would be more user friendly, and there doesn't
seem to be much point in requiring an implementation to
support compiling the functionality in stages they won't work in.
Typically this should be detectable by an implementation at compile
time since this will just require them to reject shaders with
#extension GL_KHR_shader_subgroup* in shader stages that they don't
support. However, for SPIR-V implementations, this may happen at
lowering time, so it may happen at either compile or link-time.
RESOLVED: Compile or link-time error.
3. How should we enable SPIR-V support for this extension?
DISCUSSION: Options could include:
A. add support for SPIR-V 1.1, 1.2, and 1.3.
B. add support for only the subgroups capabilities from SPIR-V 1.3.
Doing option (A) seems like a weird way of submarining support
for new versions of SPIR-V into OpenGL, and it seems like there
should be a separate extension for that.
If option (B) is selected, we need to be sure to disallow other
new capabilities that are added in SPIR-V 1.1, 1.2, and 1.3
RESOLVED: (B) only add support for subgroup capabilities from SPIR-V
1.3. If a future GL core version incorporates this extension it should
add support for all of SPIR-V 1.3.
4. What functionality of SPIR-V 1.1, 1.2, and 1.3 needs to be disallowed?
Additions that aren't gated by specific capabilities and are disallowed
are the following:
LocalSizeId (1.2)
DependencyInfinite (1.1)
DependencyLength (1.1)
OpModuleProcessed (1.1)
OpDecorateId (1.2)
OpExecutionModeId (1.2)
Additions that are gated by graphics-compatible capabilities not
being enabled by this extension (but could be enabled by other
Capabilities Enabling extension
StorageBuffer (1.3) SPV_KHR_storage_buffer_storage_class
DrawParameters (1.3) SPV_KHR_shader_draw_parameters
- BaseVertex
- BaseInstance
- DrawIndex
DeviceGroup (1.3) SPV_KHR_device_group
- DeviceIndex
MultiView (1.3) SPV_KHR_multiview
- ViewIndex
StorageBuffer16BitAccess (1.3) SPV_KHR_16bit_storage
StorageUniformBufferBlock16 (1.3) SPV_KHR_16bit_storage
UniformAndStorageBuffer16BitAccess (1.3) SPV_KHR_16bit_storage
StorageUniform16 (1.3) SPV_KHR_16bit_storage
StoragePushConstant16 (1.3) SPV_KHR_16bit_storage
StorageInputOutput16 (1.3) SPV_KHR_16bit_storage
VariablePointersStorageBuffer (1.3) SPV_KHR_variable_pointers
VariablePointers (1.3) SPV_KHR_variable_pointers
5. Given Issues (3) and (4) what exactly are the additional SPIR-V
requirements are being added by this extension?
RESOLVED: We add support for the following from SPIR-V 1.3:
Capabilities (3.31) Enabling API Feature
Builtins (3.21) Enabling Capability
SubgroupSize GroupNonUniform
NumSubgroups GroupNonUniform
SubgroupId GroupNonUniform
SubgroupLocalInvocationId GroupNonUniform
SubgroupEqMask GroupNonUniformBallot
SubgroupGeMask GroupNonUniformBallot
SubgroupGtMask GroupNonUniformBallot
SubgroupLeMask GroupNonUniformBallot
SubgroupLtMask GroupNonUniformBallot
Group Operations Enabling Capability
Reduce GroupNonUniformArithmetic, GroupNonUniformBallot
InclusiveScan GroupNonUniformArithmetic, GroupNonUniformBallot
ExclusiveScan GroupNonUniformArithmetic, GroupNonUniformBallot
ClusteredReduce GroupNonUniformClustered
Non-Uniform Instructions Enabling Capability
OpGroupNonUniformElect GroupNonUniform
OpGroupNonUniformAll GroupNonUniformVote
OpGroupNonUniformAny GroupNonUniformVote
OpGroupNonUniformAllEqual GroupNonUniformVote
OpGroupNonUniformBroadcast GroupNonUniformBallot
OpGroupNonUniformBroadcastFirst GroupNonUniformBallot
OpGroupNonUniformBallot GroupNonUniformBallot
OpGroupNonUniformInverseBallot GroupNonUniformBallot
OpGroupNonUniformBallotBitExtract GroupNonUniformBallot
OpGroupNonUniformBallotBitCount GroupNonUniformBallot
OpGroupNonUniformBallotFindLSB GroupNonUniformBallot
OpGroupNonUniformBallotFindMSB GroupNonUniformBallot
OpGroupNonUniformShuffle GroupNonUniformShuffle
OpGroupNonUniformShuffleXor GroupNonUniformShuffle
OpGroupNonUniformShuffleUp GroupNonUniformShuffle
OpGroupNonUniformShuffleDown GroupNonUniformShuffle
OpGroupNonUniformIAdd GroupNonUniformArithmetic, GroupNonUniformClustered
OpGroupNonUniformFAdd GroupNonUniformArithmetic, GroupNonUniformClustered
OpGroupNonUniformIMul GroupNonUniformArithmetic, GroupNonUniformClustered
OpGroupNonUniformFMul GroupNonUniformArithmetic, GroupNonUniformClustered
OpGroupNonUniformSMin GroupNonUniformArithmetic, GroupNonUniformClustered
OpGroupNonUniformUMin GroupNonUniformArithmetic, GroupNonUniformClustered
OpGroupNonUniformFMin GroupNonUniformArithmetic, GroupNonUniformClustered
OpGroupNonUniformSMax GroupNonUniformArithmetic, GroupNonUniformClustered
OpGroupNonUniformUMax GroupNonUniformArithmetic, GroupNonUniformClustered
OpGroupNonUniformFMax GroupNonUniformArithmetic, GroupNonUniformClustered
OpGroupNonUniformBitwiseAnd GroupNonUniformArithmetic, GroupNonUniformClustered
OpGroupNonUniformBitwiseOr GroupNonUniformArithmetic, GroupNonUniformClustered
OpGroupNonUniformBitwiseXor GroupNonUniformArithmetic, GroupNonUniformClustered
OpGroupNonUniformLogicalAnd GroupNonUniformArithmetic, GroupNonUniformClustered
OpGroupNonUniformLogicalOr GroupNonUniformArithmetic, GroupNonUniformClustered
OpGroupNonUniformLogicalXor GroupNonUniformArithmetic, GroupNonUniformClustered
OpGroupNonUniformQuadBroadcast GroupNonUniformQuad
OpGroupNonUniformQuadSwap GroupNonUniformQuad
*Subgroup* as an acceptable memory scope.
OpControlBarrier in fragment, vertex, tessellation evaluation, tessellation
control, and geometry stages with the *Subgroup* execution Scope.
Revision History
Rev. Date Author Changes
---- ----------- -------- -------------------------------------------
8 2019-07-26 dgkoch Update status and assign extension numbers
7 2019-05-22 dgkoch Resync language with Vulkan spec. Address feedback
from Graeme. Relax quad ordering definition.
6 2019-03-28 dgkoch rename to KHR_shader_subgroup, update some issues
5 2018-05-30 dgkoch Address feedback from Graeme and Jesse.
4 2018-05-28 dgkoch change ALLSTAGES -> ALL_STAGES, fix typos
3 2018-05-23 dgkoch Add overview and interactions, add SPIR-V 1.3
restrictions, Issues 4 and 5.
2 2018-04-26 dgkoch Various updates to match latest vulkan spec
Assign tokens. Add SPIR-V support.
1 2018-01-19 dgkoch Initial revision.