| Name |
| |
| NV_shader_thread_group |
| |
| Name Strings |
| |
| GL_NV_shader_thread_group |
| |
| Contributors |
| |
| Jeannot Breton, NVIDIA |
| Pat Brown, NVIDIA |
| Eric Werness, NVIDIA |
| Mark Kilgard, NVIDIA |
| |
| Contact |
| |
| Jeannot Breton, NVIDIA Corporation (jbreton 'at' nvidia.com) |
| |
| Status |
| |
| Shipping. |
| |
| Version |
| |
| Last Modified Date: 7/21/2015 |
| NVIDIA Revision: 4 |
| |
| Number |
| |
| OpenGL Extension #447 |
| |
| Dependencies |
| |
| This extension is written against the OpenGL 4.3 (Compatibility Profile) |
| Specification. |
| |
| This extension is written against version 4.30 (revision 07) of the OpenGL |
| Shading Language Specification. |
| |
| OpenGL 4.3 and GLSL 4.3 are required. |
| |
| This extension interacts with NV_gpu_program5 |
| |
| This extension interacts with NV_compute_program5 |
| |
| This extension interacts with NV_tessellation_program5 |
| |
| Overview |
| |
| Implementations of the OpenGL Shading Language may, but are not required |
| to, run multiple shader threads for a single stage as a SIMD thread group, |
| where individual execution threads are assigned to thread groups in an |
| undefined, implementation-dependent order. This extension provides a set |
| of new features to the OpenGL Shading Language to query thread states and |
| to share data between fragments within a 2x2 pixel quad. |
| |
| More specifically the following functionalities were added: |
| |
| * New uniform variables and tokens to query the number of threads in a |
| warp, the number of warps running on a SM and the number of SMs on the |
| GPU. |
| |
| * New shader inputs to query the thread id, the warp id and the SM id. |
| |
| * New shader inputs to query if a fragment shader thread is a helper |
| thread. |
| |
| * New shader built-in functions to query the state of a Boolean condition |
| over all threads in a thread group. |
| |
| * New shader built-in functions to query which threads are active within |
| a thread group. |
| |
| * New fragment shader built-in functions to share data between fragments |
| within a 2x2 pixel quad. |
| |
| Shaders using the new functionalities provided by this extension should |
| enable this functionality via the construct |
| |
| #extension GL_NV_shader_thread_group : require (or enable) |
| |
| This extension also specifies some modifications to the program assembly |
| language to support the thread state query and thread data sharing |
| functionalities. |
| |
| Note that in this extension specification warp and thread group have the |
| same meaning. A warp is a group of threads that get executed in lockstep. |
| Each thread in a warp executes the same instruction of a program, but on |
| different data. |
| |
| New Procedures and Functions |
| |
| None |
| |
| |
| New Tokens |
| |
| Accepted by the <pname> parameter of GetBooleanv, GetIntegerv, |
| GetFloatv, and GetDoublev: |
| |
| WARP_SIZE_NV 0x9339 |
| WARPS_PER_SM_NV 0x933A |
| SM_COUNT_NV 0x933B |
| |
| |
| Modifications to The OpenGL Shading Language Specification, Version 4.30 |
| (Revision 07) |
| |
| Including the following line in a shader can be used to control the |
| language features described in this extension: |
| |
| #extension GL_NV_shader_thread_group : <behavior> |
| |
| where <behavior> is as specified in section 3.3. |
| |
| New preprocessor #defines are added to the OpenGL Shading Language: |
| |
| #define GL_NV_shader_thread_group 1 |
| |
| Modify Section 7.1, Built-in Languages Variable, p. 110 |
| |
| (Add to the list of built-in variables for the compute, vertex, geometry, |
| tessellation control, tessellation evaluation and fragment languages) |
| |
| in uint gl_ThreadInWarpNV; |
| in uint gl_ThreadEqMaskNV; |
| in uint gl_ThreadGeMaskNV; |
| in uint gl_ThreadGtMaskNV; |
| in uint gl_ThreadLeMaskNV; |
| in uint gl_ThreadLtMaskNV; |
| in uint gl_WarpIDNV; |
| in uint gl_SMIDNV; |
| |
| (Add to the list of built-in variables for the fragment languages) |
| |
| in bool gl_HelperThreadNV; |
| |
| (Add those paragraphs at the end of this section) |
| |
| The variable gl_ThreadInWarpNV hold the id of the thread within the thread |
| group(or warp). This variable is in the range 0 to gl_WarpSizeNV-1, where |
| gl_WarpSizeNV is the total number of thread in a warp. |
| |
| The variable gl_ThreadEqMaskNV is a bitfield in which the bit equal to the |
| current thread id is set. The variable gl_ThreadGeMaskNV is a bitfield in |
| which bits greater or equal to the current thread id are set. The variable |
| gl_ThreadGtMaskNV is a bitfield in which bits greater than the current |
| thread id are set. The variable gl_ThreadLeMaskNV is a bitfield in which |
| bits lower or equal to the current thread id are set. The variable |
| gl_ThreadLtMaskNV is a bitfield in which bits lower than the current thread |
| id are set. |
| |
| The value of gl_ThreadEqMaskNV, gl_ThreadGeMaskNV, gl_ThreadGtMaskNV, |
| gl_ThreadLeMaskNV and gl_ThreadLtMaskNV are derived from the value of |
| gl_ThreadInWarpNV using simple bit-shift arithmetic, they don't take into |
| account the value of the thread group active mask. For example, if the |
| application wants a bitfield in which bits lower or equal to the current |
| thread id are set only for active threads, the result of gl_ThreadLeMaskNV |
| will need to be ANDed with the thread group active mask. |
| |
| The variable gl_WarpIDNV hold the warp id of the executing thread. This |
| variable is in the range 0 to gl_WarpsPerSMNV-1, where gl_WarpsPerSMNV is |
| the maximum number of warp executing on a SM. |
| |
| The variable gl_SMIDNV hold the SM id of the executing thread. This |
| variable is in the range 0 to gl_SMCountNV-1, where gl_SMCountNV is the |
| number of SM on the GPU. |
| |
| The variable gl_HelperThreadNV specifies if the current thread is a helper |
| thread. In implementations supporting this extension, fragment shader |
| invocations may be arranged in SIMD thread groups of 2x2 fragments called |
| "quad". When a fragment shader instruction is executed on a quad, it's |
| possible that some fragments within the quad will execute the instruction |
| even if they are not covered by the primitive. Those threads are called |
| helper threads. Their outputs will be discarded and they will not execute |
| global store functions, but the intermediate values they compute can still |
| be used by thread group sharing functions or by fragment derivative |
| functions like dFdx and dFdy. |
| |
| |
| Modify Section 7.4, Built-In Uniform State, p. 125 |
| |
| (Add to the list of built-in uniform variable declaration) |
| |
| uniform uint gl_WarpSizeNV; |
| uniform uint gl_WarpsPerSMNV; |
| uniform uint gl_SMCountNV; |
| |
| (Add this paragraph at the end of this section) |
| |
| The variable gl_WarpSizeNV is the total number of thread in a warp. The |
| variable gl_WarpsPerSMNV is the maximum number of warp executing on a SM. |
| The variable gl_SMCountNV is the number of SM on the GPU. |
| |
| |
| Modify Section 8.3, Common Functions, p. 133 |
| |
| (add a function to query which threads are active within a thread group) |
| |
| Syntax: |
| |
| uint activeThreadsNV(void) |
| |
| In the value returned by activeThreadsNV(), bit <N> is set to 1 if the |
| corresponding thread in the SIMD thread group is executing the call to |
| activeThreadsNV() and 0 otherwise. A bit in the return value may be set |
| to zero due to conditional flow control (e.g., returning from a function, |
| executing the "else" part of an "if" statement) or SIMD thread group was |
| dispatched without a full collection of threads. |
| |
| (add a function to query the state of a Boolean condition over all the |
| threads in a thread group) |
| |
| Syntax: |
| |
| uint ballotThreadNV(bool value) |
| |
| The function ballotThreadNV() computes a 32-bit bitfield. It looks at the |
| condition <value> for each active thread of a thread group and set to 1 |
| each bit for which the condition in the corresponding thread is true. Bits |
| for threads with false condition are set to 0. Bits for inactive threads |
| are also set to 0. It's possible to query the active thread mask by |
| calling the function activeThreadsNV. |
| |
| (add a function to share data between fragment in a quad) |
| |
| Syntax: |
| |
| float quadSwizzle0NV(float swizzledValue, [float unswizzledValue]) |
| vec2 quadSwizzle0NV(vec2 swizzledValue, [vec2 unswizzledValue]) |
| vec3 quadSwizzle0NV(vec3 swizzledValue, [vec3 unswizzledValue]) |
| vec4 quadSwizzle0NV(vec4 swizzledValue, [vec4 unswizzledValue]) |
| |
| float quadSwizzle1NV(float swizzledValue, [float unswizzledValue]) |
| vec2 quadSwizzle1NV(vec2 swizzledValue, [vec2 unswizzledValue]) |
| vec3 quadSwizzle1NV(vec3 swizzledValue, [vec3 unswizzledValue]) |
| vec4 quadSwizzle1NV(vec4 swizzledValue, [vec4 unswizzledValue]) |
| |
| float quadSwizzle2NV(float swizzledValue, [float unswizzledValue]) |
| vec2 quadSwizzle2NV(vec2 swizzledValue, [vec2 unswizzledValue]) |
| vec3 quadSwizzle2NV(vec3 swizzledValue, [vec3 unswizzledValue]) |
| vec4 quadSwizzle2NV(vec4 swizzledValue, [vec4 unswizzledValue]) |
| |
| float quadSwizzle3NV(float swizzledValue, [float unswizzledValue]) |
| vec2 quadSwizzle3NV(vec2 swizzledValue, [vec2 unswizzledValue]) |
| vec3 quadSwizzle3NV(vec3 swizzledValue, [vec3 unswizzledValue]) |
| vec4 quadSwizzle3NV(vec4 swizzledValue, [vec4 unswizzledValue]) |
| |
| float quadSwizzleXNV(float swizzledValue, [float unswizzledValue]) |
| vec2 quadSwizzleXNV(vec2 swizzledValue, [vec2 unswizzledValue]) |
| vec3 quadSwizzleXNV(vec3 swizzledValue, [vec3 unswizzledValue]) |
| vec4 quadSwizzleXNV(vec4 swizzledValue, [vec4 unswizzledValue]) |
| |
| float quadSwizzleYNV(float swizzledValue, [float unswizzledValue]) |
| vec2 quadSwizzleYNV(vec2 swizzledValue, [vec2 unswizzledValue]) |
| vec3 quadSwizzleYNV(vec3 swizzledValue, [vec3 unswizzledValue]) |
| vec4 quadSwizzleYNV(vec4 swizzledValue, [vec4 unswizzledValue]) |
| |
| In implementations supporting this extension, if a primitive covers a |
| fragment at (x,y), its fragment shader invocation will be arranged in a |
| SIMD thread group with fragment shader invocations corresponding to three |
| neighboring pixels. These four invocations are arranged in a 2x2 grid, |
| called a "quad". If the neighbors of a fragment are not covered by the |
| primitive, fragment shader invocations will still be generated. The |
| implementation may compute differences between values in these threads to |
| estimate derivatives for dFdx(), dFdy(), and for texture lookups with |
| automatic LOD calculations. |
| |
| Fragments may have different locations in the quads based on the type of |
| render target. |
| |
| When rendering to a window, fragments within a quad follow this pattern: |
| |
| --------------------------------------------------- |
| | gl_ThreadInWarpNV 4N+0 | gl_ThreadInWarpNV 4N+1 | |
| | pixel (X+0,Y+1) | pixel (X+1,Y+1) | |
| --------------------------------------------------- |
| | gl_ThreadInWarpNV 4N+2 | gl_ThreadInWarpNV 4N+3 | |
| | pixel (X+0,Y+0) | pixel (X+1,Y+0) | |
| --------------------------------------------------- |
| |
| |
| When rendering to a framebuffer object, fragments within a quad follow this |
| pattern: |
| |
| --------------------------------------------------- |
| | gl_ThreadInWarpNV 4N+2 | gl_ThreadInWarpNV 4N+3 | |
| | pixel (X+0,Y+1) | pixel (X+1,Y+1) | |
| --------------------------------------------------- |
| | gl_ThreadInWarpNV 4N+0 | gl_ThreadInWarpNV 4N+1 | |
| | pixel (X+0,Y+0) | pixel (X+1,Y+0) | |
| --------------------------------------------------- |
| |
| There are 6 quadSwizzle functions that allow fragments within a quad to |
| exchange data. All those functions will read a floating point |
| operand <swizzledValue>, which can come from any fragment in the quad. |
| Another optional floating point operand <unswizzledValue>, which comes from |
| the current fragment, can be added to <swizzledValue>. The only difference |
| between all those quadSwizzle functions is the location where they get the |
| <swizzledValue> operand within the 2x2 pixel quad. |
| |
| quadSwizzle0NV will read the <swizzledValue> operand from the fragment 0: |
| |
| result[thread N] = swizzledValue[thread 0] + unswizzledValue[thread N] |
| |
| |
| quadSwizzle1NV will read the <swizzledValue> operand from the fragment 1: |
| |
| result[thread N] = swizzledValue[thread 1] + unswizzledValue[thread N] |
| |
| |
| quadSwizzle2NV will read the <swizzledValue> operand from the fragment 2: |
| |
| result[thread N] = swizzledValue[thread 2] + unswizzledValue[thread N] |
| |
| |
| quadSwizzle3NV will read the <swizzledValue> operand from the fragment 3: |
| |
| result[thread N] = swizzledValue[thread 3] + unswizzledValue[thread N] |
| |
| |
| quadSwizzleXNV will read the <swizzledValue> operand for each fragment |
| from its neighbor in X: |
| |
| result[thread 0] = swizzledValue[thread 1] + unswizzledValue[thread 0] |
| result[thread 1] = swizzledValue[thread 0] + unswizzledValue[thread 1] |
| result[thread 2] = swizzledValue[thread 3] + unswizzledValue[thread 2] |
| result[thread 3] = swizzledValue[thread 2] + unswizzledValue[thread 3] |
| |
| |
| quadSwizzleYNV will read the <swizzledValue> operand for each fragment |
| from its neighbor in Y: |
| |
| result[thread 0] = swizzledValue[thread 2] + unswizzledValue[thread 0] |
| result[thread 1] = swizzledValue[thread 3] + unswizzledValue[thread 1] |
| result[thread 2] = swizzledValue[thread 0] + unswizzledValue[thread 2] |
| result[thread 3] = swizzledValue[thread 1] + unswizzledValue[thread 3] |
| |
| |
| If any thread in a 2x2 pixel quad is inactive, the quad is divergent. In |
| this case quadSwizzle will return 0 for all fragments in the quad. |
| |
| |
| Dependencies on NV_gpu_program5 |
| |
| If NV_gpu_program5 is supported and "OPTION NV_shader_thread_group" is |
| specified in an assembly program, the following edits are made to extend |
| the assembly programming model documented in the NV_gpu_program4 extension |
| and extended by NV_gpu_program5. |
| |
| If NV_gpu_program5 is not supported, or if "OPTION NV_shader_thread_group" |
| is not specified in an assembly program, the contents of this dependencies |
| section should be ignored. |
| |
| Modify Section 2.X.2, Program Grammar |
| |
| (add the following rules to the the NV_gpu_program4 and |
| NV_gpu_program5 base grammars) |
| |
| <VECTORop> ::= "TGBALLOT" |
| |
| <stateSingleItem> ::= "state" "." <stateThreadItem> |
| |
| <stateThreadItem> ::= "thread" "." <stateThreadProperty> |
| |
| <stateThreadProperty> ::= "warpsize" |
| | "warpspersm" |
| | "smcount" |
| |
| (add/change the following rules to the NV_fragment_program4 and |
| NV_gpu_program5 base grammars) |
| |
| <VECTORop> ::= "QSWZ0" |
| | "QSWZ1" |
| | "QSWZ2" |
| | "QSWZ3" |
| | "QSWZX" |
| | "QSWZY" |
| |
| <attribBasic> ::= <fragPrefix> "threadid" |
| | <fragPrefix> "threadeqmask" |
| | <fragPrefix> "threadltmask" |
| | <fragPrefix> "threadlemask" |
| | <fragPrefix> "threadgtmask" |
| | <fragPrefix> "threadgemask" |
| | <fragPrefix> "warpid" |
| | <fragPrefix> "smid" |
| | <fragPrefix> "helperthread" |
| |
| (add/change the following rules to the NV_vertex_program4 and |
| NV_gpu_program5 base grammars) |
| |
| <attribBasic> ::= <vtxPrefix> "threadid" |
| | <vtxPrefix> "threadeqmask" |
| | <vtxPrefix> "threadltmask" |
| | <vtxPrefix> "threadlemask" |
| | <vtxPrefix> "threadgtmask" |
| | <vtxPrefix> "threadgemask" |
| | <vtxPrefix> "warpid" |
| | <vtxPrefix> "smid" |
| |
| (add/change the following rules to the NV_geometry_program4 and |
| NV_gpu_program5 base grammars) |
| |
| <attribBasic> ::= <primPrefix> "threadid" |
| | <primPrefix> "threadeqmask" |
| | <primPrefix> "threadltmask" |
| | <primPrefix> "threadlemask" |
| | <primPrefix> "threadgtmask" |
| | <primPrefix> "threadgemask" |
| | <primPrefix> "warpid" |
| | <primPrefix> "smid" |
| |
| Modify Section 2.X.3.2 of the NV_gpu_program4 specification, Program |
| Attribute Variables. |
| |
| (Add the table entries and relevant text describing the fragment program |
| input variable use to query thread states.) |
| |
| Fragment Attribute Binding Components Underlying State |
| -------------------------- ---------- ---------------------------- |
| ... |
| fragment.threadid (id,-,-,-) id of the current thread |
| fragment.threadeqmask (m,-,-,-) mask with the current thread |
| fragment.threadltmask (m,-,-,-) mask with lower thread |
| fragment.threadlemask (m,-,-,-) mask with lower or equal thread |
| fragment.threadgtmask (m,-,-,-) mask with greater thread |
| fragment.threadgemask (m,-,-,-) mask with greater or equal thread |
| fragment.warpid (id,-,-,-) warp id of the current thread |
| fragment.smid (id,-,-,-) SM id of the current thread |
| fragment.helperthread (k,-,-,-) current thread is a helper thread |
| ... |
| |
| If a fragment attribute binding matches "fragment.threadid", the "x" |
| component is filled with the thread id of the current thread. The thread |
| id is an unsigned integer in the range 0 to 31. |
| |
| If a fragment attribute binding matches "fragment.threadeqmask", the "x" |
| component is filled with a 32-bit unsigned integer bitfield in which the |
| bit equal to the current thread id is set. |
| |
| If a fragment attribute binding matches "fragment.threadltmask", the "x" |
| component is filled with a 32-bit unsigned integer bitfield in which bits |
| lower than the current thread id are set. |
| |
| If a fragment attribute binding matches "fragment.threadlemask", the "x" |
| component is filled with a 32-bit unsigned integer bitfield in which bits |
| lower or equal to the current thread id are set. |
| |
| If a fragment attribute binding matches "fragment.threadgtmask", the "x" |
| component is filled with a 32-bit unsigned integer bitfield in which bits |
| greater than the current thread id are set. |
| |
| If a fragment attribute binding matches "fragment.threadgemask", the "x" |
| component is filled with a 32-bit unsigned integer bitfield in which bits |
| greater or equal to the current thread id are set. |
| |
| If a fragment attribute binding matches "fragment.warpid", the "x" |
| component is filled with the warp id of the current thread. The warp id is |
| an unsigned integer, the range of this value is hw dependent. |
| |
| If a fragment attribute binding matches "fragment.smid", the "x" component |
| is filled with the SM id of the current thread. The SM id is an unsigned |
| integer, the range of this value is hw dependent. |
| |
| If a fragment attribute binding matches "fragment.helperthread", the "x" |
| component is an integer value equal to -1 when the current thread is a |
| helper thread and 0 otherwise. In implementations supporting this |
| extension, fragment program invocations may be arranged in SIMD thread |
| groups of 2x2 fragments called "quad". When a fragment program instruction |
| is executed on a quad, it's possible that some fragments within the quad |
| will execute the instruction even if they are not covered by the primitive. |
| Those threads are called helper threads. Their outputs will be discarded |
| and they will not execute global store instructions, but the intermediate |
| values they compute can still be used by thread group sharing instructions |
| or by fragment derivative instructions like DDX and DDY. |
| |
| (Add the table entries and relevant text describing the vertex program |
| attribute variable use to query thread states.) |
| |
| Vertex Attribute Binding Components Underlying State |
| ------------------------ ---------- ---------------------------- |
| ... |
| vertex.threadid (id,-,-,-) id of the current thread |
| vertex.threadeqmask (m,-,-,-) mask with the current thread |
| vertex.threadltmask (m,-,-,-) mask with lower thread |
| vertex.threadlemask (m,-,-,-) mask with lower or equal thread |
| vertex.threadgtmask (m,-,-,-) mask with greater thread |
| vertex.threadgemask (m,-,-,-) mask with greater or equal thread |
| vertex.warpid (id,-,-,-) warp id of the current thread |
| vertex.smid (id,-,-,-) SM id of the current thread |
| ... |
| |
| If a vertex attribute binding matches "vertex.threadid", the "x" component |
| is filled with the thread id of the current thread. The thread id is an |
| unsigned integer in the range 0 to 31. |
| |
| If a vertex attribute binding matches "vertex.threadeqmask", the "x" |
| component is filled with a 32-bit unsigned integer bitfield in which the |
| bit equal to the current thread id is set. |
| |
| If a vertex attribute binding matches "vertex.threadltmask", the "x" |
| component is filled with a 32-bit unsigned integer bitfield in which bits |
| lower than the current thread id are set. |
| |
| If a vertex attribute binding matches "vertex.threadlemask", the "x" |
| component is filled with a 32-bit unsigned integer bitfield in which bits |
| lower or equal to the current thread id are set. |
| |
| If a vertex attribute binding matches "vertex.threadgtmask", the "x" |
| component is filled with a 32-bit unsigned integer bitfield in which bits |
| greater than the current thread id are set. |
| |
| If a vertex attribute binding matches "vertex.threadgemask", the "x" |
| component is filled with a 32-bit unsigned integer bitfield in which bits |
| greater or equal to the current thread id are set. |
| |
| If a vertex attribute binding matches "vertex.warpid", the "x" component is |
| filled with the warp id of the current thread. The warp id is an unsigned |
| integer, the range of this value is hw dependent. |
| |
| If a vertex attribute binding matches "vertex.smid", the "x" component |
| is filled with the SM id of the current thread. The SM id is an unsigned |
| integer, the range of this value is hw dependent. |
| |
| |
| (Add the table entries and relevant text describing the geometry program |
| attribute variable use to query thread states.) |
| |
| Geometry Attribute Binding Components Underlying State |
| -------------------------- ---------- ---------------------------- |
| ... |
| primitive.threadid (id,-,-,-) id of the current thread |
| primitive.threadeqmask (m,-,-,-) mask with the current thread |
| primitive.threadltmask (m,-,-,-) mask with lower thread |
| primitive.threadlemask (m,-,-,-) mask with lower or equal thread |
| primitive.threadgtmask (m,-,-,-) mask with greater thread |
| primitive.threadgemask (m,-,-,-) mask with greater or equal thread |
| primitive.warpid (id,-,-,-) warp id of the current thread |
| primitive.smid (id,-,-,-) SM id of the current thread |
| ... |
| |
| If a geometry attribute binding matches "primitive.threadid", the "x" |
| component is filled with the thread id of the current thread. The thread |
| id is an unsigned integer in the range 0 to 31. |
| |
| If a geometry attribute binding matches "primitive.threadeqmask", the "x" |
| component is filled with a 32-bit unsigned integer bitfield in which the |
| bit equal to the current thread id is set. |
| |
| If a geometry attribute binding matches "primitive.threadltmask", the "x" |
| component is filled with a 32-bit unsigned integer bitfield in which bits |
| lower than the current thread id are set. |
| |
| If a geometry attribute binding matches "primitive.threadlemask", the "x" |
| component is filled with a 32-bit unsigned integer bitfield in which bits |
| lower or equal to the current thread id are set. |
| |
| If a geometry attribute binding matches "primitive.threadgtmask", the "x" |
| component is filled with a 32-bit unsigned integer bitfield in which bits |
| greater than the current thread id are set. |
| |
| If a geometry attribute binding matches "primitive.threadgemask", the "x" |
| component is filled with a 32-bit unsigned integer bitfield in which bits |
| greater or equal to the current thread id are set. |
| |
| If a geometry attribute binding matches "primitive.warpid", the "x" |
| component is filled with the warp id of the current thread. The warp id is |
| an unsigned integer, the range of this value is hw dependent. |
| |
| If a geometry attribute binding matches "primitive.smid", the "x" component |
| is filled with the SM id of the current thread. The SM id is an unsigned |
| integer, the range of this value is hw dependent. |
| |
| |
| (add the following subsection to section 2.X.3.3, Parameters) |
| |
| Thread Group Property Bindings |
| |
| Binding Components Underlying State |
| ----------------------------- ---------- ---------------------------- |
| state.thread.warpsize (x,-,-,-) total number of thread in a |
| warp |
| state.thread.warpspersm (x,-,-,-) maximum number of warp |
| executing on a SM |
| state.thread.smcount (x,-,-,-) number of SM on the GPU |
| |
| If a program parameter binding matches "state.thread.warpsize", the "x" |
| component of the program parameter variable is filled with an integer value |
| indicating the total number of thread in a warp. The "y", "z", and "w" |
| components are undefined. |
| |
| If a program parameter binding matches "state.thread.warpspersm", the "x" |
| component of the program parameter variable is filled with an integer value |
| indicating the maximum number of warp executing on a SM. The "y", "z", and |
| "w" components are undefined. |
| |
| If a program parameter binding matches "state.thread.smcount", the "x" |
| component of the program parameter variable is filled with an integer value |
| indicating the number of SM on the GPU. The "y", "z", and "w" components |
| are undefined. |
| |
| |
| Modify Section 2.X.4, Program Execution Environment |
| |
| (Add the table entries and relevant text describing the program |
| instruction to query thread conditions.) |
| |
| Instr- Modifiers |
| uction V F I C S H D Out Inputs Description |
| ------- -- - - - - - - --- -------- -------------------------------- |
| ... |
| TGBALLOT 50 X X X X - - F vu v query a boolean in thread group |
| ... |
| |
| |
| (Add the table entries and relevant text describing the fragment program |
| instructions to exchange data between threads.) |
| |
| Instr- Modifiers |
| uction V F I C S H D Out Inputs Description |
| ------- -- - - - - - - --- -------- -------------------------------- |
| ... |
| QSWZ0 50 X - - - - - F v v,v add fragment 0 in a quad |
| QSWZ1 50 X - - - - - F v v,v add fragment 1 in a quad |
| QSWZ2 50 X - - - - - F v v,v add fragment 2 in a quad |
| QSWZ3 50 X - - - - - F v v,v add fragment 3 in a quad |
| QSWZX 50 X - - - - - F v v,v add fragments horizontally |
| QSWZY 50 X - - - - - F v v,v add fragments vertically |
| ... |
| |
| |
| (Add to "Section 2.X.6, Program Options" of the NV_gpu_program4 extension, |
| as extended by NV_gpu_program5) |
| |
| + Shader thread group (NV_shader_thread_group) |
| |
| If a fragment program specifies the "NV_shader_thread_group" option, it |
| may use the "fragment.threadid", "fragment.threadeqmask", |
| "fragment.threadltmask", "fragment.threadlemask", "fragment.threadgtmask", |
| "fragment.threadgemask", "fragment.warpid", "fragment.smid", |
| "fragment.helperthread", "state.thread.warpsize", "state.thread.warpspersm" |
| and "state.thread.smcount" bindings. It may also use the "TGBALLOT", |
| "QSWZ0", "QSWZ1", "QSWZ2", "QSWZ3", "QSWZX" and "QSWZY" instructions. If |
| this option is not specified, a program will fail to compile if it uses |
| those instructions or bindings. |
| |
| If a vertex program specifies the "NV_shader_thread_group" option, it may |
| use the "vertex.threadid", "vertex.threadeqmask", "vertex.threadltmask", |
| "vertex.threadlemask", "vertex.threadgtmask", "vertex.threadgemask", |
| "vertex.warpid", "vertex.smid", "state.thread.warpsize", |
| "state.thread.warpspersm" and "state.thread.smcount" bindings. It may also |
| use the "TGBALLOT" instruction. If this option is not specified, a program |
| will fail to compile if it uses those instructions or bindings. |
| |
| If a geometry program specifies the "NV_shader_thread_group" option, it |
| may use the "primitive.threadid", "primitive.threadeqmask", |
| "primitive.threadltmask", "primitive.threadlemask", |
| "primitive.threadgtmask", "primitive.threadgemask", "primitive.warpid", |
| "primitive.smid", "state.thread.warpsize", "state.thread.warpspersm" and |
| "state.thread.smcount" bindings. It may also use the "TGBALLOT" |
| instruction. If this option is not specified, a program will fail to |
| compile if it uses those instructions or bindings. |
| |
| Section 2.X.8.Z, QSWZ0: add fragment 0 data to all fragment in a quad |
| |
| The QSWZ0 instruction produces a floating point result by adding the |
| first operand, a floating point value from fragment 0, to the second |
| operand, another floating point value from the current fragment. |
| |
| quadSwizzle0NV is the GLSL function that implements the same functionality |
| as the QSWZ0 assembly instruction. The section 8.3 of the OpenGL Shading |
| Language Specification has more detail about the implementation of |
| quadSwizzle0NV. This additional information also applies to QSWZ0. |
| |
| |
| Section 2.X.8.Z, QSWZ1: add fragment 1 data to all fragment in a quad |
| |
| The QSWZ1 instruction produces a floating point result by adding the |
| first operand, a floating point value from fragment 1, to the second |
| operand, another floating point value from the current fragment. |
| |
| quadSwizzle1NV is the GLSL function that implements the same functionality |
| as the QSWZ1 assembly instruction. The section 8.3 of the OpenGL Shading |
| Language Specification has more detail about the implementation of |
| quadSwizzle1NV. This additional information also applies to QSWZ1. |
| |
| |
| Section 2.X.8.Z, QSWZ2: add fragment 2 data to all fragment in a quad |
| |
| The QSWZ2 instruction produces a floating point result by adding the |
| first operand, a floating point value from fragment 2, to the second |
| operand, another floating point value from the current fragment. |
| |
| quadSwizzle2NV is the GLSL function that implements the same functionality |
| as the QSWZ2 assembly instruction. The section 8.3 of the OpenGL Shading |
| Language Specification has more detail about the implementation of |
| quadSwizzle2NV. This additional information also applies to QSWZ2. |
| |
| |
| Section 2.X.8.Z, QSWZ3: add fragment 3 data to all fragment in a quad |
| |
| The QSWZ3 instruction produces a floating point result by adding the |
| first operand, a floating point value from fragment 3, to the second |
| operand, another floating point value from the current fragment. |
| |
| quadSwizzle3NV is the GLSL function that implements the same functionality |
| as the QSWZ3 assembly instruction. The section 8.3 of the OpenGL Shading |
| Language Specification has more detail about the implementation of |
| quadSwizzle3NV. This additional information also applies to QSWZ3. |
| |
| |
| Section 2.X.8.Z, QSWZX: add fragments in a quad horizontally |
| |
| The QSWZX instruction produces a floating point result by adding the |
| first operand, a floating point value from the fragment neighbor in X to |
| the current fragment, to the second operand, another floating point value |
| from the current fragment. |
| |
| quadSwizzleXNV is the GLSL function that implements the same functionality |
| as the QSWZX assembly instruction. The section 8.3 of the OpenGL Shading |
| Language Specification has more detail about the implementation of |
| quadSwizzleXNV. This additional information also applies to QSWZX. |
| |
| |
| Section 2.X.8.Z, QSWZY: add fragments in a quad vertically |
| |
| The QSWZY instruction produces a floating point result by adding the |
| first operand, a floating point value from the fragment neighbor in Y to |
| the current fragment, to the second operand, another floating point value |
| from the current fragment. |
| |
| quadSwizzleYNV is the GLSL function that implements the same functionality |
| as the QSWZY assembly instruction. The section 8.3 of the OpenGL Shading |
| Language Specification has more detail about the implementation of |
| quadSwizzleYNV. This additional information also applies to QSWZY. |
| |
| |
| Section 2.X.8.Z, TGBALLOT: query a boolean condition over a thread group |
| |
| The TGBALLOT instruction produces a result vector by reading a vector |
| operand for each active thread in the current thread group and comparing |
| each component to zero. A result vector component contains an integer |
| bitmask value (described below) for which the bits in a component bitmask |
| are set if the value in the operand vector is non-zero for the |
| corresponding thread, and not set otherwise. |
| |
| Sometime when the instruction is in a conditional control flow block or |
| when it's not possible to completely fill a thread group, only a subset of |
| the threads in the thread group will be active and will execute the |
| TGBALLOT instruction. Each bit in the bitfield corresponding to inactive |
| threads will be set to 0. It's possible to query the active thread mask |
| by calling TGBALLOT with 1 as the first operand. |
| |
| tmp = VectorLoad(op0); |
| result = { 0, 0, 0, 0 }; |
| for (all active threads) { |
| if ([thread]tmp.x != 0) result.x |= 1 << thread; |
| if ([thread]tmp.y != 0) result.y |= 1 << thread; |
| if ([thread]tmp.z != 0) result.z |= 1 << thread; |
| if ([thread]tmp.w != 0) result.w |= 1 << thread; |
| } |
| |
| Dependencies on NV_tessellation_program5 |
| |
| If NV_tessellation_program5 is supported and |
| "OPTION NV_shader_thread_group" is specified in an assembly program, the |
| following edits are made to extend the assembly programming model |
| documented in the NV_gpu_program4 extension and extended by NV_gpu_program5 |
| and NV_tessellation_program5. |
| |
| If NV_tessellation_program5 is not supported, or if |
| "OPTION NV_shader_thread_group" is not specified in an assembly program, |
| the contents of this dependencies section should be ignored. |
| |
| |
| Modify Section 2.X.2, Program Grammar |
| |
| (add/change the following rules to the NV_gpu_program5 base grammars for |
| tessellation control programs) |
| |
| <attribBasic> ::= <primPrefix> "threadid" |
| | <primPrefix> "threadeqmask" |
| | <primPrefix> "threadltmask" |
| | <primPrefix> "threadlemask" |
| | <primPrefix> "threadgtmask" |
| | <primPrefix> "threadgemask" |
| | <primPrefix> "warpid" |
| | <primPrefix> "smid" |
| |
| (add/change the following rules to the NV_gpu_program5 base grammars for |
| tessellation evaluation programs) |
| |
| <attribBasic> ::= <primPrefix> "threadid" |
| | <primPrefix> "threadeqmask" |
| | <primPrefix> "threadltmask" |
| | <primPrefix> "threadlemask" |
| | <primPrefix> "threadgtmask" |
| | <primPrefix> "threadgemask" |
| | <primPrefix> "warpid" |
| | <primPrefix> "smid" |
| |
| |
| Modify Section 2.X.3.2 of the NV_tessellation_program5 specification, |
| Program Attribute Variables. |
| |
| (Add the table entries and relevant text describing the Tessellation |
| control and evaluation program attribute variables use to query thread |
| states.) |
| |
| |
| Primitive Binding Suffix Components Underlying State |
| -------------------------- ---------- ---------------------------- |
| ... |
| primitive.threadid (id,-,-,-) id of the current thread |
| primitive.threadeqmask (m,-,-,-) mask with the current thread |
| primitive.threadltmask (m,-,-,-) mask with lower thread |
| primitive.threadlemask (m,-,-,-) mask with lower or equal thread |
| primitive.threadgtmask (m,-,-,-) mask with greater thread |
| primitive.threadgemask (m,-,-,-) mask with greater or equal thread |
| primitive.warpid (id,-,-,-) warp id of the current thread |
| primitive.smid (id,-,-,-) SM id of the current thread |
| ... |
| |
| If a attribute binding matches "primitive.threadid", the "x" component is |
| filled with the thread id of the current thread. The thread id is an |
| unsigned integer in the range 0 to 31. |
| |
| If a attribute binding matches "primitive.threadeqmask", the "x" |
| component is filled with a 32-bit unsigned integer bitfield in which the |
| bit equal to the current thread id is set. |
| |
| If a attribute binding matches "primitive.threadltmask", the "x" |
| component is filled with a 32-bit unsigned integer bitfield in which bits |
| lower than the current thread id are set. |
| |
| If a attribute binding matches "primitive.threadlemask", the "x" |
| component is filled with a 32-bit unsigned integer bitfield in which bits |
| lower or equal to the current thread id are set. |
| |
| If a attribute binding matches "primitive.threadgtmask", the "x" |
| component is filled with a 32-bit unsigned integer bitfield in which bits |
| greater than the current thread id are set. |
| |
| If a attribute binding matches "primitive.threadgemask", the "x" |
| component is filled with a 32-bit unsigned integer bitfield in which bits |
| greater or equal to the current thread id are set. |
| |
| If a attribute binding matches "primitive.warpid", the "x" component is |
| filled with the warp id of the current thread. The warp id is an unsigned |
| integer, the range of this value is hw dependent. |
| |
| If a attribute binding matches "primitive.smid", the "x" component is |
| filled with the SM id of the current thread. The SM id is an unsigned |
| integer, the range of this value is hw dependent. |
| |
| (Add to "Section 2.X.6, Program Options" of the NV_gpu_program4 extension, |
| as extended by NV_gpu_program5 and NV_tessellation_program5) |
| |
| + Shader thread group (NV_shader_thread_group) |
| |
| If a program specifies the "NV_shader_thread_group" option, it may use |
| the "primitive.threadid", "primitive.threadeqmask", |
| "primitive.threadltmask", "primitive.threadlemask", |
| "primitive.threadgtmask", "primitive.threadgemask", "primitive.warpid", |
| "primitive.smid", "state.thread.warpsize", "state.thread.warpspersm" and |
| "state.thread.smcount" bindings. It may also use the "TGBALLOT" |
| instruction. If this option is not specified, a program will fail to |
| compile if it uses those bindings. |
| |
| |
| Dependencies on NV_compute_program5 |
| |
| If NV_compute_program5 is supported and "OPTION NV_shader_thread_group" is |
| specified in an assembly program, the following edits are made to extend |
| the assembly programming model documented in the NV_gpu_program4 extension |
| and extended by NV_gpu_program5 and NV_compute_program5. |
| |
| If NV_compute_program5 is not supported, or if |
| "OPTION NV_shader_thread_group" is not specified in an assembly program, |
| the contents of this dependencies section should be ignored. |
| |
| Section 2.X.2, Program Grammar |
| |
| (add the following rules to the grammar) |
| |
| <attribBasic> ::= "invocation" "." "threadid" |
| | "invocation" "." "threadeqmask" |
| | "invocation" "." "threadltmask" |
| | "invocation" "." "threadlemask" |
| | "invocation" "." "threadgtmask" |
| | "invocation" "." "threadgemask" |
| | "invocation" "." "warpid" |
| | "invocation" "." "smid" |
| |
| Modify Section 2.X.3.2 of the NV_compute_program5 specification, Program |
| Attribute Variables. |
| |
| (Add the table entries and relevant text describing the compute program |
| input variable use to query thread states.) |
| |
| Attribute Binding Components Underlying State |
| -------------------------- ---------- ---------------------------- |
| ... |
| invocation.threadid (id,-,-,-) id of the current thread |
| invocation.threadeqmask (m,-,-,-) mask with the current thread |
| invocation.threadltmask (m,-,-,-) mask with lower thread |
| invocation.threadlemask (m,-,-,-) mask with lower or equal thread |
| invocation.threadgtmask (m,-,-,-) mask with greater thread |
| invocation.threadgemask (m,-,-,-) mask with greater or equal thread |
| invocation.warpid (id,-,-,-) warp id of the current thread |
| invocation.smid (id,-,-,-) SM id of the current thread |
| ... |
| |
| If a compute attribute binding matches "invocation.threadid", the "x" |
| component is filled with the thread id of the current thread. The thread |
| id is an unsigned integer in the range 0 to 31. |
| |
| If a compute attribute binding matches "invocation.threadeqmask", the "x" |
| component is filled with a 32-bit unsigned integer bitfield in which the |
| bit equal to the current thread id is set. |
| |
| If a compute attribute binding matches "invocation.threadltmask", the "x" |
| component is filled with a 32-bit unsigned integer bitfield in which bits |
| lower than the current thread id are set. |
| |
| If a compute attribute binding matches "invocation.threadlemask", the "x" |
| component is filled with a 32-bit unsigned integer bitfield in which bits |
| lower or equal to the current thread id are set. |
| |
| If a compute attribute binding matches "invocation.threadgtmask", the "x" |
| component is filled with a 32-bit unsigned integer bitfield in which bits |
| greater than the current thread id are set. |
| |
| If a compute attribute binding matches "invocation.threadgemask", the "x" |
| component is filled with a 32-bit unsigned integer bitfield in which bits |
| greater or equal to the current thread id are set. |
| |
| If a compute attribute binding matches "invocation.warpid", the "x" |
| component is filled with the warp id of the current thread. The warp id is |
| an unsigned integer, the range of this value is hw dependent. |
| |
| If a compute attribute binding matches "invocation.smid", the "x" component |
| is filled with the SM id of the current thread. The SM id is an unsigned |
| integer, the range of this value is hw dependent. |
| |
| (Add to "Section 2.X.6, Program Options" of the NV_gpu_program4 extension, |
| as extended by NV_gpu_program5 and NV_compute_program5) |
| |
| |
| + Shader thread group (NV_shader_thread_group) |
| |
| If a program specifies the "NV_shader_thread_group" option, it may use the |
| "invocation.threadid", "invocation.threadeqmask", |
| "invocation.threadltmask", "invocation.threadlemask", |
| "invocation.threadgtmask", "invocation.threadgemask", "invocation.warpid", |
| "invocation.smid", "state.thread.warpsize", "state.thread.warpspersm" and |
| "state.thread.smcount" bindings. It may also use the "TGBALLOT" |
| instruction. If this option is not specified, a program will fail to |
| compile if it uses those bindings. |
| |
| |
| Errors |
| |
| None. |
| |
| New State |
| |
| None. |
| |
| New Implementation Dependent State |
| |
| Minimum |
| Get Value Type Get Command Value Description Sec. Attrib |
| -------------------------------- ---- --------------- ------- --------------------- ------ ------ |
| WARP_SIZE_NV Z+ GetIntegerv 1 total number of 2.X.3.3 - |
| thread in a warp. |
| |
| WARPS_PER_SM_NV Z+ GetIntegerv 1 maximum number of 2.X.3.3 - |
| warp executing on a |
| SM. |
| |
| SM_COUNT_NV Z+ GetIntegerv 1 number of SM on the 2.X.3.3 - |
| GPU. |
| |
| |
| Issues |
| |
| None |
| |
| |
| Revision History |
| |
| Rev. Date Author Changes |
| ---- -------- -------- ----------------------------------------- |
| 4 7/21/15 jbreton Update the layout of threads within a quad for |
| window and framebuffer object rendering. |
| 3 2/14/14 jbreton Rename the extension from NVX to NV. |
| 2 9/4/13 jbreton Add helperThread attribute binding. |
| 1 12/19/12 jbreton Internal revisions. |