| Name |
| |
| NV_gpu_multicast |
| |
| Name Strings |
| |
| GL_NV_gpu_multicast |
| |
| Contact |
| |
| Joshua Schnarr, NVIDIA Corporation (jschnarr 'at' nvidia.com) |
| Ingo Esser, NVIDIA Corporation (iesser 'at' nvidia.com) |
| |
| Contributors |
| |
| Christoph Kubisch, NVIDIA |
| Mark Kilgard, NVIDIA |
| Robert Menzel, NVIDIA |
| Kevin Lefebvre, NVIDIA |
| Ralf Biermann, NVIDIA |
| |
| Status |
| |
| Shipping in NVIDIA release 370.XX drivers and up. |
| |
| Version |
| |
| Last Modified Date: January 3, 2019 |
| Revision: 6 |
| |
| Number |
| |
| OpenGL Extension #494 |
| |
| Dependencies |
| |
| This extension is written against the OpenGL 4.5 specification |
| (Compatibility Profile), dated February 2, 2015. |
| |
| This extension requires ARB_copy_image. |
| |
| This extension interacts with ARB_sample_locations. |
| |
| This extension interacts with ARB_sparse_buffer. |
| |
| This extension requires EXT_direct_state_access. |
| |
| Overview |
| |
| This extension enables novel multi-GPU rendering techniques by providing application control |
| over a group of linked GPUs with identical hardware configuration. |
| |
| Multi-GPU rendering techniques fall into two categories: implicit and explicit. Existing |
| explicit approaches like WGL_NV_gpu_affinity have two main drawbacks: CPU overhead and |
| application complexity. An application must manage one context per GPU and multi-pump the API |
| stream. Implicit multi-GPU rendering techniques avoid these issues by broadcasting rendering |
| from one context to multiple GPUs. Common implicit approaches include alternate-frame |
| rendering (AFR), split-frame rendering (SFR) and multi-GPU anti-aliasing. They each have |
| drawbacks. AFR scales nicely but interacts poorly with inter-frame dependencies. SFR can |
| improve latency but has challenges with offscreen rendering and scaling of vertex processing. |
| With multi-GPU anti-aliasing, each GPU renders the same content with alternate sample |
| positions and the driver blends the result to improve quality. This also has issues with |
| offscreen rendering and can conflict with other anti-aliasing techniques. |
| |
| These issues with implicit multi-GPU rendering all have the same root cause: the driver lacks |
| adequate knowledge to accelerate every application. To resolve this, NV_gpu_multicast |
| provides fine-grained, explicit application control over multiple GPUs with a single context. |
| |
| Key points: |
| |
| - One context controls multiple GPUs. Every GPU in the linked group can access every object. |
| |
| - Rendering is broadcast. Each draw is repeated across all GPUs in the linked group. |
| |
| - Each GPU gets its own instance of all framebuffers, allowing individualized output for each |
| GPU. Input data can be customized for each GPU using buffers created with the storage flag, |
| PER_GPU_STORAGE_BIT_NV and a new API, MulticastBufferSubDataNV. |
| |
| - New interfaces provide mechanisms to transfer textures and buffers from one GPU to another. |
| |
| New Procedures and Functions |
| |
| void RenderGpuMaskNV(bitfield mask); |
| |
| void MulticastBufferSubDataNV( |
| bitfield gpuMask, uint buffer, |
| intptr offset, sizeiptr size, |
| const void *data); |
| |
| void MulticastCopyBufferSubDataNV( |
| uint readGpu, bitfield writeGpuMask, |
| uint readBuffer, uint writeBuffer, |
| intptr readOffset, intptr writeOffset, sizeiptr size); |
| |
| void MulticastCopyImageSubDataNV( |
| uint srcGpu, bitfield dstGpuMask, |
| uint srcName, enum srcTarget, |
| int srcLevel, |
| int srcX, int srcY, int srcZ, |
| uint dstName, enum dstTarget, |
| int dstLevel, |
| int dstX, int dstY, int dstZ, |
| sizei srcWidth, sizei srcHeight, sizei srcDepth); |
| |
| void MulticastBlitFramebufferNV(uint srcGpu, uint dstGpu, |
| int srcX0, int srcY0, int srcX1, int srcY1, |
| int dstX0, int dstY0, int dstX1, int dstY1, |
| bitfield mask, enum filter); |
| |
| void MulticastFramebufferSampleLocationsfvNV(uint gpu, uint framebuffer, uint start, |
| sizei count, const float *v); |
| |
| void MulticastBarrierNV(void); |
| |
| void MulticastWaitSyncNV(uint signalGpu, bitfield waitGpuMask); |
| |
| void MulticastGetQueryObjectivNV(uint gpu, uint id, enum pname, int *params); |
| void MulticastGetQueryObjectuivNV(uint gpu, uint id, enum pname, uint *params); |
| void MulticastGetQueryObjecti64vNV(uint gpu, uint id, enum pname, int64 *params); |
| void MulticastGetQueryObjectui64vNV(uint gpu, uint id, enum pname, uint64 *params); |
| |
| New Tokens |
| |
| Accepted in the <flags> parameter of BufferStorage and NamedBufferStorageEXT: |
| |
| PER_GPU_STORAGE_BIT_NV 0x0800 |
| |
| Accepted by the <pname> parameter of GetBooleanv, GetIntegerv, GetInteger64v, GetFloatv, and |
| GetDoublev: |
| |
| MULTICAST_GPUS_NV 0x92BA |
| RENDER_GPU_MASK_NV 0x9558 |
| |
| Accepted as a value for <pname> for the TexParameter{if}, TexParameter{if}v, |
| TextureParameter{if}, TextureParameter{if}v, MultiTexParameter{if}EXT and |
| MultiTexParameter{if}vEXT commands and for the <value> parameter of GetTexParameter{if}v, |
| GetTextureParameter{if}vEXT and GetMultiTexParameter{if}vEXT: |
| |
| PER_GPU_STORAGE_NV 0x9548 |
| |
| Accepted by the <pname> parameter of GetMultisamplefv: |
| |
| MULTICAST_PROGRAMMABLE_SAMPLE_LOCATION_NV 0x9549 |
| |
| Additions to the OpenGL 4.5 Specification (Compatibility Profile) |
| |
| (Add a new chapter after chapter 19 "Compute Shaders") |
| |
| 20 Multicast Rendering |
| |
| Some implementations support multiple linked GPUs driven by a single context. Often the |
| distribution of work to individual GPUs is managed by the GL without client knowledge. This |
| chapter specifies commands for explicitly distributing work across GPUs in a linked group. |
| Rendering can be enabled or disabled for specific GPUs. Draw commands are multicast, or |
| repeated across all enabled GPUs. Objects are shared by all GPUs, however each GPU has its |
| own instance (copy) of many resources, including framebuffers. When each GPU has its own |
| instance of a resource, it is considered to have per-GPU storage. When all GPUs share a |
| single instance of a resource, this is considered GPU-shared storage. |
| |
| The mechanism for linking GPUs is implementation specific, as is the mechanism for enabling |
| multicast rendering support (if necessary). The number of GPUs usable for multicast rendering |
| by a context can be queried by calling GetIntegerv with the symbolic constant |
| MULTICAST_GPUS_NV. This number is constant for the lifetime of a context. Individual GPUs |
| are identified using zero-based indices in the range [0, n-1], where n is the number of |
| multicast GPUs. GPUs are also identified by bitmasks of the form 2^i, where i is the GPU |
| index. A set of GPUs is specified by the union of masks for each GPU in the set. |
| |
| 20.1 Controlling Individual GPUs |
| |
| Render commands are restricted to a specific set of GPUs with |
| |
| void RenderGpuMaskNV(bitfield mask); |
| |
| The following errors apply to RenderGpuMaskNV: |
| |
| INVALID_OPERATION is generated |
| * if <mask> is zero, |
| * if <mask> is not zero and <mask> is greater than or equal to 2^n, where n is equal |
| to MULTICAST_GPUS_NV, |
| * if issued between BeginConditionalRender and the corresponding EndConditionalRender. |
| |
| If the command does not generate an error, RENDER_GPU_MASK_NV is set to <mask>. The default |
| value of RENDER_GPU_MASK_NV is (2^n)-1. |
| |
| Render commands are skipped for a GPU that is not present in RENDER_GPU_MASK_NV. For example: |
| draw calls, clears, compute dispatches, and copies or pixel path operations that write to a |
| framebuffer (e.g. DrawPixels, BlitFramebuffer). For a full list of render commands see |
| section 2.4 (page 26). MulticastBlitFramebufferNV is an exception to this policy: while it is |
| a rendering command, it has its own source and destinations mask. Note that buffer and |
| textures updates are not affected by RENDER_GPU_MASK_NV. |
| |
| 20.2 Multi-GPU Buffer Storage |
| |
| Like other resources, buffer objects can have two types of storage, per-GPU storage or |
| GPU-shared storage. Per-GPU storage can be explicitly requested using the |
| PER_GPU_STORAGE_BIT_NV flag with BufferStorage/NamedBufferStorageEXT. If this flag is not |
| set, the type of storage used is undefined. The implementation may use either type and |
| transition between them at any time. Client reads of a buffer with per-GPU storage may source |
| from any GPU. |
| |
| The following rules apply to buffer objects with per-GPU storage: |
| |
| When mapped updates apply to all GPUs (only WRITE_ONLY access is supported). |
| When bound to UNIFORM_BUFFER, client uniform updates apply to all GPUs. |
| When used as the write buffer for CopyBufferSubData or CopyNamedBufferSubData, writes apply |
| to all GPUs. |
| |
| The following commands affect storage on all GPUs, even if the buffer object has per-GPU |
| storage: |
| |
| BufferSubData, NamedBufferSubData, ClearBufferSubData, and ClearNamedBufferData |
| |
| An INVALID_VALUE error is generated if BufferStorage/NamedBufferStorageEXT is called with |
| PER_GPU_STORAGE_BIT_NV set with MAP_READ_BIT or SPARSE_STORAGE_BIT_ARB. |
| An INVALID_OPERATION is generated if a buffer with PER_GPU_STORAGE_BIT_NV is bound to |
| UNIFORM_BUFFER and GetUniformfv, GetUniformiv, GetUniformuiv or GetUniformdv is called. |
| |
| To modify buffer object data on one or more GPUs, the client may use the command |
| |
| void MulticastBufferSubDataNV( |
| bitfield gpuMask, uint buffer, |
| intptr offset, sizeiptr size, |
| const void *data); |
| |
| This command operates similarly to NamedBufferSubData, except that it updates the per-GPU |
| buffer data on the set of GPUs defined by <gpuMask>. If <buffer> has GPU-shared storage, |
| <gpuMask> is ignored and the shared instance of the buffer is updated. |
| |
| An INVALID_VALUE error is generated if <gpuMask> is zero or is greater than or equal to 2^n, |
| where n is equal to MULTICAST_GPUS_NV. |
| An INVALID_OPERATION error is generated if <buffer> is not the name of an existing buffer |
| object. |
| An INVALID_VALUE error is generated if <offset> or <size> is negative, or if <offset> + <size> |
| is greater than the value of BUFFER_SIZE for the buffer object. |
| An INVALID_OPERATION error is generated if any part of the specified buffer range is mapped |
| with MapBufferRange or MapBuffer (see section 6.3), unless it was mapped with |
| MAP_PERSISTENT_BIT set in the MapBufferRange access flags. |
| An INVALID_OPERATION error is generated if the BUFFER_IMMUTABLE_STORAGE flag of the buffer |
| object is TRUE and the value of BUFFER_STORAGE_FLAGS for the buffer does not have the |
| DYNAMIC_STORAGE_BIT set. |
| |
| To copy between buffers created with PER_GPU_STORAGE_BIT_NV, the client may use the command |
| |
| void MulticastCopyBufferSubDataNV( |
| uint readGpu, bitfield writeGpuMask, |
| uint readBuffer, uint writeBuffer, |
| intptr readOffset, intptr writeOffset, sizeiptr size); |
| |
| This command operates similarly to CopyNamedBufferSubData, with the exception that it operates |
| on per-GPU instances of the buffer object. The read GPU index is specified by <readGpu> and |
| the set of write GPUs is specified by the mask in <writeGpuMask>. The following errors apply |
| to MulticastCopyBufferSubDataNV: |
| |
| An INVALID_OPERATION error is generated if <readBuffer> or <writeBuffer> is not the name of an |
| existing buffer object. |
| An INVALID_VALUE error is generated if any of <readOffset>, <writeOffset>, or <size> are |
| negative, if <readOffset> + <size> exceeds the size of the source buffer object, or if |
| <writeOffset> + <size> exceeds the size of the destination buffer object. |
| An INVALID_OPERATION error is generated if either the source or destination buffer objects is |
| mapped, unless they were mapped with MAP_PERSISTENT_BIT set in the Map*BufferRange access |
| flags. |
| An INVALID_OPERATION error is generated if the value of BUFFER_STORAGE_FLAGS for <readBuffer> |
| or <writeBuffer> does not have PER_GPU_STORAGE_BIT_NV set. |
| An INVALID_VALUE error is generated if <readGpu> is greater than or equal to |
| MULTICAST_GPUS_NV. |
| An INVALID_OPERATION error is generated if <writeGpuMask> is zero. An INVALID_VALUE error is |
| generated if <writeGpuMask> is not zero and <writeGpuMask> is greater than or equal to 2^n, |
| where n is equal to MULTICAST_GPUS_NV. |
| An INVALID_VALUE error is generated if the source and destination are the same buffer object, |
| <readGpu> is present in <writeGpuMask>, and the ranges [<readOffset>; <readOffset> + <size>) |
| and [<writeOffset>; <writeOffset> + <size>) overlap. |
| |
| 20.3 Multi-GPU Framebuffers and Textures |
| |
| All buffers in the default framebuffer as well as renderbuffers receive per-GPU storage. By |
| default, storage for textures is undefined: it may be per-GPU or GPU-shared and can transition |
| between the types at any time. Per-GPU storage can be specified via |
| [Multi]Tex[ture]Parameter{if}[v] with PER_GPU_STORAGE_NV for the <pname> argument and TRUE for |
| the value. For this storage parameter to take effect, it must be specified after the texture |
| object is created and before the texture contents are defined by TexImage*, TexStorage* or |
| TextureStorage*. |
| |
| 20.3.1 Copying Image Data Between GPUs |
| |
| To copy texel data between GPUs, the client may use the command: |
| |
| void MulticastCopyImageSubDataNV( |
| uint srcGpu, bitfield dstGpuMask, |
| uint srcName, enum srcTarget, |
| int srcLevel, |
| int srcX, int srcY, int srcZ, |
| uint dstName, enum dstTarget, |
| int dstLevel, |
| int dstX, int dstY, int dstZ, |
| sizei srcWidth, sizei srcHeight, sizei srcDepth); |
| |
| This command operates equivalently to CopyImageSubData, except that it takes a source GPU and |
| a destination GPU set defined by <srcGpu> and <dstGpuMask> (respectively). Texel data is |
| copied from the source GPU to all destination GPUs. The following errors apply to |
| MulticastCopyImageSubDataNV: |
| |
| INVALID_ENUM is generated |
| * if either <srcTarget> or <dstTarget> |
| - is not RENDERBUFFER or a valid non-proxy texture target |
| - is TEXTURE_BUFFER, or |
| - is one of the cubemap face selectors described in table 3.17, |
| * if the target does not match the type of the object. |
| |
| INVALID_OPERATION is generated |
| * if either object is a texture and the texture is not complete, |
| * if the source and destination formats are not compatible, |
| * if the source and destination number of samples do not match, |
| * if one image is compressed and the other is uncompressed and the |
| block size of compressed image is not equal to the texel size |
| of the compressed image. |
| |
| INVALID_VALUE is generated |
| * if <srcGpu> is greater than or equal to MULTICAST_GPUS_NV, |
| * if <dstGpuMask> is zero, |
| * if <dstGpuMask> is greater than or equal to 2^n, where n is equal to |
| MULTICAST_GPUS_NV, |
| * if either <srcName> or <dstName> does not correspond to a valid |
| renderbuffer or texture object according to the corresponding |
| target parameter, or |
| * if the specified level is not a valid level for the image, or |
| * if the dimensions of the either subregion exceeds the boundaries |
| of the corresponding image object, or |
| * if the image format is compressed and the dimensions of the |
| subregion fail to meet the alignment constraints of the format. |
| |
| To copy pixel values from one GPU to another use the following command: |
| |
| void MulticastBlitFramebufferNV(uint srcGpu, uint dstGpu, |
| int srcX0, int srcY0, int srcX1, int srcY1, |
| int dstX0, int dstY0, int dstX1, int dstY1, |
| bitfield mask, enum filter); |
| |
| This command operates equivalently to BlitNamedFramebuffer except that it takes a source GPU |
| and a destination GPU defined by <srcGpu> and <dstGpu> (respectively). Pixel values are |
| copied from the read framebuffer on the source GPU to the draw framebuffer on the destination |
| GPU. |
| |
| In addition to the errors generated by BlitNamedFramebuffer (see listing starting on page |
| 634), calling MulticastBlitFramebufferNV will generate INVALID_VALUE if <srcGpu> or <dstGpu> |
| is greater than or equal to MULTICAST_GPUS_NV. |
| |
| 20.3.2 Per-GPU Sample Locations |
| |
| Programmable sample locations can be customized for each GPU and framebuffer using the |
| following command: |
| |
| void MulticastFramebufferSampleLocationsfvNV(uint gpu, uint framebuffer, uint start, |
| sizei count, const float *v); |
| |
| An INVALID_OPERATION error is generated by MulticastFramebufferSampleLocationsfvNV if |
| <framebuffer> is not the name of an existing framebuffer object. |
| |
| INVALID_VALUE is generated if the sum of <start> and <count> is greater than |
| PROGRAMMABLE_SAMPLE_LOCATION_TABLE_SIZE_ARB. |
| |
| An INVALID_VALUE error is generated if <gpu> is greater than or equal to MULTICAST_GPUS_NV. |
| |
| This is equivalent to FramebufferSampleLocationsfvARB except that it sets |
| MULTICAST_PROGRAMMABLE_SAMPLE_LOCATION_NV at the appropriate offset for the specified GPU. |
| Just as with FramebufferSampleLocationsfvARB, FRAMEBUFFER_PROGRAMMABLE_SAMPLE_LOCATIONS_ARB |
| must be enabled for these sample locations to take effect. FramebufferSampleLocationsfvARB |
| and NamedFramebufferSampleLocationsfvARB also set MULTICAST_PROGRAMMABLE_SAMPLE_LOCATION_NV |
| but for the specified sample across all multicast GPUs. If <gpu> is 0, |
| MulticastFramebufferSampleLocationsfvNV updates PROGRAMMABLE_SAMPLE_LOCATION_ARB in addition |
| to MULTICAST_PROGRAMMABLE_SAMPLE_LOCATION_NV. |
| |
| The programmed sample locations can be retrieved using GetMultisamplefv with <pname> set to |
| MULTICAST_PROGRAMMABLE_SAMPLE_LOCATION_NV and indices calculated as follows: |
| |
| index_x = gpu * PROGRAMMABLE_SAMPLE_LOCATION_TABLE_SIZE_ARB + 2 * sample_i; |
| index_y = gpu * PROGRAMMABLE_SAMPLE_LOCATION_TABLE_SIZE_ARB + 2 * sample_i + 1; |
| |
| 20.4 Interactions with Other Copy Functions |
| |
| Many existing commands can be used to copy between resources with GPU-shared, per-GPU or |
| undefined storage. For example: ReadPixels, GetBufferSubData or TexImage2D with a pixel |
| unpack buffer. The following table defines how the storage of the resource influences the |
| behavior of these copies. |
| |
| Table 20.1 Behavior of Copy Commands with Multi-GPU Storage |
| |
| Source Destination Behavior |
| ---------- ----------- ----------------------------------------------------------------------- |
| GPU-shared GPU-shared There is just one source and one destination. Copy from source to |
| destination. |
| GPU-shared per-GPU There is a single source. Copy it to the destination on all GPUs. |
| GPU-shared undefined Either of the above behaviors for a GPU-shared source may apply. |
| |
| per-GPU GPU-shared Copy from the GPU with the lowest index set in RENDER_GPU_MASK_NV to |
| to the shared destination. |
| per-GPU per-GPU Implementations are encouraged to copy from source to destination |
| separately on each GPU. This is not required. If and when this is not |
| feasible, the copy should source from the GPU with the lowest index set |
| in RENDER_GPU_MASK_NV. |
| per-GPU undefined Either of the above behaviors for a per-GPU source may apply. |
| |
| undefined GPU-shared Either of the above behaviors for a GPU-shared destination may apply. |
| undefined per-GPU Either of the above behaviors for a per-GPU destination may apply. |
| undefined undefined Any of the above behaviors may apply. |
| |
| 20.5 Multi-GPU Synchronization |
| |
| MulticastCopyImageSubDataNV and MulticastCopyBufferSubDataNV each provide implicit |
| synchronization with previous work on the source GPU. MulticastBlitFramebufferNV is |
| different, providing implicit synchronization with previous work on the destination GPU. |
| In both cases, synchronization of the copies can be achieved with calls to the barrier |
| command: |
| |
| void MulticastBarrierNV(void); |
| |
| This is called to block all GPUs until all previous commands have been completed by all GPUs, |
| and all writes have landed. To guarantee consistency, synchronization must be placed between |
| any two accesses by multiple GPUs to the same memory when at least one of the accesses is a |
| write. This includes accesses to both the source and the destination. The safest approach is |
| to call MulticastBarrierNV immediately before and after each copy that involves multiple GPUs. |
| |
| GPU writes and reads to/from GPU-shared locations require synchronization as well. GPU writes |
| such as transform feedback, shader image store, CopyTexImage, CopyBufferSubData are not |
| automatically synchronized with writes by other GPUs. Neither are GPU reads such as texture |
| fetches, shader image loads, CopyTexImage, etc. synchronized with writes by other GPUs. |
| Existing barriers such as TextureBarrier and MemoryBarrier only provide consistency guarantees |
| for rendering, writes and reads on a single GPU. |
| |
| In some cases it may be desirable to have one or more GPUs wait for an operation to complete |
| on another GPU without synchronizing all GPUs with MulticastBarrierNV. This can be performed |
| with the following command: |
| |
| void MulticastWaitSyncNV(uint signalGpu, bitfield waitGpuMask); |
| |
| INVALID_VALUE is generated |
| * if <signalGpu> is greater than or equal to MULTICAST_GPUS_NV, |
| * if <waitGpuMask> is zero, |
| * if <waitGpuMask> is greater than or equal to 2^n, where n is equal to |
| MULTICAST_GPUS_NV, or |
| * if <signalGpu> is present in <waitGpuMask>. |
| |
| MulticastWaitSyncNV provides the same consistency guarantees as MulticastBarrierNV but only |
| between the GPUs specified by <signalGpu> and <waitGpuMask> in a single direction. It forces |
| the GPUs specified by waitGpuMask to wait until the GPU specified by <signalGpu> has completed |
| all previous commands and writes associated with those commands. |
| |
| 20.6 Multi-GPU Queries |
| |
| Queries are performed across all multicast GPUs. Each query object stores independent result |
| values for each GPU. The result value for a specific GPU can be queried using one of the |
| following commands: |
| |
| void MulticastGetQueryObjectivNV(uint gpu, uint id, enum pname, int *params); |
| void MulticastGetQueryObjectuivNV(uint gpu, uint id, enum pname, uint *params); |
| void MulticastGetQueryObjecti64vNV(uint gpu, uint id, enum pname, int64 *params); |
| void MulticastGetQueryObjectui64vNV(uint gpu, uint id, enum pname, uint64 *params); |
| |
| The behavior of these commands matches the GetQueryObject* equivalent commands, except they |
| return the result value for the specified GPU. A query may be available on one GPU but not on |
| another, so it may be necessary to check QUERY_RESULT_AVAILABLE for each GPU. GetQueryObject* |
| return query results and availability for GPU 0 only. |
| |
| In addition to the errors generated by GetQueryObject* (see the listing in section 4.2 on page |
| 49), calling MulticastGetQueryObject* will generate INVALID_VALUE if <gpu> is greater than or |
| equal to MULTICAST_GPUS_NV. |
| |
| Additions to Chapter 8 of the OpenGL 4.5 (Compatibility Profile) Specification |
| (Textures and Samplers) |
| |
| Modify Section 8.10 (Texture Parameters) |
| |
| Insert the following paragraph before Table 8.25 (Texture parameters and their values): |
| |
| If <pname> is PER_GPU_STORAGE_NV, then the state is stored in the texture, but only takes |
| effect the next time storage is allocated for a texture using TexImage*, TexStorage* or |
| TextureStorage*. If the value of TEXTURE_IMMUTABLE_FORMAT is TRUE, then PER_GPU_STORAGE_NV |
| cannot be changed and an error is generated. |
| |
| Additions to Table 8.26 Texture parameters and their values |
| |
| Name Type Legal values |
| ------------------ ------- ------------ |
| PER_GPU_STORAGE_NV boolean TRUE, FALSE |
| |
| Additions to Chapter 10 of the OpenGL 4.5 (Compatibility Profile) Specification |
| (Vertex Specification and Drawing Commands) |
| |
| Modify Section 10.9 (Conditional Rendering) |
| |
| Replace the following text: |
| |
| If the result (SAMPLES_PASSED) of the query is zero, or if the result (ANY_SAMPLES_PASSED |
| or ANY_SAMPLES_- PASSED_CONSERVATIVE) is FALSE, all rendering commands described in |
| section 2.4 are discarded and have no effect when issued between BeginConditional- Render |
| and the corresponding EndConditionalRender |
| |
| with this text: |
| |
| For each active render GPU, if the result (SAMPLES_PASSED) of the query on that GPU is |
| zero, or if the result (ANY_SAMPLES_PASSED or ANY_SAMPLES_- PASSED_CONSERVATIVE) is FALSE, |
| all rendering commands described in section 2.4 are discarded by this GPU and have no |
| effect when issued between BeginConditional- Render and the corresponding |
| EndConditionalRender |
| |
| Similarly replace the following: |
| |
| If the result (SAMPLES_PASSED) of the query is non-zero, or if the result |
| (ANY_SAMPLES_PASSED or ANY_SAMPLES_PASSED_- CONSERVATIVE) is TRUE, such commands are not |
| discarded. |
| |
| with this: |
| |
| For each active render GPU, if the result (SAMPLES_PASSED) of the query on that GPU is |
| non-zero, or if the result (ANY_SAMPLES_PASSED or ANY_SAMPLES_PASSED_- CONSERVATIVE) is |
| TRUE, such commands are not discarded. |
| |
| Finally, replace all instances of "the GL" with "each active render GPU". |
| |
| Additions to Chapter 14 of the OpenGL 4.5 (Compatibility Profile) Specification |
| (Fixed-Function Primitive Assembly and Rasterization) |
| |
| Modify Section 14.3.1 (Multisampling) |
| |
| Replace the following text: |
| |
| The location for sample <i> is taken from v[2*(i-start)] and v[2*(i-start)+1]. |
| |
| with the following: |
| |
| These commands set the sample locations for all multicast GPUs in |
| MULTICAST_FRAMEBUFFER_PROGRAMMABLE_SAMPLE_LOCATIONS_NV. The location for sample <i> on |
| gpu <g> is taken from v[g*N+2*(i-start)] and v[g*N+2*(i-start)+1]. |
| |
| Replace the following error generated by GetMultisamplefv: |
| |
| An INVALID_ENUM error is generated if <pname> is not SAMPLE_LOCATION_ARB or |
| PROGRAMMABLE_SAMPLE_LOCATION_ARB. |
| |
| with the following: |
| |
| An INVALID_ENUM error is generated if <pname> is not SAMPLE_LOCATION_ARB, |
| PROGRAMMABLE_SAMPLE_LOCATION_ARB or MULTICAST_PROGRAMMABLE_SAMPLE_LOCATION_NV. |
| |
| Add the following to the list of errors generated by GetMultisamplefv: |
| |
| An INVALID_VALUE error is generated if <pname> is |
| MULTICAST_PROGRAMMABLE_SAMPLE_LOCATION_ARB and <index> is greater than or equal to the |
| value of PROGRAMMABLE_SAMPLE_LOCATION_TABLE_SIZE_ARB multiplied by the value of |
| MULTICAST_GPUS_NV. |
| |
| Replace the following pseudocode (in both locations): |
| |
| float *table = FRAMEBUFFER_PROGRAMMABLE_SAMPLE_LOCATIONS_ARB; |
| sample_location.xy = (table[2*sample_i], table[2*sample_i+1]); |
| |
| with the following: |
| |
| float *table = MULTICAST_FRAMEBUFFER_PROGRAMMABLE_SAMPLE_LOCATIONS_NV; |
| table += PROGRAMMABLE_SAMPLE_LOCATION_TABLE_SIZE_ARB * gpu; |
| sample_location.xy = (table[2*sample_i], table[2*sample_i+1]); |
| |
| Additions to the WGL/GLX/EGL/AGL Specifications |
| |
| None |
| |
| Dependencies on ARB_sample_locations |
| |
| If ARB_sample_locations is not supported, section 20.3.2 and any references to |
| MulticastFramebufferSampleLocationsfvNV and MULTICAST_PROGRAMMABLE_SAMPLE_LOCATION_NV should |
| be removed. The modifications to Section 14.3.1 (Multisampling) should also be removed. |
| |
| Dependencies on ARB_sparse_buffer |
| |
| If ARB_sparse_buffer is not supported, any reference to SPARSE_STORAGE_BIT_ARB should be |
| removed. |
| |
| Errors |
| |
| Relaxation of INVALID_ENUM errors |
| --------------------------------- |
| GetBooleanv, GetIntegerv, GetInteger64v, GetFloatv, and GetDoublev now accept new tokens as |
| described in the "New Tokens" section. |
| |
| New State |
| |
| Additions to Table 23.4 Rasterization |
| Initial |
| Get Value Type Get Command Value Description Sec. Attribute |
| -------------------------- ------ ----------- ----- ----------------------- ---- --------- |
| RENDER_GPU_MASK_NV Z+ GetIntegerv * Mask of GPUs that have 20.1 - |
| writes enabled |
| * See section 20.1 |
| |
| Additions to Table 23.19 Textures (state per texture object) |
| |
| Initial |
| Get Value Type Get Command Value Description Sec. |
| --------- ---- ----------- ------- ----------- ---- |
| PER_GPU_STORAGE_NV B GetTexParameter FALSE Per-GPU storage requested 20.3 |
| |
| |
| Additions to Table 23.30 Framebuffer (state per framebuffer object) |
| |
| Get Value Get Command Type Initial Value Description Sec. Attribute |
| --------- ----------- ---- ------------- ----------- ---- --------- |
| MULTICAST_PROGRAMMABLE_- GetMultisamplefv * (0.5,0.5) Programmable sample 20.3.2 - |
| SAMPLE_LOCATION_NV |
| |
| * The type here is "2* x n x 2 x R[0,1]" which is is equivalent to PROGRAMMABLE_SAMPLE_LOCATION_ARB |
| but with samples locations for all multicast GPUs (one after the other). |
| |
| New Implementation Dependent State |
| |
| Add to Table 23.82, Implementation-Dependent Values, p. 784 |
| |
| Minimum |
| Get Value Type Get Command Value Description Sec. Attribute |
| ---------------------------- ------ ------------- ----- ---------------------- ---- --------- |
| MULTICAST_GPUS_NV Z+ GetIntegerv 1 Number of linked GPUs 20.0 - |
| usable for multicast |
| |
| Backwards Compatibility |
| |
| This extension replaces NVX_linked_gpu_multicast. The enumerant values for MULTICAST_GPUS_NV |
| and PER_GPU_STORAGE_BIT_NV match those of MAX_LGPU_GPUS_NVX and LGPU_SEPARATE_STORAGE_BIT_NVX |
| (respectively). MulticastBufferSubDataNV, MulticastCopyImageSubDataNV and MulticastBarrierNV |
| behave analog to LGPUNamedBufferSubDataNVX, LGPUCopyImageSubDataNVX and LGPUInterlockNVX |
| (respectively). |
| |
| Sample Code |
| |
| Binocular stereo rendering example using NV_gpu_multicast with single GPU fallback: |
| |
| struct ViewData { |
| GLint viewport_index; |
| GLfloat mvp[16]; |
| GLfloat modelview[16]; |
| }; |
| ViewData leftViewData = { 0, {...}, {...} }; |
| ViewData rightViewData = { 1, {...}, {...} }; |
| |
| GLuint ubo[2]; |
| glCreateBuffers(2, &ubo[0]); |
| |
| if (has_NV_gpu_multicast) { |
| glNamedBufferStorage(ubo[0], size, NULL, GL_PER_GPU_STORAGE_BIT_NV | GL_DYNAMIC_STORAGE_BIT); |
| glMulticastBufferSubDataNV(0x1, ubo[0], 0, size, &leftViewData); |
| glMulticastBufferSubDataNV(0x2, ubo[0], 0, size, &rightViewData); |
| } else { |
| glNamedBufferStorage(ubo[0], size, &leftViewData, 0); |
| glNamedBufferStorage(ubo[1], size, &rightViewData, 0); |
| } |
| |
| glViewportIndexedf(0, 0, 0, 640, 480); // left viewport |
| glViewportIndexedf(1, 640, 0, 640, 480); // right viewport |
| // Vertex shader sets gl_ViewportIndex according to viewport_index in UBO |
| |
| glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT); |
| |
| if (has_NV_gpu_multicast) { |
| glBindBufferBase(GL_UNIFORM_BUFFER, 0, ubo[0]); |
| drawScene(); |
| // Make GPU 1 wait for glClear above to complete on GPU 0 |
| glMulticastWaitSyncNV(0, 0x2); |
| // Copy right viewport from GPU 1 to GPU 0 |
| glMulticastCopyImageSubDataNV(1, 0x1, |
| renderBuffer, GL_RENDERBUFFER, 0, 640, 0, 0, |
| renderBuffer, GL_RENDERBUFFER, 0, 640, 0, 0, |
| 640, 480, 1); |
| // Make GPU 0 wait for GPU 1 copy to GPU 0 |
| glMulticastWaitSyncNV(1, 0x1); |
| } else { |
| glBindBufferBase(GL_UNIFORM_BUFFER, 0, ubo[0]); |
| drawScene(); |
| glBindBufferBase(GL_UNIFORM_BUFFER, 0, ubo[1]); |
| drawScene(); |
| } |
| // Both viewports are now present in GPU 0's renderbuffer |
| |
| Issues |
| |
| (1) Should we provide explicit inter-GPU synchronization API? Will this make the implementation |
| easier or harder for the driver and applications? |
| |
| RESOLVED. Yes. A naive implementation of implicit synchronization would simply synchronize the |
| GPUs before and after each copy. Smart implicit synchronization would have to track all APIs |
| that can modify buffers and textures, creating an excessive burden for driver implementation |
| and maintenance. An application can track dependencies more easily and outperform a naive |
| driver implementation using explicit synchronization. |
| |
| (2) How does this extension interact with queries (e.g. occlusion queries)? |
| |
| RESOLVED. Queries are performed separately on each GPU. The standard GetQueryObject* APIs |
| return query results for GPU 0 only. However GetQueryBufferObject* can be used to retrieve |
| query results for all GPUs through a buffer with separate storage (PER_GPU_STORAGE_BIT_NV). |
| |
| (3) Are copy operations controlled by the render mask? |
| |
| RESOLVED. Copies which write to the framebuffer are considered render commands and implicitly |
| controlled by the render mask. Copies between textures and buffers are not considered render |
| commands so they are not influenced by the mask. If masked copies are desired, use |
| MulticastCopyImageSubDataNV, MulticastCopyBufferSubDataNV or MulticastBlitFramebufferNV. |
| These commands explicitly specify the GPU source and destination and are not influenced by the |
| render mask. |
| |
| (4) What happens if the MulticastCopyBufferSubDataNV source and destination buffer is the same? |
| |
| RESOLVED. When the source and destination involve the same GPU, MulticastCopyBufferSubDataNV |
| matches the behavior of CopyBufferSubData: overlapped copies are not allowed and an |
| INVALID_VALUE error results. When the source and destination do not involve the same GPU, |
| overlapping copies are allowed and no error is generated. |
| |
| (5) How does this extension interact with CopyTexImage2D? |
| |
| RESOLVED. The behavior depends on the storage type of the target. See section 20.4. Since |
| CopyTexImage* sources from the framebuffer, the source always has per-GPU storage. |
| |
| (6) Should we provide a mechanism to modify viewports independently for each GPU? |
| |
| RESOLVED. No. This can be achieved using multicast UBOs and ARB_shader_viewport_layer_array. |
| |
| (7) Should we add a present API that automatically displays content from a specific GPU? It |
| could abstract the transport mechanism, copying when necessary. |
| |
| RESOLVED. No. Transfers should be avoided to maximize performance and minimize latency. |
| Minimizing transfers requires application awareness of display connectivity to assign |
| rendering appropriately. Hiding transfers behind an API would also prevent some interesting |
| multi-GPU rendering techniques (e.g. checkerboard-style split rendering). |
| |
| WGL_NV_bridged_display can be used to enable display from multiple GPUs without copies. |
| |
| (8) Should we expose the extension on single-GPU configurations? |
| |
| RESOLVED. Yes, this is recommended. It allows more code sharing between multi-GPU and |
| single-GPU code paths. If there is only one GPU present MULTICAST_GPUS_NV will be 1. It |
| may also be 1 if explicit GPU control is unavailable (e.g. if the active multi-GPU rendering |
| mode prevents it). Note that in revisions 5 and prior of this extension the minimum for |
| MULTICAST_GPUS_NV was 2. |
| |
| (9) Should glGet*BufferParameter* return the PER_GPU_STORAGE_BIT_NV bit when |
| BUFFER_STORAGE_FLAGS is queried? |
| |
| RESOLVED. Yes. BUFFER_STORAGE_FLAGS must match the flags parameter input to *BufferStorage, as |
| specified in table 6.3. |
| |
| (10) Can a query be complete/available on one GPU and not another? |
| |
| RESOLVED. Yes. Independent query completion is important for conditional rendering. It |
| allows each GPU to begin conditional rendering in mode QUERY_WAIT without waiting on other |
| GPUs. |
| |
| (11) How can custom texel data for be uploaded to each GPU for a given texture? |
| |
| The easiest way is to create staging textures with the custom texel data and then copy it |
| to a texture with per-GPU storage using MulticastCopyImageSubDataNV. |
| |
| (12) Should we allow the waitGpuMask in MulticastWaitSyncNV to include the signal GPU? |
| |
| RESOLVED. No. There is no reason for a GPU to wait on itself. This is effectively a no-op in |
| the command stream. Furthermore it is easy to confuse GPU indices and masks, so it is |
| beneficial to explicitly generate an error in this case. |
| |
| (13) Will support for NVX_linked_gpu_multicast continue? |
| |
| RESOLVED. NVX_linked_gpu_multicast is deprecated and applications should switch to |
| NV_gpu_multicast. However, implementations are encouraged to continue supporting |
| NVX_linked_gpu_multicast for backwards compatibility. |
| |
| (14) Does RenderGpuMaskNV work with immediate mode rendering? |
| |
| RESOLVED. Yes, the render GPU mask applies to immediate mode rendering the same as other |
| rendering. Note that RenderGpuMaskNV is not one of the commands allowed between Begin and End |
| (see section 10.7.5) so the render mask must be set before Begin is called. |
| |
| Revision History |
| |
| Rev. Date Author Changes |
| ---- -------- -------- ----------------------------------------------- |
| 6 01/03/19 jschnarr reduce MULTICAST_GPUS_NV minimum to 1 |
| clarify that MULTICAST_GPUS_NV is constant for a context |
| 5 10/07/16 jschnarr trivial typo fix |
| 4 07/21/16 mjk registered |
| 3 06/15/16 jschnarr R370 release |