extensions/NVX/NVX_linked_gpu_multicast.txt - external/github.com/KhronosGroup/OpenGL-Registry - Git at Google

 Name

     NVX_linked_gpu_multicast

 Name Strings

     GL_NVX_linked_gpu_multicast

 Contact

     Joshua Schnarr, NVIDIA Corporation (jschnarr 'at' nvidia.com)
     Ingo Esser, NVIDIA Corporation (iesser 'at' nvidia.com)

 Contributors

     Christoph Kubisch, NVIDIA
     Mark Kilgard, NVIDIA

 Status

     Shipping in NVIDIA release 361 drivers.

 Version

     Last Modified Date:         July 21, 2016
     NVIDIA Revision:            4

 Number

     OpenGL Extension #493

 Dependencies

     This extension is written against the OpenGL 4.5 specification (Compatibility Profile), dated
     February 2, 2015.

     This extension interacts with ARB_sparse_buffer.

     This extension interacts with ARB_copy_image.

     This extension interacts with EXT_direct_state_access.

     This extension interacts with ARB_shader_viewport_layer_array.

 Overview

     This extension enables novel multi-GPU rendering techniques by providing application control
     over a group of linked GPUs with identical hardware configuration.

     Multi-GPU rendering techniques fall into two categories: implicit and explicit.  Existing
     explicit approaches like WGL_NV_gpu_affinity have two main drawbacks: CPU overhead and
     application complexity.  An application must manage one context per GPU and multi-pump the API
     stream.  Implicit multi-GPU rendering techniques avoid these issues by broadcasting rendering
     from one context to multiple GPUs.  Common implicit approaches include alternate-frame
     rendering (AFR), split-frame rendering (SFR) and multi-GPU anti-aliasing.  They each have
     drawbacks.  AFR scales nicely but interacts poorly with inter-frame dependencies.  SFR can
     improve latency but has challenges with offscreen rendering and scaling of vertex processing.
     With multi-GPU anti-aliasing, each GPU renders the same content with alternate sample
     positions and the driver blends the result to improve quality.  This also has issues with
     offscreen rendering and can conflict with other anti-aliasing techniques.

     These issues with implicit multi-GPU rendering all have the same root cause: the driver lacks
     adequate knowledge to accelerate every application.  To resolve this, NVX_linked_gpu_multicast
     provides application control over multiple GPUs with a single context.

     Key points:

     - One context controls multiple GPUs.  Every GPU in the linked group can access every object.

     - Rendering is broadcast.  Each draw is repeated across all GPUs in the linked group.

     - Each GPU gets its own instance of all framebuffers and attached textures, allowing
       individualized output for each GPU.  Input data can be customized for each GPU using buffers
       created with the storage flag, LGPU_SEPARATE_STORAGE_BIT_NVX and a new API,
       LGPUNamedBufferSubDataNVX.

     - Textures can be transferred from one GPU to another using LGPUCopyImageSubDataNVX.


 New Procedures and Functions

     void LGPUNamedBufferSubDataNVX(
         bitfield gpuMask, uint buffer,
         intptr offset, sizeiptr size,
         const void *data);

     void LGPUCopyImageSubDataNVX(
         uint sourceGpu, bitfield destinationGpuMask,
         uint srcName, enum srcTarget,
         int srcLevel,
         int srcX, int srxY, int srcZ,
         uint dstName, enum dstTarget,
         int dstLevel,
         int dstX, int dstY, int dstZ,
         sizei width, sizei height, sizei depth);

     void LGPUInterlockNVX(void);

 New Tokens

     Accepted in the <flags> parameter of BufferStorage and
     NamedBufferStorageEXT:

         LGPU_SEPARATE_STORAGE_BIT_NVX               0x0800

     Accepted by the <pname> parameter of GetBooleanv, GetIntegerv,
     GetInteger64v, GetFloatv, and GetDoublev:

         MAX_LGPU_GPUS_NVX                           0x92BA

 Additions to the OpenGL 4.5 Specification (Compatibility Profile)

     (Add a new chapter after chapter 19 "Compute Shaders")

     20 Multicast Rendering

     This chapter specifies commands for using multiple GPUs in a linked group.  Commands are
     multicast, or repeated across all linked GPUs.  Objects are shared by all GPUs, however each
     GPU has its own instance (copy) of many resources, including framebuffers.  When each GPU has
     its own instance of a resource, it is considered to have per-GPU storage.  When all GPUs share
     a single instance of a resource, this is considered GPU-shared storage.

     The mechanism for linking GPUs is implementation specific, as is the process-global mechanism
     for enabling multicast rendering support (if necessary).  The number of GPUs usable for
     multicast rendering by a context can be queried by calling GetIntegerv with the symbolic
     constant MAX_LGPU_GPUS_NVX.  Individual GPUs are identified using zero-based indices in the
     range [0, n-1], where n is the number of multicast GPUs.  GPUs are also be identified by
     bitmasks of the form 2^i, where i is the GPU index.  A set of GPUs is specified by the union of
     masks for each GPU in the set.

     20.1 Multi-GPU Buffer Storage

     Like other resources, buffer objects can have two types of storage, per-GPU storage or
     GPU-shared storage.  Per-GPU storage can be explicitly requested using the
     LGPU_SEPARATE_STORAGE_BIT_NVX flag with BufferStorage/NamedBufferStorageEXT.  If this flag is
     not set, the type of storage used is undefined.  The implementation may use either type
     and transition between them at any time.  Client reads of a buffer with per-GPU storage may
     source from any GPU.

     The following rules apply to buffer objects with per-GPU storage:

       When mapped with WRITE_ONLY access, writes apply to all GPUs.
       When bound to UNIFORM_BUFFER, client uniform updates apply to all GPUs.
       When used as the write buffer for CopyBufferSubData or CopyNamedBufferSubData, writes apply to
       all GPUs.

     The following commands affect storage on all GPUs, even if the the buffer object has per-GPU
     storage:

       BufferSubData, NamedBufferSubData, ClearBufferSubData, and ClearNamedBufferData

     An INVALID_VALUE error is generated if BufferStorage/NamedBufferStorageEXT is called with
     LGPU_SEPARATE_STORAGE_BIT_NVX set with MAP_PERSISTENT_BIT or SPARSE_STORAGE_BIT_ARB.

     To modify buffer object data on one or more GPUs, the client may use the command

     void LGPUNamedBufferSubDataNVX(
         bitfield gpuMask, uint buffer,
         intptr offset, sizeiptr size,
         const void *data);

     This function operates similarly to NamedBufferSubData, except that it updates the per-GPU
     buffer data on the set of GPUs defined by <gpuMask>.

     An INVALID_VALUE error is generated if <gpuMask> is zero.
     An INVALID_OPERATION error is generated if <buffer> is not the name of an existing buffer
     object.
     An INVALID_VALUE error is generated if <offset> or <size> is negative, or if <offset> + <size>
     is greater than the value of BUFFER_SIZE for the buffer object.
     An INVALID_OPERATION error is generated if any part of the specified buffer range is mapped
     with MapBufferRange or MapBuffer (see section 6.3), unless it was mapped with
     MAP_PERSISTENT_BIT set in the MapBufferRange access flags.
     An INVALID_OPERATION error is generated if the BUFFER_IMMUTABLE_STORAGE flag of the buffer
     object is TRUE and the value of BUFFER_STORAGE_FLAGS for the buffer does not have the
     DYNAMIC_STORAGE_BIT set.

     20.2 Multi-GPU Framebuffers and Textures

     All buffers in the default framebuffer as well as renderbuffers and textures bound to
     framebuffer objects receive per-GPU storage.  Storage for other textures is undefined: it may
     be per-GPU or GPU-shared and can transition between the types at any time.

     To copy texel data between GPUs, the client may use the command

     void LGPUCopyImageSubDataNVX(
         uint sourceGpu, bitfield destinationGpuMask,
         uint srcName, enum srcTarget,
         int srcLevel,
         int srcX, int srxY, int srcZ,
         uint dstName, enum dstTarget,
         int dstLevel,
         int dstX, int dstY, int dstZ,
         sizei width, sizei height, sizei depth);

     This function operates similarly to CopyImageSubData, except that it takes a source GPU
     and a destination GPU set defined by <destinationGpuMask>.

     INVALID_ENUM is generated
      * if either <srcTarget> or <dstTarget>
       - is not RENDERBUFFER or a valid non-proxy texture target
       - is TEXTURE_BUFFER, or
       - is one of the cubemap face selectors described in table 3.17,
      * if the target does not match the type of the object.

     INVALID_OPERATION is generated
      * if either object is a texture and the texture is not complete,
      * if the source and destination formats are not compatible,
      * if the source and destination number of samples do not match,
      * if one image is compressed and the other is uncompressed and the
        block size of compressed image is not equal to the texel size
        of the compressed image.

     INVALID_VALUE is generated
      * if <sourceGpu> is greater than or equal to MAX_LGPU_GPUS_NVX,
      * if <destinationGpuMask> is zero,
      * if either <srcName> or <dstName> does not correspond to a valid
        renderbuffer or texture object according to the corresponding
        target parameter, or
      * if the specified level is not a valid level for the image, or
      * if the dimensions of the either subregion exceeds the boundaries
        of the corresponding image object, or
      * if the image format is compressed and the dimensions of the
        subregion fail to meet the alignment constraints of the format.


     20.3 Multi-GPU Synchronization

     LGPUCopyImageSubDataNVX provides implicit synchronization with previous rendering to the given
     texture or renderbuffer on the source GPU.  Synchronization of the copy with the destination
     GPU(s) is achieved with the interlock function:

       void LGPUInterlockNVX(void)

     This is called to synchronize all linked GPUs to the same point in the API stream.  To
     guarantee consistency, the interlock command must be used as a barrier between any two
     accesses by multiple GPUs to the same memory when at least one of the accesses is a write.
     For consistent copies between GPUs, synchronization is required before and after each copy:

     1. Prior to each call to LGPUCopyImageSubDataNVX, LGPUInterlockNVX() must be called after
     the most recent read or write of the target image by a destination GPU.

     2. After each call to LGPUCopyImageSubDataNVX, LGPUInterlockNVX() must be called
     prior to any future read or write of the target image by a destination GPU.

     GPU writes and reads to/from GPU-shared locations require synchronization as well.  GPU writes
     such as transform feedback, shader image store, CopyTexImage, CopyBufferSubData are not
     automatically synchronized with writes by other GPUs.  Neither are GPU reads such as texture
     fetches, shader image loads, CopyTexImage, etc. synchronized with writes by other GPUs.
     Existing barriers such as TextureBarrier and MemoryBarrier only provide consistency guarantees
     for rendering, writes and reads on a single GPU.


     Additions to the AGL/GLX/WGL Specifications

         None

 GLX Protocol

     None

 Errors

     Relaxation of INVALID_ENUM errors
     ---------------------------------
     GetBooleanv, GetIntegerv, GetInteger64v, GetFloatv, and GetDoublev now accept new tokens as
     described in the "New Tokens" section.

 New State

     None

 New Implementation Dependent State

     Add to Table 23.82, Implementation-Dependent Values, p. 784

                                                 Minimum
     Get Value               Type   Get Command  Value   Description               Sec.  Attribute
     ----------------------  ----   -----------  ------- -----------------------   ----  ---------
     MAX_LGPU_GPUS_NVX        Z+   GetIntegerv      2    Maximum number of         6.9     -
                                                         usable GPUs
 Sample Code

     Binocular stereo rendering example using NVX_linked_gpu_multicast with single GPU fallback:

     struct ViewData {
         GLint viewport_index;
         GLfloat mvp[16];
         GLfloat modelview[16];
     };
     ViewData leftViewData = { 0, {...}, {...} };
     ViewData rightViewData = { 1, {...}, {...} };

     GLuint ubo[2];
     glCreateBuffers(2, &ubo[0]);

     if (has_NVX_linked_gpu_multicast) {
         glNamedBufferStorage(ubo[0], size, NULL, GL_LGPU_SEPARATE_STORAGE_BIT_NVX | GL_DYNAMIC_STORAGE_BIT);
         glLGPUNamedBufferSubDataNVX(0x1, ubo[0], 0, size, &leftViewData);
         glLGPUNamedBufferSubDataNVX(0x2, ubo[0], 0, size, &rightViewData);
     } else {
         glNamedBufferStorage(ubo[0], size, &leftViewData, 0);
         glNamedBufferStorage(ubo[1], size, &rightViewData, 0);
     }

     glViewportIndexedf(0, 0, 0, 640, 480);  // left viewport
     glViewportIndexedf(1, 640, 0, 640, 480);  // right viewport
     // Vertex shader sets gl_ViewportIndex according to viewport_index in UBO

     glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);

     if (has_NVX_linked_gpu_multicast) {
         glBindBufferBase(GL_UNIFORM_BUFFER, 0, ubo[0]);
         drawScene();
         // Make GPU 1 wait for glClear above to complete on GPU 0
         glLGPUInterlockNVX();
         // Copy right viewport from GPU 1 to GPU 0
         glLGPUCopyImageSubDataNVX(1, 0x1,
                                   renderBuffer, GL_RENDERBUFFER, 0, 640, 0, 0,
                                   renderBuffer, GL_RENDERBUFFER, 0, 640, 0, 0,
                                   640, 480, 1);
         // Make GPU 0 wait for GPU 1 copy to GPU 0
         glLGPUInterlockNVX();
     } else {
         glBindBufferBase(GL_UNIFORM_BUFFER, 0, ubo[0]);
         drawScene();
         glBindBufferBase(GL_UNIFORM_BUFFER, 0, ubo[1]);
         drawScene();
     }
     // Both viewports are now present in GPU 0's renderbuffer

 Issues

   (1) Should we provide explicit inter-gpu synchronization API?  Will this make the implementation
     easier or harder for the driver and applications?

     RESOLVED. Yes. A naive implementation of implicit synchronization would simply interlock the
     GPUs before and after each copy.  Smart implicit synchronization would have to track all APIs
     that can modify buffers and textures, creating an excessive burden for driver implementation
     and maintenance.  An application can track dependencies more easily and outperform a naive
     driver implementation using explicit synchronization.

   (2) How does this extension interact with queries (e.g. occlusion queries)?

     RESOLVED. Queries are performed separately on each GPU. The standard GetQueryObject* APIs
     return query results for GPU 0 only. However GetQueryBufferObject* can be used to retrieve
     query results for all GPUs through a buffer with separate storage (LGPU_SEPARATE_STORAGE_BIT).

   (3) Which textures and buffers have separate storage for each GPU?

     The default framebuffer and framebuffer texture attachments. Also buffers allocated with
     LGPU_SEPARATE_STORAGE_BIT. Other buffers and textures may or may not have separate storage.

   (4) Should we provide a mechanism to modify viewports independently for each GPU?

     RESOLVED. No. This can be achieved using multicast UBOs and ARB_shader_viewport_layer_array.

   (5) Should we expose this extension on single-GPU configurations?

     RESOLVED. No. The extension provides no value unless MULTICAST_GPUS_NV > 1.  Limiting exposure
     to these configurations guarantees that at least two GPUs will be available when the extension
     is reported.

   (6) Can rendering be enabled/disabled on a specific subset of GPUs?

     This functionality will be added in a future version of this extension.

   (7) Should glGet*BufferParameter* return the LGPU_SEPARATE_STORAGE_BIT_NVX bit when
     BUFFER_STORAGE_FLAGS is queried?

     RESOLVED. Yes. BUFFER_STORAGE_FLAGS must match the flags parameter input to *BufferStorage, as
     specified in table 6.3.

 Revision History

     Rev.    Date    Author    Changes
     ----  --------  --------  -----------------------------------------
      4    07/21/16  mjk       Register extension
	Name

	NVX_linked_gpu_multicast

	Name Strings

	GL_NVX_linked_gpu_multicast

	Contact

	Joshua Schnarr, NVIDIA Corporation (jschnarr 'at' nvidia.com)
	Ingo Esser, NVIDIA Corporation (iesser 'at' nvidia.com)

	Contributors

	Christoph Kubisch, NVIDIA
	Mark Kilgard, NVIDIA

	Status

	Shipping in NVIDIA release 361 drivers.

	Version

	Last Modified Date: July 21, 2016
	NVIDIA Revision: 4

	Number

	OpenGL Extension #493

	Dependencies

	This extension is written against the OpenGL 4.5 specification (Compatibility Profile), dated
	February 2, 2015.

	This extension interacts with ARB_sparse_buffer.

	This extension interacts with ARB_copy_image.

	This extension interacts with EXT_direct_state_access.

	This extension interacts with ARB_shader_viewport_layer_array.

	Overview

	This extension enables novel multi-GPU rendering techniques by providing application control
	over a group of linked GPUs with identical hardware configuration.

	Multi-GPU rendering techniques fall into two categories: implicit and explicit. Existing
	explicit approaches like WGL_NV_gpu_affinity have two main drawbacks: CPU overhead and
	application complexity. An application must manage one context per GPU and multi-pump the API
	stream. Implicit multi-GPU rendering techniques avoid these issues by broadcasting rendering
	from one context to multiple GPUs. Common implicit approaches include alternate-frame
	rendering (AFR), split-frame rendering (SFR) and multi-GPU anti-aliasing. They each have
	drawbacks. AFR scales nicely but interacts poorly with inter-frame dependencies. SFR can
	improve latency but has challenges with offscreen rendering and scaling of vertex processing.
	With multi-GPU anti-aliasing, each GPU renders the same content with alternate sample
	positions and the driver blends the result to improve quality. This also has issues with
	offscreen rendering and can conflict with other anti-aliasing techniques.

	These issues with implicit multi-GPU rendering all have the same root cause: the driver lacks
	adequate knowledge to accelerate every application. To resolve this, NVX_linked_gpu_multicast
	provides application control over multiple GPUs with a single context.

	Key points:

	- One context controls multiple GPUs. Every GPU in the linked group can access every object.

	- Rendering is broadcast. Each draw is repeated across all GPUs in the linked group.

	- Each GPU gets its own instance of all framebuffers and attached textures, allowing
	individualized output for each GPU. Input data can be customized for each GPU using buffers
	created with the storage flag, LGPU_SEPARATE_STORAGE_BIT_NVX and a new API,
	LGPUNamedBufferSubDataNVX.

	- Textures can be transferred from one GPU to another using LGPUCopyImageSubDataNVX.


	New Procedures and Functions

	void LGPUNamedBufferSubDataNVX(
	bitfield gpuMask, uint buffer,
	intptr offset, sizeiptr size,
	const void *data);

	void LGPUCopyImageSubDataNVX(
	uint sourceGpu, bitfield destinationGpuMask,
	uint srcName, enum srcTarget,
	int srcLevel,
	int srcX, int srxY, int srcZ,
	uint dstName, enum dstTarget,
	int dstLevel,
	int dstX, int dstY, int dstZ,
	sizei width, sizei height, sizei depth);

	void LGPUInterlockNVX(void);

	New Tokens

	Accepted in the <flags> parameter of BufferStorage and
	NamedBufferStorageEXT:

	LGPU_SEPARATE_STORAGE_BIT_NVX 0x0800

	Accepted by the <pname> parameter of GetBooleanv, GetIntegerv,
	GetInteger64v, GetFloatv, and GetDoublev:

	MAX_LGPU_GPUS_NVX 0x92BA

	Additions to the OpenGL 4.5 Specification (Compatibility Profile)

	(Add a new chapter after chapter 19 "Compute Shaders")

	20 Multicast Rendering

	This chapter specifies commands for using multiple GPUs in a linked group. Commands are
	multicast, or repeated across all linked GPUs. Objects are shared by all GPUs, however each
	GPU has its own instance (copy) of many resources, including framebuffers. When each GPU has
	its own instance of a resource, it is considered to have per-GPU storage. When all GPUs share
	a single instance of a resource, this is considered GPU-shared storage.

	The mechanism for linking GPUs is implementation specific, as is the process-global mechanism
	for enabling multicast rendering support (if necessary). The number of GPUs usable for
	multicast rendering by a context can be queried by calling GetIntegerv with the symbolic
	constant MAX_LGPU_GPUS_NVX. Individual GPUs are identified using zero-based indices in the
	range [0, n-1], where n is the number of multicast GPUs. GPUs are also be identified by
	bitmasks of the form 2^i, where i is the GPU index. A set of GPUs is specified by the union of
	masks for each GPU in the set.

	20.1 Multi-GPU Buffer Storage

	Like other resources, buffer objects can have two types of storage, per-GPU storage or
	GPU-shared storage. Per-GPU storage can be explicitly requested using the
	LGPU_SEPARATE_STORAGE_BIT_NVX flag with BufferStorage/NamedBufferStorageEXT. If this flag is
	not set, the type of storage used is undefined. The implementation may use either type
	and transition between them at any time. Client reads of a buffer with per-GPU storage may
	source from any GPU.

	The following rules apply to buffer objects with per-GPU storage:

	When mapped with WRITE_ONLY access, writes apply to all GPUs.
	When bound to UNIFORM_BUFFER, client uniform updates apply to all GPUs.
	When used as the write buffer for CopyBufferSubData or CopyNamedBufferSubData, writes apply to
	all GPUs.

	The following commands affect storage on all GPUs, even if the the buffer object has per-GPU
	storage:

	BufferSubData, NamedBufferSubData, ClearBufferSubData, and ClearNamedBufferData

	An INVALID_VALUE error is generated if BufferStorage/NamedBufferStorageEXT is called with
	LGPU_SEPARATE_STORAGE_BIT_NVX set with MAP_PERSISTENT_BIT or SPARSE_STORAGE_BIT_ARB.

	To modify buffer object data on one or more GPUs, the client may use the command

	void LGPUNamedBufferSubDataNVX(
	bitfield gpuMask, uint buffer,
	intptr offset, sizeiptr size,
	const void *data);

	This function operates similarly to NamedBufferSubData, except that it updates the per-GPU
	buffer data on the set of GPUs defined by <gpuMask>.

	An INVALID_VALUE error is generated if <gpuMask> is zero.
	An INVALID_OPERATION error is generated if <buffer> is not the name of an existing buffer
	object.
	An INVALID_VALUE error is generated if <offset> or <size> is negative, or if <offset> + <size>
	is greater than the value of BUFFER_SIZE for the buffer object.
	An INVALID_OPERATION error is generated if any part of the specified buffer range is mapped
	with MapBufferRange or MapBuffer (see section 6.3), unless it was mapped with
	MAP_PERSISTENT_BIT set in the MapBufferRange access flags.
	An INVALID_OPERATION error is generated if the BUFFER_IMMUTABLE_STORAGE flag of the buffer
	object is TRUE and the value of BUFFER_STORAGE_FLAGS for the buffer does not have the
	DYNAMIC_STORAGE_BIT set.

	20.2 Multi-GPU Framebuffers and Textures

	All buffers in the default framebuffer as well as renderbuffers and textures bound to
	framebuffer objects receive per-GPU storage. Storage for other textures is undefined: it may
	be per-GPU or GPU-shared and can transition between the types at any time.

	To copy texel data between GPUs, the client may use the command

	void LGPUCopyImageSubDataNVX(
	uint sourceGpu, bitfield destinationGpuMask,
	uint srcName, enum srcTarget,
	int srcLevel,
	int srcX, int srxY, int srcZ,
	uint dstName, enum dstTarget,
	int dstLevel,
	int dstX, int dstY, int dstZ,
	sizei width, sizei height, sizei depth);

	This function operates similarly to CopyImageSubData, except that it takes a source GPU
	and a destination GPU set defined by <destinationGpuMask>.

	INVALID_ENUM is generated
	* if either <srcTarget> or <dstTarget>
	- is not RENDERBUFFER or a valid non-proxy texture target
	- is TEXTURE_BUFFER, or
	- is one of the cubemap face selectors described in table 3.17,
	* if the target does not match the type of the object.

	INVALID_OPERATION is generated
	* if either object is a texture and the texture is not complete,
	* if the source and destination formats are not compatible,
	* if the source and destination number of samples do not match,
	* if one image is compressed and the other is uncompressed and the
	block size of compressed image is not equal to the texel size
	of the compressed image.

	INVALID_VALUE is generated
	* if <sourceGpu> is greater than or equal to MAX_LGPU_GPUS_NVX,
	* if <destinationGpuMask> is zero,
	* if either <srcName> or <dstName> does not correspond to a valid
	renderbuffer or texture object according to the corresponding
	target parameter, or
	* if the specified level is not a valid level for the image, or
	* if the dimensions of the either subregion exceeds the boundaries
	of the corresponding image object, or
	* if the image format is compressed and the dimensions of the
	subregion fail to meet the alignment constraints of the format.


	20.3 Multi-GPU Synchronization

	LGPUCopyImageSubDataNVX provides implicit synchronization with previous rendering to the given
	texture or renderbuffer on the source GPU. Synchronization of the copy with the destination
	GPU(s) is achieved with the interlock function:

	void LGPUInterlockNVX(void)

	This is called to synchronize all linked GPUs to the same point in the API stream. To
	guarantee consistency, the interlock command must be used as a barrier between any two
	accesses by multiple GPUs to the same memory when at least one of the accesses is a write.
	For consistent copies between GPUs, synchronization is required before and after each copy:

	1. Prior to each call to LGPUCopyImageSubDataNVX, LGPUInterlockNVX() must be called after
	the most recent read or write of the target image by a destination GPU.

	2. After each call to LGPUCopyImageSubDataNVX, LGPUInterlockNVX() must be called
	prior to any future read or write of the target image by a destination GPU.

	GPU writes and reads to/from GPU-shared locations require synchronization as well. GPU writes
	such as transform feedback, shader image store, CopyTexImage, CopyBufferSubData are not
	automatically synchronized with writes by other GPUs. Neither are GPU reads such as texture
	fetches, shader image loads, CopyTexImage, etc. synchronized with writes by other GPUs.
	Existing barriers such as TextureBarrier and MemoryBarrier only provide consistency guarantees
	for rendering, writes and reads on a single GPU.


	Additions to the AGL/GLX/WGL Specifications

	None

	GLX Protocol

	None

	Errors

	Relaxation of INVALID_ENUM errors
	---------------------------------
	GetBooleanv, GetIntegerv, GetInteger64v, GetFloatv, and GetDoublev now accept new tokens as
	described in the "New Tokens" section.

	New State

	None

	New Implementation Dependent State

	Add to Table 23.82, Implementation-Dependent Values, p. 784

	Minimum
	Get Value Type Get Command Value Description Sec. Attribute
	---------------------- ---- ----------- ------- ----------------------- ---- ---------
	MAX_LGPU_GPUS_NVX Z+ GetIntegerv 2 Maximum number of 6.9 -
	usable GPUs
	Sample Code

	Binocular stereo rendering example using NVX_linked_gpu_multicast with single GPU fallback:

	struct ViewData {
	GLint viewport_index;
	GLfloat mvp[16];
	GLfloat modelview[16];
	};
	ViewData leftViewData = { 0, {...}, {...} };
	ViewData rightViewData = { 1, {...}, {...} };

	GLuint ubo[2];
	glCreateBuffers(2, &ubo[0]);

	if (has_NVX_linked_gpu_multicast) {
	glNamedBufferStorage(ubo[0], size, NULL, GL_LGPU_SEPARATE_STORAGE_BIT_NVX \| GL_DYNAMIC_STORAGE_BIT);
	glLGPUNamedBufferSubDataNVX(0x1, ubo[0], 0, size, &leftViewData);
	glLGPUNamedBufferSubDataNVX(0x2, ubo[0], 0, size, &rightViewData);
	} else {
	glNamedBufferStorage(ubo[0], size, &leftViewData, 0);
	glNamedBufferStorage(ubo[1], size, &rightViewData, 0);
	}

	glViewportIndexedf(0, 0, 0, 640, 480); // left viewport
	glViewportIndexedf(1, 640, 0, 640, 480); // right viewport
	// Vertex shader sets gl_ViewportIndex according to viewport_index in UBO

	glClear(GL_COLOR_BUFFER_BIT \| GL_DEPTH_BUFFER_BIT);

	if (has_NVX_linked_gpu_multicast) {
	glBindBufferBase(GL_UNIFORM_BUFFER, 0, ubo[0]);
	drawScene();
	// Make GPU 1 wait for glClear above to complete on GPU 0
	glLGPUInterlockNVX();
	// Copy right viewport from GPU 1 to GPU 0
	glLGPUCopyImageSubDataNVX(1, 0x1,
	renderBuffer, GL_RENDERBUFFER, 0, 640, 0, 0,
	renderBuffer, GL_RENDERBUFFER, 0, 640, 0, 0,
	640, 480, 1);
	// Make GPU 0 wait for GPU 1 copy to GPU 0
	glLGPUInterlockNVX();
	} else {
	glBindBufferBase(GL_UNIFORM_BUFFER, 0, ubo[0]);
	drawScene();
	glBindBufferBase(GL_UNIFORM_BUFFER, 0, ubo[1]);
	drawScene();
	}
	// Both viewports are now present in GPU 0's renderbuffer

	Issues

	(1) Should we provide explicit inter-gpu synchronization API? Will this make the implementation
	easier or harder for the driver and applications?

	RESOLVED. Yes. A naive implementation of implicit synchronization would simply interlock the
	GPUs before and after each copy. Smart implicit synchronization would have to track all APIs
	that can modify buffers and textures, creating an excessive burden for driver implementation
	and maintenance. An application can track dependencies more easily and outperform a naive
	driver implementation using explicit synchronization.

	(2) How does this extension interact with queries (e.g. occlusion queries)?

	RESOLVED. Queries are performed separately on each GPU. The standard GetQueryObject* APIs
	return query results for GPU 0 only. However GetQueryBufferObject* can be used to retrieve
	query results for all GPUs through a buffer with separate storage (LGPU_SEPARATE_STORAGE_BIT).

	(3) Which textures and buffers have separate storage for each GPU?

	The default framebuffer and framebuffer texture attachments. Also buffers allocated with
	LGPU_SEPARATE_STORAGE_BIT. Other buffers and textures may or may not have separate storage.

	(4) Should we provide a mechanism to modify viewports independently for each GPU?

	RESOLVED. No. This can be achieved using multicast UBOs and ARB_shader_viewport_layer_array.

	(5) Should we expose this extension on single-GPU configurations?

	RESOLVED. No. The extension provides no value unless MULTICAST_GPUS_NV > 1. Limiting exposure
	to these configurations guarantees that at least two GPUs will be available when the extension
	is reported.

	(6) Can rendering be enabled/disabled on a specific subset of GPUs?

	This functionality will be added in a future version of this extension.

	(7) Should glGetBufferParameter return the LGPU_SEPARATE_STORAGE_BIT_NVX bit when
	BUFFER_STORAGE_FLAGS is queried?

	RESOLVED. Yes. BUFFER_STORAGE_FLAGS must match the flags parameter input to *BufferStorage, as
	specified in table 6.3.

	Revision History

	Rev. Date Author Changes
	---- -------- -------- -----------------------------------------
	4 07/21/16 mjk Register extension