blob: 4d2f03f28a9ecce6a877553654ee9d719a14a3b2 [file] [log] [blame]
Name Strings
Matt Craighead, NVIDIA Corporation (mcraighead 'at'
Copyright NVIDIA Corporation, 2000, 2001, 2002.
IP Status
NVIDIA Proprietary.
Shipping (version 1.0)
NVIDIA Date: November 7, 2002 (version 1.0)
Written based on the wording of the OpenGL 1.3 specification.
If this extension is implemented, the WGL or GLX memory allocator
interface specified in NV_vertex_array_range must also be
implemented. Please refer to the NV_vertex_array_range specification
for further information on this interface.
The vertex array range extension is intended to improve the
efficiency of OpenGL vertex arrays. OpenGL vertex arrays' coherency
model and ability to access memory from arbitrary locations in memory
prevented implementations from using DMA (Direct Memory Access)
Many image-intensive applications, such as those that use dynamically
generated textures, face similar problems. These applications would
like to be able to sustain throughputs of hundreds of millions of
pixels per second through DrawPixels and hundreds of millions of
texels per second through TexSubImage.
However, the same restrictions that limited vertex throughput also
limit pixel throughput.
By the time that any pixel operation that reads data from user memory
returns, OpenGL requires that it must be safe for the application to
start using that memory for a different purpose. This coherency
model prevents asynchronous DMA transfers directly out of the user's
There are also no restrictions on the pointer provided to pixel
operations or on the size of the data. To facilitate DMA
implementations, the driver needs to know in advance what region of
the address space to lock down.
Vertex arrays faced both of these restrictions already, but pixel
operations have one additional complicating factor -- they are
bidirectional. Vertex array data is always being transfered from the
application to the driver and the HW, whereas pixel operations
sometimes transfer data to the application from the driver and HW.
Note that the types of memory that are suitable for DMA for reading
and writing purposes are often different. For example, on many PC
platforms, DMA pulling is best accomplished with write-combined
(uncached) AGP memory, while pushing data should use cached memory so
that the application can read the data efficiently once it has been
read back over the AGP bus.
This extension defines an API where an application can specify two
pixel data ranges, which are analogous to vertex array ranges, except
that one is for operations where the application is reading data
(e.g. glReadPixels) and one is for operations where the application
is writing data (e.g. glDrawPixels, glTexSubImage2D, etc.). Each
pixel data range has a pointer to its start and a length in bytes.
When the pixel data range is enabled, and if the pointer specified
as the argument to a pixel operation is inside the corresponding
pixel data range, the implementation may choose to asynchronously
pull data from the pixel data range or push data to the pixel data
range. Data pulled from outside the pixel data range is undefined,
while pushing data to outside the pixel data range produces undefined
The application may synchronize with the hardware in one of two ways:
by flushing the pixel data range (or causing an implicit flush) or by
using the NV_fence extension to insert fences in the command stream.
* The vertex array range extension required that all active vertex
arrays must be located inside the vertex array range. Should
this extension be equally strict?
RESOLVED: No, because a user may want to use the pixel data range
for one type of operation (say, texture downloads) but still be
able to use standard non-PDR pixel operations for everything
else. Requiring that apps disable PDR every time such an
operation occurs would be burdensome and make it difficult to
integrate this extension into a larger app with minimal changes.
So, for each pixel operation, we will look at the pointer
provided by the application. If it's inside the PDR, the PDR
rules apply, and if it's not inside the PDR, it's a standard GL
pixel operation, even if some of the data is actually inside the
* Reads and writes may require different types of memory. How do
we handle this?
RESOLVED: The allocator interface already provides the ability to
specify different read and write frequencies. A buffer for a
write PDR should probably be allocated with a high write
frequency and low read frequency, while a read PDR's buffer
should have a low write and high read frequency.
Having two PDRs is essential because a single application may
want to perform both asynchronous reads and writes
* What happens if a PDR pixel operation pulls data from a location
outside the PDR?
RESOLVED: The data pulled is undefined, and program termination
may result.
* What happens if a PDR pixel operation pushes data to a location
outside the PDR?
RESOLVED: The contents of that memory location become undefined,
and program termination may result.
* What happens if the hardware can't support the operation?
RESOLVED: The operation may be slow, because we may need to, for
example, read the pixel data out of uncached memory with the CPU,
but it should still work. So this should never be a problem; in
fact, it means that a basic implementation that accelerates only,
say, one operation is quite trivial.
* Should there be any limitations to what operations should be
RESOLVED: No, in theory any pixel operation that accesses a
user's buffer can work with PDR. This includes Bitmap,
PolygonStipple, GetTexImage, ConvolutionFilter2D, etc. Many are
unlikely to be accelerated, but there is no reason to place
arbitrary restrictions. A list of possibly supported operations
is provided for OpenGL 1.2.1 with ARB_imaging support and for all
the extensions currently supported by NVIDIA. Developers should
carefully read the Implementation Details provided by their
vendor before using the extension.
* Should PixelMap and GetPixelMap be supported?
RESOLVED: Yes. They're not really pixel path operations, but,
again, there is no good reason to omit operations, and they _are_
operations that pass around big chunks of pixel-related data. If
we support PolygonStipple, surely we should support this.
* Can the PDRs and the VAR overlap and/or be the same buffer?
RESOLVED: Yes. In fact, it is expected that one of the preferred
modes of usage for this extension will be to use the same AGP
buffer for both the write PDR and the VAR, so it can be used for
both dynamic texturing and dynamic geometry.
* Can video memory buffers be used?
RESOLVED: Yes, assuming the implementation supports using them
for PDR. On systems with AGP Fast Writes, this may be
interesting in some cases. Another possible use for this is to
treat a video memory buffer as an offscreen surface, where
DrawPixels can be thought of as a blit from offscreen memory to
a GL surface, and ReadPixels can be thought of as a blit from a
GL surface to offscreen memory. This technique should be used
with caution, because there are other alternatives, such as
pbuffers, aux buffers, and even textures.
* Do we want to support more than one read and one write PDR?
RESOLVED: No, but I could imagine uses for it. For example, an
app could use two system memory buffers (one read, one write PDR)
and a single video memory buffer (both read and write). Do we
need a scheme where an unlimited number of PDR buffers can be
specified? Ugh. I hope not. I can't think of a good reason to
use more than 3 buffers, and even that is stretching it.
* Do we want a separate enable for both the read and write PDR?
RESOLVED: Yes. In theory, they are completely independent, and
we should treat them as such.
* Is there an equivalent to the VAR validity check?
RESOLVED: No. When a vertex array call occurs, all the vertex
array state is already set. We can know in advance whether all
the pointers, strides, etc. are set up in a satisfactory way.
However, for a pixel operation, much of the state is provided on
the same function call that performs the operation. For example,
the pixel format of the data may need to match that of the
framebuffer. We can't know this without looking at the format
and type arguments.
An alternative might be some sort of "proxy" mechanism for pixel
operations, but this seems to be very complicated.
* Do we want a more generalized API? What stops us from needing a
DMA extension for every single conceivable use in the future?
RESOLVED: No, this is good enough. Since new extensions will
probably require new semantics anyhow, we'll just live with that.
Maybe if the ARB wants to create a more generic "DMA" extension,
these issues can be revisited.
* How do applications synchronize with the hardware?
RESOLVED: A new command, FlushPixelDataRangeNV, is provided, that
is analogous to FlushVertexArrayRangeNV. Applications can also
use the Finish command. The NV_fence extension is best for
applications that need fine-grained synchronization.
* Should enabling or disabling a PDR induce an implicit PDR flush?
RESOLVED: No. In the VAR extension, enabling and disabling the
VAR does induce a VAR flush, but this has proven to be more
problematic than helpful, because it makes it much more difficult
to switch between VAR and non-VAR rendering; the VAR2 extension
lifts this restriction, and there is no reason to get this wrong
a second time.
The PDR extension does not suffer from the problem of enabling
and disabling frequently, because non-PDR operations are
permitted simply by providing a pointer outside of the PDR, but
there is no clear reason why the enable or disable should cause
a quite unnecessary PDR flush.
* Should this state push/pop?
RESOLVED: Yes, but via a Push/PopClientAttrib and the
GL_CLIENT_PIXEL_STORE_BIT bit. Although this is heavyweight
state, VAR also allowed push/pop. It does fit nicely into an
existing category, too.
* Should making another context current cause a PDR flush?
RESOLVED: No. There's no fundamental reason it should. Note
that apps should be careful to not free their memory until the
hardware is not using it... note also that this decision is
inconsistent with VAR, which did guarantee a flush here.
* Is the read PDR guaranteed to give you either old or new values,
or is it truly undefined?
RESOLVED: Undefined. This may ease implementation constraints
slightly. Apps must not rely at all on the contents of the
region where the readback is occurring until it is known to be
An example of how an implementation might conceivably require
this is as follows. Suppose that a piece of hardware, for some
reason, can only write full 32-byte chunks of data. Any bytes
that were supposed to be unwritten are in fact trashed by the
hardware, filled with garbage. By careful fixups (read the
contents before the operation, restore when done), the driver may
be able to hide this fact, but a requirement that either new or
old data must show up would be violated.
Or, more trivially, you might implement certain pixel operations
as an in-place postprocess on the returned data.
It is not anticipated that NVIDIA implementations will need this
flexibility, but it is nevertheless provided.
* How should an application allocate its PDR memory?
The app should use wglAllocateMemoryNV, even for a read PDR in
system memory. Using malloc may result in suboptimal
performance, because the driver will not be able to choose an
optimal memory type. For ReadPixels to system memory, you might
set a read frequency of 1.0, a write frequency of 0.0, and a
priority of 1.0. The driver might allocate PCI memory, or
physically contiguous PCI memory, or cachable AGP memory, all
depending on the performance characteristics of the device.
While memory from malloc will work, it does not allow the driver
to make these decisions, and it will certainly never give you AGP
Write PDR memory for purposes of streaming textures, etc. works
exactly the same as VAR memory for streaming vertices. You can,
and in fact are encouraged to, use the same circular buffer for
both vertices and textures.
If you have different needs (not just streaming textures or
asynchronous readbacks), you may want your pixel data in video
New Procedures and Functions
void PixelDataRangeNV(enum target, sizei length, void *pointer)
void FlushPixelDataRangeNV(enum target)
New Tokens
Accepted by the <target> parameter of PixelDataRangeNV and
FlushPixelDataRangeNV, and by the <cap> parameter of
EnableClientState, DisableClientState, and IsEnabled:
Accepted by the <pname> parameter of GetBooleanv, GetIntegerv,
GetFloatv, and GetDoublev:
Accepted by the <pname> parameter of GetPointerv:
Additions to Chapter 2 of the OpenGL 1.3 Specification (OpenGL Operation)
Additions to Chapter 3 of the OpenGL 1.3 Specification (Rasterization)
Add new section to Section 3.6, "Pixel Rectangles", on page 113:
"3.6.7 Write Pixel Data Range Operation
Applications can enhance the performance of DrawPixels and other
commands that transfer large amounts of pixel data by using a pixel
data range. The command
void PixelDataRangeNV(enum target, sizei length, void *pointer)
specifies one of the current pixel data ranges. When the write pixel
data range is enabled and valid, pixel data transfers from within
the pixel data range are potentially faster. The pixel data range is
a contiguous region of (virtual) address space for placing pixel
data. The "pointer" parameter is a pointer to the base of the pixel
data range. The "length" pointer is the length of the pixel data
range in basic machine units (typically unsigned bytes). For the
write pixel data range, "target" must be WRITE_PIXEL_DATA_RANGE_NV.
The pixel data range address space region extends from "pointer"
to "pointer + length - 1" inclusive.
There is some system burden associated with establishing a pixel data
range (typically, the memory range must be locked down). If either
the pixel data range pointer or size is set to zero, the previously
established pixel data range is released (typically, unlocking the
The pixel data range may not be established for operating system
dependent reasons, and therefore, not valid. Reasons that a pixel
data range cannot be established include spanning different memory
types, the memory could not be locked down, alignment restrictions
are not met, etc.
The write pixel data range is enabled or disabled by calling
EnableClientState or DisableClientState with the symbolic constant
The write pixel data range is valid when the following conditions are
o PixelDataRangeNV has been called with a non-null pointer and
non-zero size, for target WRITE_PIXEL_DATA_RANGE_NV.
o The write pixel data range has been established.
o An implementation-dependent validity check based on the
pointer alignment, size, and underlying memory type of the
write pixel data range region of memory.
Otherwise, the write pixel data range is not valid.
The commands, such as DrawPixels, that may be made faster by the
write pixel data range are listed in the Appendix.
When the write pixel data range is valid, an attempt will be made to
accelerate these commands if and only if the data pointer argument to
the command lies within the write pixel data range. No attempt will
be made to accelerate commands whose base pointer is outside this
range. Accessing data outside the write pixel data range when the
base pointer lies within the range and the range is valid will
produce undefined results and may cause program termination.
The standard OpenGL pixel data coherency model requires that pixel
data be extracted from the user's buffer immediately, before the
pixel command returns. When the write pixel data range is valid,
this model is relaxed so that changes made to pixel data until the
next "write pixel data range flush" may affect pixel commands in non-
sequential ways. That is, a call to a pixel command that precedes
a change to pixel data (without an intervening "write pixel data
range flush") may access the changed data; though a call to a pixel
command following a change to pixel data must always access the
changed data, and never the original data.
A 'write pixel data range flush' occurs when one of the following
operations occur:
o Finish returns.
o FlushPixelDataRangeNV (with target WRITE_PIXEL_DATA_RANGE_NV)
o PixelDataRangeNV (with target WRITE_PIXEL_DATA_RANGE_NV)
The client state required to implement the write pixel data range
consists of an enable bit, a memory pointer, and an integer size.
If the memory mapping of pages within the pixel data range changes,
using the pixel data range has undefined effects. To ensure that the
pixel data range reflects the address space's current state, the
application is responsible for calling PixelDataRange again after any
memory mapping changes within the pixel data range."
Additions to Chapter 4 of the OpenGL 1.3 Specification (Per-Fragment
Operations and the Frame Buffer)
Add new section to Section 4.3, "Pixel Draw/Read State", on page 180:
"4.3.5 Read Pixel Data Range Operation
The read pixel data range is similar to the write pixel data range
(see section 3.6.7), but is specified with PixelDataRangeNV with a
target READ_PIXEL_DATA_RANGE_NV. It is exactly analogous to the
write pixel data range, but applies to commands where OpenGL returns
pixel data to the caller, such as ReadPixels. The list of commands
to which the read pixel data range applies can be found in the
Validity checks and flushes of the read pixel data range behave in a
manner exactly analogous to those of the write pixel data range,
though any implementation-dependent checks may differ between the two
types of pixel data range.
The standard OpenGL pixel data coherency model requires that pixel
data be written into the user's buffer immediately, before the
pixel command returns. When the read pixel data range is valid,
this model is relaxed so that this data may not necessarily be
available until the next "read pixel data range flush". Until such
point in time, an attempt to read the buffer returns undefined
If both the read and write pixel data ranges are valid and overlap,
then all operations involving both in the same thread are
automatically synchronized. That is, the write pixel data range
operation will automatically wait for any pending read pixel data
range results to become available before attempting to retrieve them.
However, if the operations are performed from different threads, the
user is responsible for all such synchronization.
Read pixel data range operations are also synchronized with vertex
array range operations in the same way.
The client state required to implement the read pixel data range
consists of an enable bit, a memory pointer, and an integer size."
Additions to Chapter 5 of the OpenGL 1.3 Specification (Special Functions)
Add the following to the end of Section 5.4 "Display Lists" (page
"PixelDataRangeNV and FlushPixelDataRangeNV are not complied into
display lists but are executed immediately.
If a display list is compiled while WRITE_PIXEL_DATA_RANGE_NV is
enabled, all commands affected by that enable are accumulated into a
display list as if WRITE_PIXEL_DATA_RANGE_NV is disabled.
The state of the read pixel data range does not affect display list
compilation, because those commands that might be accelerated by a
read pixel data range are commands that are executed immediately
rather than being compiled into a display list (ReadPixels and
GetTexImage, for example)."
Additions to Chapter 6 of the OpenGL 1.3 Specification (State and
State Requests)
Additions to the GLX Specification
"OpenGL implementations using GLX indirect rendering should fail to
set up the pixel data range and will not accelerate any pixel
operations using it. Additionally, glXAllocateMemoryNV always fails
to allocate memory (returns NULL) when used with an indirect
rendering context."
GLX Protocol
INVALID_OPERATION is generated if PixelDataRangeNV or
FlushPixelDataRangeNV is called between the execution of Begin and
the corresponding execution of End.
INVALID_ENUM is generated if PixelDataRangeNV or
FlushPixelDataRangeNV is called when target is not
INVALID_VALUE is generated if PixelDataRangeNV is called when length
is negative.
New State
Get Value Get Command Type Value Attrib
--------- ----------- ---- ------- ------
WRITE_PIXEL_DATA_RANGE_NV IsEnabled B False pixel-store
READ_PIXEL_DATA_RANGE_NV IsEnabled B False pixel-store
WRITE_PIXEL_DATA_RANGE_POINTER_NV GetPointerv Z+ 0 pixel-store
READ_PIXEL_DATA_RANGE_POINTER_NV GetPointerv Z+ 0 pixel-store
WRITE_PIXEL_DATA_RANGE_LENGTH_NV GetIntegerv Z+ 0 pixel-store
READ_PIXEL_DATA_RANGE_LENGTH_NV GetIntegerv Z+ 0 pixel-store
Appendix: Operations Supported
In unextended OpenGL 1.3 with ARB_imaging support, the following
commands may take advantage of the write PDR:
In unextended OpenGL 1.3 with ARB_imaging support, the following
commands may take advantage of the read PDR:
No other extensions shipping in the NVIDIA OpenGL drivers add any
other new commands that may take advantage of this extension,
although in a few cases there are new commands that alias to other
commands that may be accelerated by this extension. These commands
glCompressedTexImage1DARB (ARB_texture_compression)
glCompressedTexImage2DARB (ARB_texture_compression)
glCompressedTexImage3DARB (ARB_texture_compression)
glCompressedTexSubImage1DARB (ARB_texture_compression)
glCompressedTexSubImage2DARB (ARB_texture_compression)
glCompressedTexSubImage3DARB (ARB_texture_compression)
glColorSubTableEXT (EXT_paletted_texture)
glColorTableEXT (EXT_paletted_texture)
glGetCompressedTexImageARB (ARB_texture_compression)
glTexImage3DEXT (EXT_texture3D)
glTexSubImage3DEXT (EXT_texture3D)
NVIDIA Implementation Details
In the Release 40 OpenGL drivers, the NV_pixel_data_range extension
is supported on all GeForce/Quadro-class hardware. The following
commands may potentially be accelerated in this release:
The following type/format/buffer format sets are accelerated for
type format buffer format
GL_UNSIGNED_SHORT_5_6_5 GL_RGB 16-bit color (PCs only -- Macs use 555)
GL_UNSIGNED_INT_8_8_8_8_REV GL_BGRA 32-bit color w/ alpha
GL_UNSIGNED_BYTE GL_BGRA 32-bit color w/ alpha (little endian only)
GL_UNSIGNED_INT_24_8_NV GL_DEPTH_STENCIL_NV 24-bit depth, 8-bit stencil
The following internalformat/type/format sets are accelerated for
internalformat type format
The following internalformat/type/format sets will be accelerated for
glTex[Sub]Image2D on little-endian machines only:
internalformat type format
All compressed texture formats are supported for
The following restrictions apply to all commands:
- No pixel transfer operations of any kind may be in use.
- The base address of the PDR must be aligned to a 32-byte boundary.
- The data pointer must be aligned to boundaries of the size of one
group of pixels. For example, GL_UNSIGNED_SHORT_5_6_5 data must
be aligned to 2-byte boundaries, GL_UNSIGNED_INT_24_8_NV data must
be aligned to 4-byte boundaries, and GL_BGRA/GL_UNSIGNED_BYTE data
must be aligned to 4-byte boundaries (not 1-byte boundaries).
Compressed texture data must be aligned to a block boundary.
No additional restrictions apply to glReadPixels or
The following additional restrictions apply to glTex[Sub]Image2D:
- The texture must fit in video memory.
- The texture must have a border size of zero.
- The stride (in bytes) between two lines of source data must not
exceed 65535.
- For non-rectangle textures, the width and height of the destination
mipmap level must not exceed 2048, nor be below 2; also, the
destination mipmap level must not be 2x2 (for 16-bit textures) or
2x2, 4x2, or 2x4 (for 8-bit textures).
Future software releases may increase the number of accelerated
commands and the number of accelerated data formats for each command.
Note also that although all of the formats and commands listed are
guaranteed to be accelerated, there may be limitations in the actual
implementation not as strict as those stated here; for example, some
data formats not listed here may turn out to be accelerated.
However, it is highly recommended that you stick to the formats and
commands listed in this section. In cases where actual restrictions
are less strict, future implementations may very well enforce the
listed restriction.
It is also possible that some of these restrictions may become _more_
strict on future chips; though at present no such additional
restrictions are known to be likely. Such restrictions would likely
take the form of more stringent pitch or alignment restrictions, if
they proved to be necessary.
In practice, you should expect that several of these restrictions
will be more lenient in a future release.
Revision History
November 7, 2002 - Updated implementation details section with most
up-to-date rules on PDR usage. Lifted rule that texture downloads
must be 2046 pixels in size or smaller. Removed support for 8-bit
texture downloads. Increased max TexSubImage pitch to 65535 from