extensions/NV/NV_pixel_data_range.txt - external/github.com/KhronosGroup/OpenGL-Registry - Git at Google

 Name

     NV_pixel_data_range

 Name Strings

     GL_NV_pixel_data_range

 Contact

     Matt Craighead, NVIDIA Corporation (mcraighead 'at' nvidia.com)

 Notice

     Copyright NVIDIA Corporation, 2000, 2001, 2002.

 IP Status

     NVIDIA Proprietary.

 Status

     Shipping (version 1.0)

 Version

     NVIDIA Date: November 7, 2002 (version 1.0)

 Number

     284

 Dependencies

     Written based on the wording of the OpenGL 1.3 specification.

     If this extension is implemented, the WGL or GLX memory allocator
     interface specified in NV_vertex_array_range must also be
     implemented.  Please refer to the NV_vertex_array_range specification
     for further information on this interface.

 Overview

     The vertex array range extension is intended to improve the
     efficiency of OpenGL vertex arrays.  OpenGL vertex arrays' coherency
     model and ability to access memory from arbitrary locations in memory
     prevented implementations from using DMA (Direct Memory Access)
     operations.

     Many image-intensive applications, such as those that use dynamically
     generated textures, face similar problems.  These applications would
     like to be able to sustain throughputs of hundreds of millions of
     pixels per second through DrawPixels and hundreds of millions of
     texels per second through TexSubImage.

     However, the same restrictions that limited vertex throughput also
     limit pixel throughput.

     By the time that any pixel operation that reads data from user memory
     returns, OpenGL requires that it must be safe for the application to
     start using that memory for a different purpose.  This coherency
     model prevents asynchronous DMA transfers directly out of the user's
     buffer.

     There are also no restrictions on the pointer provided to pixel
     operations or on the size of the data.  To facilitate DMA
     implementations, the driver needs to know in advance what region of
     the address space to lock down.

     Vertex arrays faced both of these restrictions already, but pixel
     operations have one additional complicating factor -- they are
     bidirectional.  Vertex array data is always being transfered from the
     application to the driver and the HW, whereas pixel operations
     sometimes transfer data to the application from the driver and HW.
     Note that the types of memory that are suitable for DMA for reading
     and writing purposes are often different.  For example, on many PC
     platforms, DMA pulling is best accomplished with write-combined
     (uncached) AGP memory, while pushing data should use cached memory so
     that the application can read the data efficiently once it has been
     read back over the AGP bus.

     This extension defines an API where an application can specify two
     pixel data ranges, which are analogous to vertex array ranges, except
     that one is for operations where the application is reading data
     (e.g. glReadPixels) and one is for operations where the application
     is writing data (e.g. glDrawPixels, glTexSubImage2D, etc.).  Each
     pixel data range has a pointer to its start and a length in bytes.

     When the pixel data range is enabled, and if the pointer specified
     as the argument to a pixel operation is inside the corresponding
     pixel data range, the implementation may choose to asynchronously
     pull data from the pixel data range or push data to the pixel data
     range.  Data pulled from outside the pixel data range is undefined,
     while pushing data to outside the pixel data range produces undefined
     results.

     The application may synchronize with the hardware in one of two ways:
     by flushing the pixel data range (or causing an implicit flush) or by
     using the NV_fence extension to insert fences in the command stream.

 Issues

     *   The vertex array range extension required that all active vertex
         arrays must be located inside the vertex array range.  Should
         this extension be equally strict?

         RESOLVED: No, because a user may want to use the pixel data range
         for one type of operation (say, texture downloads) but still be
         able to use standard non-PDR pixel operations for everything
         else.  Requiring that apps disable PDR every time such an
         operation occurs would be burdensome and make it difficult to
         integrate this extension into a larger app with minimal changes.
         So, for each pixel operation, we will look at the pointer
         provided by the application.  If it's inside the PDR, the PDR
         rules apply, and if it's not inside the PDR, it's a standard GL
         pixel operation, even if some of the data is actually inside the
         PDR.

     *   Reads and writes may require different types of memory.  How do
         we handle this?

         RESOLVED: The allocator interface already provides the ability to
         specify different read and write frequencies.  A buffer for a
         write PDR should probably be allocated with a high write
         frequency and low read frequency, while a read PDR's buffer
         should have a low write and high read frequency.

         Having two PDRs is essential because a single application may
         want to perform both asynchronous reads and writes
         simultaneously.

     *   What happens if a PDR pixel operation pulls data from a location
         outside the PDR?

         RESOLVED: The data pulled is undefined, and program termination
         may result.

     *   What happens if a PDR pixel operation pushes data to a location
         outside the PDR?

         RESOLVED: The contents of that memory location become undefined,
         and program termination may result.

     *   What happens if the hardware can't support the operation?

         RESOLVED: The operation may be slow, because we may need to, for
         example, read the pixel data out of uncached memory with the CPU,
         but it should still work.  So this should never be a problem; in
         fact, it means that a basic implementation that accelerates only,
         say, one operation is quite trivial.

     *   Should there be any limitations to what operations should be
         supported?

         RESOLVED: No, in theory any pixel operation that accesses a
         user's buffer can work with PDR.  This includes Bitmap,
         PolygonStipple, GetTexImage, ConvolutionFilter2D, etc.  Many are
         unlikely to be accelerated, but there is no reason to place
         arbitrary restrictions.  A list of possibly supported operations
         is provided for OpenGL 1.2.1 with ARB_imaging support and for all
         the extensions currently supported by NVIDIA.  Developers should
         carefully read the Implementation Details provided by their
         vendor before using the extension.

     *   Should PixelMap and GetPixelMap be supported?

         RESOLVED: Yes.  They're not really pixel path operations, but,
         again, there is no good reason to omit operations, and they _are_
         operations that pass around big chunks of pixel-related data.  If
         we support PolygonStipple, surely we should support this.

     *   Can the PDRs and the VAR overlap and/or be the same buffer?

         RESOLVED: Yes.  In fact, it is expected that one of the preferred
         modes of usage for this extension will be to use the same AGP
         buffer for both the write PDR and the VAR, so it can be used for
         both dynamic texturing and dynamic geometry.

     *   Can video memory buffers be used?

         RESOLVED: Yes, assuming the implementation supports using them
         for PDR.  On systems with AGP Fast Writes, this may be
         interesting in some cases.  Another possible use for this is to
         treat a video memory buffer as an offscreen surface, where
         DrawPixels can be thought of as a blit from offscreen memory to
         a GL surface, and ReadPixels can be thought of as a blit from a
         GL surface to offscreen memory.  This technique should be used
         with caution, because there are other alternatives, such as
         pbuffers, aux buffers, and even textures.

     *   Do we want to support more than one read and one write PDR?

         RESOLVED: No, but I could imagine uses for it.  For example, an
         app could use two system memory buffers (one read, one write PDR)
         and a single video memory buffer (both read and write).  Do we
         need a scheme where an unlimited number of PDR buffers can be
         specified?  Ugh.  I hope not.  I can't think of a good reason to
         use more than 3 buffers, and even that is stretching it.

     *   Do we want a separate enable for both the read and write PDR?

         RESOLVED: Yes.  In theory, they are completely independent, and
         we should treat them as such.

     *   Is there an equivalent to the VAR validity check?

         RESOLVED: No.  When a vertex array call occurs, all the vertex
         array state is already set.  We can know in advance whether all
         the pointers, strides, etc. are set up in a satisfactory way.
         However, for a pixel operation, much of the state is provided on
         the same function call that performs the operation.  For example,
         the pixel format of the data may need to match that of the
         framebuffer.  We can't know this without looking at the format
         and type arguments.

         An alternative might be some sort of "proxy" mechanism for pixel
         operations, but this seems to be very complicated.

     *   Do we want a more generalized API?  What stops us from needing a
         DMA extension for every single conceivable use in the future?

         RESOLVED: No, this is good enough.  Since new extensions will
         probably require new semantics anyhow, we'll just live with that.
         Maybe if the ARB wants to create a more generic "DMA" extension,
         these issues can be revisited.

     *   How do applications synchronize with the hardware?

         RESOLVED: A new command, FlushPixelDataRangeNV, is provided, that
         is analogous to FlushVertexArrayRangeNV.  Applications can also
         use the Finish command.  The NV_fence extension is best for
         applications that need fine-grained synchronization.

     *   Should enabling or disabling a PDR induce an implicit PDR flush?

         RESOLVED: No.  In the VAR extension, enabling and disabling the
         VAR does induce a VAR flush, but this has proven to be more
         problematic than helpful, because it makes it much more difficult
         to switch between VAR and non-VAR rendering; the VAR2 extension
         lifts this restriction, and there is no reason to get this wrong
         a second time.

         The PDR extension does not suffer from the problem of enabling
         and disabling frequently, because non-PDR operations are
         permitted simply by providing a pointer outside of the PDR, but
         there is no clear reason why the enable or disable should cause
         a quite unnecessary PDR flush.

     *   Should this state push/pop?

         RESOLVED: Yes, but via a Push/PopClientAttrib and the
         GL_CLIENT_PIXEL_STORE_BIT bit.  Although this is heavyweight
         state, VAR also allowed push/pop.  It does fit nicely into an
         existing category, too.

     *   Should making another context current cause a PDR flush?

         RESOLVED: No.  There's no fundamental reason it should.  Note
         that apps should be careful to not free their memory until the
         hardware is not using it... note also that this decision is
         inconsistent with VAR, which did guarantee a flush here.

     *   Is the read PDR guaranteed to give you either old or new values,
         or is it truly undefined?

         RESOLVED: Undefined.  This may ease implementation constraints
         slightly.  Apps must not rely at all on the contents of the
         region where the readback is occurring until it is known to be
         finished.

         An example of how an implementation might conceivably require
         this is as follows.  Suppose that a piece of hardware, for some
         reason, can only write full 32-byte chunks of data.  Any bytes
         that were supposed to be unwritten are in fact trashed by the
         hardware, filled with garbage.  By careful fixups (read the
         contents before the operation, restore when done), the driver may
         be able to hide this fact, but a requirement that either new or
         old data must show up would be violated.

         Or, more trivially, you might implement certain pixel operations
         as an in-place postprocess on the returned data.

         It is not anticipated that NVIDIA implementations will need this
         flexibility, but it is nevertheless provided.

     *   How should an application allocate its PDR memory?

         The app should use wglAllocateMemoryNV, even for a read PDR in
         system memory.  Using malloc may result in suboptimal
         performance, because the driver will not be able to choose an
         optimal memory type.  For ReadPixels to system memory, you might
         set a read frequency of 1.0, a write frequency of 0.0, and a
         priority of 1.0.  The driver might allocate PCI memory, or
         physically contiguous PCI memory, or cachable AGP memory, all
         depending on the performance characteristics of the device.
         While memory from malloc will work, it does not allow the driver
         to make these decisions, and it will certainly never give you AGP
         memory.

         Write PDR memory for purposes of streaming textures, etc. works
         exactly the same as VAR memory for streaming vertices.  You can,
         and in fact are encouraged to, use the same circular buffer for
         both vertices and textures.

         If you have different needs (not just streaming textures or
         asynchronous readbacks), you may want your pixel data in video
         memory.

 New Procedures and Functions

     void PixelDataRangeNV(enum target, sizei length, void *pointer)
     void FlushPixelDataRangeNV(enum target)

 New Tokens

     Accepted by the <target> parameter of PixelDataRangeNV and
     FlushPixelDataRangeNV, and by the <cap> parameter of
     EnableClientState, DisableClientState, and IsEnabled:

         WRITE_PIXEL_DATA_RANGE_NV                      0x8878
         READ_PIXEL_DATA_RANGE_NV                       0x8879

     Accepted by the <pname> parameter of GetBooleanv, GetIntegerv,
     GetFloatv, and GetDoublev:

         WRITE_PIXEL_DATA_RANGE_LENGTH_NV               0x887A
         READ_PIXEL_DATA_RANGE_LENGTH_NV                0x887B

     Accepted by the <pname> parameter of GetPointerv:

         WRITE_PIXEL_DATA_RANGE_POINTER_NV              0x887C
         READ_PIXEL_DATA_RANGE_POINTER_NV               0x887D

 Additions to Chapter 2 of the OpenGL 1.3 Specification (OpenGL Operation)

     None.

 Additions to Chapter 3 of the OpenGL 1.3 Specification (Rasterization)


     Add new section to Section 3.6, "Pixel Rectangles", on page 113:

     "3.6.7  Write Pixel Data Range Operation

     Applications can enhance the performance of DrawPixels and other
     commands that transfer large amounts of pixel data by using a pixel
     data range.  The command

        void PixelDataRangeNV(enum target, sizei length, void *pointer)

     specifies one of the current pixel data ranges.  When the write pixel
     data range is enabled and valid, pixel data transfers from within
     the pixel data range are potentially faster.  The pixel data range is
     a contiguous region of (virtual) address space for placing pixel
     data.  The "pointer" parameter is a pointer to the base of the pixel
     data range.  The "length" pointer is the length of the pixel data
     range in basic machine units (typically unsigned bytes).  For the
     write pixel data range, "target" must be WRITE_PIXEL_DATA_RANGE_NV.

     The pixel data range address space region extends from "pointer"
     to "pointer + length - 1" inclusive.

     There is some system burden associated with establishing a pixel data
     range (typically, the memory range must be locked down).  If either
     the pixel data range pointer or size is set to zero, the previously
     established pixel data range is released (typically, unlocking the
     memory).

     The pixel data range may not be established for operating system
     dependent reasons, and therefore, not valid.  Reasons that a pixel
     data range cannot be established include spanning different memory
     types, the memory could not be locked down, alignment restrictions
     are not met, etc.

     The write pixel data range is enabled or disabled by calling
     EnableClientState or DisableClientState with the symbolic constant
     WRITE_PIXEL_DATA_RANGE_NV.

     The write pixel data range is valid when the following conditions are
     met:

       o  WRITE_PIXEL_DATA_RANGE_NV is enabled.

       o  PixelDataRangeNV has been called with a non-null pointer and
          non-zero size, for target WRITE_PIXEL_DATA_RANGE_NV.

       o  The write pixel data range has been established.

       o  An implementation-dependent validity check based on the
          pointer alignment, size, and underlying memory type of the
          write pixel data range region of memory.

     Otherwise, the write pixel data range is not valid.

     The commands, such as DrawPixels, that may be made faster by the
     write pixel data range are listed in the Appendix.

     When the write pixel data range is valid, an attempt will be made to
     accelerate these commands if and only if the data pointer argument to
     the command lies within the write pixel data range.  No attempt will
     be made to accelerate commands whose base pointer is outside this
     range.  Accessing data outside the write pixel data range when the
     base pointer lies within the range and the range is valid will
     produce undefined results and may cause program termination.

     The standard OpenGL pixel data coherency model requires that pixel
     data be extracted from the user's buffer immediately, before the
     pixel command returns.  When the write pixel data range is valid,
     this model is relaxed so that changes made to pixel data until the
     next "write pixel data range flush" may affect pixel commands in non-
     sequential ways.  That is, a call to a pixel command that precedes
     a change to pixel data (without an intervening "write pixel data
     range flush") may access the changed data; though a call to a pixel
     command following a change to pixel data must always access the
     changed data, and never the original data.

     A 'write pixel data range flush' occurs when one of the following
     operations occur:

        o  Finish returns.

        o  FlushPixelDataRangeNV (with target WRITE_PIXEL_DATA_RANGE_NV)
           returns.

        o  PixelDataRangeNV (with target WRITE_PIXEL_DATA_RANGE_NV)
           returns.

     The client state required to implement the write pixel data range
     consists of an enable bit, a memory pointer, and an integer size.

     If the memory mapping of pages within the pixel data range changes,
     using the pixel data range has undefined effects.  To ensure that the
     pixel data range reflects the address space's current state, the
     application is responsible for calling PixelDataRange again after any
     memory mapping changes within the pixel data range."

 Additions to Chapter 4 of the OpenGL 1.3 Specification (Per-Fragment
 Operations and the Frame Buffer)

     Add new section to Section 4.3, "Pixel Draw/Read State", on page 180:

     "4.3.5  Read Pixel Data Range Operation

     The read pixel data range is similar to the write pixel data range
     (see section 3.6.7), but is specified with PixelDataRangeNV with a
     target READ_PIXEL_DATA_RANGE_NV.  It is exactly analogous to the
     write pixel data range, but applies to commands where OpenGL returns
     pixel data to the caller, such as ReadPixels.  The list of commands
     to which the read pixel data range applies can be found in the
     Appendix.

     Validity checks and flushes of the read pixel data range behave in a
     manner exactly analogous to those of the write pixel data range,
     though any implementation-dependent checks may differ between the two
     types of pixel data range.

     The standard OpenGL pixel data coherency model requires that pixel
     data be written into the user's buffer immediately, before the
     pixel command returns.  When the read pixel data range is valid,
     this model is relaxed so that this data may not necessarily be
     available until the next "read pixel data range flush".  Until such
     point in time, an attempt to read the buffer returns undefined
     values.

     If both the read and write pixel data ranges are valid and overlap,
     then all operations involving both in the same thread are
     automatically synchronized.  That is, the write pixel data range
     operation will automatically wait for any pending read pixel data
     range results to become available before attempting to retrieve them.
     However, if the operations are performed from different threads, the
     user is responsible for all such synchronization.

     Read pixel data range operations are also synchronized with vertex
     array range operations in the same way.

     The client state required to implement the read pixel data range
     consists of an enable bit, a memory pointer, and an integer size."

 Additions to Chapter 5 of the OpenGL 1.3 Specification (Special Functions)

     Add the following to the end of Section 5.4 "Display Lists" (page
     179):

     "PixelDataRangeNV and FlushPixelDataRangeNV are not complied into
     display lists but are executed immediately.

     If a display list is compiled while WRITE_PIXEL_DATA_RANGE_NV is
     enabled, all commands affected by that enable are accumulated into a
     display list as if WRITE_PIXEL_DATA_RANGE_NV is disabled.

     The state of the read pixel data range does not affect display list
     compilation, because those commands that might be accelerated by a
     read pixel data range are commands that are executed immediately
     rather than being compiled into a display list (ReadPixels and
     GetTexImage, for example)."

 Additions to Chapter 6 of the OpenGL 1.3 Specification (State and
 State Requests)

     None.

 Additions to the GLX Specification

     "OpenGL implementations using GLX indirect rendering should fail to
     set up the pixel data range and will not accelerate any pixel
     operations using it.  Additionally, glXAllocateMemoryNV always fails
     to allocate memory (returns NULL) when used with an indirect
     rendering context."

 GLX Protocol

     None

 Errors

     INVALID_OPERATION is generated if PixelDataRangeNV or
     FlushPixelDataRangeNV is called between the execution of Begin and
     the corresponding execution of End.

     INVALID_ENUM is generated if PixelDataRangeNV or
     FlushPixelDataRangeNV is called when target is not
     WRITE_PIXEL_DATA_RANGE_NV or READ_PIXEL_DATA_RANGE_NV.

     INVALID_VALUE is generated if PixelDataRangeNV is called when length
     is negative.

 New State

                                                               Initial
    Get Value                          Get Command     Type    Value    Attrib
    ---------                          -----------     ----    -------  ------
    WRITE_PIXEL_DATA_RANGE_NV          IsEnabled       B       False    pixel-store
    READ_PIXEL_DATA_RANGE_NV           IsEnabled       B       False    pixel-store
    WRITE_PIXEL_DATA_RANGE_POINTER_NV  GetPointerv     Z+      0        pixel-store
    READ_PIXEL_DATA_RANGE_POINTER_NV   GetPointerv     Z+      0        pixel-store
    WRITE_PIXEL_DATA_RANGE_LENGTH_NV   GetIntegerv     Z+      0        pixel-store
    READ_PIXEL_DATA_RANGE_LENGTH_NV    GetIntegerv     Z+      0        pixel-store

 Appendix: Operations Supported

     In unextended OpenGL 1.3 with ARB_imaging support, the following
     commands may take advantage of the write PDR:

         glBitmap
         glColorSubTable
         glColorTable
         glCompressedTexImage1D
         glCompressedTexImage2D
         glCompressedTexImage3D
         glCompressedTexSubImage1D
         glCompressedTexSubImage2D
         glCompressedTexSubImage3D
         glConvolutionFilter1D
         glConvolutionFilter2D
         glDrawPixels
         glPixelMapfv
         glPixelMapuiv
         glPixelMapusv
         glPolygonStipple
         glSeparableFilter2D
         glTexImage1D
         glTexImage2D
         glTexImage3D
         glTexSubImage1D
         glTexSubImage2D
         glTexSubImage3D

     In unextended OpenGL 1.3 with ARB_imaging support, the following
     commands may take advantage of the read PDR:

         glGetColorTable
         glGetCompressedTexImage
         glGetConvolutionFilter
         glGetHistogram
         glGetMinmax
         glGetPixelMapfv
         glGetPixelMapuiv
         glGetPixelMapusv
         glGetPolygonStipple
         glGetSeparableFilter
         glGetTexImage
         glReadPixels

     No other extensions shipping in the NVIDIA OpenGL drivers add any
     other new commands that may take advantage of this extension,
     although in a few cases there are new commands that alias to other
     commands that may be accelerated by this extension.  These commands
     are:

         glCompressedTexImage1DARB (ARB_texture_compression)
         glCompressedTexImage2DARB (ARB_texture_compression)
         glCompressedTexImage3DARB (ARB_texture_compression)
         glCompressedTexSubImage1DARB (ARB_texture_compression)
         glCompressedTexSubImage2DARB (ARB_texture_compression)
         glCompressedTexSubImage3DARB (ARB_texture_compression)
         glColorSubTableEXT (EXT_paletted_texture)
         glColorTableEXT (EXT_paletted_texture)
         glGetCompressedTexImageARB (ARB_texture_compression)
         glTexImage3DEXT (EXT_texture3D)
         glTexSubImage3DEXT (EXT_texture3D)

 NVIDIA Implementation Details

     In the Release 40 OpenGL drivers, the NV_pixel_data_range extension
     is supported on all GeForce/Quadro-class hardware.  The following
     commands may potentially be accelerated in this release:

         glReadPixels
         glTexImage2D
         glTexSubImage2D
         glCompressedTexImage2D
         glCompressedTexImage3D
         glCompressedTexSubImage2D

     The following type/format/buffer format sets are accelerated for
     glReadPixels:

       type                            format               buffer format
       -----------------------------------------------------------------------------------------------
       GL_UNSIGNED_SHORT_5_6_5         GL_RGB               16-bit color (PCs only -- Macs use 555)
       GL_UNSIGNED_INT_8_8_8_8_REV     GL_BGRA              32-bit color w/ alpha
       GL_UNSIGNED_BYTE                GL_BGRA              32-bit color w/ alpha (little endian only)
       GL_UNSIGNED_SHORT               GL_DEPTH_COMPONENT   16-bit depth
       GL_UNSIGNED_INT_24_8_NV         GL_DEPTH_STENCIL_NV  24-bit depth, 8-bit stencil

     The following internalformat/type/format sets are accelerated for
     glTex[Sub]Image2D:

       internalformat              type                            format
       -------------------------------------------------------------------------------
       GL_RGB5                     GL_UNSIGNED_SHORT_5_6_5         GL_RGB
       GL_RGB8                     GL_UNSIGNED_INT_8_8_8_8_REV     GL_BGRA
       GL_RGBA4                    GL_UNSIGNED_SHORT_4_4_4_4_REV   GL_BGRA
       GL_RGB5_A1                  GL_UNSIGNED_SHORT_1_5_5_5_REV   GL_BGRA
       GL_RGBA8                    GL_UNSIGNED_INT_8_8_8_8_REV     GL_BGRA

       GL_DEPTH_COMPONENT16_SGIX   GL_UNSIGNED_SHORT               GL_DEPTH_COMPONENT
       GL_DEPTH_COMPONENT24_SGIX   GL_UNSIGNED_INT_24_8_NV         GL_DEPTH_STENCIL_NV

     The following internalformat/type/format sets will be accelerated for
     glTex[Sub]Image2D on little-endian machines only:

       internalformat              type                            format
       -------------------------------------------------------------------------------
       GL_LUMINANCE8_ALPHA8        GL_UNSIGNED_BYTE                GL_LUMINANCE_ALPHA

       GL_RGB8                     GL_UNSIGNED_BYTE                GL_BGRA
       GL_RGBA8                    GL_UNSIGNED_BYTE                GL_BGRA

     All compressed texture formats are supported for
     glCompressedTex[Sub]Image[2,3]D.

     The following restrictions apply to all commands:
     - No pixel transfer operations of any kind may be in use.
     - The base address of the PDR must be aligned to a 32-byte boundary.
     - The data pointer must be aligned to boundaries of the size of one
       group of pixels.  For example, GL_UNSIGNED_SHORT_5_6_5 data must
       be aligned to 2-byte boundaries, GL_UNSIGNED_INT_24_8_NV data must
       be aligned to 4-byte boundaries, and GL_BGRA/GL_UNSIGNED_BYTE data
       must be aligned to 4-byte boundaries (not 1-byte boundaries).
       Compressed texture data must be aligned to a block boundary.

     No additional restrictions apply to glReadPixels or
     glCompressedTex[Sub]Image[2,3]D.

     The following additional restrictions apply to glTex[Sub]Image2D:
     - The texture must fit in video memory.
     - The texture must have a border size of zero.
     - The stride (in bytes) between two lines of source data must not
       exceed 65535.
     - For non-rectangle textures, the width and height of the destination
       mipmap level must not exceed 2048, nor be below 2; also, the
       destination mipmap level must not be 2x2 (for 16-bit textures) or
       2x2, 4x2, or 2x4 (for 8-bit textures).

     Future software releases may increase the number of accelerated
     commands and the number of accelerated data formats for each command.
     Note also that although all of the formats and commands listed are
     guaranteed to be accelerated, there may be limitations in the actual
     implementation not as strict as those stated here; for example, some
     data formats not listed here may turn out to be accelerated.
     However, it is highly recommended that you stick to the formats and
     commands listed in this section.  In cases where actual restrictions
     are less strict, future implementations may very well enforce the
     listed restriction.

     It is also possible that some of these restrictions may become _more_
     strict on future chips; though at present no such additional
     restrictions are known to be likely.  Such restrictions would likely
     take the form of more stringent pitch or alignment restrictions, if
     they proved to be necessary.

     In practice, you should expect that several of these restrictions
     will be more lenient in a future release.

 Revision History

     November 7, 2002 - Updated implementation details section with most
     up-to-date rules on PDR usage.  Lifted rule that texture downloads
     must be 2046 pixels in size or smaller.  Removed support for 8-bit
     texture downloads.  Increased max TexSubImage pitch to 65535 from
     8191.