extensions/NV/NV_fragment_program.txt - external/github.com/KhronosGroup/OpenGL-Registry - Git at Google

 Name

     NV_fragment_program

 Name Strings

     GL_NV_fragment_program

 Contact

     Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com)
     Mark J. Kilgard, NVIDIA Corporation (mjk 'at' nvidia.com)

 Notice

     Copyright NVIDIA Corporation, 2001-2002.

 IP Status

     NVIDIA Proprietary.

 Status

     Implemented in CineFX (NV30) Emulation driver, August 2002.
     Shipping in Release 40 NVIDIA driver for CineFX hardware, January 2003.

 Version

     Last Modified Date:  2005/05/24
     NVIDIA Revision:     73

 Number

     282

 Dependencies

     Written based on the wording of the OpenGL 1.2.1 specification and
     requires OpenGL 1.2.1.

     Requires support for the ARB_multitexture extension with at least
     two texture units.

     NV_vertex_program affects the definition of this extension.  The only
     dependency is that both extensions use the same mechanisms for defining
     and binding programs.

     NV_texture_shader trivially affects the definition of this extension.

     NV_texture_rectangle trivially affects the definition of this extension.

     ARB_texture_cube_map trivially affects the definition of this extension.

     EXT_fog_coord trivially affects the definition of this extension.

     NV_depth_clamp affects the definition of this extension.

     ARB_depth_texture and SGIX_depth_texture affect the definition of this
     extension.

     NV_float_buffer affects the definition of this extension.

     ARB_vertex_program affects the definition of this extension.

     ARB_fragment_program affects the definition of this extension.

 Overview

     OpenGL mandates a certain set of configurable per-fragment computations
     defining texture lookup, texture environment, color sum, and fog
     operations.  Each of these areas provide a useful but limited set of fixed
     operations.  For example, unextended OpenGL 1.2.1 provides only four
     texture environment modes, color sum, and three fog modes.  Many OpenGL
     extensions have either improved existing functionality or introduced new
     configurable fragment operations.  While these extensions have enabled new
     and interesting rendering effects, the set of effects is limited by the
     set of special modes introduced by the extension.  This lack of
     flexibility is in contrast to the high-level of programmability of
     general-purpose CPUs and other (frequently software-based) shading
     languages.  The purpose of this extension is to expose to the OpenGL
     application writer an unprecedented degree of programmability in the
     computation of final fragment colors and depth values.

     This extension provides a mechanism for defining fragment program
     instruction sequences for application-defined fragment programs.  When in
     fragment program mode, a program is executed each time a fragment is
     produced by rasterization.  The inputs for the program are the attributes
     (position, colors, texture coordinates) associated with the fragment and a
     set of constant registers.  A fragment program can perform mathematical
     computations and texture lookups using arbitrary texture coordinates.  The
     results of a fragment program are new color and depth values for the
     fragment.

     This extension defines a programming model including a 4-component vector
     instruction set, 16- and 32-bit floating-point data types, and a
     relatively large set of temporary registers.  The programming model also
     includes a condition code vector which can be used to mask register writes
     at run-time or kill fragments altogether.  The syntax, program
     instructions, and general semantics are similar to those in the
     NV_vertex_program and NV_vertex_program2 extensions, which provide for the
     execution of an arbitrary program each time the GL receives a vertex.

     The fragment program execution environment is designed for efficient
     hardware implementation and to support a wide variety of programs.  By
     design, the entire set of existing fragment programs defined by existing
     OpenGL per-fragment computation extensions can be implemented using the
     extension's programming model.

     The fragment program execution environment accesses textures via
     arbitrarily computed texture coordinates.  As such, there is no necessary
     correspondence between the texture coordinates and texture maps previously
     lumped into a single "texture unit".  This extension separates the notion
     of "texture coordinate sets" and "texture image units" (texture maps and
     associated parameters), allowing implementations with a different number
     of each.  The initial implementation of this extension will support 8
     texture coordinate sets and 16 texture image units.

 Issues

     What limitations exist in this extension?

         RESOLVED:  Very few.  Programs can not exceed a maximum program length
         (which is no less than 1024 instructions), and can use no more than
         32-64 temporary registers.  Programs can not access more than one
         fragment attribute or program parameter (constant) per instruction,
         but can work around this restriction using temporaries.  The number of
         textures that can be used by a program is limited to the number of
         texture image units provided by the implementation (16 in the initial
         implementation of this extension).

         These limits are fairly high.  Additionally, there is no limit on the
         total number of texture lookups that can be performed by a program.
         There is no limit on the length of a texture dependency chain -- one
         can write a program that performs over 1000 consecutive dependent
         texture lookups.  There is no restrictions on dependencies between
         texture mapping instructions and arithmetic instructions.  Texture
         lookups can be performed using arbitrarily computed texture
         coordinates.  Applications can carry out their calculations with full
         32-bit single precision, although two lower-precision modes are also
         available.

     How does texture mapping work with fragment programs?

         RESOLVED:  This extension provides three instructions used to perform
         texture lookups.

         The "TEX" instruction performs a lookup with the (s,t,r) values taken
         from an interpolated texture coordinate, an arbitrarily computed
         vector, or even a program constant.  The "TXP" instruction performs a
         similar lookup, except that it uses the fourth component of the source
         vector to performs a perspective divide, using (s/q, t/q, r/q).  In
         both cases, the GL will automatically compute partial derivatives used
         for filter and LOD selection.

         The "TXD" instruction operates like "TEX", except that it allows the
         program to explicitly specify two additional vectors containing the
         partial derivatives of the texture coordinate with respect to x and y
         window coordinates.

         All three instructions write a filtered texel value to a temporary or
         output register.  Other than the computation of texture coordinates
         and partial derivatives, texture lookups not performed any differently
         in fragment program mode.  In particular, any applicable LOD biases,
         wrap modes, minification and magnification filters, and anisotropic
         filtering controls are still applied in fragment program mode.

         The results of the texture lookup are available to be used arbitrarily
         by subsequent fragment program instructions.  Fragment programs are
         allowed to access any texture map arbitrarily many times.

     Can fragment programs be used to compute depth values?

          RESOLVED:  Yes.  A fragment program can perform arbitrary
          computations to compute a final value for the fragment, which it
          should write to the "z" component of the o[DEPR] register.  The "z"
          value written should be in the range [0,1], regardless of the size of
          the depth buffer.

          To assist in the computation of the final Z value, a fragment program
          can access the interpolated depth of the fragment (prior to any
          displacement) by reading the "z" component of the f[WPOS] attribute
          register.

     How should near and far plane clipping work in fragment program mode if
     the current fragment program computes a depth value?

         RESOLVED:  Geometric clipping to the near and far clip plane should be
         disabled.  Clipping should be done based on the depth values computed
         per-fragment.  The rationale is that per-fragment depth displacement
         operations may effectively move portions of a primitive initially
         outside the clip volume inside, and vice versa.

         Note that under the NV_depth_clamp extension, geometric clipping to
         the near and far clip planes is also disabled, and the fragment depth
         values are clamped to the depth range.  If depth clamp mode is enabled
         when using a fragment program that computes a depth value, the
         computed depth value will be clamped to the depth range.

     Should fragment programs be allowed to use multiple precisions for
     operands and operations?

         RESOLVED:  Yes.  Low-precision operands are generally adequate for
         representing colors.  Allowing low-precision registers also allows for
         a larger number of temporary registers (at lower precision).
         Low-precision operations also provide the opportunity for a higher
         level of performance.

         Applications are free to use only high-precision operations or mix
         high- and low-precision operations as necessary.

     What levels of precision are supported in arithmetic operations?

         RESOLVED:  Arithmetic operations can be performed at three different
         precisions.  32-bit floating point precision (fp32) uses the IEEE
         single-precision standard with a sign bit, 8 exponent bits, and 23
         mantissa bits.  16-bit floating-point precision (fp16) uses a similar
         floating-point representation, but with 5 exponent bits and 10
         mantissa bits.  Additionally, many arithmetic operations can also be
         carried out at 12-bit fixed point precision (fx12), where values in
         the range [-2,+2) are represented as signed values with 10 fraction
         bits.

     How should the precision with which operations are carried out be
     specified?  Should we infer the precision from the types of the operands
     or result vectors?  Or should it be an attribute of the instruction?

         RESOLVED:  Applications can optionally specify the precision of
         individual instructions by adding a suffix of "R", "H", and "X" to
         instruction names to select fp32, fp16, and fx12 precision,
         respectively.

         By default, instructions will be carried out using the precision of
         the destination register.  Always inferring the precision from the
         operands has a number of issues.  First, there are a number of
         operations (e.g., TEX/TXP/TXD) where result type has little to no
         correspondance to the type of the operands.  In these cases, precision
         suffixes are not supported.  Second, one could have instructions
         automatically cast operands and compute results using the type of the
         highest precision operand or result.  This behavior would be
         problematic since all fragment attribute registers and program
         parameters are kept at full precision, but full precision may not be
         needed by the operation.

         The choice of precision level allows programs to trade off precision
         for potentially higher performance.  Giving the program explicit
         control over the precision also allows it to dictate precision
         explicitly and eliminate any uncertainty over type casting.

     For instructions whose specified precision is different than the precision
     of the operands or the result registers, how are the operations performed?
     How are the condition codes updated?

         RESOLVED:  Operations are performed with operands and results at the
         precision specified by the instruction.  After the operation is
         complete, the result is converted to the precision of the destination
         register, after which the condition code is generated.

         In an alternate approach, the condition code could be generated from
         the result.  However, in some cases, the register contents would not
         match the condition code.  In such cases, it may not be reliable to
         use the condition code to prevent division by zero or other special
         cases.

     How does this extension interact with the ARB_multisample extension?  In
     the ARB_multisample extension, each fragment has multiple depth values.
     In this extension, a single interpolated depth value may be modified by a
     fragment program.

         RESOLVED:  The depth values for the extra samples are generated by
         computing partials of the computed depth value and using these
         partials to derive the depth values for each of the extra samples.

     How does this extension interact with polygon offset?  Both extensions
     modify fragment depth values.

         RESOLVED:  As in the base OpenGL spec, the depth offset generated by
         polygon offset is added during polygon rasterization.  The depth value
         provided to programs in f[WPOS].z already includes polygon offset, if
         enabled.  If the depth value is replaced by a fragment program, the
         polygon offset value will NOT be recomputed and added back after
         program execution.

         This is probably not desirable for fragment programs that modify depth
         values since the partials used to generate the offset may not match
         the partials of the computed depth value.  Polygon offset for filled
         polygons can be approximated in a fragment program using the depth
         partials obtained by the DDX and DDY instructions.  This will not work
         properly for line- and point-mode polygons, since the partials used
         for offset are computed over the polygon, while the partials resulting
         from the DDX and DDY instructions are computed along the line (or are
         zero for point-mode polygons).  In addition, separate treatment of
         points, line segments, and polygons is not possible in a fragment
         program.

     Should depth component replacement be an property of the fragment program
     or a separate enable?

         RESOLVED:  It should be a program property.  Using the output register
         notation simplifies matters:  depth components are replaced if and
         only if the DEPR register is written to.  This alleviates the
         application and driver burden of maintaining separate state.

     How does this extension affect the handling of q texture coordinates in
     the OpenGL spec?

         RESOLVED:  Fragment programs are allowed to access an associated q
         texture coordinate, so this attribute must be produced by
         rasterization.  In unextended OpenGL 1.2, the q coordinate is
         eliminated in the rasterization portions of the spec after dividing
         each of s, t, and r by it.  This extension updates the specification
         to pass q coordinates through at least to conventional texture
         mapping.  When fragment program mode are disabled, q coordinates will
         be eliminated there in an identical manner.  This modification has the
         added benefit of simplifying the equations used for attribute
         interpolation.

     How should clip w coordinates be handled by this extension?

         RESOLVED:  Fragment programs are allowed to access the reciprocal of
         the clip w coordinate, so this attribute must be produced by
         rasterization.  The OpenGL 1.2 spec doesn't explictly enumerate the
         attributes associated with the fragment, but we add treatment of the w
         clip coordinate in the appropriate locations.

         The reciprocal of the clip w coordinate in traditional graphics
         hardware is produced by screen-space linear interpolation of the
         reciprocals of the clip w coordinates of the vertices.  However, this
         spec says the clip w coordinate is produced by perspective-correct
         interpolation of the (non-reciprocated) clip w vertex coordinates.
         These two formulations turn out to be equivalent, and the latter is
         more convenient since the core OpenGL spec already contains formulas
         for perspective-correct interpolation of vertex attributes.

     What is produced by the TEX/TXP/TXD instructions if the requested texture
     image is inconsistent?

         RESOLVED:  The result vector is specified to be (0,0,0,0).  This
         behavior is consistent with the NV_texture_shader extension.  Note
         that like in NV_texture_shader, these instructions ignore the standard
         hierarchy of texture enables and programs can access textures that are
         not specifically "enabled".

     Should a minimum precision be specified for certain fragment attribute
     registers (in particular COL0, COL1) that may not be generated with full
     fp32 precision?

         RESOLVED:  No.  It is expected that the precision of COL0/COL1 should
         generally be at least as high as that of the frame buffer.

     Fragment color components (f[COL0] and f[COL1]) are generally
     low-precision fixed-point values in the range [0,1].  Is it possible to
     pass unclamped or high-precision color components to fragment programs?

         RESOLVED:  Yes, although you can't exactly call them "colors".
         High-precision per-vertex color values can be written into any unused
         texture coordinate set, either via a MultiTexCoord call or using a
         vertex program.  These "texture coordinates" will be interpolated
         during rasterization, and can be used arbitrarily by a fragment
         program.

         In particular, there is no requirement that per-fragment attributes
         called "texture coordinates" be used for texture mapping.

     Should this specification guarantee that temporary registers are
     initialized to zero?

         RESOLVED:  Yes.  This will allow for the modular construction of
         programs that accumulate results in registers.  For example,
         per-fragment lighting may use MAD instructions to accumulate color
         contributions at each light.  Without zero-initialization, the program
         would require an explicit MOV instruction to load 0 or the use of the
         MUL instruction for the first light.

     Should this specification support Unicode program strings?

         RESOLVED:  Not necessary.

     Programs defined by NV_vertex_program begin with "!!VP1.0".  Should
     fragment programs have a similar identifier?

         RESOLVED:  Yes, "!!FP1.0", identifying the first revision of this
         fragment program language.

     Should per-fragment attributes have equivalent integer names in the
     program language, as per-vertex attributes do in NV_vertex_program?

         RESOLVED:  No.  In NV_vertex_program, "generic" vertex attributes
         could be specified directly by an application using only an attribute
         number.  Those numbers may have no necessary correlation with the
         conventional attribute names, although conventional vertex attributes
         are mapped to attribute numbers.  However, conventional attributes are
         the only outputs of vertex programs and of rasterization.  Therefore,
         there is no need for a similar input-by-number functionality for
         fragment programs.

     Should we provide the ability to issue instructions that do not update
     temporary or output registers?

         RESOLVED:  Yes.  Programs may issue instructions whose only purpose is
         to update the condition code register, and requiring such instructions
         to write to a temporary may require the use of an additional temporary
         and/or defeat possible program optimizations.  We accomplish this by
         adding two write-only temporary pseudo-registers ("RC" and "HC") that
         can be specified as destination registers.

     Do the packing and unpacking instructions in this extension make any
     sense?

         RESOLVED:  Yes.  They are useful for packing and unpacking multiple
         components in a single channel of a floating-point frame buffer.  For
         example, a 128-bit "RGBA" frame buffer could pack 16 8-bit quantities
         or 8 16-bit quantities, all of which could be used in later
         rasterization passes.  See the NV_float_buffer extension for more
         information.

     Should we provide a method for specifying an fp16 depth component output
     value?

         RESOLVED:  No.  There is no good reason for supporting half-precision
         Z outputs.  Even with 16-bit Z buffers, the 10-bit mantissa of the
         half-precision float is rather limiting.  There would effectively be
         only 11 good bits in the back half of the Z buffer.

     Should RequestResidentProgramsNV (or a new equivalent function) take a
     target?  Dealing with working sets of different program types is a bit
     messy.  Should we document some limitation if we get programs of different
     types?

         RESOLVED:  In retrospect, it may have been a good idea to attach a
         target to this command, but there isn't a good reason to mess with
         something that already works for vertex programs.  The driver is
         responsible for ensuring consistent results when the program types
         specified are mixed.

     What happens on data type conversions where the original value is not
     exactly representable in the new data type, either due to overflow or
     insufficient precision in the destination type?

         RESOLVED:  In case of overflow, the original value is clamped to the
         +/-INF (fp16 or fp32) or the nearest representable value (fx12).  In
         case of imprecision, the conversion is either to round or truncate to
         the nearest representable value.

     Should this extension support IEEE-style denorms?  For 32-bit IEEE
     floating point, denorms are numbers smaller in absolute value than 2^-126.
     For 16-bit floats used by this extension, denorms are numbers smaller in
     absolute value than 2^-14.

         RESOLVED:  For 32-bit data types, hardware support for denorms was
         considered too expensive relative to the benefit provided.
         Computational results that would otherwise produce denorms are flushed
         to zero.  For 16-bit data types, hardware denorm support will be
         present.  The expense of hardware denorm support is lower and the
         potential precision benefit is greater for 16-bit data types.

     OpenGL provides a hierarchy of texture enables.  The texture lookup
     operations in NV_texture_shader effectively override the texture enable
     hierarchy and select a specific texture to enable.  What should be done by
     this extension?

         RESOLVED:  This extension will build upon NV_texture_shader and reduce
         the driver overhead of validating the texture enables.  Texture
         lookups can be specified by instructions like "TEX H0, f[TEX2], TEX2,
         3D", which would indicate to use texture coordinate set number 2 to do
         a lookup in the texture object bound to the TEXTURE_3D target in
         texture image unit 2.

         Each texture unit can have only one "active" target.  Programs are not
         allowed to reference different texture targets in the same texture
         image unit.  In the example above, any other texture instructions
         using texture image unit 2 must specify the 3D texture target.

     What is the interaction with NV_register_combiners?

         RESOLVED:  Register combiners are not available when fragment programs
         are enabled.

         Previous version of this specification supported the notion of
         combiner programs, where the result of fragment program execution was
         a set of four "texture lookup" values that fed the register combiners.

     For convenience, should we include pseudo-instructions not present in the
     hardware instruction set that are trivially implementable?  For example,
     absolute value and subtract instructions could fall in this category.  An
     "ABS R1,R0" instruction would be equivalent to "MAX R1,R0,-R0", and a "SUB
     R2,R0,R1" would be equivalent to "ADD R2,R0,-R1"

         RESOLVED:  In general, yes.  A SUB instruction is provided for
         convenience.  This extension does not provide a separate ABS
         instruction because it supports absolute value operations of each
         operand.

     Should there be a '+' in the <optionalSign> portion of the grammar?  There
     isn't one in the GL_NV_vertex_program spec.

         RESOLVED:  Yes, for orthogonality/readability.  A '+' obviously adds
         no functionality.  In NV_vertex_program, an <optionalSign> of "-" was
         always a negation operator.  However, in fragment programs, it can
         also be used as a sign for a constant value.

     Can the same fragment attribute register, program parameter register, or
     constants be used for multiple operands in the same instruction?  If so,
     can it be used with different swizzle patterns?

         RESOLVED:  Yes and yes.

     This extension allows different limits for the number of texture
     coordinate sets and the number of texture image units (i.e., texture maps
     and associated data).  The state in ActiveTextureARB affects both
     coordinate sets (TexGen, matrix operations) and image units (TexParameter,
     TexEnv).  How should we deal with this?

         RESOLVED:  Continue to use ActiveTextureARB and emit an
         INVALID_OPERATION if the active texture refers to an unsupported
         coordinate set/image unit.  Other options included creating dummy
         (unusable) state for unsupported coordinate sets/image units and
         continue to use ActiveTextureARB normally, or creating separate state
         and state-setting commands for coordinate sets and image units.
         Separate state is the cleanest solution, but would add more calls and
         potentially cause more programmer confusion.  Dummy state would avoid
         additional error checks, but the demands of dummy state could grow if
         the number of texture image units and texture coordinate sets
         increases.

         The current OpenGL spec is vague as to what state is affected by the
         active texture selector and has no distination between
         coordinate-related and image-related state.  The state tables could
         use a good clean-up in this area.

     The LRP instruction is defined so that the result of "LRP R0, R0, R1, R2"
     is R0*R1+(1-R0)*R2.  There are conflicting precedents here.  The
     definition here matches the "lrp" instruction in the DirectX 8.0 pixel
     shader language.  However, an equivalent RenderMan lerp operation would
     yield a result of (1-R0)*R1+R0*R2.  Which ordering should be implemented?

         RESOLVED:  NVIDIA hardware implements the former operand ordering, and
         there is no good reason to specify a different ordering.  To convert a
         "LRP" using the latter ordering to NV_fragment_program, swap the third
         and fourth arguments.

     Should this extension provide tracking of matrices or any other state,
     similar to that provided in NV_vertex_program?

         RESOLVED:  No.

     Should this extension provide global program parameters -- values shared
     between multiple fragment programs?

         RESOLVED:  No.

     Should this extension provide program parameters specific to a program?
     If so, how?

         RESOLVED:  Yes.  These parameters will be called "local parameters".
         This extension will provide both named and numbered local parameters.
         Local parameters can be managed by the driver and eliminate the need
         for applications to manage a global name space.

         Named local parameters work much like standard variable names in most
         programming languages.  They are created using the "DECLARE"
         instruction within the fragment program itself.  For example:

             DECLARE color = {1,0,0,1};

         Named local parameters are used simply by referencing the variable
         name.  They do not require the array syntax like the global parameters
         in the NV_vertex_program extension.  They can be updated using the
         commands ProgramNamedParameter4[f,fv]NV.

         Numbered local parameters are not declared.  They are used by simply
         referencing an element of an array called "p".  For example,

             MOV R0, p[12];

         loads the value of numbered local parameter 12 into register R0.
         Numbered local parameters can be updated using the commands
         ProgramLocalParameter4[d,dv,f,fv]ARB.

         The numbered local parameter APIs were added to this extension late in
         its development, and are provided for compatibility with the
         ARB_vertex_program extension, and what will likely be supported in
         ARB_fragment_program as well.  Providing this mechanism allows
         programs to use the same mechanisms to set local parameters in both
         extension.

     Why are the APIs for setting named and numbered local parameters
     different?

         RESOLVED:  The named parameter API was created prior to
         ARB_vertex_program (and the possible future ARB_fragment_program) and
         uses conventions borrowed from NV_vertex_program.  A slightly
         different API was chosen during the ARB standardization process; see
         the ARB_vertex_program specification for more details.

         The named parameter API takes a program ID and a parameter name, and
         sets the parameter for the program with the specified ID.  The
         specified program does not need to be bound (via BindProgramNV) in
         order to modify the values of its named parameters.  The numbered
         parameter API takes a program target enum (FRAGMENT_PROGRAM_NV) and a
         parameter number and modifies the corresponding numbered parameter of
         the currently bound program.

     What should be the initial value of uninitialized local parameters?

         RESOLVED:  (0,0,0,0).  This choice is somewhat arbitrary, but matches
         previous extensions (e.g., NV_vertex_program).

     Should this extension support program parameter arrays?

         RESOLVED:  No hardware support is present.  Note that from the point
         of view of a fragment program, a texture map can be used as a 1-, 2-,
         or 3-dimensional array of constants.

     Should this extension provide support constants in fragment programs?  If
     so, how?

         RESOLVED:  Yes.  Scalar or vector constants can be defined inline
         (e.g., "1.0" or "{1,2,3,4}").  In addition, named constants are
         supported using the "DEFINE" instruction, which allow programmers to
         change the values of constants used in multiple instructions simply be
         changing the value assigned to the named constant.

         Note that because this extension uses program strings, the
         floating-point value of any constants generated on the fly must be
         printed to the program string.  An alternate method that avoids the
         need to print constants is to declare a named local program parameter
         and initialize it with the ProgramNamedParameter4[f,fv]() calls.

     Should named constants be allowed to be redefined?

         RESOLVED:  No.  If you want to redefine the values of constants, you
         can create an equivalent named program parameter by changing the
         "DEFINE" keyword to "DECLARE".

     Should functions used to update or query named local parameters take a
     zero-terminated string (as with most strings in the C programming
     language), or should they require an explicit string length?  If the
     former, should we create a version of LoadProgramNV that does not require
     a string length.

         RESOLVED:  Stick with explicit string length.  Strings that are
         defined as constants can have the length computed at compile-time.
         Strings read from files will have the length known in advance.
         Programs to build strings at run-time also likely keep the length
         up-to-date.  Passing an explicit length saves time, since the driver
         doesn't have to do a strlen().

     What is the deal with the alpha of the secondary color?

         RESOLVED:  In unextended OpenGL 1.2, the alpha component of the
         secondary color is forced to 0.0.  In the EXT_secondary_color
         extension, the alpha of the per-vertex secondary colors is defined to
         be 0.0.  NV_vertex_program allows vertex programs to produce a
         per-vertex alpha component, but it is forced to zero for the purposes
         of the color sum.  In the NV_register_combiners extension, the alpha
         component of the secondary color is undefined.  What a mess.

         In this extension, the alpha of the secondary color is well-defined
         and can be used normally.  When in vertex program mode

     Why are fragment program instructions involving f[FOGC] or f[TEX0] through
     f[TEX7] automatically carried out at full precision?

         RESOLVED:  This is an artifact of the method that these interpolants
         are generated the NVIDIA graphics hardware.  If such instructions
         absolutely must be carried out at lower precision, the requirement can
         be met by first loading the interpolants into a temporary register.

     With a different number of texture coordinate sets and texture image
     units, how many copies of each kind of texture state are there?

         RESOLVED:  The intention is that texture state be broken into three
         groups.  (1) There are MAX_TEXTURE_COORDS_NV copies of texture
         coordinate set state, which includes current texture coordinates,
         TexGen state, and texture matrices.  (2) There are
         MAX_TEXTURE_IMAGE_UNITS_NV copies of texture image unit state, which
         include texture maps, texture parameters, LOD bias parameters.  (3)
         There are MAX_TEXTURE_UNITS_ARB copies of legacy OpenGL texture unit
         state (e.g., texture enables, TexEnv blending state), all of which are
         unused when in fragment program mode.

         It is not necessary that MAX_TEXTURE_UNITS_ARB be equal to the minimum
         of MAX_TEXTURE_COORDS_NV and MAX_TEXTURE_IMAGE_UNITS --
         implementations may choose not to extend fixed-function OpenGL texture
         mapping modes beyond a certain point.

     The GLX protocol for LoadProgramNV (and ProgramNamedParameterNV) may end
     up with programs >64KB.  This will overflow the limits of the GLX Render
     protocol, resulting in the need to use RenderLarge path.  This is an issue
     with vertex programs, also.

         RESOLVED:  Yes, it is.

     Should textures used by fragment programs be declared?  For example,
     "TEXTURE TEX3, 2D", indicating that the 2D texture should be used for all
     accesses to texture unit 3.  The dimension could be dropped from the TEX
     family of instructions, and some of the compile-time error checking could
     be dropped.

         RESOLVED:  Maybe it should be, but for better or worse, it isn't.

     It is not all that uncommon to have negative q values with projective
     texture mapping, but results are undefined if any q values are negative in
     this specification.  Why?

         RESOLVED:  This restriction carries on a similar one in the initial
         OpenGL specification.  The motivation for this restriction is that
         when interpolating, it is possible for a fragment to have an
         interpolated q coordinate at or near 0.0.  Since the texture
         coordinates used for projective texture mapping are s/q, t/q, and r/q,
         this will result in a divide-by-zero error or suffer from significant
         numerical instability.  Results will be inaccurate for such fragments.

         Other than the numerical stability issue above, NVIDIA hardware should
         have no problems with negative q coordinates.

     Should programs that replace depth have their own special program type,
     Such as "!!FPD1.0" and "!!FPDC1.0"?

         RESOLVED:  No.  If a program has an instruction that writes to
         o[DEPR], the final fragment depth value is taken from o[DEPR].z.
         Otherwise, the fragment's original depth value is used.

     What fx12 value should NaN map to?

         RESOLVED:  For the lack of any better choice, 0.0.

     How are special-case encodings (-INF, +INF, -0.0, +0.0, NaN) handled for
     arithmetic and comparison operations?

         RESOLVED:  The special cases for all floating-point operations are
         designed to match the IEEE specification for floating-point numbers as
         closely as possible.  The results produced by special cases should be
         enumerated in the sections of this spec describing the operations.
         There are some cases where the implemented fragment program behavior
         does not match IEEE conventions, and these cases should be noted in
         this specification.

     How can condition codes be used to mask out register writes?  How about
     killing fragments?  What other things can you do?

         RESOLVED:  The following example computes a component wise |R1-R2|:

           SUBC R0, R1, R2;      # "C" suffix means update condition code
           MOV  R0 (LT), -R0;    # Conditional write mask in parentheses

         The first instruction computes a component-wise difference between R1
         and R2, storing R1-R2 in register R0.  The "C" suffix in the
         instruction means to update the condition code based on the sign of
         the result vector components.  The second instruction inverts the sign
         of the components of R0.  However the "(LT)" portion says that the
         destination register should be updated only if the corresponding
         condition code component is LT (negative).  This means that only those
         components of R0

         To kill a fragment if the red (x) component of a texture lookup
         returns zero:

           TEXC R0, f[TEX0], TEX0, 2D;
           KIL EQ.x;

         To kill based on the green (y) component, use "EQ.y" instead.  To kill
         if any of the four components is zero, use "EQ.xyzw" or just "EQ".

         Fragment programs do not support boolean expressions.  These can
         generally be achieved using conditional write mask.

         To evaluate the expression "(R0.x == 0) && (R1.x == 0)":

           MOVC RC.x, R0.x;
           MOVC RC.x (EQ), R1.x;

         To evaluate the expression "(R0.x == 0) || (R1.x == 0)":

           MOVC RC.x, R0.x;
           MOVC RC.x (NE), R1.x;

         In both cases, the x component of the condition code will contain "EQ"
         if and only if the condition is TRUE.

     How can fragment programs be used to implement non-standard texture
     filtering modes?

         RESOLVED:  As one example, consider a case where you want to do linear
         filtering in a 2D texture map, but only horizontally.  To achieve
         this, first set the texture filtering mode to NEAREST.  For a 16 x n
         texture, you might do something like:

           DEFINE halfTexel = { 0.03125, 0 };   # 1/32 (1/2 a texel)
           ADD R2, f[TEX0], -halfTexel;         # coords of left sample
           ADD R1, f[TEX0], +halfTexel;         # coords of right sample
           TEX R0, R2, TEX0, 2D;                # lookup left sample
           TEX R1, R1, TEX0, 2D;                # lookup right sample
           MUL R2.x, R2.x, 16;                  # scale X coords to texels
           FRC R2.x, R2.x;                      # get fraction, filter weight
           LRP R0, R2.x, R1, R0;                # blend samples based on weight

         There are plenty of other interesting things that can be done.

     Should this specification provide more examples?

         RESOLVED:  Yes, it should.

     Is the OpenGL ARB working on a multi-vendor standard for fragment
     programmability?  Will there be an ARB_fragment_program extension?  If so,
     how will this extension interact with the ARB standard?

         RESOLVED:  Yes, as of July 2002, there was a multi-vendor working
         group and a draft specification.  The ARB extension is expected to
         have several features not present in this extension, such as state
         tracking and global parameters (called "program environment
         parameters").  It will also likely lack certain features found in this
         extension.

     Why does the HEMI mapping apply to the third component of signed HILO
     textures, but not to unsigned HILO textures?

         RESOLVED:  This behavior matches the behavior of NV_texture_shader
         (e.g., the DOT_PRODUCT_NV mode).  The HEMI mapping will construct the
         third component of a unit vector whose first two components are
         encoded in the HILO texture.


 New Procedures and Functions

     void ProgramNamedParameter4fNV(uint id, sizei len, const ubyte *name,
                                    float x, float y, float z, float w);
     void ProgramNamedParameter4dNV(uint id, sizei len, const ubyte *name,
                                    double x, double y, double z, double w);
     void ProgramNamedParameter4fvNV(uint id, sizei len, const ubyte *name,
                                     const float v[]);
     void ProgramNamedParameter4dvNV(uint id, sizei len, const ubyte *name,
                                     const double v[]);
     void GetProgramNamedParameterfvNV(uint id, sizei len, const ubyte *name,
                                       float *params);
     void GetProgramNamedParameterdvNV(uint id, sizei len, const ubyte *name,
                                       double *params);

     void ProgramLocalParameter4dARB(enum target, uint index,
                                     double x, double y, double z, double w);
     void ProgramLocalParameter4dvARB(enum target, uint index,
                                      const double *params);
     void ProgramLocalParameter4fARB(enum target, uint index,
                                     float x, float y, float z, float w);
     void ProgramLocalParameter4fvARB(enum target, uint index,
                                      const float *params);
     void GetProgramLocalParameterdvARB(enum target, uint index,
                                        double *params);
     void GetProgramLocalParameterfvARB(enum target, uint index,
                                        float *params);


 New Tokens

     Accepted by the <cap> parameter of Disable, Enable, and IsEnabled, by the
     <pname> parameter of GetBooleanv, GetIntegerv, GetFloatv, and GetDoublev,
     and by the <target> parameter of BindProgramNV, LoadProgramNV,
     ProgramLocalParameter4dARB, ProgramLocalParameter4dvARB,
     ProgramLocalParameter4fARB, ProgramLocalParameter4fvARB,
     GetProgramLocalParameterdvARB, and GetProgramLocalParameterfvARB:

         FRAGMENT_PROGRAM_NV                            0x8870

     Accepted by the <pname> parameter of GetBooleanv, GetIntegerv, GetFloatv,
     and GetDoublev:

         MAX_TEXTURE_COORDS_NV                          0x8871
         MAX_TEXTURE_IMAGE_UNITS_NV                     0x8872
         FRAGMENT_PROGRAM_BINDING_NV                    0x8873
         MAX_FRAGMENT_PROGRAM_LOCAL_PARAMETERS_NV       0x8868

     Accepted by the <name> parameter of GetString:

         PROGRAM_ERROR_STRING_NV                        0x8874


 Additions to Chapter 2 of the OpenGL 1.2.1 Specification (OpenGL Operation)

     Modify Section 2.11, Clipping (p.39)

     (replace the first paragraph of the section, p. 39)  Primitives are clipped
     to the clip volume.  In clip coordinates, the view volume is defined by

         -w_c <= x_c <= w_c,
         -w_c <= y_c <= w_c, and
         -w_c <= z_c <= w_c.

     Clipping to the near and far clip planes is ignored if fragment program
     mode (section 3.11) or texture shaders (see NV_texture_shader
     specification) are enabled, if the current fragment program or texture
     shader computes per-fragment depth values.  In this case, the view volume
     is defined by:

         -w_c <= x_c <= w_c and
         -w_c <= y_c <= w_c.


 Additions to Chapter 3 of the OpenGL 1.2.1 Specification (Rasterization)

     Modify Chapter 3 introduction (p. 57)

     (p.57, modify 1st paragraph) ... Figure 3.1 diagrams the rasterization
     process.  The color value assigned to a fragment is initially determined
     by the rasterization operations (Sections 3.3 through 3.7) and modified by
     either the execution of the texturing, color sum, and fog operations as
     defined in Sections 3.8, 3.9, and 3.10, or of a fragment program defined
     in Section 3.11.  The final depth value is initially determined by the
     rasterization operations and may be modified by a fragment program.

     note:  Antialiasing Application is renumbered from Section 3.11 to Section
     3.12.

     Modify Figure 3.1 (p.58)

                              Primitive Assembly
                                       |
               +-----------+-----------+-----------+-----------+
               |           |           |           |           |
               |           |           |        Pixel          |
             Point       Line       Polygon     Rectangle   Bitmap
            Raster-     Raster-     Raster-     Raster-     Raster-
            ization     ization     ization     ization     ization
               |           |           |           |           |
               +-----------+-----------+-----------+-----------+
                                       |
                                       |
                     +-----------------+-----------------+
                     |                 |                 |
               Conventional         Texture          Fragment
               Texture Fetch        Shaders          Programs
                     |                 |                 |
                     |  +--------------+                 |
                     |  |                                |
         TEXTURE_    o  o                                |
         SHADER_NV                                       |
         enable      o                                   |
                     |                                   |
                     +-------------+                     |
                     |             |                     |
                Conventional   Register                  |
                   TexEnv      Combiners                 |
                     |             |                     |
                 Color Sum         |                     |
                     |             |                     |
                    Fog            |                     |
                     |             |                     |
                     |  +----------+                     |
                     |  |                                |
         REGISTER_   o  o                                |
         COMBINERS_                                      |
         NV enable   o                                   |
                     |                                   |
                     +-----------------+  +--------------+
                                       |  |
                            FRAGMENT_  o  o
                            PROGRAM_
                            NV enable  o
                                       |
                                       |
                                    Coverage
                                   Application
                                       |
                                       v
                             to fragment processing


     Modify Section 3.3, Points (p.61)

     All fragments produced in rasterizing a non-antialiased point are assigned
     the same associated data, which are those of the vertex corresponding to
     the point.  (delete reference to divide by q).

     If anitialiasing is enabled, then ...  The data associated with each
     fragment are otherwise the data associated with the point being
     rasterized.  (delete reference to divide by q)

     Modify Section 3.4.1, Basic Line Segment Rasterization (p.66)

     (Note that t=0 at p_a and t=1 at p_b).  The value of an associated datum f
     from the fragment, whether it be R, G, B, or A (in RGBA mode) or a color
     index (in color index mode), the s, t, r, or q texture coordinate, or the
     clip w coordinate (the depth value, window z, must be found using equation
     3.3, below), is found as

       f = (1-t) * f_a / w_a + t * f_b / w_b                     (3.2)
           ---------------------------------
                 (1-t) / w_a + t / w_b

     where f_a and f_b are the data associated with the starting and ending
     endpoints of the segment, respectively; w_a and w_b are the clip
     w coordinates of the starting and ending endpoints of the segments
     respectively.  Note that linear interpolation would use

       f = (1-t) * f_a + t * f_b.                                (3.3)

     ... A GL implementation may choose to approximate equation 3.2 with 3.3,
     but this will normally lead to unacceptable distortion effects when
     interpolating texture coordinates or clip w coordinates.

     Modify Section 3.5.1, Basic Polygon Rasterization (p.71)

     Denote a datum at p_a, p_b, or p_c ... is given by

       f = a * f_a / w_a + b * f_b / w_b + c * f_c / w_c         (3.4)
           ---------------------------------------------
                   a / w_a + b / w_b + c / w_c

     where w_a, w_b, and w_c are the clip w coordinates of p_a, p_b, and p_c,
     respectively.  a, b, and c are the barycentric coordinates of the fragment
     for which the data are produced. a, b, and c must correspond precisely to
     the exact coordinates ... at the fragment's center.

     Just as with line segment rasterization, equation 3.4 may be approximated
     by

       f = a * f_a + b * f_b + c * f_c;                          (3.5)

     this may yield ... for texture coordinates or clip w coordinates.

     Modify Section 3.6.4, Rasterization of Pixel Rectangles (p.100)

     A fragment arising from a group ... are given by those associated with the
     current raster position.  (delete reference to divide by q)

     Modify Section 3.7, Bitmaps (p.111)

     Otherwise, a rectangular array ... The associated data for each fragment
     are those associated with the current raster position.  (delete reference
     to divide by q)  Once the fragments have been produced ...

     Modify Section 3.8, Texturing (p.112)

     ... an image at the location indicated by a fragment's texture coordinates
     to modify the fragments primary RGBA color.  Texturing does not affect the
     secondary color.

     Texturing is specified only for RGBA mode; its use in color index mode is
     undefined.

     Except when in fragment program mode (Section 3.11), the (s,t,r) texture
     coordinates used for texturing are the values s/q, t/q, and r/q,
     respectively, where s, t, r, and q are the texture coordinates associated
     with the fragment.  When in fragment program mode, the (s,t,r) texture
     coordinates are specified by the program.  If q is less than or equal to
     zero, the results of texturing are undefined.

     Add new Section 3.11, Fragment Programs (p.140)

     Fragment program mode is enabled and disabled with the Enable and Disable
     commands using the symbolic constant FRAGMENT_PROGRAM_NV.  When fragment
     program mode is enabled, standard and extended texturing, color sum, and
     fog application stages are ignored and a general purpose program is
     executed instead.

     A fragment program is a sequence of instructions that execute on a
     per-fragment basis.  In fragment program mode, the currently bound
     fragment program is executed as each fragment is generated by the
     rasterization operations.  Fragment programs execute a finite fixed
     sequence of instructions with no branching or looping, and operate
     independently from the processing of other fragments.  Fragment programs
     are used to compute new color values to be associated with each fragment,
     and can optionally compute a new depth value for each fragment as well.

     Fragment program mode is not available in color index mode and is
     considered disabled, regardless of the state of FRAGMENT_PROGRAM_NV.  When
     fragment program mode is enabled, texture shaders and register combiners
     (NV_texture_shader and NV_register_combiners extension) are disabled,
     regardless of the state of TEXTURE_SHADER_NV and REGISTER_COMBINERS_NV.

     Section 3.11.1, Fragment Program Registers

     Fragment programs operate on a set of program registers.  Each program
     register is a 4-component vector, whose components are referred to as "x",
     "y", "z", and "w" respectively.  The components of a fragment register are
     always referred to in this manner, regardless of the meaning of their
     contents.

     The four components of each fragment program register have one of two
     different representations:  32-bit floating-point (fp32) or 16-bit
     floating-point (fp16).  More details on these representations can be found
     in Section 3.11.4.1.

     There are several different classes of program registers.  Attribute
     registers (Table X.1) correspond to the fragment's associated data
     produced by rasterization.  Temporary registers (Table X.2) hold
     intermediate results generated by the fragment program.  Output registers
     (Table X.3) hold the final results of a fragment program.  The single
     condition code register is used to mask writes to other registers or to
     determine if a fragment should be discarded.


     Section 3.11.1.1, Fragment Program Attribute Registers

     The fragment program attribute registers (Table X.1) hold the location of
     the fragment and the data associated with the fragment produced by
     rasterization.

     Fragment Attribute                                    Component
     Register Name    Description                          Interpretation
     --------------   -----------------------------------  --------------
        f[WPOS]       Position of the fragment center.     (x,y,z,1/w)
        f[COL0]       Interpolated primary color           (r,g,b,a)
        f[COL1]       Interpolated secondary color         (r,g,b,a)
        f[FOGC]       Interpolated fog distance/coord      (z,0,0,0)
        f[TEX0]       Texture coordinate (unit 0)          (s,t,r,q)
        f[TEX1]       Texture coordinate (unit 1)          (s,t,r,q)
        f[TEX2]       Texture coordinate (unit 2)          (s,t,r,q)
        f[TEX3]       Texture coordinate (unit 3)          (s,t,r,q)
        f[TEX4]       Texture coordinate (unit 4)          (s,t,r,q)
        f[TEX5]       Texture coordinate (unit 5)          (s,t,r,q)
        f[TEX6]       Texture coordinate (unit 6)          (s,t,r,q)
        f[TEX7]       Texture coordinate (unit 7)          (s,t,r,q)

     Table X.1:  Fragment Attribute Registers.  The component interpretation
     column describes the mapping of attribute values to register components.
     For example, the "x" component of f[COL0] holds the red color component,
     and the "x" component of f[TEX0] holds the "s" texture coordinate for
     texture unit 0.  The entries "0" and "1" indicate that the attribute
     register components hold the constants 0 and 1, respectively.

     f[WPOS].x and f[WPOS].y hold the (x,y) window coordinates of the fragment
     center, and relative to the lower left corner of the window.  f[WPOS].z
     holds the associated z window coordinate, normally in the range [0,1].
     f[WPOS].w holds the reciprocal of the associated clip w coordinate.

     f[COL0] and f[COL1] hold the associated RGBA primary and secondary colors
     of the fragment, respectively.

     f[FOGC] holds the associated eye distance or fog coordinate normally used
     for fog computations.

     f[TEX0] through f[TEX7] hold the associated texture coordinates for
     texture coordinate sets 0 through 7, respectively.

     All attribute register components are treated as 32-bit floats.  However,
     the components of primary and secondary colors (f[COL0] and f[COL1]) may
     be generated with reduced precision.

     The contents of the fragment attribute registers may not be modified by a
     fragment program.  In addition, each fragment program instruction can use
     at most one unique attribute register.


     Section 3.11.1.2, Fragment Program Temporary Registers

     The fragment temporary registers (Table X.2) hold intermediate values used
     during the execution of a fragment program.  There are 96 temporary
     register names, but not all can be used simultaneously.

     Fragment Temporary
     Register Name       Description
     ------------------  -----------------------------------------------------
         R0-R31          Four 32-bit (fp32) floating point values (s.e8.m23)
         H0-H63          Four 16-bit (fp16) floating point values (s.e5.m10)

     Table X.2:  Fragment Temporary Registers.

     In addition to the normal temporary registers, there are two temporary
     pseudo-registers, "RC" and "HC".  RC and HC are treated as unnumbered,
     write-only temporary registers.  The components of RC have an fp32 data
     type; the components of HC have an fp16 data type.  The sole purpose of
     these registers is to permit instructions to modify the condition code
     register (section 3.11.1.4) without overwriting the values in any
     temporary register.

     Fragment program instructions can read and write temporary registers.
     There is no restriction on the number of temporary registers that can be
     accessed by any given instruction.

     All temporary registers are initialized to (0,0,0,0) each time a fragment
     program executes.


     Section 3.11.1.3, Fragment Program Output Registers

     The fragment program output registers hold the final results of the
     fragment program.  The possible final results of a fragment program are a
     high- or low-precision RGBA fragment color, and a fragment depth value.

        Output
     Register Name      Description
     -------------      -------------------------------------------------------
        o[COLR]         Final RGBA fragment color, fp32 format
        o[COLH]         Final RGBA fragment color, fp16 format
        o[DEPR]         Final fragment depth value, fp32 format

     Table X.3:  Fragment Program Output Registers.

     o[COLR] and o[COLH] specify the color of a fragment.  These two registers
     are identical, except for the associated data type of the components.  The
     R, G, B, and A components of the fragment color are taken from the x, y,
     z, and w components respectively of the o[COLR] or o[COLH].  A fragment
     program will fail to load if it writes to both o[COLR] and o[COLH].

     o[DEPR] can be used to replace the associated depth value of a fragment.
     The new depth value is taken from the z component of o[DEPR].  If a
     fragment program does not write to o[DEPR], the associated depth value is
     unmodified.

     A fragment program will fail to load if it does not write to at least one
     output register.

     The fragment program output registers may not be read by a fragment
     program, but may be written to multiple times.

     The values of all fragment program output registers are initially
     undefined.


     Section 3.11.1.4, Fragment Program Condition Code Register

     The condition code register (CC) is a single four-component vector.  Each
     component of this register is one of four enumerated values:  GT (greater
     than), EQ (equal), LT (less than), or UN (unordered).  The condition code
     register can be used to mask writes to fragment data register components
     or to terminate processing of a fragment altogether (via the KIL
     instruction).

     Most fragment program instructions can optionally update the condition
     code register.  When a fragment program instruction updates the condition
     code register, a condition code component is set to LT if the
     corresponding component of the result vector is less than zero, EQ if it
     is equal to zero, GT if it is greater than zero, and UN if it is NaN (not
     a number).

     The condition code register is initialized to a vector of EQ values each
     time a fragment program executes.


     Section 3.11.2, Fragment Program Parameters

     In addition to using the registers defined in Section 3.11.1, fragment
     programs may also use fragment program parameters in their computation.
     Fragment program parameters are constant during the execution of fragment
     programs, but some parameters may be modified outside the execution of a
     fragment program.

     There are five different types of program parameters:  embedded scalar
     constants, embedded vector constants, named constants, named local
     parameters, and numbered local parameters.

     Embedded scalar constants are written as standard floating-point numbers
     with an optional sign designator ("+" or "-") and optional scientific
     notation (e.g., "E+06", meaning "times 10^6").

     Embedded vector constants are written as a comma-separated array of one to
     four scalar constants, surrounded by braces (like a C/C++ array
     initializer).  Vector constants are always treated as 4-component vectors:
     constants with fewer than four components are expanded to 4-components by
     filling missing y and z components with 0.0 and missing w components with
     1.0.  Thus, the vector constant "{2}" is equivalent to "{2,0,0,1}",
     "{3,4}" is equivalent to "{3,4,0,1}", and "{5,6,7}" is equivalent to
     "{5,6,7,1}".

     Named constants allow fragment program instructions to define scalar or
     vector constants that can be referenced by name.  Named constants are
     created using the DEFINE instruction:

         DEFINE pi = 3.1415926535;
         DEFINE color = {0.2, 0.5, 0.8, 1.0};

     The DEFINE instruction associates a constant name with a scalar or vector
     constant value.  Subsequent fragment program instructions that use the
     constant name are equivalent to those using the corresponding constant
     value.

     Named local parameters are similar to named vector constants, but their
     values can be modified after the program is loaded.  Local parameters are
     created using the DECLARE instruction:

         DECLARE fog_color1;
         DECLARE fog_color2 = {0.3, 0.6, 0.9, 0.1};

     The DECLARE instruction creates a 4-component vector associated with the
     local parameter name.  Subsequent fragment program instructions
     referencing the local parameter name are processed as though the current
     value of the local parameter vector were specified instead of the
     parameter name.  A DECLARE instruction can optionally specify an initial
     value for the local parameter, which can be either a scalar or vector
     constant.  Scalar constants are expanded to 4-component vectors by
     replicating the scalar value in each component.  The initial value of
     local parameters not initialized by the program is (0,0,0,0).

     A named local parameter for a specific program can be updated using the
     calls ProgramNamedParameter4fNV or ProgramNamedParameter4fvNV (section
     5.7).  Named local parameters are accessible only by the program in which
     they are defined.  Modifying a local parameter affects the only the
     associated program and does not affect local parameters with the same name
     that are found in any other fragment program.

     Numbered local parameters are similar to named local parameters, except
     that they are referred to by number and are not declared in fragment
     programs.  Each fragment program object has an array of four-component
     floating-point vectors that can be used by the program.  The number of
     vectors is given by the implementation-dependent constant
     MAX_FRAGMENT_PROGRAM_LOCAL_PARAMETERS_NV, and must be at least 64.  A
     numbered local parameter is accessed by a fragment program as members of
     an array called "p".  For example, the instruction

         MOV R0, p[31];

     copies the contents of numbered local parameter 31 into temporary register
     R0.

     Constant and local parameter names can be arbitrary strings consisting of
     letters (upper or lower-case), numbers, underscores ("_"), and dollar
     signs ("$").  Keywords defined in the grammar (including instruction
     names) can not be used as constant names, nor can strings that start with
     numbers, or strings that specify valid temporary register or texture
     numbers (e.g., "R0"-"R31", "H0"-"H63"", "TEX0"-"TEX15").  A fragment
     program will fail to load if a DEFINE or DECLARE instruction specifies an
     invalid constant or local parameter name.

     A fragment program will fail to load if an instruction contains a named
     parameter not specified in a previous DEFINE or DECLARE instruction.  A
     fragment program will also fail to load if a DEFINE or DECLARE instruction
     attempts to re-define a named parameter specified in a previous DEFINE or
     DECLARE instruction.

     The contents of the fragment program parameters may not be modified by a
     fragment program.  In addition, each fragment program instruction can
     normally use at most one unique program parameter.  The only exception to
     this rule is if all program parameter references specify named or embedded
     constants that taken together contain no more than four unique scalar
     values.  For such instructions, the GL will automatically generate an
     equivalent instruction that references a single merged vector constant.
     This merging allows programs to specify instructions like the following:

         Instruction              Equivalent Instruction
         ---------------------    ---------------------------------------
         MAD R0, R1, 2, -1;       MAD R0, R1, {2,-1,0,0}.x, {2,-1,0,0}.y;
         ADD R0, {1,2,3,4}, 4;    ADD R0, {1,2,3,4}.xyzw, {1,2,3,4}.w;

     Before counting the number of unique values, any named constants are first
     converted to the equivalent embedded constants.  When generating a
     combined vector constant, the GL does not perform swizzling, component
     selection, negation, or absolute value operations.  The following
     instructions are invalid, as they contain more than four unique scalar
     values.

         Invalid Instructions
         -----------------------------------
         ADD R0, {1,2,3,4}, -4;
         ADD R0, {1,2,3,4}, |-4|;
         ADD R0, {1,2,3,4}, -{-1,-2,-3,-4};
         ADD R0, {1,2,3,4}, {4,5,6,7}.x;


     Section 3.11.3, Fragment Program Specification

     Fragment programs are specified as an array of ubytes.  The array is a
     string of ASCII characters encoding the program.  The command
     LoadProgramNV loads a fragment program when the target parameter is
     FRAGMENT_PROGRAM_NV.  The command BindProgramNV enables a fragment program
     for execution.

     At program load time, the program is parsed into a set of tokens possibly
     separated by white space.  Spaces, tabs, newlines, carriage returns, and
     comments are considered whitespace.  Comments begin with the character "#"
     and are terminated by a newline, a carriage return, or the end of the
     program array.  Fragment programs are case-sensitive -- upper and lower
     case letters are treated differently.  The proper choice of case can be
     inferred from the grammar.

     The Backus-Naur Form (BNF) grammar below specifies the syntactically valid
     sequences for fragment programs.  The set of valid tokens can be inferred
     from the grammar.  The token "" represents an empty string and is used to
     indicate optional rules.  A program is invalid if it contains any
     undefined tokens or characters.

     <program>              ::= <progPrefix> <instructionSequence> "END"

     <progPrefix>           ::= "!!FP1.0"

     <instructionSequence>  ::= <instructionSequence> <instructionStatement>
                              | <instructionStatement>

     <instructionStatement> ::= <instruction> ";"
                              | <constantDefinition> ";"
                              | <localDeclaration> ";"

     <instruction>          ::= <VECTORop-instruction>
                              | <SCALARop-instruction>
                              | <BINSCop-instruction>
                              | <BINop-instruction>
                              | <TRIop-instruction>
                              | <KILop-instruction>
                              | <TEXop-instruction>
                              | <TXDop-instruction>

     <VECTORop-instruction> ::= <VECTORop> <maskedDstReg> ","
                                <vectorSrc>

     <VECTORop>             ::= "DDX"   | "DDX_SAT"
                              | "DDXR"  | "DDXR_SAT"
                              | "DDXH"  | "DDXH_SAT"
                              | "DDXC"  | "DDXC_SAT"
                              | "DDXRC" | "DDXRC_SAT"
                              | "DDXHC" | "DDXHC_SAT"
                              | "DDY"   | "DDY_SAT"
                              | "DDYR"  | "DDYR_SAT"
                              | "DDYH"  | "DDYH_SAT"
                              | "DDYC"  | "DDYC_SAT"
                              | "DDYRC" | "DDYRC_SAT"
                              | "DDYHC" | "DDYHC_SAT"
                              | "FLR"   | "FLR_SAT"
                              | "FLRR"  | "FLRR_SAT"
                              | "FLRH"  | "FLRH_SAT"
                              | "FLRX"  | "FLRX_SAT"
                              | "FLRC"  | "FLRC_SAT"
                              | "FLRRC" | "FLRRC_SAT"
                              | "FLRHC" | "FLRHC_SAT"
                              | "FLRXC" | "FLRXC_SAT"
                              | "FRC"   | "FRC_SAT"
                              | "FRCR"  | "FRCR_SAT"
                              | "FRCH"  | "FRCH_SAT"
                              | "FRCX"  | "FRCX_SAT"
                              | "FRCC"  | "FRCC_SAT"
                              | "FRCRC" | "FRCRC_SAT"
                              | "FRCHC" | "FRCHC_SAT"
                              | "FRCXC" | "FRCXC_SAT"
                              | "LIT"   | "LIT_SAT"
                              | "LITR"  | "LITR_SAT"
                              | "LITH"  | "LITH_SAT"
                              | "LITC"  | "LITC_SAT"
                              | "LITRC" | "LITRC_SAT"
                              | "LITHC" | "LITHC_SAT"
                              | "MOV"   | "MOV_SAT"
                              | "MOVR"  | "MOVR_SAT"
                              | "MOVH"  | "MOVH_SAT"
                              | "MOVX"  | "MOVX_SAT"
                              | "MOVC"  | "MOVC_SAT"
                              | "MOVRC" | "MOVRC_SAT"
                              | "MOVHC" | "MOVHC_SAT"
                              | "MOVXC" | "MOVXC_SAT"
                              | "PK2H"
                              | "PK2US"
                              | "PK4B"
                              | "PK4UB"

     <SCALARop-instruction> ::= <SCALARop> <maskedDstReg> ","
                                <scalarSrc>

     <SCALARop>             ::= "COS"     | "COS_SAT"
                              | "COSR"    | "COSR_SAT"
                              | "COSH"    | "COSH_SAT"
                              | "COSC"    | "COSC_SAT"
                              | "COSRC"   | "COSRC_SAT"
                              | "COSHC"   | "COSHC_SAT"
                              | "EX2"     | "EX2_SAT"
                              | "EX2R"    | "EX2R_SAT"
                              | "EX2H"    | "EX2H_SAT"
                              | "EX2C"    | "EX2C_SAT"
                              | "EX2RC"   | "EX2RC_SAT"
                              | "EX2HC"   | "EX2HC_SAT"
                              | "LG2"     | "LG2_SAT"
                              | "LG2R"    | "LG2R_SAT"
                              | "LG2H"    | "LG2H_SAT"
                              | "LG2C"    | "LG2C_SAT"
                              | "LG2RC"   | "LG2RC_SAT"
                              | "LG2HC"   | "LG2HC_SAT"
                              | "RCP"     | "RCP_SAT"
                              | "RCPR"    | "RCPR_SAT"
                              | "RCPH"    | "RCPH_SAT"
                              | "RCPC"    | "RCPC_SAT"
                              | "RCPRC"   | "RCPRC_SAT"
                              | "RCPHC"   | "RCPHC_SAT"
                              | "RSQ"     | "RSQ_SAT"
                              | "RSQR"    | "RSQR_SAT"
                              | "RSQH"    | "RSQH_SAT"
                              | "RSQC"    | "RSQC_SAT"
                              | "RSQRC"   | "RSQRC_SAT"
                              | "RSQHC"   | "RSQHC_SAT"
                              | "SIN"     | "SIN_SAT"
                              | "SINR"    | "SINR_SAT"
                              | "SINH"    | "SINH_SAT"
                              | "SINC"    | "SINC_SAT"
                              | "SINRC"   | "SINRC_SAT"
                              | "SINHC"   | "SINHC_SAT"
                              | "UP2H"    | "UP2H_SAT"
                              | "UP2HC"   | "UP2HC_SAT"
                              | "UP2US"   | "UP2US_SAT"
                              | "UP2USC"  | "UP2USC_SAT"
                              | "UP4B"    | "UP4B_SAT"
                              | "UP4BC"   | "UP4BC_SAT"
                              | "UP4UB"   | "UP4UB_SAT"
                              | "UP4UBC"  | "UP4UBC_SAT"

     <BINSCop-instruction> ::=  <BINSCop> <maskedDstReg> ","
                                <scalarSrc> "," <scalarSrc>

     <BINSCop>              ::= "POW"   | "POW_SAT"
                              | "POWR"  | "POWR_SAT"
                              | "POWH"  | "POWH_SAT"
                              | "POWC"  | "POWC_SAT"
                              | "POWRC" | "POWRC_SAT"
                              | "POWHC" | "POWHC_SAT"

     <BINop-instruction>    ::= <BINop> <maskedDstReg> ","
                                <vectorSrc> "," <vectorSrc>

     <BINop>                ::= "ADD"   | "ADD_SAT"
                              | "ADDR"  | "ADDR_SAT"
                              | "ADDH"  | "ADDH_SAT"
                              | "ADDX"  | "ADDX_SAT"
                              | "ADDC"  | "ADDC_SAT"
                              | "ADDRC" | "ADDRC_SAT"
                              | "ADDHC" | "ADDHC_SAT"
                              | "ADDXC" | "ADDXC_SAT"
                              | "DP3"   | "DP3_SAT"
                              | "DP3R"  | "DP3R_SAT"
                              | "DP3H"  | "DP3H_SAT"
                              | "DP3X"  | "DP3X_SAT"
                              | "DP3C"  | "DP3C_SAT"
                              | "DP3RC" | "DP3RC_SAT"
                              | "DP3HC" | "DP3HC_SAT"
                              | "DP3XC" | "DP3XC_SAT"
                              | "DP4"   | "DP4_SAT"
                              | "DP4R"  | "DP4R_SAT"
                              | "DP4H"  | "DP4H_SAT"
                              | "DP4X"  | "DP4X_SAT"
                              | "DP4C"  | "DP4C_SAT"
                              | "DP4RC" | "DP4RC_SAT"
                              | "DP4HC" | "DP4HC_SAT"
                              | "DP4XC" | "DP4XC_SAT"
                              | "DST"   | "DST_SAT"
                              | "DSTR"  | "DSTR_SAT"
                              | "DSTH"  | "DSTH_SAT"
                              | "DSTC"  | "DSTC_SAT"
                              | "DSTRC" | "DSTRC_SAT"
                              | "DSTHC" | "DSTHC_SAT"
                              | "MAX"   | "MAX_SAT"
                              | "MAXR"  | "MAXR_SAT"
                              | "MAXH"  | "MAXH_SAT"
                              | "MAXX"  | "MAXX_SAT"
                              | "MAXC"  | "MAXC_SAT"
                              | "MAXRC" | "MAXRC_SAT"
                              | "MAXHC" | "MAXHC_SAT"
                              | "MAXXC" | "MAXXC_SAT"
                              | "MIN"   | "MIN_SAT"
                              | "MINR"  | "MINR_SAT"
                              | "MINH"  | "MINH_SAT"
                              | "MINX"  | "MINX_SAT"
                              | "MINC"  | "MINC_SAT"
                              | "MINRC" | "MINRC_SAT"
                              | "MINHC" | "MINHC_SAT"
                              | "MINXC" | "MINXC_SAT"
                              | "MUL"   | "MUL_SAT"
                              | "MULR"  | "MULR_SAT"
                              | "MULH"  | "MULH_SAT"
                              | "MULX"  | "MULX_SAT"
                              | "MULC"  | "MULC_SAT"
                              | "MULRC" | "MULRC_SAT"
                              | "MULHC" | "MULHC_SAT"
                              | "MULXC" | "MULXC_SAT"
                              | "RFL"   | "RFL_SAT"
                              | "RFLR"  | "RFLR_SAT"
                              | "RFLH"  | "RFLH_SAT"
                              | "RFLC"  | "RFLC_SAT"
                              | "RFLRC" | "RFLRC_SAT"
                              | "RFLHC" | "RFLHC_SAT"
                              | "SEQ"   | "SEQ_SAT"
                              | "SEQR"  | "SEQR_SAT"
                              | "SEQH"  | "SEQH_SAT"
                              | "SEQX"  | "SEQX_SAT"
                              | "SEQC"  | "SEQC_SAT"
                              | "SEQRC" | "SEQRC_SAT"
                              | "SEQHC" | "SEQHC_SAT"
                              | "SEQXC" | "SEQXC_SAT"
                              | "SFL"   | "SFL_SAT"
                              | "SFLR"  | "SFLR_SAT"
                              | "SFLH"  | "SFLH_SAT"
                              | "SFLX"  | "SFLX_SAT"
                              | "SFLC"  | "SFLC_SAT"
                              | "SFLRC" | "SFLRC_SAT"
                              | "SFLHC" | "SFLHC_SAT"
                              | "SFLXC" | "SFLXC_SAT"
                              | "SGE"   | "SGE_SAT"
                              | "SGER"  | "SGER_SAT"
                              | "SGEH"  | "SGEH_SAT"
                              | "SGEX"  | "SGEX_SAT"
                              | "SGEC"  | "SGEC_SAT"
                              | "SGERC" | "SGERC_SAT"
                              | "SGEHC" | "SGEHC_SAT"
                              | "SGEXC" | "SGEXC_SAT"
                              | "SGT"   | "SGT_SAT"
                              | "SGTR"  | "SGTR_SAT"
                              | "SGTH"  | "SGTH_SAT"
                              | "SGTX"  | "SGTX_SAT"
                              | "SGTC"  | "SGTC_SAT"
                              | "SGTRC" | "SGTRC_SAT"
                              | "SGTHC" | "SGTHC_SAT"
                              | "SGTXC" | "SGTXC_SAT"
                              | "SLE"   | "SLE_SAT"
                              | "SLER"  | "SLER_SAT"
                              | "SLEH"  | "SLEH_SAT"
                              | "SLEX"  | "SLEX_SAT"
                              | "SLEC"  | "SLEC_SAT"
                              | "SLERC" | "SLERC_SAT"
                              | "SLEHC" | "SLEHC_SAT"
                              | "SLEXC" | "SLEXC_SAT"
                              | "SLT"   | "SLT_SAT"
                              | "SLTR"  | "SLTR_SAT"
                              | "SLTH"  | "SLTH_SAT"
                              | "SLTX"  | "SLTX_SAT"
                              | "SLTC"  | "SLTC_SAT"
                              | "SLTRC" | "SLTRC_SAT"
                              | "SLTHC" | "SLTHC_SAT"
                              | "SLTXC" | "SLTXC_SAT"
                              | "SNE"   | "SNE_SAT"
                              | "SNER"  | "SNER_SAT"
                              | "SNEH"  | "SNEH_SAT"
                              | "SNEX"  | "SNEX_SAT"
                              | "SNEC"  | "SNEC_SAT"
                              | "SNERC" | "SNERC_SAT"
                              | "SNEHC" | "SNEHC_SAT"
                              | "SNEXC" | "SNEXC_SAT"
                              | "STR"   | "STR_SAT"
                              | "STRR"  | "STRR_SAT"
                              | "STRH"  | "STRH_SAT"
                              | "STRX"  | "STRX_SAT"
                              | "STRC"  | "STRC_SAT"
                              | "STRRC" | "STRRC_SAT"
                              | "STRHC" | "STRHC_SAT"
                              | "STRXC" | "STRXC_SAT"
                              | "SUB"   | "SUB_SAT"
                              | "SUBR"  | "SUBR_SAT"
                              | "SUBH"  | "SUBH_SAT"
                              | "SUBX"  | "SUBX_SAT"
                              | "SUBC"  | "SUBC_SAT"
                              | "SUBRC" | "SUBRC_SAT"
                              | "SUBHC" | "SUBHC_SAT"
                              | "SUBXC" | "SUBXC_SAT"

     <TRIop-instruction>    ::= <TRIop> <maskedDstReg> ","
                                <vectorSrc> "," <vectorSrc> ","
                                <vectorSrc>

     <TRIop>                ::= "MAD"   | "MAD_SAT"
                              | "MADR"  | "MADR_SAT"
                              | "MADH"  | "MADH_SAT"
                              | "MADX"  | "MADX_SAT"
                              | "MADC"  | "MADC_SAT"
                              | "MADRC" | "MADRC_SAT"
                              | "MADHC" | "MADHC_SAT"
                              | "MADXC" | "MADXC_SAT"
                              | "LRP"   | "LRP_SAT"
                              | "LRPR"  | "LRPR_SAT"
                              | "LRPH"  | "LRPH_SAT"
                              | "LRPX"  | "LRPX_SAT"
                              | "LRPC"  | "LRPC_SAT"
                              | "LRPRC" | "LRPRC_SAT"
                              | "LRPHC" | "LRPHC_SAT"
                              | "LRPXC" | "LRPXC_SAT"
                              | "X2D"   | "X2D_SAT"
                              | "X2DR"  | "X2DR_SAT"
                              | "X2DH"  | "X2DH_SAT"
                              | "X2DC"  | "X2DC_SAT"
                              | "X2DRC" | "X2DRC_SAT"
                              | "X2DHC" | "X2DHC_SAT"

     <KILop-instruction>    ::= <KILop> <ccMask>

     <KILop>                ::= "KIL"

     <TEXop-instruction>    ::= <TEXop> <maskedDstReg> ","
                                <vectorSrc> "," <texImageId>

     <TEXop>                ::= "TEX"  | "TEX_SAT"
                              | "TEXC" | "TEXC_SAT"
                              | "TXP"  | "TXP_SAT"
                              | "TXPC" | "TXPC_SAT"

     <TXDop-instruction>    ::= <TXDop> <maskedDstReg> ","
                                <vectorSrc> "," <vectorSrc> ","
                                <vectorSrc> "," <texImageId>

     <TXDop>                ::= "TXD"  | "TXD_SAT"
                              | "TXDC" | "TXDC_SAT"

     <scalarSrc>            ::= <absScalarSrc>
                              | <baseScalarSrc>

     <absScalarSrc>         ::= <negate> "|" <baseScalarSrc> "|"

     <baseScalarSrc>        ::= <signedScalarConstant>
                              | <negate> <namedScalarConstant>
                              | <negate> <vectorConstant> <scalarSuffix>
                              | <negate> <namedLocalParameter> <scalarSuffix>
                              | <negate> <numberedLocal> <scalarSuffix>
                              | <negate> <srcRegister> <scalarSuffix>

     <vectorSrc>            ::= <absVectorSrc>
                              | <baseVectorSrc>

     <absVectorSrc>         ::= <negate> "|" <baseVectorSrc> "|"

     <baseVectorSrc>        ::= <signedScalarConstant>
                              | <negate> <namedScalarConstant>
                              | <negate> <vectorConstant> <scalarSuffix>
                              | <negate> <vectorConstant> <swizzleSuffix>
                              | <negate> <namedLocalParameter> <scalarSuffix>
                              | <negate> <namedLocalParameter> <swizzleSuffix>
                              | <negate> <numberedLocal> <scalarSuffix>
                              | <negate> <numberedLocal> <swizzleSuffix>
                              | <negate> <srcRegister> <scalarSuffix>
                              | <negate> <srcRegister> <swizzleSuffix>

     <maskedDstReg>         ::= <dstRegister> <optionalWriteMask>
                                <optionalCCMask>

     <dstRegister>          ::= <fragTempReg>
                              | <fragOutputReg>
                              | "RC"
                              | "HC"

     <optionalCCMask>       ::= "(" <ccMask> ")"
                              | ""

     <ccMask>               ::= <ccMaskRule> <swizzleSuffix>
                              | <ccMaskRule> <scalarSuffix>

     <ccMaskRule>           ::= "EQ" | "GE" | "GT" | "LE" | "LT" | "NE" |
                                "TR" | "FL"

     <optionalWriteMask>    ::= ""
                              | "." "x"
                              | "."     "y"
                              | "." "x" "y"
                              | "."         "z"
                              | "." "x"     "z"
                              | "."     "y" "z"
                              | "." "x" "y" "z"
                              | "."             "w"
                              | "." "x"         "w"
                              | "."     "y"     "w"
                              | "." "x" "y"     "w"
                              | "."         "z" "w"
                              | "." "x"     "z" "w"
                              | "."     "y" "z" "w"
                              | "." "x" "y" "z" "w"

     <srcRegister>          ::= <fragAttribReg>
                              | <fragTempReg>

     <fragAttribReg>        ::= "f" "[" <fragAttribRegId> "]"

     <fragAttribRegId>      ::= "WPOS" | "COL0" | "COL1" | "FOGC" | "TEX0"
                              | "TEX1" | "TEX2" | "TEX3" | "TEX4" | "TEX5"
                              | "TEX6" | "TEX7"

     <fragTempReg>          ::= <fragF32Reg>
                              | <fragF16Reg>

     <fragF32Reg>           ::= "R0"  | "R1"  | "R2"  | "R3"
                              | "R4"  | "R5"  | "R6"  | "R7"
                              | "R8"  | "R9"  | "R10" | "R11"
                              | "R12" | "R13" | "R14" | "R15"
                              | "R16" | "R17" | "R18" | "R19"
                              | "R20" | "R21" | "R22" | "R23"
                              | "R24" | "R25" | "R26" | "R27"
                              | "R28" | "R29" | "R30" | "R31"

     <fragF16Reg>           ::= "H0"  | "H1"  | "H2"  | "H3"
                              | "H4"  | "H5"  | "H6"  | "H7"
                              | "H8"  | "H9"  | "H10" | "H11"
                              | "H12" | "H13" | "H14" | "H15"
                              | "H16" | "H17" | "H18" | "H19"
                              | "H20" | "H21" | "H22" | "H23"
                              | "H24" | "H25" | "H26" | "H27"
                              | "H28" | "H29" | "H30" | "H31"
                              | "H32" | "H33" | "H34" | "H35"
                              | "H36" | "H37" | "H38" | "H39"
                              | "H40" | "H41" | "H42" | "H43"
                              | "H44" | "H45" | "H46" | "H47"
                              | "H48" | "H49" | "H50" | "H51"
                              | "H52" | "H53" | "H54" | "H55"
                              | "H56" | "H57" | "H58" | "H59"
                              | "H60" | "H61" | "H62" | "H63"

     <fragOutputReg>        ::= "o" "[" <fragOutputRegName> "]"

     <fragOutputRegName>    ::= "COLR" | "COLH" | "DEPR"

     <numberedLocal>        ::= "p" "[" <localNumber> "]"

     <localNumber>          ::= <integer> from 0 to
                                MAX_FRAGMENT_PROGRAM_LOCAL_PARAMETERS_NV - 1

     <scalarSuffix>         ::= "." <component>

     <swizzleSuffix>        ::= ""
                              | "." <component> <component>
                                    <component> <component>

     <component>            ::= "x" | "y" | "z" | "w"

     <texImageId>           ::= <texImageUnit> "," <texImageTarget>

     <texImageUnit>         ::= "TEX0"  | "TEX1"  | "TEX2"  | "TEX3"
                              | "TEX4"  | "TEX5"  | "TEX6"  | "TEX7"
                              | "TEX8"  | "TEX9"  | "TEX10" | "TEX11"
                              | "TEX12" | "TEX13" | "TEX14" | "TEX15"

     <texImageTarget>       ::= "1D" | "2D" | "3D" | "CUBE" | "RECT"

     <constantDefinition>   ::= "DEFINE" <namedVectorConstant> "="
                                <vectorConstant>
                              | "DEFINE" <namedScalarConstant> "="
                                <scalarConstant>

     <localDeclaration>     ::= "DECLARE" <namedLocalParameter>
                                <optionalLocalValue>

     <optionalLocalValue>   ::= ""
                              | "=" <vectorConstant>
                              | "=" <scalarConstant>

     <vectorConstant>       ::= {" <vectorConstantList> "}"
                              | <namedVectorConstant>

     <vectorConstantList>   ::= <scalarConstant>
                              | <scalarConstant> "," <scalarConstant>
                              | <scalarConstant> "," <scalarConstant> ","
                                <scalarConstant>
                              | <scalarConstant> "," <scalarConstant> ","
                                <scalarConstant> "," <scalarConstant>

     <scalarConstant>       ::= <signedScalarConstant>
                              | <namedScalarConstant>

     <signedScalarConstant> ::= <optionalSign> <floatConstant>

     <namedScalarConstant>  ::= <identifier>    ((name of a scalar constant
                                                  in a DEFINE instruction))

     <namedVectorConstant>  ::= <identifier>    ((name of a vector constant
                                                  in a DEFINE instruction))

     <namedLocalParameter>  ::= <identifier>    ((name of a local parameter
                                                  in a DECLARE instruction))

     <negate>               ::= "-" | "+" | ""

     <optionalSign>         ::= "-" | "+" | ""

     <identifier>           ::= see text below

     <floatConstant>        ::= see text below


     The <identifier> rule matches a sequence of one or more letters ("A"
     through "Z", "a" through "z", "_", and "$") and digits ("0" through "9);
     the first character must be a letter.  The underscore ("_") and dollar
     sign ("$") count as a letters.  Upper and lower case letters are different
     (names are case-sensitive).

     The <floatConstant> rule matches a floating-point constant consisting
     of an integer part, a decimal point, a fraction part, an "e" or
     "E", and an optionally signed integer exponent.  The integer and
     fraction parts both consist of a sequence of on or more digits ("0"
     through "9").  Either the integer part or the fraction parts (not
     both) may be missing; either the decimal point or the "e" (or "E")
     and the exponent (not both) may be missing.

     A fragment program fails to load if it contains more than the maximum
     number of executable instructions.  If ARB_fragment_program is supported,
     this limit is the value of MAX_PROGRAM_INSTRUCTIONS_ARB for the
     FRAGMENT_PROGRAM_ARB target.  Otherwise, the limit is 1024.  Executable
     instructions are those matching the <instruction> rule in the grammar, and
     do not include DEFINE or DECLARE instructions.

     A fragment program fails to load if its total temporary and output
     register count exceeds 64.  Each fp32 temporary or output register used by
     the program (R0-R31, o[COLR], and o[DEPR]) counts as two registers; each
     fp16 temporary or output register used by the program (H0-H63 and o[COLH])
     count as a single register.

     A fragment program fails to load if any instruction sources more than one
     unique fragment attribute register.  Instructions sourcing the same
     attribute register multiple times are acceptable.

     A fragment program fails to load if any instruction sources more than one
     unique program parameter register.  Instructions sourcing the same program
     parameter multiple times are acceptable.

     A fragment program fails to load if multiple texture lookup instructions
     reference different targets for the same texture image unit.

     A fragment program fails to load if it writes to both the o[COLR] and
     o[COLH] output registers.

     The error INVALID_OPERATION is generated by LoadProgramNV if a fragment
     program fails to load because it is not syntactically correct or for one
     of the semantic restrictions listed above.

     The error INVALID_OPERATION is generated by LoadProgramNV if a program is
     loaded for id when id is currently loaded with a program of a different
     target.

     A successfully loaded fragment program is parsed into a sequence of
     instructions.  Each instruction is identified by its tokenized name.  The
     operation of these instructions when executed is defined in Sections
     3.11.4 and 3.11.5.


     Section 3.11.4, Fragment Program Operation

     There are forty-five fragment program instructions.  Fragment program
     instructions may have up to eight variants, including a suffix of "R",
     "H", or "X" to specify arithmetic precision (section 3.11.4.2), a suffix
     of "C" to allow an update of the condition code register (section
     3.11.4.4), and a suffix of "_SAT" to clamp the result vector components to
     the range [0,1] (section 3.11.4.4).  For example, the sixteen forms of the
     "ADD" instruction are "ADD", "ADDR", "ADDH", "ADDX", "ADDC", "ADDRC",
     "ADDHC", "ADDXC", "ADD_SAT", "ADDR_SAT", "ADDH_SAT", "ADDX_SAT",
     "ADDC_SAT", "ADDRC_SAT", "ADDHC_SAT", and "ADDXC_SAT".

     Some mathematical instructions that support precision suffixes, typically
     those that involve complicated floating-point computations, do not support
     the "X" precision suffix.

     The fragment program instructions and their respective input and output
     parameters are summarized in Table X.4.

       Instruction          Inputs  Output   Description
       -----------------    ------  ------   --------------------------------
       ADD[RHX][C][_SAT]    v,v     v        add
       COS[RH ][C][_SAT]    s       ssss     cosine
       DDX[RH ][C][_SAT]    v       v        derivative relative to x
       DDY[RH ][C][_SAT]    v       v        derivative relative to y
       DP3[RHX][C][_SAT]    v,v     ssss     3-component dot product
       DP4[RHX][C][_SAT]    v,v     ssss     4-component dot product
       DST[RH ][C][_SAT]    v,v     v        distance vector
       EX2[RH ][C][_SAT]    s       ssss     exponential base 2
       FLR[RHX][C][_SAT]    v       v        floor
       FRC[RHX][C][_SAT]    v       v        fraction
       KIL                  none    none     conditionally discard fragment
       LG2[RH ][C][_SAT]    s       ssss     logarithm base 2
       LIT[RH ][C][_SAT]    v       v        compute light coefficients
       LRP[RHX][C][_SAT]    v,v,v   v        linear interpolation
       MAD[RHX][C][_SAT]    v,v,v   v        multiply and add
       MAX[RHX][C][_SAT]    v,v     v        maximum
       MIN[RHX][C][_SAT]    v,v     v        minimum
       MOV[RHX][C][_SAT]    v       v        move
       MUL[RHX][C][_SAT]    v,v     v        multiply
       PK2H                 v       ssss     pack two 16-bit floats
       PK2US                v       ssss     pack two unsigned 16-bit scalars
       PK4B                 v       ssss     pack four signed 8-bit scalars
       PK4UB                v       ssss     pack four unsigned 8-bit scalars
       POW[RH ][C][_SAT]    s,s     ssss     exponentiation (x^y)
       RCP[RH ][C][_SAT]    s       ssss     reciprocal
       RFL[RH ][C][_SAT]    v,v     v        reflection vector
       RSQ[RH ][C][_SAT]    s       ssss     reciprocal square root
       SEQ[RHX][C][_SAT]    v,v     v        set on equal
       SFL[RHX][C][_SAT]    v,v     v        set on false
       SGE[RHX][C][_SAT]    v,v     v        set on greater than or equal
       SGT[RHX][C][_SAT]    v,v     v        set on greater than
       SIN[RH ][C][_SAT]    s       ssss     sine
       SLE[RHX][C][_SAT]    v,v     v        set on less than or equal
       SLT[RHX][C][_SAT]    v,v     v        set on less than
       SNE[RHX][C][_SAT]    v,v     v        set on not equal
       STR[RHX][C][_SAT]    v,v     v        set on true
       SUB[RHX][C][_SAT]    v,v     v        subtract
       TEX[C][_SAT]         v       v        texture lookup
       TXD[C][_SAT]         v,v,v   v        texture lookup w/partials
       TXP[C][_SAT]         v       v        projective texture lookup
       UP2H[C][_SAT]        s       v        unpack two 16-bit floats
       UP2US[C][_SAT]       s       v        unpack two unsigned 16-bit scalars
       UP4B[C][_SAT]        s       v        unpack four signed 8-bit scalars
       UP4UB[C][_SAT]       s       v        unpack four unsigned 8-bit scalars
       X2D[RH ][C][_SAT]    v,v,v   v        2D coordinate transformation

     Table X.4:  Summary of fragment program instructions.  "[RHX]" indicates
     an optional arithmetic precision suffix.  "[C]" indicates an optional
     condition code update suffix.  "[_SAT]" indicates an optional clamp of
     result vector components to [0,1].  "v" indicates a 4-component vector
     input or output, "s" indicates a scalar input, and "ssss" indicates a
     scalar output replicated across a 4-component vector.


     Section 3.11.4.1:  Fragment Program Storage Precision

     Registers in fragment program are stored in two different representations:
     16-bit floating-point (fp16) and 32-bit floating-point (fp32).  There is
     an additional 12-bit fixed-point representation (fx12) used only as an
     internal representation for instructions with the "X" precision qualifier.

     In the 32-bit float (fp32) representation, each component is represented
     in floating-point with eight exponent and twenty-three mantissa bits, as
     in the standard IEEE single-precision format.  If S represents the sign (0
     or 1), E represents the exponent in the range [0,255], and M represents
     the mantissa in the range [0,2^23-1], then an fp32 float is decoded as:

        (-1)^S * 0.0,                           if E == 0,
        (-1)^S * 2^(E-127) * (1 + M/2^23),      if 0 < E < 255,
        (-1)^S * INF,                           if E == 255 and M == 0,
        NaN,                                    if E == 255 and M != 0.

     INF (Infinity) is a special representation indicating numerical overflow.
     NaN (Not a Number) is a special representation indicating the result of
     illegal arithmetic operations, such as computing the square root or
     logarithm of a negative number.  Note that all normal fp32 values, zero,
     and INF have an associated sign.  -0.0 and +0.0 are considered equivalent
     for the purposes of comparisons.

     This representation is identical to the IEEE single-precision
     floating-point standard, except that no special representation is provided
     for denorms -- numbers in the range (-2^-126, +2^-126).  All such numbers
     are flushed to zero.

     In a 16-bit float (fp16) register, each component is represented
     similarly, except with only five exponent and ten mantissa bits.  If S
     represents the sign (0 or 1), E represents the exponent in the range
     [0,31], and M represents the mantissa in the range [0,2^10-1], then an
     fp32 float is decoded as:

        (-1)^S * 0.0,                           if E == 0 and M == 0,
        (-1)^S * 2^-14 * M/2^10                 if E == 0 and M != 0,
        (-1)^S * 2^(E-15) * (1 + M/2^10),       if 0 < E < 31,
        (-1)^S * INF,                           if E == 31 and M == 0, or
        NaN,                                    if E == 31 and M != 0.

     One important difference is that the fp16 representation, unlike fp32,
     supports denorms to maximize the limited precision of the 16-bit floating
     point encodings.

     In the 12-bit fixed-point (fx12) format, numbers are represented as signed
     12-bit two's complement integers with 10 fraction bits.  The range of
     representable values is [-2048/1024, +2047/1024].

     Section 3.11.4.2:  Fragment Program Operation Precision

     Fragment program instructions frequently perform mathematical operations.
     Such operations may be performed at one of three different precisions.
     Fragment programs can specify the precision of each instruction by using
     the precision suffix.  If an instruction has a suffix of "R", calculations
     are carried out with 32-bit floating point operands and results.  If an
     instruction has a suffix of "H", calculations are carried out using 16-bit
     floating point operands and results.  If an instruction has a suffix of
     "X", calculations are carried out using 12-bit fixed point operands and
     results.  For example, the instruction "MULR" performs a 32-bit
     floating-point multiply, "MULH" performs a 16-bit floating-point multiply,
     and "MULX" performs a 12-bit fixed-point multiply.  If no precision suffix
     is specified, calculations are carried out using the precision of the
     temporary register receiving the result.

     Fragment program instructions may source registers or constants whose
     precisions differ from the precision specified with the instruction.
     Instructions may also generate intermediate results with a different
     precision than that of the destination register.  In these cases, the
     values sourced are converted to the precision specified by the
     instruction.

     When converting to fx12 format, -INF and any values less than -2048/1024
     become -2048/1024.  +INF, and any values greater than +2047/1024 become
     +2047/1024.  NaN becomes 0.

     When converting to fp16 format, any values less than or equal to -2^16 are
     converted to -INF.  Any values greater than or equal to +2^16 are
     converted to +INF.  -INF, +INF, NaN, -0.0, and +0.0 are unchanged.  Any
     other values that are not exactly representable in fp16 format are
     converted to one of the two nearest representable values.

     When converting to fp32 format, any values less than or equal to -2^128
     are converted to -INF.  Any values greater than or equal to +2^128 are
     converted to +INF.  -INF, +INF, NaN, -0.0, and +0.0 are unchanged.  Any
     other values that are not exactly representable in fp32 format are
     converted to one of the two nearest representable values.

     Fragment program instructions using the fragment attribute registers
     f[FOGC] or f[TEX0] through f[TEX7] will be carried out at full fp32
     precision, regardless of the precision specified by the instruction.

     Section 3.11.4.3:  Fragment Program Operands

     Except for KIL, fragment program instructions operate on either vector or
     scalar operands, indicated in the grammar (see section 3.11.3) by the
     rules <vectorSrc> and <scalarSrc> respectively.

     The basic set of scalar operands is defined by the grammar rule
     <baseScalarSrc>.  Scalar operands can be scalar constants (embedded or
     named), or single components of vector constants, local parameters, or
     registers allowed by the <srcRegister> rule.  A vector component is
     selected by the <scalarSuffix> rule, where the characters "x", "y", "z",
     and "w" select the x, y, z, and w components, respectively, of the vector.

     The basic set of vector operands is defined by the grammar rule
     <baseVectorSrc>.  Vector operands can include vector constants, local
     parameters, or registers allowed by the <srcRegister> rule.

     Basic vector operands can be swizzled according to the <swizzleSuffix>
     rule.  In its most general form, the <swizzleSuffix> rule matches the
     pattern ".????" where each question mark is one of "x", "y", "z", or "w".
     For such patterns, the x, y, z, and w components of the operand are taken
     from the vector components named by the first, second, third, and fourth
     character of the pattern, respectively.  For example, if the swizzle
     suffix is ".yzzx" and the specified source contains {2,8,9,0}, the
     swizzled operand used by the instruction is {8,9,9,2}.  If the
     <swizzleSuffix> rule matches "", it is treated as though it were ".xyzw".

     Operands can optionally be negated according to the <negate> rule in
     <baseScalarSrc> or <baseVectorSrc>.  If the <negate> matches "-", each
     value is negated.

     The absolute value of operands can be taken if the <vectorSrc> or
     <scalarSrc> rules match <absScalarSrc> or <absVectorSrc>.  In this case,
     the absolute value of each component is taken.  In addition, if the
     <negate> rule in <absScalarSrc> or <absVectorSrc> matches "-", the result
     is then negated.

     Instructions requiring vector operands can also use scalar operands in the
     case where the <vectorSrc> rule matches <scalarSrc>.  In such cases, a
     4-component vector is produced by replicating the scalar.

     After operands are loaded, they are converted to a data type corresponding
     to the operation precision specified in the fragment program instruction.

     The following pseudo-code spells out the operand generation process.
     "SrcT" and "InstT" refer to the data types of the specified register or
     constant and the instruction, respectively.  "VecSrcT" and "VecInstT"
     refer to 4-component vectors of the corresponding type.  "absolute" is
     TRUE if the operand matches the <absScalarSrc> or <absVectorSrc> rules,
     and FALSE otherwise.  "negateBase" is TRUE if the <negate> rule in
     <baseScalarSrc> or <baseVectorSrc> matches "-" and FALSE otherwise.
     "negateAbs" is TRUE if the <negate> rule in <absScalarSrc> or
     <absVectorSrc> matches "-" and FALSE otherwise.  The ".c***", ".*c**",
     ".**c*", ".***c" modifiers refer to the x, y, z, and w components obtained
     by the swizzle operation.  TypeConvert() is assumed to convert a scalar of
     type SrcT to a scalar of type InstT using the type conversion process
     specified above.

       VecInstT VectorLoad(VecSrcT source)
       {
           VecSrcT srcVal;
           VecInstT convertedVal;

           srcVal.x = source.c***;
           srcVal.y = source.*c**;
           srcVal.z = source.**c*;
           srcVal.w = source.***c;
           if (negateBase) {
              srcVal.x = -srcVal.x;
              srcVal.y = -srcVal.y;
              srcVal.z = -srcVal.z;
              srcVal.w = -srcVal.w;
           }
           if (absolute) {
              srcVal.x = abs(srcVal.x);
              srcVal.y = abs(srcVal.y);
              srcVal.z = abs(srcVal.z);
              srcVal.w = abs(srcVal.w);
           }
           if (negateAbs) {
              srcVal.x = -srcVal.x;
              srcVal.y = -srcVal.y;
              srcVal.z = -srcVal.z;
              srcVal.w = -srcVal.w;
           }

           convertedVal.x = TypeConvert(srcVal.x);
           convertedVal.y = TypeConvert(srcVal.y);
           convertedVal.z = TypeConvert(srcVal.z);
           convertedVal.w = TypeConvert(srcVal.w);
           return convertedVal;
       }

       InstT ScalarLoad(VecSrcT source)
       {
           SrcT srcVal;
           InstT convertedVal;

           srcVal = source.c***;
           if (negateBase) {
             srcVal = -srcVal;
           }
           if (absolute) {
              srcVal = abs(srcVal);
           }
           if (negateAbs) {
             srcVal = -srcVal;
           }

           convertedVal = TypeConvert(srcVal);
           return convertedVal;
       }


     Section 3.11.4.4, Fragment Program Destination Register Update

     Each fragment program instruction, except for KIL, writes a 4-component
     result vector to a single temporary or output register.

     The four components of the result vector are first optionally clamped to
     the range [0,1].  The components will be clamped if and only if the result
     clamp suffix "_SAT" is present in the instruction name.  The instruction
     "ADD_SAT" will clamp the results to [0,1]; the otherwise equivalent
     instruction "ADD" will not.

     Since the instruction may be carried out at a different precision than the
     destination register, the components of the results vector are then
     converted to the data type corresponding to destination register.

     Writes to individual components of the temporary register are controlled
     by two sets of enables: individual component write masks specified as part
     of the instruction and the optional condition code mask.

     The component write mask is specified by the <optionalWriteMask> rule
     found in the <maskedDstReg> rule.  If the optional mask is "", all
     components are enabled.  Otherwise, the optional mask names the individual
     components to enable.  The characters "x", "y", "z", and "w" match the x,
     y, z, and w components respectively.  For example, an optional mask of
     ".xzw" indicates that the x, z, and w components should be enabled for
     writing but the y component should not.  The grammar requires that the
     destination register mask components must be listed in "xyzw" order.

     The optional condition code mask is specified by the <optionalCCMask> rule
     found in the <maskedDstReg> rule.  If <optionalCCMask> matches "", all
     components are enabled.  Otherwise, the condition code register is loaded
     and swizzled according to the swizzling specified by <swizzleSuffix>.
     Each component of the swizzled condition code is tested according to the
     rule given by <ccMaskRule>.  <ccMaskRule> may have the values "EQ", "NE",
     "LT", "GE", LE", or "GT", which mean to enable writes if the corresponding
     condition code field evaluates to equal, not equal, less than, greater
     than or equal, less than or equal, or greater than, respectively.
     Comparisons involving condition codes of "UN" (unordered) evaluate to true
     for "NE" and false otherwise.  For example, if the condition code is
     (GT,LT,EQ,GT) and the condition code mask is "(NE.zyxw)", the swizzle
     operation will load (EQ,LT,GT,GT) and the mask will thus will enable
     writes on the y, z, and w components.  In addition, "TR" always enables
     writes and "FL" always disables writes, regardless of the condition code.

     Each component of the destination register is updated with the result of
     the fragment program if and only if the component is enabled for writes by
     both the component write mask and the optional condition code mask.
     Otherwise, the component of the destination register remains unchanged.

     A fragment program instruction can also optionally update the condition
     code register.  The condition code is updated if the condition code
     register update suffix "C" is present in the instruction name.  The
     instruction "ADDC" will update the condition code; the otherwise
     equivalent instruction "ADD" will not.  If condition code updates are
     enabled, each component of the destination register enabled for writes is
     compared to zero.  The corresponding component of the condition code is
     set to "LT", "EQ", or "GT", if the written component is less than, equal
     to, or greater than zero, respectively.  Condition code components are set
     to "UN" if the written component is NaN.  Note that values of -0.0 and
     +0.0 both evaluate to "EQ".  If a component of the destination register is
     not enabled for writes, the corresponding condition code component is
     unchanged.

     In the following example code,

         # R1=(-2, 0, 2, NaN)              R0                  CC
         MOVC R0, R1;               # ( -2,  0,   2, NaN) (LT,EQ,GT,UN)
         MOVC R0.xyz, R1.yzwx;      # (  0,  2, NaN, NaN) (EQ,GT,UN,UN)
         MOVC R0 (NE), R1.zywx;     # (  0,  0, NaN,  -2) (EQ,EQ,UN,LT)

     the first instruction writes (-2,0,2,NaN) to R0 and updates the condition
     code to (LT,EQ,GT,UN).  The second instruction, only the "x", "y", and "z"
     components of R0 and the condition code are updated, so R0 ends up with
     (0,2,NaN,NaN) and the condition code ends up with (EQ,GT,UN,UN).  In the
     third instruction, the condition code mask disables writes to the x
     component (its condition code field is "EQ"), so R0 ends up with
     (0,0,NaN,-2) and the condition code ends up with (EQ,EQ,UN,LT).

     The following pseudocode illustrates the process of writing a result
     vector to the destination register.  In the example, "ccMaskRule" refers
     to the condition code mask rule given by <ccMaskRule> (or "" if no rule is
     specified), "instrmask" refers to the component write mask given by the
     <optionalWriteMask> rule, "updatecc" is TRUE if condition code updates are
     enabled, and "clamp01" is TRUE if [0,1] result clamping is enabled.
     "destination" and "cc" refer to the register selected by <dstRegister> and
     the condition code, respectively.

       boolean TestCC(CondCode field) {
           switch (ccMaskRule) {
           case "EQ":  return (field == "EQ");
           case "NE":  return (field != "EQ");
           case "LT":  return (field == "LT");
           case "GE":  return (field == "GT" || field == "EQ");
           case "LE":  return (field == "LT" || field == "EQ");
           case "GT":  return (field == "GT");
           case "TR":  return TRUE;
           case "FL":  return FALSE;
           case "":    return TRUE;
       }

       enum GenerateCC(DstT value) {
         if (value == NaN) {
           return UN;
         } else if (value < 0) {
           return LT;
         } else if (value == 0) {
           return EQ;
         } else {
           return GT;
         }
       }

       void UpdateDestination(VecDstT destination, VecInstT result)
       {
           // Load the original destination register and condition code.
           VecDstT resultDst;
           VecDstT merged;
           VecCC   mergedCC;

           // Clamp the result vector components to [0,1], if requested.
           if (clamp01) {
               if (result.x < 0)      result.x = 0;
               else if (result.x > 1) result.x = 1;
               if (result.y < 0)      result.y = 0;
               else if (result.y > 1) result.y = 1;
               if (result.z < 0)      result.z = 0;
               else if (result.z > 1) result.z = 1;
               if (result.w < 0)      result.w = 0;
               else if (result.w > 1) result.w = 1;
           }

           // Convert the result to the type of the destination register.
           resultDst.x = TypeConvert(result.x);
           resultDst.y = TypeConvert(result.y);
           resultDst.z = TypeConvert(result.z);
           resultDst.w = TypeConvert(result.w);

           // Merge the converted result into the destination register, under
           // control of the compile- and run-time write masks.
           merged = destination;
           mergedCC = cc;
           if (instrMask.x && TestCC(cc.c***)) {
               merged.x = result.x;
               if (updatecc) mergedCC.x = GenerateCC(result.x);
           }
           if (instrMask.y && TestCC(cc.*c**)) {
               merged.y = result.y;
               if (updatecc) mergedCC.y = GenerateCC(result.y);
           }
           if (instrMask.z && TestCC(cc.**c*)) {
               merged.z = result.z;
               if (updatecc) mergedCC.z = GenerateCC(result.z);
           }
           if (instrMask.w && TestCC(cc.***c)) {
               merged.w = result.w;
               if (updatecc) mergedCC.w = GenerateCC(result.w);
           }

           // Write out the new destination register and result code.
           destination = merged;
           cc = mergedCC;
       }

     Section 3.11.5, Fragment Program Instruction Set

     The following sections describe the instruction set available to fragment
     programs.


     Section 3.11.5.1,  ADD:  Add

     The ADD instruction performs a component-wise add of the two operands to
     yield a result vector.

       tmp0 = VectorLoad(op0);
       tmp1 = VectorLoad(op1);
       result.x = tmp0.x + tmp1.x;
       result.y = tmp0.y + tmp1.y;
       result.z = tmp0.z + tmp1.z;
       result.w = tmp0.w + tmp1.w;

     The following special-case rules apply to addition:

       1. "A+B" is always equivalent to "B+A".
       2. NaN + <x> = NaN, for all <x>.
       3. +INF + <x> = +INF, for all <x> except NaN and -INF.
       4. -INF + <x> = -INF, for all <x> except NaN and +INF.
       5. +INF + -INF = NaN.
       6. -0.0 + <x> = <x>, for all <x>.
       7. +0.0 + <x> = <x>, for all <x> except -0.0.


     Section 3.11.5.2,  COS:  Cosine

     The COS instruction approximates the cosine of the angle specified by the
     scalar operand and replicates the approximation to all four components of
     the result vector.  The angle is specified in radians and does not have to
     be in the range [0,2*PI].

       tmp = ScalarLoad(op0);
       result.x = ApproxCosine(tmp);
       result.y = ApproxCosine(tmp);
       result.z = ApproxCosine(tmp);
       result.w = ApproxCosine(tmp);

     The approximation function ApproxCosine is accurate to at least 22 bits
     with an angle in the range [0,2*PI].

       | ApproxCosine(x) - cos(x) | < 1.0 / 2^22, if 0.0 <= x < 2.0 * PI.

     The error in the approximation will typically increase with the absolute
     value of the angle when the angle falls outside the range [0,2*PI].

     The following special-case rules apply to cosine approximation:

       1. ApproxCosine(NaN) = NaN.
       2. ApproxCosine(+/-INF) = NaN.
       3. ApproxCosine(+/-0.0) = +1.0.


     Section 3.11.5.3,  DDX:  Derivative Relative to X

     The DDX instruction computes approximate partial derivatives of the four
     components of the single operand with respect to the X window coordinate
     to yield a result vector.  The partial derivative is evaluated at the
     center of the pixel.

       f = VectorLoad(op0);
       result = ComputePartialX(f);

     Note that the partial derivates obtained by this instruction are
     approximate, and derivative-of-derivate instruction sequences may not
     yield accurate second derivatives.

     For components with partial derivatives that overflow (including +/-INF
     inputs), the resulting partials may be encoded as large floating-point
     numbers instead of +/-INF.


     Section 3.11.5.4,  DDY:  Derivative Relative to Y

     The DDY instruction computes approximate partial derivatives of the four
     components of the single operand with respect to the Y window coordinate
     to yield a result vector.  The partial derivative is evaluated at the
     center of the pixel.

       f = VectorLoad(op0);
       result = ComputePartialY(f);

     Note that the partial derivates obtained by this instruction are
     approximate, and derivative-of-derivate instruction sequences may not
     yield accurate second derivatives.

     For components with partial derivatives that overflow (including +/-INF
     inputs), the resulting partials may be encoded as large floating-point
     numbers instead of +/-INF.


     Section 3.11.5.5,  DP3:  3-Component Dot Product

     The DP3 instruction computes a three component dot product of the two
     operands (using the x, y, and z components) and replicates the dot product
     to all four components of the result vector.

       tmp0 = VectorLoad(op0);
       tmp1 = VectorLoad(op1):
       result.x = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +
                  (tmp0.z * tmp2.z);
       result.y = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +
                  (tmp0.z * tmp2.z);
       result.z = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +
                  (tmp0.z * tmp2.z);
       result.w = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +
                  (tmp0.z * tmp2.z);


     Section 3.11.5.6,  DP4:  4-Component Dot Product

     The DP4 instruction computes a four component dot product of the two
     operands and replicates the dot product to all four components of the
     result vector.

       tmp0 = VectorLoad(op0);
       tmp1 = VectorLoad(op1):
       result.x = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +
                  (tmp0.z * tmp2.z) + (tmp0.w * tmp1.w);
       result.y = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +
                  (tmp0.z * tmp2.z) + (tmp0.w * tmp1.w);
       result.z = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +
                  (tmp0.z * tmp2.z) + (tmp0.w * tmp1.w);
       result.w = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +
                  (tmp0.z * tmp2.z) + (tmp0.w * tmp1.w);


     Section 3.11.5.7,  DST:  Distance Vector

     The DST instruction computes a distance vector from two specially-
     formatted operands.  The first operand should be of the form [NA, d^2,
     d^2, NA] and the second operand should be of the form [NA, 1/d, NA, 1/d],
     where NA values are not relevant to the calculation and d is a vector
     length.  If both vectors satisfy these conditions, the result vector will
     be of the form [1.0, d, d^2, 1/d].

     The exact behavior is specified in the following pseudo-code:

       tmp0 = VectorLoad(op0);
       tmp1 = VectorLoad(op1);
       result.x = 1.0;
       result.y = tmp0.y * tmp1.y;
       result.z = tmp0.z;
       result.w = tmp1.w;

     Given an arbitrary vector, d^2 can be obtained using the DOT3 instruction
     (using the same vector for both operands) and 1/d can be obtained from d^2
     using the RSQ instruction.

     This distance vector is useful for per-fragment light attenuation
     calculations:  a DOT3 operation involving the distance vector and an
     attenuation constants vector will yield the attenuation factor.


     Section 3.11.5.8,  EX2:  Exponential Base 2

     The EX2 instruction approximates 2 raised to the power of the scalar
     operand and replicates it to all four components of the result
     vector.

       tmp = ScalarLoad(op0);
       result.x = Approx2ToX(tmp);
       result.y = Approx2ToX(tmp);
       result.z = Approx2ToX(tmp);
       result.w = Approx2ToX(tmp);

     The approximation function is accurate to at least 22 bits:

       | Approx2ToX(x) - 2^x | < 1.0 / 2^22, if 0.0 <= x < 1.0,

     and, in general,

       | Approx2ToX(x) - 2^x | < (1.0 / 2^22) * (2^floor(x)).

     The following special-case rules apply to exponential approximation:

       1. Approx2ToX(NaN) = NaN.
       2. Approx2ToX(-INF) = +0.0.
       3. Approx2ToX(+INF) = +INF.
       4. Approx2ToX(+/-0.0) = +1.0.


     Section 3.11.5.9,  FLR:  Floor

     The FLR instruction performs a component-wise floor operation on the
     operand to generate a result vector.  The floor of a value is defined as
     the largest integer less than or equal to the value.  The floor of 2.3 is
     2.0; the floor of -3.6 is -4.0.

       tmp = VectorLoad(op0);
       result.x = floor(tmp.x);
       result.y = floor(tmp.y);
       result.z = floor(tmp.z);
       result.w = floor(tmp.w);

     The following special-case rules apply to floor computation:

       1. floor(NaN) = NaN.
       2. floor(<x>) = <x>, for -0.0, +0.0, -INF, and +INF.  In all cases, the
          sign of the result is equal to the sign of the operand.


     Section 3.11.5.10,  FRC:  Fraction

     The FRC instruction extracts the fractional portion of each component of
     the operand to generate a result vector.  The fractional portion of a
     component is defined as the result after subtracting off the floor of the
     component (see FLR), and is always in the range [0.00, 1.00).

     For negative values, the fractional portion is NOT the number written to
     the right of the decimal point -- the fractional portion of -1.7 is not
     0.7 -- it is 0.3.  0.3 is produced by subtracting the floor of -1.7 (-2.0)
     from -1.7.

       tmp = VectorLoad(op0);
       result.x = tmp.x - floor(tmp.x);
       result.y = tmp.y - floor(tmp.y);
       result.z = tmp.z - floor(tmp.z);
       result.w = tmp.w - floor(tmp.w);

     The following special-case rules, which can be derived from the rules for
     FLR and ADD apply to fraction computation:

       1. fraction(NaN) = NaN.
       2. fraction(+/-INF) = NaN.
       3. fraction(+/-0.0) = +0.0.


     Section 3.11.5.11,  KIL:  Conditionally Discard Fragment

     The KIL instruction is unlike any other instruction in the instruction
     set.  This instruction evaluates components of a swizzled condition code
     using a test expression identical to that used to evaluate condition code
     write masks (Section 3.11.4.4).  If any condition code component evaluates
     to TRUE, the fragment is discarded.  Otherwise, the instruction has no
     effect.  The condition code components are specified, swizzled, and
     evaluated in the same manner as the condition code write mask.

       if (TestCC(rc.c***) || TestCC(rc.*c**) ||
           TestCC(rc.**c*) || TestCC(rc.***c)) {
          // Discard the fragment.
       } else {
         // Do nothing.
       }

     If the fragment is discarded, it is treated as though it were not produced
     by rasterization.  In particular, none of the per-fragment operations
     (such as stencil tests, blends, stencil, depth, or color buffer writes)
     are performed on the fragment.


     Section 3.11.5.12,  LG2:  Logarithm Base 2

     The LG2 instruction approximates the base 2 logarithm of the scalar
     operand and replicates it to all four components of the result vector.

       tmp = ScalarLoad(op0);
       result.x = ApproxLog2(tmp);
       result.y = ApproxLog2(tmp);
       result.z = ApproxLog2(tmp);
       result.w = ApproxLog2(tmp);

     The approximation function is accurate to at least 22 bits:

       | ApproxLog2(x) - log_2(x) | < 1.0 / 2^22.

     Note that for large values of x, there are not enough bits in the
     floating-point storage format to represent a result that precisely.

     The following special-case rules apply to logarithm approximation:

       1. ApproxLog2(NaN) = NaN.
       2. ApproxLog2(+INF) = +INF.
       3. ApproxLog2(+/-0.0) = -INF.
       4. ApproxLog2(x) = NaN, -INF < x < -0.0.
       5. ApproxLog2(-INF) = NaN.


     Section 3.11.5.13,  LIT:  Compute Light Coefficients

     The LIT instruction accelerates per-fragment lighting by computing
     lighting coefficients for ambient, diffuse, and specular light
     contributions.  The "x" component of the operand is assumed to hold a
     diffuse dot product (n dot VP_pli, as in the vertex lighting equations in
     Section 2.13.1).  The "y" component of the operand is assumed to hold a
     specular dot product (n dot h_i).  The "w" component of the operand is
     assumed to hold the specular exponent of the material (s_rm).

     The "x" component of the result vector receives the value that should be
     multiplied by the ambient light/material product (always 1.0).  The "y"
     component of the result vector receives the value that should be
     multiplied by the diffuse light/material product (n dot VP_pli).  The "z"
     component of the result vector receives the value that should be
     multiplied by the specular light/material product (f_i * (n dot h_i) ^
     s_rm).  The "w" component of the result is the constant 1.0.

     Negative diffuse and specular dot products are clamped to 0.0, as is done
     in the standard per-vertex lighting operations.  In addition, if the
     diffuse dot product is zero or negative, the specular coefficient is
     forced to zero.

       tmp = VectorLoad(op0);
       if (t.x < 0) t.x = 0;
       if (t.y < 0) t.y = 0;
       result.x = 1.0;
       result.y = t.x;
       result.z = (t.x > 0) ? ApproxPower(t.y, t.w) : 0.0;
       result.w = 1.0;

     The exponentiation approximation used to compute result.z are identical to
     that used in the POW instruction, including errors and the processing of
     any special cases.


     Section 3.11.5.14,  LRP:  Linear Interpolation

     The LRP instruction performs a component-wise linear interpolation to
     yield a result vector.  It interpolates between the components of the
     second and third operands, using the first operand as a weight.

       tmp0 = VectorLoad(op0);
       tmp1 = VectorLoad(op1);
       tmp2 = VectorLoad(op2);
       result.x = tmp0.x * tmp1.x + (1 - tmp0.x) * tmp2.x;
       result.y = tmp0.y * tmp1.y + (1 - tmp0.y) * tmp2.y;
       result.z = tmp0.z * tmp1.z + (1 - tmp0.z) * tmp2.z;
       result.w = tmp0.w * tmp1.w + (1 - tmp0.w) * tmp2.w;


     Section 3.11.5.15,  MAD:  Multiply and Add

     The MAD instruction performs a component-wise multiply of the first two
     operands, and then does a component-wise add of the product to the third
     operand to yield a result vector.

       tmp0 = VectorLoad(op0);
       tmp1 = VectorLoad(op1);
       tmp2 = VectorLoad(op2);
       result.x = tmp0.x * tmp1.x + tmp2.x;
       result.y = tmp0.y * tmp1.y + tmp2.y;
       result.z = tmp0.z * tmp1.z + tmp2.z;
       result.w = tmp0.w * tmp1.w + tmp2.w;


     Section 3.11.5.16,  MAX:  maximum

     The MAX instruction computes component-wise maximums of the values in the
     two operands to yield a result vector.

       tmp0 = VectorLoad(op0);
       tmp1 = VectorLoad(op1);
       result.x = max(tmp0.x, tmp1.x);
       result.y = max(tmp0.y, tmp1.y);
       result.z = max(tmp0.z, tmp1.z);
       result.w = max(tmp0.w, tmp1.w);

     The following special cases apply to the maximum operation:

       1. max(A,B) is always equivalent to max(B,A).
       2. max(NaN, <x>) == NaN, for all <x>.


     Section 3.11.5.17,  MIN:  minimum

     The MIN instruction computes component-wise minimums of the values in the
     two operands to yield a result vector.

       tmp0 = VectorLoad(op0);
       tmp1 = VectorLoad(op1);
       result.x = min(tmp0.x, tmp1.x);
       result.y = min(tmp0.y, tmp1.y);
       result.z = min(tmp0.z, tmp1.z);
       result.w = min(tmp0.w, tmp1.w);

     The following special cases apply to the minimum operation:

       1. min(A,B) is always equivalent to min(B,A).
       2. min(NaN, <x>) == NaN, for all <x>.


     Section 3.11.5.18,  MOV:  Move

     The MOV instruction copies the value of the operand to yield a result
     vector.

       result = VectorLoad(op0);


     Section 3.11.5.19,  MUL:  Multiply

     The MUL instruction performs a component-wise multiply of the two operands
     to yield a result vector.

       tmp0 = VectorLoad(op0);
       tmp1 = VectorLoad(op1);
       result.x = tmp0.x * tmp1.x;
       result.y = tmp0.y * tmp1.y;
       result.z = tmp0.z * tmp1.z;
       result.w = tmp0.w * tmp1.w;

     The following special-case rules apply to multiplication:

       1. "A*B" is always equivalent to "B*A".
       2. NaN * <x> = NaN, for all <x>.
       3. +/-0.0 * +/-INF = NaN.
       4. +/-0.0 * <x> = +/-0.0, for all <x> except -INF, +INF, and NaN.  The
          sign of the result is positive if the signs of the two operands match
          and negative otherwise.
       5. +/-INF * <x> = +/-INF, for all <x> except -0.0, +0.0, and NaN.  The
          sign of the result is positive if the signs of the two operands match
          and negative otherwise.
       6. +1.0 * <x> = <x>, for all <x>.


     Section 3.11.5.20,  PK2H:  Pack Two 16-bit Floats

     The PK2H instruction converts the "x" and "y" components of the single
     operand into 16-bit floating-point format, packs the bit representation of
     these two floats into a 32-bit value, and replicates that value to all
     four components of the result vector.  The PK2H instruction can be
     reversed by the UP2H instruction below.

       tmp0 = VectorLoad(op0);
       /* result obtained by combining raw bits of tmp0.x, tmp0.y */
       result.x = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16);
       result.y = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16);
       result.z = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16);
       result.w = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16);

     The result must be written to a register with 32-bit components (an "R"
     register, o[COLR], or o[DEPR]).  A fragment program will fail to load if
     any other register type is specified.


     Section 3.11.5.21,  PK2US:  Pack Two Unsigned 16-bit Scalars

     The PK2US instruction converts the "x" and "y" components of the single
     operand into a packed pair of 16-bit unsigned scalars.  The scalars are
     represented in a bit pattern where all '0' bits corresponds to 0.0 and all
     '1' bits corresponds to 1.0.  The bit representations of the two converted
     components are packed into a 32-bit value, and that value is replicated to
     all four components of the result vector.  The PK2US instruction can be
     reversed by the UP2US instruction below.

       tmp0 = VectorLoad(op0);
       if (tmp0.x < 0.0) tmp0.x = 0.0;
       if (tmp0.x > 1.0) tmp0.x = 1.0;
       if (tmp0.y < 0.0) tmp0.y = 0.0;
       if (tmp0.y > 1.0) tmp0.y = 1.0;
       us.x = round(65535.0 * tmp0.x);  /* us is a ushort vector */
       us.y = round(65535.0 * tmp0.y);
       /* result obtained by combining raw bits of us. */
       result.x = ((us.x) | (us.y << 16));
       result.y = ((us.x) | (us.y << 16));
       result.z = ((us.x) | (us.y << 16));
       result.w = ((us.x) | (us.y << 16));

     The result must be written to a register with 32-bit components (an "R"
     register, o[COLR], or o[DEPR]).  A fragment program will fail to load if
     any other register type is specified.


     Section 3.11.5.22,  PK4B:  Pack Four Signed 8-bit Scalars

     The PK4B instruction converts the four components of the single operand
     into 8-bit signed quantities.  The signed quantities are represented in a
     bit pattern where all '0' bits corresponds to -128/127 and all '1' bits
     corresponds to +127/127.  The bit representations of the four converted
     components are packed into a 32-bit value, and that value is replicated to
     all four components of the result vector.  The PK4B instruction can be
     reversed by the UP4B instruction below.

       tmp0 = VectorLoad(op0);
       if (tmp0.x < -128/127) tmp0.x = -128/127;
       if (tmp0.y < -128/127) tmp0.y = -128/127;
       if (tmp0.z < -128/127) tmp0.z = -128/127;
       if (tmp0.w < -128/127) tmp0.w = -128/127;
       if (tmp0.x > +127/127) tmp0.x = +127/127;
       if (tmp0.y > +127/127) tmp0.y = +127/127;
       if (tmp0.z > +127/127) tmp0.z = +127/127;
       if (tmp0.w > +127/127) tmp0.w = +127/127;
       ub.x = round(127.0 * tmp0.x + 128.0);  /* ub is a ubyte vector */
       ub.y = round(127.0 * tmp0.y + 128.0);
       ub.z = round(127.0 * tmp0.z + 128.0);
       ub.w = round(127.0 * tmp0.w + 128.0);
       /* result obtained by combining raw bits of ub. */
       result.x = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
       result.y = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
       result.z = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
       result.w = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));

     The result must be written to a register with 32-bit components (an "R"
     register, o[COLR], or o[DEPR]).  A fragment program will fail to load if
     any other register type is specified.


     Section 3.11.5.23,  PK4UB:  Pack Four Unsigned 8-bit Scalars

     The PK4UB instruction converts the four components of the single operand
     into a packed grouping of 8-bit unsigned scalars.  The scalars are
     represented in a bit pattern where all '0' bits corresponds to 0.0 and all
     '1' bits corresponds to 1.0.  The bit representations of the four
     converted components are packed into a 32-bit value, and that value is
     replicated to all four components of the result vector.  The PK4UB
     instruction can be reversed by the UP4UB instruction below.

       tmp0 = VectorLoad(op0);
       if (tmp0.x < 0.0) tmp0.x = 0.0;
       if (tmp0.x > 1.0) tmp0.x = 1.0;
       if (tmp0.y < 0.0) tmp0.y = 0.0;
       if (tmp0.y > 1.0) tmp0.y = 1.0;
       if (tmp0.z < 0.0) tmp0.z = 0.0;
       if (tmp0.z > 1.0) tmp0.z = 1.0;
       if (tmp0.w < 0.0) tmp0.w = 0.0;
       if (tmp0.w > 1.0) tmp0.w = 1.0;
       ub.x = round(255.0 * tmp0.x);  /* ub is a ubyte vector */
       ub.y = round(255.0 * tmp0.y);
       ub.z = round(255.0 * tmp0.z);
       ub.w = round(255.0 * tmp0.w);
       /* result obtained by combining raw bits of ub. */
       result.x = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
       result.y = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
       result.z = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
       result.w = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));

     The result must be written to a register with 32-bit components (an "R"
     register, o[COLR], or o[DEPR]).  A fragment program will fail to load if
     any other register type is specified.


     Section 3.11.5.24,  POW:  Exponentiation

     The POW instruction approximates the value of the first scalar operand
     raised to the power of the second scalar operand and replicates it to all
     four components of the result vector.

       tmp0 = ScalarLoad(op0);
       tmp1 = ScalarLoad(op1);
       result.x = ApproxPower(tmp0, tmp1);
       result.y = ApproxPower(tmp0, tmp1);
       result.z = ApproxPower(tmp0, tmp1);
       result.w = ApproxPower(tmp0, tmp1);

     The exponentiation approximation function is defined in terms of the base
     2 exponentiation and logarithm approximation operations in the EX2 and LG2
     instructions, including errors and the processing of any special cases.
     In particular,

       ApproxPower(a,b) = ApproxExp2(b * ApproxLog2(a)).

     The following special-case rules, which can be derived from the rules in
     the LG2, MUL, and EX2 instructions, apply to exponentiation:

       1. ApproxPower(<x>, <y>) = NaN, if x < -0.0,
       2. ApproxPower(<x>, <y>) = NaN, if x or y is NaN.
       3. ApproxPower(+/-0.0, +/-0.0) = NaN.
       4. ApproxPower(+INF, +/-0.0) = NaN.
       5. ApproxPower(+1.0, +/-INF) = NaN.
       6. ApproxPower(+/-0.0, <x>) = +0.0, if x > +0.0.
       7. ApproxPower(+/-0.0, <x>) = +INF, if x < -0.0.
       8. ApproxPower(+1.0, <x>)   = +1.0, if -INF < x < +INF.
       9. ApproxPower(+INF, <x>) = +INF, if x > +0.0.
       10. ApproxPower(+INF, <x>) = +INF, if x < -0.0.
       11. ApproxPower(<x>, +/-0.0) = +1.0, if +0.0 < x < +INF.
       12. ApproxPower(<x>, +1.0) ~= <x>, if x >= +0.0.
       13. ApproxPower(<x>, +INF) = +0.0, if -0.0 <= x < +1.0,
                                    +INF, if x > +1.0,
       14. ApproxPower(<x>, -INF) = +INF, if -0.0 <= x < +1.0,
                                    +0.0, if x > +1.0,

     Note that 0^0 is defined here as NaN, since ApproxLog2(0) = -INF, and
     0*(-INF) = NaN.  In many other applications, including the standard C
     pow() function, 0^0 is defined as 1.0.  This behavior can be emulated
     using additional instructions in much that same way that the pow()
     function is implemented on many CPUs.

     Note that a logarithm is involved even if the exponent is an integer.
     This means that any exponentiating with a negative base will produce NaN.
     In constrast, it is possible in a "normal" mathematical formulation to
     raise negative numbers to integral powers (e.g., (-3)^2== 9, and
     (-0.5)^-2==4).


     Section 3.11.5.25,  RCP:  Reciprocal

     The RCP instruction approximates the reciprocal of the scalar operand and
     replicates it to all four components of the result vector.

       tmp = ScalarLoad(op0);
       result.x = ApproxReciprocal(tmp);
       result.y = ApproxReciprocal(tmp);
       result.z = ApproxReciprocal(tmp);
       result.w = ApproxReciprocal(tmp);

     The approximation function is accurate to at least 22 bits:

       | ApproxReciprocal(x) - (1/x) | < 1.0 / 2^22, if 1.0 <= x < 2.0.

     The following special-case rules apply to reciprocation:

       1. ApproxReciprocal(NaN) = NaN.
       2. ApproxReciprocal(+INF) = +0.0.
       3. ApproxReciprocal(-INF) = -0.0.
       4. ApproxReciprocal(+0.0) = +INF.
       5. ApproxReciprocal(-0.0) = -INF.


     Section 3.11.5.26,  RFL:  Reflection Vector

     The RFL instruction computes the reflection of the second vector operand
     (the "direction" vector) about the vector specified by the first vector
     operand (the "axis" vector).  Both operands are treated as 3D vectors (the
     w components are ignored).  The result vector is another 3D vector (the
     "reflected direction" vector).  The length of the result vector, ignoring
     rounding errors, should equal that of the second operand.

       axis = VectorLoad(op0);
       direction = VectorLoad(op1);
       tmp.w = (axis.x * axis.x + axis.y * axis.y +
                axis.z * axis.z);
       tmp.x = (axis.x * direction.x + axis.y * direction.y +
                axis.z * direction.z);
       tmp.x = 2.0 * tmp.x;
       tmp.x = tmp.x / tmp.w;
       result.x = tmp.x * axis.x - direction.x;
       result.y = tmp.x * axis.y - direction.y;
       result.z = tmp.x * axis.z - direction.z;

     A fragment program will fail to load if the w component of the result is
     enabled in the component write mask (see the <optionalWriteMask> rule in
     the grammar).


     Section 3.11.5.27,  RSQ:  Reciprocal Square Root

     The RSQ instruction approximates the reciprocal of the square root of the
     scalar operand and replicates it to all four components of the result
     vector.

       tmp = ScalarLoad(op0);
       result.x = ApproxRSQRT(tmp);
       result.y = ApproxRSQRT(tmp);
       result.z = ApproxRSQRT(tmp);
       result.w = ApproxRSQRT(tmp);

     The approximation function is accurate to at least 22 bits:

       | ApproxRSQRT(x) - (1/x) | < 1.0 / 2^22, if 1.0 <= x < 4.0.

     The following special-case rules apply to reciprocal square roots:

       1. ApproxRSQRT(NaN) = NaN.
       2. ApproxRSQRT(+INF) = +0.0.
       3. ApproxRSQRT(-INF) = NaN.
       4. ApproxRSQRT(+0.0) = +INF.
       5. ApproxRSQRT(-0.0) = -INF.
       6. ApproxRSQRT(x) = NaN, if -INF < x < -0.0.


     Section 3.11.5.28,  SEQ:  Set on Equal To

     The SEQ instruction performs a component-wise comparison of the two
     operands.  Each component of the result vector is 1.0 if the corresponding
     component of the first operand is equal to that of the second, and 0.0
     otherwise.

       tmp0 = VectorLoad(op0);
       tmp1 = VectorLoad(op1);
       result.x = (tmp0.x == tmp1.x) ? 1.0 : 0.0;
       result.y = (tmp0.y == tmp1.y) ? 1.0 : 0.0;
       result.z = (tmp0.z == tmp1.z) ? 1.0 : 0.0;
       result.w = (tmp0.w == tmp1.w) ? 1.0 : 0.0;

     The following special-case rules apply to SEQ:

       1. (<x> == <y>) and (<y> == <x>) always produce the same result.
       1. (NaN == <x>) is FALSE for all <x>, including NaN.
       2. (+INF == +INF) and (-INF == -INF) are TRUE.
       3. (-0.0 == +0.0) and (+0.0 == -0.0) are TRUE.


     Section 3.11.5.29,  SFL:  Set on False

     The SFL instruction is a degenerate case of the other "Set on"
     instructions that sets all components of the result vector to
     0.0.

       result.x = 0.0;
       result.y = 0.0;
       result.z = 0.0;
       result.w = 0.0;


     Section 3.11.5.30,  SGE:  Set on Greater Than or Equal

     The SGE instruction performs a component-wise comparison of the two
     operands.  Each component of the result vector is 1.0 if the corresponding
     component of the first operands is greater than or equal that of the
     second, and 0.0 otherwise.

       tmp0 = VectorLoad(op0);
       tmp1 = VectorLoad(op1);
       result.x = (tmp0.x >= tmp1.x) ? 1.0 : 0.0;
       result.y = (tmp0.y >= tmp1.y) ? 1.0 : 0.0;
       result.z = (tmp0.z >= tmp1.z) ? 1.0 : 0.0;
       result.w = (tmp0.w >= tmp1.w) ? 1.0 : 0.0;

     The following special-case rules apply to SGE:

       1. (NaN >= <x>) and (<x> >= NaN) are FALSE for all <x>.
       2. (+INF >= +INF) and (-INF >= -INF) are TRUE.
       3. (-0.0 >= +0.0) and (+0.0 >= -0.0) are TRUE.


     Section 3.11.5.31,  SGT:  Set on Greater Than

     The SGT instruction performs a component-wise comparison of the two
     operands.  Each component of the result vector is 1.0 if the corresponding
     component of the first operands is greater than that of the second, and
     0.0 otherwise.

       tmp0 = VectorLoad(op0);
       tmp1 = VectorLoad(op1);
       result.x = (tmp0.x > tmp1.x) ? 1.0 : 0.0;
       result.y = (tmp0.y > tmp1.y) ? 1.0 : 0.0;
       result.z = (tmp0.z > tmp1.z) ? 1.0 : 0.0;
       result.w = (tmp0.w > tmp1.w) ? 1.0 : 0.0;

     The following special-case rules apply to SGT:

       1. (NaN > <x>) and (<x> > NaN) are FALSE for all <x>.
       2. (-0.0 > +0.0) and (+0.0 > -0.0) are FALSE.


     Section 3.11.5.32,  SIN:  Sine

     The SIN instruction approximates the sine of the angle specified by the
     scalar operand and replicates it to all four components of the result
     vector.  The angle is specified in radians and does not have to be in the
     range [0,2*PI].

       tmp = ScalarLoad(op0);
       result.x = ApproxSine(tmp);
       result.y = ApproxSine(tmp);
       result.z = ApproxSine(tmp);
       result.w = ApproxSine(tmp);

     The approximation function is accurate to at least 22 bits with an angle
     in the range [0,2*PI].

       | ApproxSine(x) - sin(x) | < 1.0 / 2^22, if 0.0 <= x < 2.0 * PI.

     The error in the approximation will typically increase with the absolute
     value of the angle when the angle falls outside the range [0,2*PI].

     The following special-case rules apply to cosine approximation:

       1. ApproxSine(NaN) = NaN.
       2. ApproxSine(+/-INF) = NaN.
       3. ApproxSine(+/-0.0) = +/-0.0.  The sign of the result is equal to the
          sign of the single operand.


     Section 3.11.5.33,  SLE:  Set on Less Than or Equal

     The SLE instruction performs a component-wise comparison of the two
     operands.  Each component of the result vector is 1.0 if the corresponding
     component of the first operand is less than or equal to that of the
     second, and 0.0 otherwise.

       tmp0 = VectorLoad(op0);
       tmp1 = VectorLoad(op1);
       result.x = (tmp0.x <= tmp1.x) ? 1.0 : 0.0;
       result.y = (tmp0.y <= tmp1.y) ? 1.0 : 0.0;
       result.z = (tmp0.z <= tmp1.z) ? 1.0 : 0.0;
       result.w = (tmp0.w <= tmp1.w) ? 1.0 : 0.0;

     The following special-case rules apply to SLE:

       1. (NaN <= <x>) and (<x> <= NaN) are FALSE for all <x>.
       2. (+INF <= +INF) and (-INF <= -INF) are TRUE.
       3. (-0.0 <= +0.0) and (+0.0 <= -0.0) are TRUE.


     Section 3.11.5.34,  SLT:  Set on Less Than

     The SLT instruction performs a component-wise comparison of the two
     operands.  Each component of the result vector is 1.0 if the corresponding
     component of the first operand is less than that of the second, and 0.0
     otherwise.

       tmp0 = VectorLoad(op0);
       tmp1 = VectorLoad(op1);
       result.x = (tmp0.x < tmp1.x) ? 1.0 : 0.0;
       result.y = (tmp0.y < tmp1.y) ? 1.0 : 0.0;
       result.z = (tmp0.z < tmp1.z) ? 1.0 : 0.0;
       result.w = (tmp0.w < tmp1.w) ? 1.0 : 0.0;

     The following special-case rules apply to SLT:

       1. (NaN < <x>) and (<x> < NaN) are FALSE for all <x>.
       2. (-0.0 < +0.0) and (+0.0 < -0.0) are FALSE.


     Section 3.11.5.35,  SNE:  Set on Not Equal

     The SNE instruction performs a component-wise comparison of the two
     operands.  Each component of the result vector is 1.0 if the corresponding
     component of the first operand is not equal to that of the second, and 0.0
     otherwise.

       tmp0 = VectorLoad(op0);
       tmp1 = VectorLoad(op1);
       result.x = (tmp0.x != tmp1.x) ? 1.0 : 0.0;
       result.y = (tmp0.y != tmp1.y) ? 1.0 : 0.0;
       result.z = (tmp0.z != tmp1.z) ? 1.0 : 0.0;
       result.w = (tmp0.w != tmp1.w) ? 1.0 : 0.0;

     The following special-case rules apply to SNE:

       1. (<x> != <y>) and (<y> != <x>) always produce the same result.
       2. (NaN != <x>) is TRUE for all <x>, including NaN.
       3. (+INF != +INF) and (-INF != -INF) are FALSE.
       4. (-0.0 != +0.0) and (+0.0 != -0.0) are TRUE.


     Section 3.11.5.36,  STR:  Set on True

     The STR instruction is a degenerate case of the other "Set on"
     instructions that sets all components of the result vector to 1.0.

       result.x = 1.0;
       result.y = 1.0;
       result.z = 1.0;
       result.w = 1.0;


     Section 3.11.5.37,  SUB:  Subtract

     The SUB instruction performs a component-wise subtraction of the second
     operand from the first to yield a result vector.

       tmp0 = VectorLoad(op0);
       tmp1 = VectorLoad(op1);
       result.x = tmp0.x - tmp1.x;
       result.y = tmp0.y - tmp1.y;
       result.z = tmp0.z - tmp1.z;
       result.w = tmp0.w - tmp1.w;

     The SUB instruction is completely equivalent to an identical ADD
     instruction in which the negate operator on the second operand is
     reversed:

       1. "SUB R0, R1, R2" is equivalent to "ADD R0, R1, -R2".
       2. "SUB R0, R1, -R2" is equivalent to "ADD R0, R1, R2".
       3. "SUB R0, R1, |R2|" is equivalent to "ADD R0, R1, -|R2|".
       4. "SUB R0, R1, -|R2|" is equivalent to "ADD R0, R1, |R2|".


     Section 3.11.5.38,  TEX: Texture Lookup

     The TEX instruction performs a filtered texture lookup using the texture
     target given by <texImageTarget> belonging to the texture image unit given
     by <texImageUnit>.  <texImageTarget> values of "1D", "2D", "3D", "CUBE",
     and "RECT" correspond to the texture targets TEXTURE_1D, TEXTURE_2D,
     TEXTURE_3D, TEXTURE_CUBE_MAP_ARB, and TEXTURE_RECTANGLE_NV, respectively.

     The (s,t,r) texture coordinates used for the lookup are the x, y, and z
     components of the single operand.

     The texture lookup is performed as specified in Section 3.8.  The LOD
     calculations in Section 3.8.5 are performed using an implementation
     dependent method to derive ds/dx, ds/dy, dt/dx, dt/dy, dr/dx, and dr/dy.
     The mapping of filtered texture components to the components of the result
     vector is dependent on the base internal format of the texture and is
     specified in Table X.5.

                                  Result Vector Components
       Base Internal Format        X      Y      Z      W
       --------------------      -----  -----  -----  -----
       ALPHA                      0.0    0.0    0.0    At
       LUMINANCE                  Lt     Lt     Lt     1.0
       LUMINANCE_ALPHA            Lt     Lt     Lt     At
       INTENSITY                  It     It     It     It
       RGB                        Rt     Gt     Bt     1.0
       RGBA                       Rt     Gt     Bt     At
       HILO_NV (signed)           HIt    LOt    HEMI   1.0
       HILO_NV (unsigned)         HIt    LOt    1.0    1.0
       DSDT_NV                    DSt    DTt    0.0    1.0
       DSDT_MAG_NV                DSt    DTt    MAGt   1.0
       DSDT_MAG_INTENSITY_NV      DSt    DTt    MAGt   It
       FLOAT_R_NV                 Rt     0.0    0.0    1.0
       FLOAT_RG_NV                Rt     Gt     0.0    1.0
       FLOAT_RGB_NV               Rt     Gt     Bt     1.0
       FLOAT_RGBA_NV              Rt     Gt     Bt     At

       Table X.5:  Mapping of filtered texel components to result vector
       components for the TEX instruction.  0.0 and 1.0 indicate that the
       corresponding constant value is written to the result vector.
       DEPTH_COMPONENT textures are treated as ALPHA, LUMINANCE, or INTENSITY,
       as specified in the texture's depth texture mode.

       For HILO_NV textures with signed components, "HEMI" is defined as
       sqrt(MAX(0, 1-(HIt^2+LOt^2))).

     This instruction specifies a particular texture target, ignoring the
     standard hierarchy of texture enables (TEXTURE_CUBE_MAP_ARB, TEXTURE_3D,
     TEXTURE_2D, TEXTURE_1D) used to select a texture target in unextended
     OpenGL.  If the specified texture target has a consistent set of images, a
     lookup is performed.  Otherwise, the result of the instruction is the
     vector (0,0,0,0).

     Although this instruction allows the selection of any texture target, a
     fragment program can not use more than one texture target for any given
     texture image unit.


     Section 3.11.5.39,  TXD: Texture Lookup with Derivatives

     The TXD instruction performs a filtered texture lookup using the texture
     target given by <texImageTarget> belonging to the texture image unit given
     by <texImageUnit>.  <texImageTarget> values of "1D", "2D", "3D", "CUBE",
     and "RECT" correspond to the texture targets TEXTURE_1D, TEXTURE_2D,
     TEXTURE_3D, TEXTURE_CUBE_MAP_ARB, and TEXTURE_RECTANGLE_NV, respectively.

     The (s,t,r) texture coordinates used for the lookup are the x, y, and z
     components of the first operand.  The partial derivatives in the X
     direction (ds/dx, dt/dx, dr/dx) are specified by the x, y, and z
     components of the second operand.  The partial derivatives in the Y
     direction (ds/dy, dt/dy, dr/dy) are specified by the x, y, and z
     components of the third operand.

     The texture lookup is performed as specified in Section 3.8.  The LOD
     calculations in Section 3.8.5 are performed using the specified partial
     derivatives.  The mapping of filtered texture components to the components
     of the result vector is dependent on the base internal format of the
     texture and is specified in Table X.5.

     This instruction specifies a particular texture target, ignoring the
     standard hierarchy of texture enables (TEXTURE_CUBE_MAP_ARB, TEXTURE_3D,
     TEXTURE_2D, TEXTURE_1D) used to select a texture target in unextended
     OpenGL.  If the specified texture target has a consistent set of images, a
     lookup is performed.  Otherwise, the result of the instruction is the
     vector (0,0,0,0).

     Although this instruction allows the selection of any texture target, a
     fragment program can not use more than one texture target for any given
     texture image unit.


     Section 3.11.5.40,  TXP: Projective Texture Lookup

     The TXP instruction performs a filtered texture lookup using the texture
     target given by <texImageTarget> belonging to the texture image unit given
     by <texImageUnit>.  <texImageTarget> values of "1D", "2D", "3D", "CUBE",
     and "RECT" correspond to the texture targets TEXTURE_1D, TEXTURE_2D,
     TEXTURE_3D, TEXTURE_CUBE_MAP_ARB, and TEXTURE_RECTANGLE_NV, respectively.

     For cube map textures, the (s,t,r) texture coordinates used for the lookup
     are given by x, y, and z, respectively.  For all other textures, the
     (s,t,r) texture coordinates used for the lookup are given by x/w, y/w, and
     z/w, respectively, where x, y, z, and w are the corresponding components
     of the operand.

     The texture lookup is performed as specified in Section 3.8.  The LOD
     calculations in Section 3.8.5 are performed using an implementation
     dependent method to derive ds/dx, ds/dy, dt/dx, dt/dy, dr/dx, and dr/dy.
     The mapping of filtered texture components to the components of the result
     vector is dependent on the base internal format of the texture and is
     specified in Table X.5.

     This instruction specifies a particular texture target, ignoring the
     standard hierarchy of texture enables (TEXTURE_CUBE_MAP_ARB, TEXTURE_3D,
     TEXTURE_2D, TEXTURE_1D) used to select a texture target in unextended
     OpenGL.  If the specified texture target has a consistent set of images, a
     lookup is performed.  Otherwise, the result of the instruction is the
     vector (0,0,0,0).

     Although this instruction allows the selection of any texture target, a
     fragment program can not use more than one texture target for any given
     texture image unit.


     Section 3.11.5.41,  UP2H:  Unpack Two 16-Bit Floats

     The UP2H instruction unpacks two 16-bit floats stored together in a 32-bit
     scalar operand.  The first 16-bit float (stored in the 16 least
     significant bits) is written into the "x" and "z" components of the result
     vector; the second is written into the "y" and "w" components of the
     result vector.

     This operation undoes the type conversion and packing performed by the
     PK2H instruction.

       tmp = ScalarLoad(op0);
       result.x = (fp16) (RawBits(tmp) & 0xFFFF);
       result.y = (fp16) ((RawBits(tmp) >> 16) & 0xFFFF);
       result.z = (fp16) (RawBits(tmp) & 0xFFFF);
       result.w = (fp16) ((RawBits(tmp) >> 16) & 0xFFFF);

     Since the source operand must be a 32-bit scalar, a fragment program will
     fail to load if the operand is not obtained from a register with 32-bit
     components or from a program parameter.


     Section 3.11.5.42,  UP2US:  Unpack Two Unsigned 16-Bit Scalars

     The UP2US instruction unpacks two 16-bit unsigned values packed together
     in a 32-bit scalar operand.  The unsigned quantities are encoded where a
     bit pattern of all '0' bits corresponds to 0.0 and a pattern of all '1'
     bits corresponds to 1.0.  The "x" and "z" components of the result vector
     are obtained from the 16 least significant bits of the operand; the "y"
     and "w" components are obtained from the 16 most significant bits.

     This operation undoes the type conversion and packing performed by the
     PK2US instruction.

       tmp = ScalarLoad(op0);
       result.x = ((RawBits(tmp) >> 0)  & 0xFFFF) / 65535.0;
       result.y = ((RawBits(tmp) >> 16) & 0xFFFF) / 65535.0;
       result.z = ((RawBits(tmp) >> 0)  & 0xFFFF) / 65535.0;
       result.w = ((RawBits(tmp) >> 16) & 0xFFFF) / 65535.0;

     Since the source operand must be a 32-bit scalar, a fragment program will
     fail to load if the operand is not obtained from a register with 32-bit
     components or from a program parameter.


     Section 3.11.5.43,  UP4B:  Unpack Four Signed 8-Bit Values

     The UP4B instruction unpacks four 8-bit signed values packed together in a
     32-bit scalar operand.  The signed quantities are encoded where a bit
     pattern of all '0' bits corresponds to -128/127 and a pattern of all '1'
     bits corresponds to +127/127.  The "x" component of the result vector is
     the converted value corresponding to the 8 least significant bits of the
     operand; the "w" component corresponds to the 8 most significant bits.

     This operation undoes the type conversion and packing performed by the
     PK4B instruction.

       tmp = ScalarLoad(op0);
       result.x = (((RawBits(tmp) >> 0) & 0xFF) - 128) / 127.0;
       result.y = (((RawBits(tmp) >> 8) & 0xFF) - 128) / 127.0;
       result.z = (((RawBits(tmp) >> 16) & 0xFF) - 128) / 127.0;
       result.w = (((RawBits(tmp) >> 24) & 0xFF) - 128) / 127.0;

     Since the source operand must be a 32-bit scalar, a fragment program will
     fail to load if the operand is not obtained from a register with 32-bit
     components or from a program parameter.


     Section 3.11.5.44,  UP4UB:  Unpack Four Unsigned 8-Bit Scalars

     The UP4UB instruction unpacks four 8-bit unsigned values packed together
     in a 32-bit scalar operand.  The unsigned quantities are encoded where a
     bit pattern of all '0' bits corresponds to 0.0 and a pattern of all '1'
     bits corresponds to 1.0.  The "x" component of the result vector is
     obtained from the 8 least significant bits of the operand; the "w"
     component is obtained from the 8 most significant bits.

     This operation undoes the type conversion and packing performed by the
     PK4UB instruction.

       tmp = ScalarLoad(op0);
       result.x = ((RawBits(tmp) >> 0)  & 0xFF) / 255.0;
       result.y = ((RawBits(tmp) >> 8)  & 0xFF) / 255.0;
       result.z = ((RawBits(tmp) >> 16) & 0xFF) / 255.0;
       result.w = ((RawBits(tmp) >> 24) & 0xFF) / 255.0;

     Since the source operand must be a 32-bit scalar, a fragment program will
     fail to load if the operand is not obtained from a register with 32-bit
     components or from a program parameter.


     Section 3.11.5.45,  X2D:  2D Coordinate Transformation

     The X2D instruction multiplies the 2D offset vector specified by the "x"
     and "y" components of the second vector operand by the 2x2 matrix
     specified by the four components of the third vector operand, and adds the
     transformed offset vector to the 2D vector specified by the "x" and "y"
     components of the first vector operand.  The first component of the sum is
     written to the "x" and "z" components of the result; the second component
     is written to the "y" and "w" components of the result.

     The X2D instruction can be used to displace texture coordinates in the
     same manner as the OFFSET_TEXTURE_2D_NV mode in the GL_NV_texture_shader
     extension.

       tmp0 = VectorLoad(op0);
       tmp1 = VectorLoad(op1);
       tmp2 = VectorLoad(op2);
       result.x = tmp0.x + tmp1.x * tmp2.x + tmp1.y * tmp2.y;
       result.y = tmp0.y + tmp1.x * tmp2.z + tmp1.y * tmp2.w;
       result.z = tmp0.x + tmp1.x * tmp2.x + tmp1.y * tmp2.y;
       result.w = tmp0.y + tmp1.x * tmp2.z + tmp1.y * tmp2.w;


     Section 3.11.6, Fragment Program Outputs

     Upon completion of fragment program execution, the output registers are
     used to replace the fragment's associated data.

     The RGBA color of the fragment is taken from the color output register
     used by the program (COLR or COLH).  The R, G, B, and A color components
     are extracted from the "x", "y", "z", and "w" components, respectively, of
     the output register and are clamped to the range [0,1].

     If the DEPR output register is written by the fragment program, the depth
     value of the fragment is taken from the z component of the DEPR output
     register.  If depth clamping is enabled, the depth value is clamped to the
     range [min(n,f), max(n,f)], where n and f are the near and far depth range
     values.  If depth clamping is disabled, the fragment is discarded if its
     depth value is outside the range [min(n,f), max(n,f)].


     Section 3.11.7, Required Fragment Program State

     The state required for managing fragment programs consists of:

       a bit indicating whether or not fragment program mode is enabled;

       an unsigned integer naming the currently bound fragment program

       and the state that must be maintained to indicate which integers are
       currently in use as fragment program names.

     Fragment program mode is initially disabled.  The initial state of all 128
     fragment program parameter registers is (0,0,0,0).  The initial currently
     bound fragment program is zero.

     Each fragment program object consists of:

       an enumerant given the program target (FRAGMENT_PROGRAM_NV);

       a boolean indicating whether the program is resident;

       an array of type ubyte containing the program string;

       an integer representing the length of the program string array;

       one four-component floating-point vector for each named local
       parameter in the program;

       and a set of MAX_FRAGMENT_PROGRAM_LOCAL_PARAMETERS_NV four-component
       floating-point vectors to hold numbered local parameters, each initially
       set to (0,0,0,0).

     Initially, no program objects exist.

     Additionally, the state required during the execution of a fragment
     program consists of:  twelve 4-component floating-point fragment attribute
     registers, thirty-two 128-bit physical temporary registers, and a single
     4-component condition code, whose components have one of four values (LT,
     EQ, GT, or UN).

     Each time a fragment program is executed, the fragment attribute registers
     are initialized with the fragment's location and associated data, all
     temporary register components are initialized to zero, and all condition
     code components are initialized to EQ.


     Renumber Section 3.11 to Section 3.12, Antialiasing Application (p.140).
     No changes to the text of the section.


 Additions to Chapter 4 of the OpenGL 1.2.1 Specification (Per-Fragment
 Operations and the Framebuffer)

     None

 Additions to Chapter 5 of the OpenGL 1.2.1 Specification (Special Functions)

     Add new section 5.7, Programs (after "Flush and Finish")

     Programs are specified as an array of ubytes used to control the operation
     of portions of the GL.  The array is a string of ASCII characters encoding
     the program.

     The command

       LoadProgramNV(enum target, uint id, sizei len, const ubyte *program);

     loads a program.  The target parameter specifies the type of program
     loaded and can be VERTEX_PROGRAM_NV, VERTEX_STATE_PROGRAM_NV, or
     FRAGMENT_PROGRAM_NV.  VERTEX_PROGRAM_NV specifies a program to be executed
     in vertex program mode as each vertex is specified.  VERTEX_STATE_PROGRAM
     specifies a program to be run manually to update vertex state.
     FRAGMENT_PROGRAM specifies a program to be executed in fragment program
     mode as each fragment is rasterized.

     Multiple programs can be loaded with different names.  id names the
     program to load.  The name space for programs is the set of positive
     integers (zero is reserved).  The error INVALID_VALUE is generated by
     LoadProgramNV if a program is loaded with an id of zero.  The error
     INVALID_OPERATION is generated by LoadProgramNV or if a program is loaded
     for an id that is currently loaded with a program of a different program
     target.  program is a pointer to an array of ubytes that represents the
     program being loaded.  The length of the array in ubytes is indicated by
     len.

     At program load time, the program is parsed into a set of tokens possibly
     separated by white space.  Spaces, tabs, newlines, carriage returns, and
     comments are considered whitespace.  Comments begin with the character "#"
     and are terminated by a newline, a carriage return, or the end of the
     program array.  Tokens are processed in a case-sensitive manner:  upper
     and lower-case letters are not considered equivalent.

     Each program target has a corresponding Backus-Naur Form (BNF) grammar
     specifying the syntactically valid sequences for programs of the specified
     type.  The set of valid tokens can be inferred from the grammar.  The
     token "" represents an empty string and is used to indicate optional
     rules.  A program is invalid if it contains any undefined tokens or
     characters.

     The error INVALID_OPERATION is generated by LoadProgramNV if a program
     fails to load because it is not syntactically correct or fails to satisfy
     all of the semantic restrictions corresponding to the program target.

     A successfully loaded program is parsed into a sequence of instructions.
     Each instruction is identified by its tokenized name.  The operation of
     these instructions is specific to the program target and is defined
     elsewhere.

     A successfully loaded program replaces the program previously assigned to
     the name specified by id.  If the OUT_OF_MEMORY error is generated by
     LoadProgramNV, no change is made to the previous contents of the named
     program.

     Querying the value of PROGRAM_ERROR_POSITION_NV returns a ubyte offset
     into the program string most recently passed to LoadProgramNV indicating
     the position of the first error, if any, in the program.  If the program
     fails to load because of a semantic restriction that cannot be determined
     until the program is fully scanned, the error position will be len, the
     length of the program.  If the program loads successfully, the value of
     PROGRAM_ERROR_POSITION_NV is assigned the value negative one.

     For targets whose programs are executed automatically (e.g., vertex and
     fragment programs), there must be a current program.  The current vertex
     program is executed automatically in vertex program mode as vertices are
     specified.  The current fragment program is executed automatically in
     fragment program mode as fragments are generated by rasterization.
     Current programs for a program target are updated by

       BindProgramNV(enum target, uint id);

     where target must be VERTEX_PROGRAM_NV or FRAGMENT_PROGRAM_NV.  The error
     INVALID_OPERATION is generated by BindProgramNV if id names a program that
     has a type different than target (for example, if id names a vertex state
     program as described in section 2.14.4).

     Binding to a nonexistent program id does not generate an error.  In
     particular, binding to program id zero does not generate an error.
     However, because program zero cannot be loaded, program zero is always
     nonexistent.  If a program id is successfully loaded with a new vertex
     program and id is also the currently bound vertex program, the new program
     is considered the currently bound vertex program.

     The INVALID_OPERATION error is generated when both vertex program mode is
     enabled and Begin is called (or when a command that performs an implicit
     Begin is called) if the current vertex program is nonexistent or not
     valid.  A vertex program may not be valid for reasons explained in section
     2.14.5.

     The INVALID_OPERATION error is generated when both fragment program mode
     is enabled and Begin, another GL command that performs an implicit Begin,
     or any other GL command that generates fragments is called, if the current
     fragment program is nonexistent or not valid.  A fragment program may be
     invalid for reasons explained in Section 3.11.3.

     Programs are deleted by calling

       void DeleteProgramsNV(sizei n, const uint *ids);

     ids contains n names of programs to be deleted.  After a program is
     deleted, it becomes nonexistent, and its name is again unused.  If a
     program that is currently bound is deleted, it is as though BindProgramNV
     has been executed with the same target as the deleted program and program
     zero.  Unused names in ids are silently ignored, as is the value zero.

     The command

       void GenProgramsNV(sizei n, uint *ids);

     returns n currently unused program names in ids.  These names are marked
     as used, for the purposes of GenProgramsNV only, but they become existent
     programs only when the are first loaded using LoadProgramNV.

     An implementation may choose to establish a working set of programs on
     which binding and/or manual execution are performed with higher
     performance.  A program that is currently part of this working set is said
     to be resident.

     The command

       boolean AreProgramsResidentNV(sizei n, const uint *ids,
                                     boolean *residences);

     returns TRUE if all of the n programs named in ids are resident, or if the
     implementation does not distinguish a working set.  If at least one of the
     programs named in ids is not resident, then FALSE is returned, and the
     residence of each program is returned in residences.  Otherwise the
     contents of residences are not changed.  If any of the names in ids are
     nonexistent or zero, FALSE is returned, the error INVALID_VALUE is
     generated, and the contents of residences are indeterminate.  The
     residence status of a single named program can also be queried by calling
     GetProgramivNV (Section 6.1.13) with id set to the name of the program and
     pname set to PROGRAM_RESIDENT_NV.

     AreProgramsResidentNV indicates only whether a program is currently
     resident, not whether it could not be made resident.  An implementation
     may choose to make a program resident only on first use, for example.  The
     client may guide the GL implementation in determining which programs
     should be resident by requesting a set of programs to make resident.

     The command

       void RequestResidentProgramsNV(sizei n, const uint *ids);

     requests that the n programs named in ids should be made resident.
     While all the programs are not guaranteed to become resident,
     the implementation should make a best effort to make as many of
     the programs resident as possible.  As a result of making the
     requested programs resident, program names not among the requested
     programs may become non-resident.  Higher priority for residency
     should be given to programs listed earlier in the ids array.
     RequestResidentProgramsNV silently ignores attempts to make resident
     nonexistent program names or zero.  AreProgramsResidentNV can be
     called after RequestResidentProgramsNV to determine which programs
     actually became resident.

     The commands

       void ProgramNamedParameter4fNV(uint id, sizei len, const ubyte *name,
                                      float x, float y, float z, float w);
       void ProgramNamedParameter4dNV(uint id, sizei len, const ubyte *name,
                                      double x, double y, double z, double w);
       void ProgramNamedParameter4fvNV(uint id, sizei len, const ubyte *name,
                                       const float v[]);
       void ProgramNamedParameter4dvNV(uint id, sizei len, const ubyte *name,
                                       const double v[]);

     specify a new value for the named program local parameter <name> belonging
     to the fragment program specified by <id>.  <name> is a pointer to an
     array of ubytes holding the parameter name.  <len> specifies the number of
     ubytes in the array given by <name>.  The new x, y, z, and w components of
     the named local parameter are given by x, y, z, and w, respectively, for
     ProgramNamedParameter4fNV and ProgramNamedParameter4dNV, and by v[0],
     v[1], v[2], and v[3], respectively, for ProgramNamedParameter4fvNV and
     ProgramNamedParameter4dvNV.  The error INVALID_OPERATION is generated if
     <id> specifies a nonexistent program or a program whose type does not
     suport named local parameters.  The error INVALID_VALUE error is generated
     if <name> does not specify the name of a local parameter in the program
     corresponding to <id>.  The error INVALID_VALUE is also generated if <len>
     is zero.

     The commands

       void ProgramLocalParameter4fARB(enum target, uint index,
                                       float x, float y, float z, float w);
       void ProgramLocalParameter4fvARB(enum target, uint index,
                                        const float *params);
       void ProgramLocalParameter4dARB(enum target, uint index,
                                       double x, double y, double z, double w);
       void ProgramLocalParameter4dvARB(enum target, uint index,
                                        const double *params);

     update the values of the numbered program local parameter <index>
     belonging to the program object currently bound to <target>.  For
     ProgramLocalParameter4fARB and ProgramLocalParameter4dARB, the four
     components of the parameter are updated with the values of <x>, <y>, <z>,
     and <w>, respectively.  For ProgramLocalParameter4fvARB and
     ProgramLocalParameter4dvARB, the four components of the parameter are
     updated with the array of four values pointed to by <params>.  The error
     INVALID_VALUE is generated if <index> is greater than or equal to the
     number of numbered program local parameters supported by <target>.


 Additions to Chapter 6 of the OpenGL 1.2.1 Specification (State and
 State Requests)

     Modify Section 6.1.11, Pointer and String Queries (p. 206)

     (modify last paragraph, p. 206) ... The possible values for <name> are
     VENDOR, RENDERER, VERSION, EXTENSIONS, and PROGRAM_ERROR_STRING_NV.

     (add after last paragraph of section, p. 207) Queries of
     PROGRAM_ERROR_STRING_NV return a pointer to an implementation-dependent
     program load error string.  If the last call to LoadProgramNV failed to
     load a program, the returned string describes a reason that the program
     failed to load.  Otherwise, a pointer to an empty string (containing only
     a terminator) is returned.

     Rename and modify Section 6.1.13, Vertex and Fragment Program Queries
     (from GL_NV_fragment_program).  Portions of this section pertaining to
     fragment programs are copied verbatim.

     (insert after discussion of GetProgramParameter[fd]vNV)

     The commands

       void GetProgramNamedParameterfvNV(uint id, sizei len,
                                         const ubyte *name, float *params);
       void GetProgramNamedParameterdvNV(uint id, sizei len,
                                         const ubyte *name, double *params);

     obtain the current program named local parameter value for the parameter
     named <name> belonging to the program given by <id>.  <name> is a pointer
     to an array of ubytes holding the parameter name.  <len> specifies the
     number of ubytes in the array given by <name>.  The error
     INVALID_OPERATION is generated if <id> specifies a nonexistent program or
     a program whose type does not suport named local parameters.  The error
     INVALID_VALUE is generated if <name> does not specify the name of a local
     parameter in the program corresponding to <id>.  The error INVALID_VALUE
     is also generated if <len> is zero.  Each named program local parameter is
     an array of four values.

     The commands

       void GetProgramLocalParameterdvARB(enum target, uint index,
                                          double *params);
       void GetProgramLocalParameterfvARB(enum target, uint index,
                                          float *params);

     obtain the current value for the numbered program local parameter <index>
     belonging to the program object currently bound to <target>, and places
     the information in the array <params>.  The error INVALID_ENUM is
     generated if <target> specifies a nonexistent program target or a program
     target that does not support numbered program local parameters.  The error
     INVALID_VALUE is generated if <index> is greater than or equal to the
     implementation-dependent number of supported numbered program local
     parameters for the program target.

     When the program target type is FRAGMENT_PROGRAM_NV, each numbered program
     local parameter returned is an array of four values.  ...

     The command

       void GetProgramivNV(uint id, enum pname, int *params);

     obtains program state named by pname for the program named id in the array
     params.  pname must be one of PROGRAM_TARGET_NV, PROGRAM_LENGTH_NV, or
     PROGRAM_RESIDENT_NV.  The error INVALID_OPERATION is generated if the
     program named id does not exist.

     The command

       void GetProgramStringNV(uint id, enum pname,
                               ubyte *program);

     obtains the program string for program id.  pname must be
     PROGRAM_STRING_NV.  n ubytes are returned into the array program
     where n is the length of the program in ubytes.  GetProgramivNV with
     PROGRAM_LENGTH_NV can be used to query the length of a program's
     string.  The INVALID_OPERATION error is generated if the program
     named id does not exist.

     ...

     The command

       boolean IsProgramNV(uint id);

     returns TRUE if program is the name of a program object.  If program
     is zero or is a non-zero value that is not the name of a program
     object, or if an error condition occurs, IsProgramNV returns FALSE.
     A name returned by GenProgramsNV but not yet loaded with a program
     is not the name of a program object."


 Additions to Appendix F of the OpenGL 1.2.1 Specification (ARB Extensions)

     Modify Section F.2.3 (Changes to Section 2.6), p.240

     (modify last paragraph on p.240) ... Multiple sets of texture coordinates
     may be used to specify how multiple texture images are mapped onto a
     primitive.  The number of texture coordinate sets supported is
     implementation dependent, but must be at least 1.  The number of texture
     coordinate sets supported may be queried with the state
     MAX_TEXTURE_COORDS_NV.

     Modify Section F.2.4 (Changes to Section 2.7), p.241

     (modify the last paragraph on p.241, carrying over to p.243)
     Implementations may support more than one set of texture coordinates.  The
     commands

         void MultiTexCoord{1234}{sifd}ARB(enum texture, T coords)
         void MultiTexCoord{1234}{sifd}vARB(enum texture, T coords)

     take the coordinate set to be modified as the <texture> parameter.
     <texture> is a symbolic constant of the form TEXTUREi_ARB, indicating that
     texture coordinate set i is to be modified.  The constants obey
     TEXTUREi_ARB = TEXTURE0_ARB + i (i is in the range 0 to k-1, where k is
     the implementation dependent number of texture units defined by
     MAX_TEXTURE_COORDS_NV).


     Modify Section F.2.5 (Changes to Section 2.8), p.243

     (modify first and second paragraphs of section) ... The client may specify
     up to 5 plus the value of MAX_TEXTURE_COORDS_NV arrays; one each to store
     vertex coordinates...

     In implementations which support more than one texture coordinate set, the
     command

         void ClientActiveTextureARB(enum texture)

     is used to select the vertex array client state parameters to be modified
     by the TexCoordPointer command and the array affected by EnableClientState
     and DisableClientState with the parameter TEXTURE_COORD_ARRAY.  This
     command sets the state variable CLIENT_ACTIVE_TEXTURE_ARB.  Each texture
     coordinate set has a client state vector which is selected when this
     command is invoked.  This state vector also includes the vertex array
     state.  This command also selects the texture coordinate set state used
     for queries of client state.

     (modify first paragraph on p.244) If the number of supported texture
     coordinate sets (the value of MAX_TEXTURE_COORDS_NV) is k, ...


     Modify Section F.2.6 (Changes to Section 2.10.2), p.244

     (modify first paragraph)  For each texture coordinate set, a 4x4 matrix is
     applied to the corresponding texture coordinates...

     (replace second and third paragraphs) The command

       void ActiveTextureARB(enum texture);

     specifies the active texture unit selector, ACTIVE_TEXTURE_ARB.  Each
     texture unit contains up to two distinct sub-units:  a texture coordinate
     processing unit (consisting of a texture matrix stack and texture
     coordinate generation state) and a texture image unit (consisting of all
     the texture state defined in Section 3.8).  In implementations with a
     different number of supported texture coordinate sets and texture image
     units, some texture units may consist of only one of the two sub-units.

     The active texture unit selector specifies the texture unit accessed by
     commands involving texture coordinate processing.  Such commands include
     those accessing the current matrix stack (if MATRIX_MODE is TEXTURE),
     TexGen (Section 2.10.4), Enable/Disable (if any texture coordinate
     generation enum is selected), as well as queries of the current texture
     coordinates and current raster texture coordinates.  If the texture unit
     number corresponding to the current value of ACTIVE_TEXTURE_ARB is greater
     than or equal to the implementation dependent constant
     MAX_TEXTURE_COORDS_NV, the error INVALID_OPERATION is generated by any
     such command.

     The active texture unit selector also selects the texture unit accessed by
     commands involving texture image processing (Section 3.8).  Such commands
     include all variants of TexEnv, TexParameter, and TexImage commands,
     BindTexture, Enable/Disable for any texture target (e.g., TEXTURE_2D), and
     queries of all such state.  If the texture unit number corresponding to
     the current value of ACTIVE_TEXTURE_ARB is greater than or equal to the
     implementation dependent constant MAX_TEXTURE_IMAGE_UNITS_NV, the error
     INVALID_OPERATION is generated by any such command.

     ActiveTextureARB generates the error INVALID_ENUM if an invalid <texture>
     is specified.  <texture> is a symbolic constant of the form TEXTUREi_ARB,
     indicating that texture unit i is to be modified.  The constants obey
     TEXTUREi_ARB = TEXTURE0_ARB + i (i is in the range 0 to k-1, where k is
     the larger of the MAX_TEXTURE_COORDS_NV and MAX_TEXTURE_IMAGE_UNITS_NV).
     For compatibility with old OpenGL specifications, the implementation
     dependent constant MAX_TEXTURE_UNITS_ARB specifies the number of
     conventional texture units supported by the implementation.  Its value
     must be no larger than the minimum of MAX_TEXTURE_COORDS_NV and
     MAX_TEXTURE_IMAGE_UNITS_NV.

     Modify Section F.2.12 (Changes to Section 3.8.10), p.249

     (modify next-to-last paragraph) Texturing is enabled and disabled
     individually for each texture unit.  If texturing is disabled for one of
     the units, then the fragment resulting from the previous unit is passed
     unaltered to the following unit.  Individual texture units beyond those
     specified by MAX_TEXTURE_UNITS_ARB may be incomplete and are always
     treated as disabled.

     Modify Section F.2.15 (Changes to Section 6.1.2), p.251

     (add to end of paragraph) Queries of texture state variables corresponding
     to texture coordinate processing unit (namely, TexGen state and enables,
     and matrices) will produce an INVALID_OPERATION error if the value of
     ACTIVE_TEXTURE_ARB is greater than or equal to MAX_TEXTURE_COORDS_NV.  All
     other texture state queries will result in an INVALID_OPERATION error if
     the value of ACTIVE_TEXTURE_ARB is greater than or equal to
     MAX_TEXTURE_IMAGE_UNITS_NV.

 Additions to the AGL/GLX/WGL Specifications

     Program objects are shared between AGL/GLX/WGL rendering contexts if
     and only if the rendering contexts share display lists.  No change
     is made to the AGL/GLX/WGL API.

 Dependencies on GL_NV_vertex_program

     If NV_vertex_program is supported, the description of LoadProgramNV in
     Section 2.14.1.7 (up to the BNF description of vertex programs) is
     deleted, as it is replaced by the contents of Section 5.7 in this
     specification.  The general error descriptions in Section 2.14.1.7 common
     to Section 5.7 (like INVALID_OPERATION if the program fails to compile)
     should also be deleted.  Section 2.14.1.8 should also be deleted.  Section
     6.1.13 is modified by this specification as described above.

 Dependencies on NV_texture_shader

     If NV_texture_shader is not supported, the comment about texture shaders
     being disabled in fragment program mode is not applicable.

 Dependencies on NV_texture_rectangle

     If NV_texture_rectangle is not supported, the references to "RECT" in the
     <texImageTarget> grammar rule and TEXTURE_RECTANGLE_NV are not applicable.

 Dependencies on ARB_texture_cube_map

     If ARB_texture_cube_map is not supported, the references to "CUBE" in the
     <texImageTarget> grammar rule and TEXTURE_CUBE_MAP_ARB are not applicable.

 Dependencies on EXT_fog_coord

     If EXT_fog_coord is not supported, references to "fog coordinate" in the
     definition of the "FOGC" fragment attribute register should be removed.

 Dependencies on NV_depth_clamp

     If NV_depth_clamp is not supported, section 3.11.6 is modified to remove
     discussion of the depth clamp enable and instead indicate that fragments
     with depth values outside [min(n,f), max(n,f)] are always discarded.

 Dependencies on ARB_depth_texture and SGIX_depth_texture

     If ARB_depth_texture is not supported, but SGIX_depth_texture is
     supported, the discussion of Table X.5 is modified to indicate that
     DEPTH_COMPONENT textures are treated as LUMINANCE.

     If neither extension is supported, the discussion of DEPTH_COMPONENT
     textures in Table X.5 should be removed.

 Dependencies on NV_float_buffer

     If NV_float_buffer is not supported, references to FLOAT_R_NV,
     FLOAT_RG_NV, FLOAT_RGB_NV, and FLOAT_RGBA_NV internal texture formats in
     Table X.5 should be removed.

 Dependencies on ARB_vertex_program

     This extension does not have any explicit dependencies, but the APIs for
     setting and querying numbered local parameters (ProgramLocalParameter*ARB
     and GetProgramLocalParameter*ARB) were taken directly from this extension,

 Dependencies on ARB_fragment_program

     If ARB_fragment_program is not supported, the maximum number of executable
     instructions in any !!FP1.0 program is 1024.  If ARB_fragment_program is
     supported, the maximum number of executable instructions for an !!FP1.0 is
     at least 1024, but can be larger.  The limit can be queried by calling
     GetProgramiv with <target> set to FRAGMENT_PROGRAM_ARB and <pname> set to
     MAX_PROGRAM_INSTRUCTIONS_ARB.


 GLX Protocol

     Most of the GLX protocol needed to implement this extension is described
     in the GL_NV_vertex_program extension specification and will not be
     repeated here.

     The following two rendering commands are potentially large, and hence can
     be sent in a glXRender or glXRenderLarge request.

         ProgramNamedParameter4fvNV
             2           28+len+p        rendering command length
             2           4218            rendering command opcode
             4           CARD32          id
             4           CARD32          len
             4           FLOAT32         params[0]
             4           FLOAT32         params[1]
             4           FLOAT32         params[2]
             4           FLOAT32         params[3]
             len         LISTofCARD8     name
             p                           unused, p=pad(len)

          If the command is encoded in a glxRenderLarge request, the command
          opcode and command length fields above are expanded to 4 bytes each:

             4           32+len+p        rendering command length
             4           4218            rendering command opcode


         ProgramNamedParameter4dvNV
             2           44+len+p        rendering command length
             2           4219            rendering command opcode
             4           CARD32          id
             4           CARD32          len
             8           FLOAT64         params[0]
             8           FLOAT64         params[1]
             8           FLOAT64         params[2]
             8           FLOAT64         params[3]
             len         LISTofCARD8     name
             p                           unused, p=pad(len)

          If the command is encoded in a glxRenderLarge request, the command
          opcode and command length fields above are expanded to 4 bytes each:

             4           48+len+p        rendering command length
             4           4219            rendering command opcode


     The remaining two commands are non-rendering commands.  These commands are
     sent separately (i.e., not as part of a glXRender or glXRenderLarge
     request), using the glXVendorPrivateWithReply request:

         GetProgramNamedParameterfvNV
             1           CARD8           opcode (X assigned)
             1           17              GLX opcode (glXVendorPrivateWithReply)
             2           4+(len+p)/4     request length
             4           1310            vendor specific opcode
             4           GLX_CONTEXT_TAG context tag
             4           INT32           len
             len         LISTofCARD8     name
             p                           unused, p=pad(len)
           =>

           If the command succeeds, 4 floats are sent in the reply:

             1           1               reply
             1                           unused
             2           CARD16          sequence number
             4           4               reply length
             24                          unused
             16          LISTofFLOAT32   params

           Otherwise, an empty reply is sent, indicating that a GL error
           occured:

             1           1               reply
             1                           unused
             2           CARD16          sequence number
             4           0               reply length
             24                          unused


         GetProgramNamedParameterdvNV
             1           CARD8           opcode (X assigned)
             1           17              GLX opcode (glXVendorPrivateWithReply)
             2           4+(len+p)/4     request length
             4           1311            vendor specific opcode
             4           GLX_CONTEXT_TAG context tag
             4           INT32           len
             len         LISTofCARD8     name
             p                           unused, p=pad(len)
           =>

           If the command succeeds, 4 doubles are sent in the reply:

             1           1               reply
             1                           unused
             2           CARD16          sequence number
             4           8               reply length
             24                          unused
             32          LISTofFLOAT64   params

           Otherwise, an empty reply is sent, indicating that a GL error
           occured:

             1           1               reply
             1                           unused
             2           CARD16          sequence number
             4           0               reply length
             24                          unused


 Errors

     INVALID_OPERATION is generated by Begin, DrawPixels, Bitmap, CopyPixels,
     or a command that performs an explicit Begin if FRAGMENT_PROGRAM_NV is
     enabled and the currently bound fragment program does not exist.

     INVALID_OPERATION is generated by ProgramNamedParameter4fNV,
     ProgramNamedParameter4dNV, ProgramNamedParameter4fvNV,
     ProgramNamedParameter4dvNV, GetProgramNamedParameterfvNV, or
     GetProgramNamedParameterdvNV if <id> specifies a nonexistent program or a
     program whose type does not suport local parameters.

     INVALID_VALUE is generated by ProgramNamedParameter4fNV,
     ProgramNamedParameter4dNV, ProgramNamedParameter4fvNV,
     ProgramNamedParameter4dvNV, GetProgramNamedParameterfvNV, or
     GetProgramNamedParameterdvNV if <len> is zero.

     INVALID_VALUE is generated by ProgramNamedParameter4fNV,
     ProgramNamedParameter4dNV, ProgramNamedParameter4fvNV,
     ProgramNamedParameter4dvNV, GetProgramNamedParameterfvNV, or
     GetProgramNamedParameterdvNV if <name> does not specify the name of a
     local parameter in the program corresponding to <id>.

     INVALID_OPERATION is generated by any command accessing texture coordinate
     processing state if the texture unit number corresponding to the current
     value of ACTIVE_TEXTURE_ARB is greater than or equal to the implementation
     dependent constant MAX_TEXTURE_COORDS_NV.

     INVALID_OPERATION is generated by any command accessing texture image
     processing state if the texture unit number corresponding to the current
     value of ACTIVE_TEXTURE_ARB is greater than or equal to the implementation
     dependent constant MAX_TEXTURE_IMAGE_UNITS_NV.


     (The following are error descriptions copied from GL_NV_vertex_program
      that apply to this extension as well.  These modifications do not affect
      the behavior of that extension.)

     INVALID_VALUE is generated by LoadProgramNV if id is zero.

     INVALID_OPERATION is generated by LoadProgramNV if the program
     corresponding to id is currently loaded but has a program type different
     from that given by target.

     INVALID_OPERATION is generated by LoadProgramNV if the program specified
     is syntactically incorrect for the program type specified by target.  The
     value of PROGRAM_ERROR_POSITION_NV is still updated when this error is
     generated.

     INVALID_OPERATION is generated by LoadProgramNV if the program specified
     fails to conform to any of the semantic restrictions imposed on programs
     of the type specified by target.  The value of PROGRAM_ERROR_POSITION_NV
     is still updated when this error is generated.

     INVALID_OPERATION is generated by BindProgramNV if target does not match
     the type of the program named by id.

     INVALID_VALUE is generated by AreProgramsResidentNV if any of the queried
     programs are zero or do not exist.

     INVALID_OPERATION is generated by GetProgramivNV or GetProgramStringNV if
     the program named id does not exist.


 New State

 Get Value                          Type  Get Command              Initial Value  Description         Section   Attribute
 ---------------------------------  ----  -----------------------  -------------  ------------------  --------  ------------
 FRAGMENT_PROGRAM_NV                B     IsEnabled                FALSE          fragment program    3.11      enable
                                                                                  mode enable
 FRAGMENT_PROGRAM_BINDING_NV        Z+    GetIntegerv              0              bound fragment      5.7       -
                                                                                  program

 Table X.6.  New State Introduced by NV_fragment_program.


 Get Value                  Type    Get Command          Initial Value  Description         Section   Attribute
 -------------------------  ------  ------------------   -------------  ------------------  --------  ---------
 PROGRAM_ERROR_POSITION_NV  Z       GetIntegerv          -1             program error       5.7       -
                                                                        position
 PROGRAM_TARGET_NV          Z2      GetProgramivNV       0              program target      6.1.13    -
 PROGRAM_LENGTH_NV          Z+      GetProgramivNV       0              program length      6.1.13    -
 PROGRAM_RESIDENT_NV        Z2      GetProgramivNV       False          program residency   6.1.13    -
 PROGRAM_STRING_NV          ubxn    GetProgramStringNV   ""             program string      6.1.13    -
 -                          nxR4    GetProgramNamed-     (0,0,0,0)      named program local 5.7       -
                                    ParameterNV                         parameter value
 -                          64+xR4  GetProgramLocal-     (0,0,0,0)      numbered program    5.7       -
                                    ParameterARB                        local parameter

 Table X.7.  Program Object State common to NV_vertex_program and NV_fragment_program.


 Get Value    Type    Get Command   Initial Value  Description               Section   Attribute
 ---------    ------  -----------   -------------  -----------------------   --------  ---------
 -            12xR4   -             fragment data  fragment attribute
                                                   registers                 3.11.1.1  -
 -            16xR4   -             (0,0,0,0)      fp32 temporary registers  3.11.1.2  -
 -            32xR4   -             (0,0,0,0)      fp16 temporary registers  3.11.1.2  -
              (Z_4)4  -             (EQ,EQ,EQ,EQ)  condition code register   3.11.1.4  -
                                                   address register

 Table X.8.  Fragment Program Per-Fragment Execution State.


 New Implementation Dependent State

                                                  Minimum
 Get Value                   Type   Get Command    Value       Description    Section  Attribute
 ---------                   ----   -----------   -------  -----------------  -------  ---------
 MAX_TEXTURE_COORDS_NV       Z+     GetIntegerv      2     number of texture  2.6      -
                                                           coordinate sets
                                                           supported
 MAX_TEXTURE_IMAGE_UNITS_NV  Z+     GetIntegerv      2     number of texture  2.10.2   -
                                                           image units
                                                           supported
 MAX_FRAGMENT_PROGRAM_       Z+     GetIntegerv     64     number of numbered 3.11.7   -
   LOCAL_PARAMETERS_NV                                     local parameters
                                                           supported


 Revision History

     Rev.    Date    Author   Changes
     ----  -------- --------  --------------------------------------------
      73   05/23/05  pbrown   Fixed cut-and-paste error in the dependency
                              section where it said "NV_texture_rectangle"
                              instead of "ARB_texture_cube_map".

      72   05/16/04  pbrown   Documented that it's not possible to results from
                              LG2 that are any more precise than what is
                              available in the fp32 storage format.

      71   04/23/04  pbrown   Fixed incorrect example.

      70   03/20/03  pbrown   Made the instruction count limit for !!FP1.0
                              programs queryable instead of a hard-wired value
                              of 1024.  The limit can be queried using
                              ARB_fragment_program mechanisms, and remains 1024
                              if ARB_fragment_program is unsupported.

      69   02/01/03  pbrown   Removed support for combiner fragment programs
                              (!!FCP1.0).

      68   01/08/03  pbrown   Correct spec language providing examples of NaNs,
                              such as sqrt(-1) or log(-1).  Division by zero
                              produces an infinity, not a NaN.

      67   12/23/02  pbrown   Fix incorrect syntax of examples of "KIL"
                              instruction. The condition code test is not
                              parenthesized in KIL.

      66   10/31/02  pbrown   Cleaned up special cases of POW, including the
                              fact that "POW dst, 0, 0" produces NaN in this
                              spec, not 1.0.

      65   10/28/02  pbrown   Documented that signed HILO textures will have
                              the hemisphere remapping applied, but unsigned
                              textures will not.

      64   09/17/02  pbrown   Minor typo fixes.

      63   08/14/02  pbrown   Clarified the value of the "other" components
                              of f[FOGC].

      62   07/24/02  pbrown   Removed PK4UBG and UP4UBG instructions.
                              Simplified the implementation of the temporary
                              and output register limit for combiner
                              programs by counting all four o[TEXn] registers
                              against the limit, whether or not they are
                              written.

      61   07/19/02  pbrown   Renamed ProgramLocalParameter*NV to
                              ProgramNamedParameter*NV to eliminate naming
                              conflicts with ARB_vertex_program (and presumably
                              ARB_fragment_program).

                              Added support for numbered program local
                              parameters for compatibility with the ARB vertex
                              program extension (and upcoming ARB fragment
                              program extension), so it's possible to set local
                              parameters the same way in both extensions.

                              Eliminated the language describing "register
                              slots" and how the "H" and "R" registers overlap.
                              Instead, registers are guaranteed not to overlap,
                              and a semantic limit is added on the number of
                              temporaries and output registers that can be used
                              by a program.

                              Eliminated the requirement that non-combiner
                              programs actually write a color value; the only
                              requirement is that one output register be
                              written.  When using fragment programs that use
                              depth replacement, there may not be a need to
                              compute color if color writes are currently
                              disabled

                              Cleaned up the issues section.  Added several
                              examples of fragment program operation.

                              Cleaned up GLX protocol.

      59   07/07/02  pbrown   Minor clarifications of texture lookup handling.
                              Documented that DDX and DDY may not always
                              produce infinities.

      58   06/27/02  pbrown   Added clarification that instructions can use the
                              same attribute or parameter register more than
                              once.  Added support for "X" precision on the
                              "set on" instructions.  Removed "X" precision
                              support from DST.

      57   06/27/02  pbrown   Added missing table entries covering the use of
                              floating-point textures.

      56   06/27/02  pbrown   Modified the spec to indicate that depth textures
                              are treated as alpha, luminance, or intensity
                              according to the depth texture mode in ARB_shadow.

      55   06/26/02  pbrown   Fixed the correct aliased register number and
                              "read-only" mappings for o[DEPR] in combiner
                              programs.

      54   06/05/02  pbrown   Fixed the spec to indicate that near and far
                              frustum clipping is disabled for depth
                              replacement programs.  Fixed the spec to indicate
                              that the register combiners enable is overridden
                              for fragment programs (enabled for combiner
                              programs, disabled for color programs).

      53   05/20/02  pbrown   Miscellaneous bug fixes for wording and
                              special-case handling errors.

      52   05/16/02  pbrown   Added "_SAT" suffix to clamp result vector
                              components to [0,1].  Fixed special case rules
                              for MUL instruction and the "UN" condition code.

      50   04/19/02  pbrown   Added "$" as a legal character in an identifier
                              name.  Added example for fixed and conditional
                              write masks and condition code updates.

      49   04/16/02  pbrown   Added new query of PROGRAM_ERROR_STRING_NV to
                              return more detailed information on program load
                              failures.

      48   04/02/02  pbrown   Added missing enum value for the
                              FRAGMENT_PROGRAM_BINDING_NV query.

      47   03/15/02  pbrown   Fixed various typos, and an incorrect description
                              of the MAX operation.

      45   01/31/02  pbrown   Renamed the packing and unpacking opcode to more
                              closely match OpenGL data type naming conventions
                              (PK2 becomes PK2H, PK16 becomes PH2US, PK4
                              becomes PK4B, PKB becomes PK4UB).  Renamed "BEM"
                              instruction to "X2D" to reflect the fact that it
                              does a 2D coordinate transformation (not just a
                              bump mapping operation).  Added PK4UBG and UP4UBG
                              instructions to support sRGB gamma correction
                              when packing and unpacking components.

      44   01/18/02  pbrown   Double the number of available temporaries (16 to
                              32 fp32 vectors).  Add BEM (texture coordinate
                              offset), PKB/UPB (unsigned byte packing), and
                              PK16/UP16 (unsigned short packing) instructions.

      43   01/04/02  pbrown   Documented special cases for comparisons,
                              including the handling of NaN in the SNE
                              instruction. Added automatic generation of a
                              third normal component for HILO textures.
                              Documented the restriction that RFL can't write
                              to the w component of the result.  Trivial fix of
                              the special-cases for RCP.  Fixed minor typo on
                              the TEX instruction.

      40   11/26/01  pbrown   Eliminated "X" precision specifier on those
                              instructions that do complicated math or don't
                              otherwise need it (e.g., "SGE").  Fixed special
                              case math on LG2 instruction.  Eliminated
                              incorrectly specified exponent clamping on LIT
                              instruction.  Fixed description and special-case
                              math on LIT/POW instructions.  Specified that
                              combiner program outputs are clamped to [-1,+1],
                              not [+0,+1].

      39   11/16/01  pbrown   Added semantic restriction that PK2/PK4 must
                              write to a 32-bit register.  Cleaned up the
                              converse restrictions on UP2/UP4, making sure to
                              allow UP2/UP4 from a program parameter.  Fix
                              section numberings and a few typos.

      36   11/07/01  pbrown   Cleaned up explanation of the "negative q is
                              undefined" for texture mapping spec restriction.
                              Fixed a nit on the number of condition code
                              values (now 4 with UN - unordered).

      35   10/29/01  pbrown   Add a SUB instruction for programmer
                              convenience. Moved unresolved issue list back to
                              the "Issues" section.  Fix several minor wording
                              issues.  Clarify register combiners/texture
                              shader/fragment program flow control diagram.

      32   10/19/01  pbrown   Document the fragment program restriction that
                              instructions involving f[FOGC] and f[TEX0-TEX7]
                              are always carried out at fp32 precision.

      31   10/19/01  pbrown   Fixed incorrect description of encoding of fp16
                              denorms.

      30   10/12/01  pbrown   Documented (0,0,0,0) local parameter
                              initialization.  Disallow multiple defines of the
                              same token.  Allow tokens that look like a
                              possible register or texture name, but have
                              numbers that are too big (e.g., "TEX24", "R37").
                              Fixed up several grammar bugs.  Documented that
                              LG2 and RSQ now do not automatically take
                              absolute values, plus new math special cases.