extensions/NV/NV_gpu_program4.txt - external/github.com/KhronosGroup/OpenGL-Registry - Git at Google

 Name

     NV_gpu_program4

 Name Strings

     GL_NV_gpu_program4

 Contact

     Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com)

 Status

     Shipping for GeForce 8 Series (November 2006)

 Version

     Last Modified Date:         09/11/2014
     NVIDIA Revision:            11

 Number

     322

 Dependencies

     This extension is written against to OpenGL 2.0 specification.

     OpenGL 2.0 is not required, but we expect all implementations of this
     extension will also support OpenGL 2.0.

     This extension is also written against the ARB_vertex_program
     specification, which provides the basic mechanisms for the assembly
     programming model used by this extension.

     This extension serves as the basis for the NV_fragment_program4,
     NV_geometry_program4, and NV_vertex_program4, which all build on this
     extension to support fragment, geometry, and vertex programs,
     respectively.  If "GL_NV_gpu_program4" is found in the extension string,
     all of these extensions are supported.

     NV_parameter_buffer_object affects the definition of this extension.

     ARB_texture_rectangle trivially affects the definition of this extension.

     EXT_gpu_program_parameters trivially affects the definition of this
     extension.

     EXT_texture_integer trivially affects the definition of this extension.

     EXT_texture_array trivially affects the definition of this extension.

     EXT_texture_buffer_object trivially affects the definition of this
     extension.

     NV_primitive_restart trivially affects the definition of this extension.

 Overview

     This specification documents the common instruction set and basic
     functionality provided by NVIDIA's 4th generation of assembly instruction
     sets supporting programmable graphics pipeline stages.

     The instruction set builds upon the basic framework provided by the
     ARB_vertex_program and ARB_fragment_program extensions to expose
     considerably more capable hardware.  In addition to new capabilities for
     vertex and fragment programs, this extension provides a new program type
     (geometry programs) further described in the NV_geometry_program4
     specification.

     NV_gpu_program4 provides a unified instruction set -- all instruction set
     features are available for all program types, except for a small number of
     features that make sense only for a specific program type.  It provides
     fully capable signed and unsigned integer data types, along with a set of
     arithmetic, logical, and data type conversion instructions capable of
     operating on integers.  It also provides a uniform set of structured
     branching constructs (if tests, loops, and subroutines) that fully support
     run-time condition testing.

     This extension provides several new texture mapping capabilities.  Shadow
     cube maps are supported, where cube map faces can encode depth values.
     Texture lookup instructions can include an immediate texel offset, which
     can assist in advanced filtering.  New instructions are provided to fetch
     a single texel by address in a texture map (TXF) and query the size of a
     specified texture level (TXQ).

     By and large, vertex and fragment programs written to ARB_vertex_program
     and ARB_fragment_program can be ported directly by simply changing the
     program header from "!!ARBvp1.0" or "!!ARBfp1.0" to "!!NVvp4.0" or
     "!!NVfp4.0", and then modifying the code to take advantage of the expanded
     feature set.  There are a small number of areas where this extension is
     not a functional superset of previous vertex program extensions, which are
     documented in this specification.


 New Procedures and Functions

     void ProgramLocalParameterI4iNV(enum target, uint index,
                                     int x, int y, int z, int w);
     void ProgramLocalParameterI4ivNV(enum target, uint index,
                                      const int *params);
     void ProgramLocalParametersI4ivNV(enum target, uint index,
                                       sizei count, const int *params);
     void ProgramLocalParameterI4uiNV(enum target, uint index,
                                      uint x, uint y, uint z, uint w);
     void ProgramLocalParameterI4uivNV(enum target, uint index,
                                       const uint *params);
     void ProgramLocalParametersI4uivNV(enum target, uint index,
                                        sizei count, const uint *params);

     void ProgramEnvParameterI4iNV(enum target, uint index,
                                   int x, int y, int z, int w);
     void ProgramEnvParameterI4ivNV(enum target, uint index,
                                    const int *params);
     void ProgramEnvParametersI4ivNV(enum target, uint index,
                                     sizei count, const int *params);
     void ProgramEnvParameterI4uiNV(enum target, uint index,
                                    uint x, uint y, uint z, uint w);
     void ProgramEnvParameterI4uivNV(enum target, uint index,
                                     const uint *params);
     void ProgramEnvParametersI4uivNV(enum target, uint index,
                                      sizei count, const uint *params);

     void GetProgramLocalParameterIivNV(enum target, uint index,
                                        int *params);
     void GetProgramLocalParameterIuivNV(enum target, uint index,
                                         uint *params);
     void GetProgramEnvParameterIivNV(enum target, uint index,
                                      int *params);
     void GetProgramEnvParameterIuivNV(enum target, uint index,
                                       uint *params);

 New Tokens


     Accepted by the <pname> parameter of GetBooleanv, GetIntegerv,
     GetFloatv, and GetDoublev:

         MIN_PROGRAM_TEXEL_OFFSET_EXT                    0x8904
         MAX_PROGRAM_TEXEL_OFFSET_EXT                    0x8905

     (note:  these tokens are shared with the EXT_gpu_shader4 extension.)

     Accepted by the <pname> parameter of GetProgramivARB:

         PROGRAM_ATTRIB_COMPONENTS_NV                    0x8906
         PROGRAM_RESULT_COMPONENTS_NV                    0x8907
         MAX_PROGRAM_ATTRIB_COMPONENTS_NV                0x8908
         MAX_PROGRAM_RESULT_COMPONENTS_NV                0x8909
         MAX_PROGRAM_GENERIC_ATTRIBS_NV                  0x8DA5
         MAX_PROGRAM_GENERIC_RESULTS_NV                  0x8DA6

 Additions to Chapter 2 of the OpenGL 1.5 Specification (OpenGL Operation)

     (Modify "Section 2.14.1" of the ARB_vertex_program specification,
     describing program parameters.)

     Each program object has an associated array of program local parameters.
     Program local parameters are four-component vectors whose components can
     hold floating-point, signed integer, or unsigned integer values.  The data
     type of each local parameter is established when the parameter's values
     are assigned.  If a program attempts to read a local parameter using a
     data type other than the one used when the parameter is set, the values
     returned are undefined.  ... The commands

       void ProgramLocalParameter4fARB(enum target, uint index,
                                       float x, float y, float z, float w);
       void ProgramLocalParameter4fvARB(enum target, uint index,
                                        const float *params);
       void ProgramLocalParameter4dARB(enum target, uint index,
                                       double x, double y, double z, double w);
       void ProgramLocalParameter4dvARB(enum target, uint index,
                                        const double *params);

       void ProgramLocalParameterI4iNV(enum target, uint index,
                                       int x, int y, int z, int w);
       void ProgramLocalParameterI4ivNV(enum target, uint index,
                                        const int *params);
       void ProgramLocalParameterI4uiNV(enum target, uint index,
                                        uint x, uint y, uint z, uint w);
       void ProgramLocalParameterI4uivNV(enum target, uint index,
                                         const uint *params);

     update the values of the program local parameter numbered <index>
     belonging to the program object currently bound to <target>.  For the
     non-vector versions of these commands, the four components of the
     parameter are updated with the values of <x>, <y>, <z>, and <w>,
     respectively.  For the vector versions, the components of the parameter
     are updated with the array of four values pointed to by <params>.  The
     error INVALID_VALUE is generated if <index> is greater than or equal to
     the number of program local parameters supported by <target>.

     The commands

       void ProgramLocalParameters4fvNV(enum target, uint index,
                                        sizei count, const float *params);
       void ProgramLocalParametersI4ivNV(enum target, uint index,
                                         sizei count, const int *params);
       void ProgramLocalParametersI4uivNV(enum target, uint index,
                                          sizei count, const uint *params);

     update the values of the program local parameters numbered <index> through
     <index> + <count> - 1 with the array of 4 * <count> values pointed to by
     <params>.  The error INVALID_VALUE is generated if the sum of <index> and
     <count> is greater than the number of program local parameters supported
     by <target>.

     When a program local parameter is updated, the data type of its components
     is assigned according to the data type of the provided values.  If values
     provided are of type "float" or "double", the components of the parameter
     are floating-point.  If the values provided are of type "int", the
     components of the parameter are signed integers.  If the values provided
     are of type "uint", the components of the parameter are unsigned integers.

     Additionally, each program target has an associated array of program
     environment parameters.  Unlike program local parameters, program
     environment parameters are shared by all program objects of a given
     target.  Program environment parameters are four-component vectors whose
     components can hold floating-point, signed integer, or unsigned integer
     values.  The data type of each environment parameter is established when
     the parameter's values are assigned.  If a program attempts to read an
     environment parameter using a data type other than the one used when the
     parameter is set, the values returned are undefined.  ... The commands

       void ProgramEnvParameter4fARB(enum target, uint index,
                                     float x, float y, float z, float w);
       void ProgramEnvParameter4fvARB(enum target, uint index,
                                      const float *params);
       void ProgramEnvParameter4dARB(enum target, uint index,
                                     double x, double y, double z, double w);
       void ProgramEnvParameter4dvARB(enum target, uint index,
                                      const double *params);
       void ProgramEnvParameterI4iNV(enum target, uint index,
                                     int x, int y, int z, int w);
       void ProgramEnvParameterI4ivNV(enum target, uint index,
                                      const int *params);
       void ProgramEnvParameterI4uiNV(enum target, uint index,
                                      uint x, uint y, uint z, uint w);
       void ProgramEnvParameterI4uivNV(enum target, uint index,
                                       const uint *params);

     update the values of the program environment parameter numbered <index>
     for the given program target <target>.  For the non-vector versions of
     these commands, the four components of the parameter are updated with the
     values of <x>, <y>, <z>, and <w>, respectively.  For the vector versions,
     the four components of the parameter are updated with the array of four
     values pointed to by <params>.  The error INVALID_VALUE is generated if
     <index> is greater than or equal to the number of program environment
     parameters supported by <target>.

     The commands

       void ProgramEnvParameters4fvNV(enum target, uint index,
                                      sizei count, const float *params);
       void ProgramEnvParametersI4ivNV(enum target, uint index,
                                       sizei count, const int *params);
       void ProgramEnvParametersI4uivNV(enum target, uint index,
                                        sizei count, const uint *params);

     update the values of the program environment parameters numbered <index>
     through <index> + <count> - 1 with the array of 4 * <count> values pointed
     to by <params>.  The error INVALID_VALUE is generated if the sum of
     <index> and <count> is greater than the number of program local parameters
     supported by <target>.

     When a program environment parameter is updated, the data type of its
     components is assigned according to the data type of the provided values.
     If values provided are of type "float" or "double", the components of the
     parameter are floating-point.  If the values provided are of type "int",
     the components of the parameter are signed integers.  If the values
     provided are of type "uint", the components of the parameter are unsigned
     integers.

     ...


     Insert New Section 2.X between Sections 2.Y and 2.Z:

     Section 2.X, GPU Programs

     The GL provides a number of different program targets that allow an
     application to either replace certain fixed-function pipeline stages with
     a fully programmable model or use a program to control aspects of the GL
     pipeline that previously had only hard-wired behavior.

     A common base instruction set is available for all program types,
     providing both integer and floating-point operations.  Structured
     branching operations and subroutine calls are available.  Texture
     mapping (loading data from external images) is supported for all
     program types.  The main differences between the different program
     types are the set of available inputs and outputs, which are program type-
     specific, and a few instructions that are meaningful for only a subset
     of program types.


     Section 2.X.2, Program Grammar

     GPU program strings are specified as an array of ASCII characters
     containing the program text.  When a GPU program is loaded by a call to
     ProgramStringARB, the program string is parsed into a set of tokens
     possibly separated by whitespace.  Spaces, tabs, newlines, carriage
     returns, and comments are considered whitespace.  Comments begin with the
     character "#" and are terminated by a newline, a carriage return, or the
     end of the program array.

     The Backus-Naur Form (BNF) grammar below specifies the syntactically valid
     sequences for GPU programs.  The set of valid tokens can be inferred
     from the grammar.  A line containing "/* empty */" represents an empty
     string and is used to indicate optional rules.  A program is invalid if it
     contains any tokens or characters not defined in this specification.

     Note that this extension is not a standalone extension and a small number
     of grammar rules are left to be defined in the extensions defining the
     specific vertex, fragment, and geometry program types.


     <program>               ::= <optionSequence> <declSequence>
                                 <statementSequence> "END"

     <optionSequence>        ::= <option> <optionSequence>
                               | /* empty */

     <option>                ::= "OPTION" <identifier> ";"

     <declSequence>          ::= /* empty */

     <statementSequence>     ::= <statement> <statementSequence>
                               | /* empty */

     <statement>             ::= <instruction> ";"
                               | <namingStatement> ";"
                               | <instLabel> ":"

     <instruction>           ::= <ALUInstruction>
                               | <TexInstruction>
                               | <FlowInstruction>

     <ALUInstruction>        ::= <VECTORop_instruction>
                               | <SCALARop_instruction>
                               | <BINSCop_instruction>
                               | <BINop_instruction>
                               | <VECSCAop_instruction>
                               | <TRIop_instruction>
                               | <SWZop_instruction>

     <TexInstruction>        ::= <TEXop_instruction>
                               | <TXDop_instruction>

     <FlowInstruction>       ::= <BRAop_instruction>
                               | <FLOWCCop_instruction>
                               | <IFop_instruction>
                               | <REPop_instruction>
                               | <ENDFLOWop_instruction>

     <VECTORop_instruction>  ::= <VECTORop> <opModifiers> <instResult> ","
                                 <instOperandV>

     <VECTORop>              ::= "ABS"
                               | "CEIL"
                               | "FLR"
                               | "FRC"
                               | "I2F"
                               | "LIT"
                               | "MOV"
                               | "NOT"
                               | "NRM"
                               | "PK2H"
                               | "PK2US"
                               | "PK4B"
                               | "PK4UB"
                               | "ROUND"
                               | "SSG"
                               | "TRUNC"

     <SCALARop_instruction>  ::= <SCALARop> <opModifiers> <instResult> ","
                                 <instOperandS>

     <SCALARop>              ::= "COS"
                               | "EX2"
                               | "LG2"
                               | "RCC"
                               | "RCP"
                               | "RSQ"
                               | "SCS"
                               | "SIN"
                               | "UP2H"
                               | "UP2US"
                               | "UP4B"
                               | "UP4UB"

     <BINSCop_instruction>   ::= <BINSCop> <opModifiers> <instResult> ","
                                 <instOperandS> "," <instOperandS>

     <BINSCop>               ::= "POW"

     <VECSCAop_instruction>  ::= <VECSCAop> <opModifiers> <instResult> ","
                                 <instOperandV> "," <instOperandS>

     <VECSCAop>              ::= "DIV"
                               | "SHL"
                               | "SHR"
                               | "MOD"

     <BINop_instruction>     ::= <BINop> <opModifiers> <instResult> ","
                                 <instOperandV> "," <instOperandV>

     <BINop>                 ::= "ADD"
                               | "AND"
                               | "DP3"
                               | "DP4"
                               | "DPH"
                               | "DST"
                               | "MAX"
                               | "MIN"
                               | "MUL"
                               | "OR"
                               | "RFL"
                               | "SEQ"
                               | "SFL"
                               | "SGE"
                               | "SGT"
                               | "SLE"
                               | "SLT"
                               | "SNE"
                               | "STR"
                               | "SUB"
                               | "XPD"
                               | "DP2"
                               | "XOR"

     <TRIop_instruction>     ::= <TRIop> <opModifiers> <instResult> ","
                                 <instOperandV> "," <instOperandV> ","
                                 <instOperandV>

     <TRIop>                 ::= "CMP"
                               | "DP2A"
                               | "LRP"
                               | "MAD"
                               | "SAD"
                               | "X2D"

     <SWZop_instruction>     ::= <SWZop> <opModifiers> <instResult> ","
                                 <instOperandVNS> "," <extendedSwizzle>

     <SWZop>                 ::= "SWZ"

     <TEXop_instruction>     ::= <TEXop> <opModifiers> <instResult> ","
                                 <instOperandV> "," <texAccess>

     <TEXop>                 ::= "TEX"
                               | "TXB"
                               | "TXF"
                               | "TXL"
                               | "TXP"
                               | "TXQ"

     <TXDop_instruction>     ::= <TXDop> <opModifiers> <instResult> ","
                                 <instOperandV> "," <instOperandV> ","
                                 <instOperandV> "," <texAccess>

     <TXDop>                 ::= "TXD"

     <BRAop_instruction>     ::= <BRAop> <opModifiers> <instTarget>
                                 <optBranchCond>

     <BRAop>                 ::= "CAL"

     <FLOWCCop_instruction>  ::= <FLOWCCop> <opModifiers> <optBranchCond>

     <FLOWCCop>              ::= "RET"
                               | "BRK"
                               | "CONT"

     <IFop_instruction>      ::= <IFop> <opModifiers> <ccTest>

     <IFop>                  ::= "IF"

     <REPop_instruction>     ::= <REPop> <opModifiers> <instOperandV>
                               | <REPop> <opModifiers>

     <REPop>                 ::= "REP"

     <ENDFLOWop_instruction> ::= <ENDFLOWop> <opModifiers>

     <ENDFLOWop>             ::= "ELSE"
                               | "ENDIF"
                               | "ENDREP"

     <opModifiers>           ::= <opModifierItem> <opModifiers>
                               | /* empty */

     <opModifierItem>        ::= "." <opModifier>

     <opModifier>            ::= "F"
                               | "U"
                               | "S"
                               | "CC"
                               | "CC0"
                               | "CC1"
                               | "SAT"
                               | "SSAT"
                               | "NTC"
                               | "S24"
                               | "U24"
                               | "HI"

     <texAccess>             ::= <texImageUnit> "," <texTarget>
                               | <texImageUnit> "," <texTarget> "," <texOffset>

     <texImageUnit>          ::= "texture" <optArrayMemAbs>

     <texTarget>             ::= "1D"
                               | "2D"
                               | "3D"
                               | "CUBE"
                               | "RECT"
                               | "SHADOW1D"
                               | "SHADOW2D"
                               | "SHADOWRECT"
                               | "ARRAY1D"
                               | "ARRAY2D"
                               | "SHADOWCUBE"
                               | "SHADOWARRAY1D"
                               | "SHADOWARRAY2D"

     <texOffset>             ::= "(" <texOffsetComp> ")"
                               | "(" <texOffsetComp> "," <texOffsetComp> ")"
                               | "(" <texOffsetComp> "," <texOffsetComp> ","
                                 <texOffsetComp> ")"

     <texOffsetComp>         ::= <optSign> <int>

     <optBranchCond>         ::= /* empty */
                               | <ccMask>

     <instOperandV>          ::= <instOperandAbsV>
                               | <instOperandBaseV>

     <instOperandAbsV>       ::= <operandAbsNeg> "|" <instOperandBaseV> "|"

     <instOperandBaseV>      ::= <operandNeg> <attribUseV>
                               | <operandNeg> <tempUseV>
                               | <operandNeg> <paramUseV>
                               | <operandNeg> <bufferUseV>

     <instOperandS>          ::= <instOperandAbsS>
                               | <instOperandBaseS>

     <instOperandAbsS>       ::= <operandAbsNeg> "|" <instOperandBaseS> "|"

     <instOperandBaseS>      ::= <operandNeg> <attribUseS>
                               | <operandNeg> <tempUseS>
                               | <operandNeg> <paramUseS>
                               | <operandNeg> <bufferUseS>

     <instOperandVNS>        ::= <attribUseVNS>
                               | <tempUseVNS>
                               | <paramUseVNS>
                               | <bufferUseVNS>

     <operandAbsNeg>         ::= <optSign>

     <operandNeg>            ::= <optSign>

     <instResult>            ::= <instResultCC>
                               | <instResultBase>

     <instResultCC>          ::= <instResultBase> <ccMask>

     <instResultBase>        ::= <tempUseW>
                               | <resultUseW>

     <namingStatement>       ::= <varMods> <ATTRIB_statement>
                               | <varMods> <PARAM_statement>
                               | <varMods> <TEMP_statement>
                               | <varMods> <OUTPUT_statement>
                               | <varMods> <BUFFER_statement>
                               | <ALIAS_statement>

     <ATTRIB_statement>      ::= "ATTRIB" <establishName> "=" <attribUseD>

     <PARAM_statement>       ::= <PARAM_singleStmt>
                               | <PARAM_multipleStmt>

     <PARAM_singleStmt>      ::= "PARAM" <establishName> <paramSingleInit>

     <PARAM_multipleStmt>    ::= "PARAM" <establishName> <optArraySize>
                                 <paramMultipleInit>

     <paramSingleInit>       ::= "=" <paramUseDB>

     <paramMultipleInit>     ::= "=" "{" <paramMultInitList> "}"

     <paramMultInitList>     ::= <paramUseDM>
                               | <paramUseDM> "," <paramMultInitList>

     <TEMP_statement>        ::= "TEMP" <varNameList>

     <OUTPUT_statement>      ::= "OUTPUT" <establishName> "=" <resultUseD>

     <varMods>               ::= <varModifier> <varMods>
                               | /* empty */

     <varModifier>           ::= "SHORT"
                               | "LONG"
                               | "INT"
                               | "UINT"
                               | "FLOAT"

     <ALIAS_statement>       ::= "ALIAS" <establishName> "=" <establishedName>

     <BUFFER_statement>      ::= <bufferDeclType> <establishName> "="
                                 <bufferSingleInit>
                               | <bufferDeclType> <establishName>
                                 <optArraySize> "=" <bufferMultInit>

     <bufferDeclType>        ::= "BUFFER"
                               | "BUFFER4"

     <bufferSingleInit>      ::= "=" <bufferUseDB>

     <bufferMultInit>        ::= "=" "{" <bufferMultInitList> "}"

     <bufferMultInitList>    ::= <bufferUseDM>
                               | <bufferUseDM> "," <bufferMultInitList>

     <varNameList>           ::= <establishName>
                               | <establishName> "," <varNameList>

     <attribUseV>            ::= <attribBasic> <swizzleSuffix>
                               | <attribVarName> <swizzleSuffix>
                               | <attribVarName> <arrayMem> <swizzleSuffix>
                               | <attribColor> <swizzleSuffix>
                               | <attribColor> "." <colorType> <swizzleSuffix>

     <attribUseS>            ::= <attribBasic> <scalarSuffix>
                               | <attribVarName> <scalarSuffix>
                               | <attribVarName> <arrayMem> <scalarSuffix>
                               | <attribColor> <scalarSuffix>
                               | <attribColor> "." <colorType> <scalarSuffix>

     <attribUseVNS>          ::= <attribBasic>
                               | <attribVarName>
                               | <attribVarName> <arrayMem>
                               | <attribColor>
                               | <attribColor> "." <colorType>

     <attribUseD>            ::= <attribBasic>
                               | <attribColor>
                               | <attribColor> "." <colorType>
                               | <attribMulti>

     <paramUseV>             ::= <paramVarName> <optArrayMem> <swizzleSuffix>
                               | <stateSingleItem> <swizzleSuffix>
                               | <programSingleItem> <swizzleSuffix>
                               | <constantVector> <swizzleSuffix>
                               | <constantScalar>

     <paramUseS>             ::= <paramVarName> <optArrayMem> <scalarSuffix>
                               | <stateSingleItem> <scalarSuffix>
                               | <programSingleItem> <scalarSuffix>
                               | <constantVector> <scalarSuffix>
                               | <constantScalar>

     <paramUseVNS>           ::= <paramVarName> <optArrayMem>
                               | <stateSingleItem>
                               | <programSingleItem>
                               | <constantVector>
                               | <constantScalar>

     <paramUseDB>            ::= <stateSingleItem>
                               | <programSingleItem>
                               | <constantVector>
                               | <signedConstantScalar>

     <paramUseDM>            ::= <stateMultipleItem>
                               | <programMultipleItem>
                               | <constantVector>
                               | <signedConstantScalar>

     <stateMultipleItem>     ::= <stateSingleItem>
                               | "state" "." <stateMatrixRows>

     <stateSingleItem>       ::= "state" "." <stateMaterialItem>
                               | "state" "." <stateLightItem>
                               | "state" "." <stateLightModelItem>
                               | "state" "." <stateLightProdItem>
                               | "state" "." <stateFogItem>
                               | "state" "." <stateMatrixRow>
                               | "state" "." <stateTexGenItem>
                               | "state" "." <stateClipPlaneItem>
                               | "state" "." <statePointItem>
                               | "state" "." <stateTexEnvItem>
                               | "state" "." <stateDepthItem>

     <stateMaterialItem>     ::= "material" "." <stateMatProperty>
                               | "material" "." <faceType> "."
                                 <stateMatProperty>

     <stateMatProperty>      ::= "ambient"
                               | "diffuse"
                               | "specular"
                               | "emission"
                               | "shininess"

     <stateLightItem>        ::= "light" <arrayMemAbs> "." <stateLightProperty>

     <stateLightProperty>    ::= "ambient"
                               | "diffuse"
                               | "specular"
                               | "position"
                               | "attenuation"
                               | "spot" "." <stateSpotProperty>
                               | "half"

     <stateSpotProperty>     ::= "direction"

     <stateLightModelItem>   ::= "lightmodel" "." <stateLModProperty>

     <stateLModProperty>     ::= "ambient"
                               | "scenecolor"
                               | <faceType> "." "scenecolor"

     <stateLightProdItem>    ::= "lightprod" <arrayMemAbs> "."
                                 <stateLProdProperty>
                               | "lightprod" <arrayMemAbs> "." <faceType> "."
                                 <stateLProdProperty>

     <stateLProdProperty>    ::= "ambient"
                               | "diffuse"
                               | "specular"

     <stateFogItem>          ::= "fog" "." <stateFogProperty>

     <stateFogProperty>      ::= "color"
                               | "params"

     <stateMatrixRows>       ::= <stateMatrixItem>
                               | <stateMatrixItem> "." <stateMatModifier>
                               | <stateMatrixItem> "." "row" <arrayRange>
                               | <stateMatrixItem> "." <stateMatModifier> "."
                                 "row" <arrayRange>

     <stateMatrixRow>        ::= <stateMatrixItem> "." "row" <arrayMemAbs>
                               | <stateMatrixItem> "." <stateMatModifier> "."
                                 "row" <arrayMemAbs>

     <stateMatrixItem>       ::= "matrix" "." <stateMatrixName>

     <stateMatModifier>      ::= "inverse"
                               | "transpose"
                               | "invtrans"

     <stateMatrixName>       ::= "modelview" <optArrayMemAbs>
                               | "projection"
                               | "mvp"
                               | "texture" <optArrayMemAbs>
                               | "program" <arrayMemAbs>

     <stateTexGenItem>       ::= "texgen" <optArrayMemAbs> "."
                                 <stateTexGenType> "." <stateTexGenCoord>

     <stateTexGenType>       ::= "eye"
                               | "object"

     <stateTexGenCoord>      ::= "s"
                               | "t"
                               | "r"
                               | "q"

     <stateClipPlaneItem>    ::= "clip" <arrayMemAbs> "." "plane"

     <statePointItem>        ::= "point" "." <statePointProperty>

     <statePointProperty>    ::= "size"
                               | "attenuation"

     <stateTexEnvItem>       ::= "texenv" <optArrayMemAbs> "."
                                 <stateTexEnvProperty>

     <stateTexEnvProperty>   ::= "color"

     <stateDepthItem>        ::= "depth" "." <stateDepthProperty>

     <stateDepthProperty>    ::= "range"

     <programSingleItem>     ::= <progEnvParam>
                               | <progLocalParam>

     <programMultipleItem>   ::= <progEnvParams>
                               | <progLocalParams>

     <progEnvParams>         ::= "program" "." "env" <arrayMemAbs>
                               | "program" "." "env" <arrayRange>

     <progEnvParam>          ::= "program" "." "env" <arrayMemAbs>

     <progLocalParams>       ::= "program" "." "local" <arrayMemAbs>
                               | "program" "." "local" <arrayRange>

     <progLocalParam>        ::= "program" "." "local" <arrayMemAbs>

     <constantVector>        ::= "{" <constantVectorList> "}"

     <constantVectorList>    ::= <signedConstantScalar>
                               | <signedConstantScalar> ","
                                 <signedConstantScalar>
                               | <signedConstantScalar> ","
                                 <signedConstantScalar> ","
                                 <signedConstantScalar>
                               | <signedConstantScalar> ","
                                 <signedConstantScalar> ","
                                 <signedConstantScalar> ","
                                 <signedConstantScalar>

     <signedConstantScalar>  ::= <optSign> <constantScalar>

     <constantScalar>        ::= <floatConstant>
                               | <intConstant>

     <floatConstant>         ::= <float>

     <intConstant>           ::= <int>

     <tempUseV>              ::= <tempVarName> <swizzleSuffix>

     <tempUseS>              ::= <tempVarName> <scalarSuffix>

     <tempUseVNS>            ::= <tempVarName>

     <tempUseW>              ::= <tempVarName> <optWriteMask>

     <resultUseW>            ::= <resultBasic> <optWriteMask>
                               | <resultVarName> <optWriteMask>

     <resultUseD>            ::= <resultBasic>

     <bufferUseV>            ::= <bufferVarName> <optArrayMem> <swizzleSuffix>

     <bufferUseS>            ::= <bufferVarName> <optArrayMem> <scalarSuffix>

     <bufferUseVNS>          ::= <bufferVarName> <optArrayMem>

     <bufferUseDB>           ::= <bufferBinding> <arrayMemAbs>

     <bufferUseDM>           ::= <bufferBinding> <arrayMemAbs>
                               | <bufferBinding> <arrayRange>
                               | <bufferBinding>

     <bufferBinding>         ::= "program" "." "buffer" <arrayMemAbs>

     <optArraySize>          ::= "[" "]"
                               | "[" <int> "]"

     <optArrayMem>           ::= /* empty */
                               | <arrayMem>

     <arrayMem>              ::= <arrayMemAbs>
                               | <arrayMemRel>

     <optArrayMemAbs>        ::= /* empty */
                               | <arrayMemAbs>

     <arrayMemAbs>           ::= "[" <int> "]"

     <arrayMemRel>           ::= "[" <arrayMemReg> <arrayMemOffset> "]"

     <arrayMemReg>           ::= <addrUseS>

     <arrayMemOffset>        ::= /* empty */
                               | "+" <int>
                               | "-" <int>

     <arrayRange>            ::= "[" <int> ".." <int> "]"

     <addrUseS>              ::= <addrVarName> <scalarSuffix>

     <ccMask>                ::= "(" <ccTest> ")"

     <ccTest>                ::= <ccMaskRule> <swizzleSuffix>

     <ccMaskRule>            ::= "EQ"
                               | "GE"
                               | "GT"
                               | "LE"
                               | "LT"
                               | "NE"
                               | "TR"
                               | "FL"
                               | "EQ0"
                               | "GE0"
                               | "GT0"
                               | "LE0"
                               | "LT0"
                               | "NE0"
                               | "TR0"
                               | "FL0"
                               | "EQ1"
                               | "GE1"
                               | "GT1"
                               | "LE1"
                               | "LT1"
                               | "NE1"
                               | "TR1"
                               | "FL1"
                               | "NAN"
                               | "NAN0"
                               | "NAN1"
                               | "LEG"
                               | "LEG0"
                               | "LEG1"
                               | "CF"
                               | "CF0"
                               | "CF1"
                               | "NCF"
                               | "NCF0"
                               | "NCF1"
                               | "OF"
                               | "OF0"
                               | "OF1"
                               | "NOF"
                               | "NOF0"
                               | "NOF1"
                               | "AB"
                               | "AB0"
                               | "AB1"
                               | "BLE"
                               | "BLE0"
                               | "BLE1"
                               | "SF"
                               | "SF0"
                               | "SF1"
                               | "NSF"
                               | "NSF0"
                               | "NSF1"

     <optWriteMask>          ::= /* empty */
                               | <xyzwMask>
                               | <rgbaMask>

     <xyzwMask>              ::= "." "x"
                               | "." "y"
                               | "." "xy"
                               | "." "z"
                               | "." "xz"
                               | "." "yz"
                               | "." "xyz"
                               | "." "w"
                               | "." "xw"
                               | "." "yw"
                               | "." "xyw"
                               | "." "zw"
                               | "." "xzw"
                               | "." "yzw"
                               | "." "xyzw"

     <rgbaMask>              ::= "." "r"
                               | "." "g"
                               | "." "rg"
                               | "." "b"
                               | "." "rb"
                               | "." "gb"
                               | "." "rgb"
                               | "." "a"
                               | "." "ra"
                               | "." "ga"
                               | "." "rga"
                               | "." "ba"
                               | "." "rba"
                               | "." "gba"
                               | "." "rgba"

     <swizzleSuffix>         ::= /* empty */
                               | "." <component>
                               | "." <xyzwSwizzle>
                               | "." <rgbaSwizzle>

     <extendedSwizzle>       ::= <extSwizComp> "," <extSwizComp> ","
                                 <extSwizComp> "," <extSwizComp>

     <extSwizComp>           ::= <optSign> <xyzwExtSwizSel>
                               | <optSign> <rgbaExtSwizSel>

     <xyzwExtSwizSel>        ::= "0"
                               | "1"
                               | <xyzwComponent>

     <rgbaExtSwizSel>        ::= <rgbaComponent>

     <scalarSuffix>          ::= "." <component>

     <component>             ::= <xyzwComponent>
                               | <rgbaComponent>

     <xyzwComponent>         ::= "x"
                               | "y"
                               | "z"
                               | "w"

     <rgbaComponent>         ::= "r"
                               | "g"
                               | "b"
                               | "a"

     <optSign>               ::= /* empty */
                               | "-"
                               | "+"

     <faceType>              ::= "front"
                               | "back"

     <colorType>             ::= "primary"
                               | "secondary"

     <instLabel>             ::= <identifier>

     <instTarget>            ::= <identifier>

     <establishedName>       ::= <identifier>

     <establishName>         ::= <identifier>


     The <int> rule matches an integer constant.  The integer consists of a
     sequence of one or more digits ("0" through "9"), or a sequence in
     hexadecimal form beginning with "0x" followed by a sequence of one or more
     hexadecimal digits ("0" through "9", "a" through "f", "A" through "F").

     The <float> rule matches a floating-point constant consisting of an
     integer part, a decimal point, a fraction part, an "e" or "E", and an
     optionally signed integer exponent.  The integer and fraction parts both
     consist of a sequence of one or more digits ("0" through "9").  Either the
     integer part or the fraction parts (not both) may be missing; either the
     decimal point or the "e" (or "E") and the exponent (not both) may be
     missing.  Most grammar rules that allow floating-point values also allow
     integers matching the <int> rule.

     The <identifier> rule matches a sequence of one or more letters ("A"
     through "Z", "a" through "z"), digits ("0" through "9), underscores ("_"),
     or dollar signs ("$"); the first character must not be a number.  Upper
     and lower case letters are considered different (names are
     case-sensitive).  The following strings are reserved keywords and may not
     be used as identifiers:  "fragment" (for fragment programs only), "vertex"
     (for vertex and geometry programs), "primitive" (for fragment and geometry
     programs), "program", "result", "state", and "texture".

     The <tempVarName>, <paramVarName>, <attribVarName>, <resultVarName>, and
     <bufferName> rules match identifiers that have been previously established
     as names of temporary, program parameter, attribute, result, and program
     parameter buffer variables, respectively.

     The <xyzwSwizzle> and <rgbaSwizzle> rules match any 4-character strings
     consisting only of the characters "x", "y", "z", and "w" (<xyzwSwizzle>)
     or "r", "g", "b", "a" (<rgbaSwizzle>).

     The error INVALID_OPERATION is generated if a program fails to load
     because it is not syntactically correct or for one of the semantic
     restrictions described in the following sections.

     A successfully loaded program is parsed into a sequence of instructions.
     Each instruction is identified by its tokenized name.  The operation of
     these instructions when executed is defined in section 2.X.4.  A
     successfully loaded program string replaces the program string previously
     loaded into the specified program object.  If the OUT_OF_MEMORY error is
     generated by ProgramStringARB, no change is made to the previous contents
     of the current program object.


     Section 2.X.3, Program Variables

     Programs may operate on a number of different variables during their
     execution.  The following sections define the different classes of
     variables that can be declared and used by a program.

     Some variable classes require variable bindings.  Variable classes with
     bindings refer to state that is either generated or consumed outside the
     program.  Examples of variable bindings include a vertex's normal, the
     position of a vertex computed by a vertex program, an interpolated texture
     coordinate, and the diffuse color of light 1.  Variables that are used
     only during program execution do not have bindings.

     Variables may be declared explicitly according to the <namingStatement>
     grammar rule.  Explicit variable declarations allow a program to establish
     a variable name that can be used to refer to a specified resource in
     subsequent instructions.  Variables may be declared anywhere in the
     program string, but must be declared prior to use.  A program will fail to
     load if it declares the same variable name more than once, or if it refers
     to a variable name that has not been previously declared in the program
     string.

     Variables may also be declared implicitly, simply by using a variable
     binding as an operand in a program instruction.  Such uses are considered
     to automatically create a nameless variable using the specified binding.
     Only variable from classes with bindings can be declared implicitly.


     Section 2.X.3.1, Program Variable Types

     Explicit variable declarations may include one or more modifiers that
     specify additional information about the variable, such as the size and
     data type of the components of the variable.  Variable modifiers are
     specified according to the <varModifier> grammar rule.

     By default, variables are considered typeless.  They can be used in
     instructions that read or write the variable as floating-point values,
     signed integers, or unsigned integers.  If a variable is written using one
     data type but then read using a different one, the results of the
     operation are undefined.  Variables with bindings are considered to be
     read or written when their values are produced or consumed; the data type
     used by the GL is specified in the description of each binding.

     Explicitly declared variables may optionally have one data type modifier,
     which can be used to detect data type mismatch errors.  Type modifers of
     "INT", "UINT", and "FLOAT" indicate that the components of the variable
     are stored as signed integers, unsigned integers, or floating-point
     values, respectively.  A program will fail to load if it attempts to read
     or write a variable using a data type other than the one indicated by the
     data type modifier.  Variables without a data type modifier can be read or
     written using any data type.

     Explicitly declared variables may optionally have one storage size
     modifier.  Variables decared as "SHORT" will be represented using at least
     16 bits per component.  "SHORT" floating-point values will have at least 5
     bits of exponent and 10 bits of mantissa.  Variables declared as "LONG"
     will be represented with at least 32 bits per component.  "LONG"
     floating-point values will have at least 8 bits of exponent and 23 bits of
     mantissa.  If no size modifier is provided, the GL will automatically
     select component sizes.  Implementations are not required to support more
     than one component size, so "SHORT", "LONG", and the default could all
     refer to the same component size.  The "LONG" modifier is supported only
     for declarations of temporary variables ("TEMP").  The "SHORT" modifier is
     supported only for declarations of temporary variables and result
     variables ("OUTPUT").

     Each variable declaration can include at most one data type and one
     storage size modifier.  A program will fail to load if it specifies
     multiple data type or multiple storage size modifiers in a single variable
     declaration.

     (NOTE:  Fragment programs also support the modifiers "FLAT", "CENTROID",
     and "NOPERSPECTIVE", which control how per-fragment attribute values are
     produced.  These modifiers are described in detail in the
     NV_fragment_program4 specification.)

     Explicitly declared variables of all types may be declared as arrays.  An
     array variable has one or more members, numbered 0 through <n>-1, where
     <n> is the number of entries in the array.  The total number of entries in
     the array can be declared using the <optArraySize> grammar rule.  For
     variable classes without bindings, an array size must be specified in the
     program, and must be a positive integer.  For variable classes with
     bindings, a declared size is optional, and is taken from the number of
     bindings assigned in the declaration if omitted.  A program will fail to
     load if the declared size of an array variable does not match the number
     of assigned bindings.

     When a variable is declared as an array, instructions that use the
     variable must specify an array member to access according to the
     <arrayMem> grammar rule.  A program will fail to load if it contains an
     instruction that accesses an array variable without specifying an array
     member or an instruction that specifies an array member for a non-array
     variable.


     Section 2.X.3.2, Program Attribute Variables

     Program attribute variables represent per-vertex or per-fragment inputs to
     the program.  All attribute variables have associated bindings, and are
     read-only during program execution.  Attribute variables may be declared
     explicitly via the <ATTRIB_statement> grammar rule, or implicitly by using
     an attribute binding in an instruction.

     The set of available attribute bindings depends on the program type, and
     is enumerated in the specifications for each program type.

     The set of bindings allowed for attribute array variables is limited to
     attribute state grouped in arrays (e.g., texture coordinates, generic
     vertex attributes).  Additionally, all bindings assigned to the array must
     be of the same binding type and must increase consecutively.  Examples of
     valid and invalid binding lists include:

       vertex.attrib[1], vertex.attrib[2]      # valid, 2-entry array
       vertex.texcoord[0..3]                   # valid, 4-entry array
       vertex.attrib[1], vertex.attrib[3]      # invalid, skipped attrib 2
       vertex.attrib[2], vertex.attrib[1]      # invalid, wrong order
       vertex.attrib[1], vertex.texcoord[2]    # invalid, different types

     Additionally, attribute bindings may be used in no more than one array
     variable accessed with relative addressing.

     Implementations may have a limit on the total number of attribute binding
     components used by each program target (MAX_PROGRAM_ATTRIB_COMPONENTS_NV).
     Programs that use more attribute binding components than this limit will
     fail to load.  The method of counting used attribute binding components is
     implementation-dependent, but must satisfy the following properties:

       * If an attribute binding is not referenced in a program, or is
         referenced only in declarations of attribute variables that are not
         used, none of its components are counted.

       * An attribute binding component may be counted as used only if there
         exists an instruction operand where

           - the component is enabled for read by the swizzle pattern (Section
             2.X.4.2), and

           - the attribute binding is

               - referenced directly by the operand,

               - bound to a declared variable referenced by the operand, or

               - bound to a declared array variable where another binding in
                 the array satisfies one of the two previous conditions.

         Implementations are not required to optimize out unused elements of an
         attribute array or components that are used in only some elements of
         an array.  The last of these rules is intended to cover the case where
         the same attribute binding is used in multiple variables.

         For example, an operand whose swizzle pattern selects only the x
         component may result in the x component of an attribute binding being
         counted, but may never result in the counting of the y, z, or w
         components of any attribute binding.

       * Implementations are not required to determine that components read by
         an instruction are actually unused due to:

           - instruction write masks (for example, a component-wise ADD
             operation that only writes the "x" component doesn't have to read
             the "y", "z", and "w" components of its operands) or

           - any other properties of the instruction (for example, the DP3
             instruction computes a 3-component dot product doesn't have to
             read the "w" component of its operands).


     Section 2.X.3.3, Program Parameters

     Program parameter variables are used as constants during program
     execution.  All program parameter variables have associated bindings and
     are read-only during program execution.  Program parameters retain their
     values across program invocations, although their values may change
     between invocations due to GL state changes.  Program parameter variables
     may be declared explicitly via the <PARAM_statement> grammar rule, or
     implicitly by using a parameter binding in an instruction.  Except where
     otherwise specified, program parameter bindings always specify
     floating-point values.

     When declaring program parameter array variables, all bindings are
     supported and can be assigned to array members in any order.  The only
     restriction is that no parameter binding may be used more than once in
     array variables accessed using relative addressing.  A program will fail
     to load if any program parameter binding is used more than once in a
     single array accessed using relative addressing or used at least once in
     two or more arrays accessed using relative addressing.


     Constant Bindings

     If a program parameter binding matches the <constantScalar> or
     <signedConstantScalar> grammar rules, the corresponding program parameter
     variable is bound to the vector (X,X,X,X), where X is the value of the
     specified constant.

     If a program parameter binding matches <constantVector>, the corresponding
     program parameter variable is bound to the vector (X,Y,Z,W), where X, Y,
     Z, and W are the values corresponding to the first, second, third, and
     fourth match of <signedConstantScalar>.  If fewer than four constants are
     specified, Y, Z, and W assume the values 0, 0, and 1, if their respective
     constants are not specified.

     Constant bindings can be interpreted as having signed integer, unsigned
     integer, or floating-point values, depending on how they are used in the
     program text.  For constants in variable declarations, the components of
     the constant are interpreted according to the variable's component data
     type modifier.  If no data type modifier is specified in a declaration,
     constants are interpreted as floating-point values.  For constant bindings
     used directly in an instruction, the components of the constant are
     interpreted according to the required data type of the operand.  A program
     will fail to load if it specifies a floating-point constant value
     (matching the <floatConstant> grammar rule) that should be interpreted as
     a signed or unsigned integer, or a negative integer constant value that
     should be interpreted as an unsigned integer.

     If the value used to specify a floating-point constant can not be exactly
     represented, the nearest floating-point value will be used.  If the value
     used to specify an integer constant is too large to be represented, the
     program will fail to load.


     Program Environment/Local Parameter Bindings

       Binding                    Components  Underlying State
       -------------------------  ----------  -------------------------------
       program.env[a]             (x,y,z,w)   program environment parameter a
       program.local[a]           (x,y,z,w)   program local parameter a
       program.env[a..b]          (x,y,z,w)   program environment parameters
                                              a through b
       program.local[a..b]        (x,y,z,w)   program local parameters
                                              a through b

       Table X.1:  Program Environment/Local Parameter Bindings.  <a> and <b>
       indicate parameter numbers, where <a> must be less than or equal to <b>.

     If a program parameter binding matches "program.env[a]" or
     "program.local[a]", the four components of the program parameter variable
     are filled with the four components of program environment parameter <a>
     or program local parameter <a> respectively.

     Additionally, for program parameter array bindings, "program.env[a..b]"
     and "program.local[a..b]" are equivalent to specifying program environment
     or local parameters <a> through <b> in order, respectively.  A program
     using any of these bindings will fail to load if <a> is greater than <b>.

     Program environment and local parameters are typeless, and may be
     specified as signed integer, unsigned integer, or floating-point
     variables.  If a program environment parameter is read using a data type
     other than the one used to specify it, an undefined value is returned.


     Material Property Bindings

       Binding                        Components  Underlying State
       -----------------------------  ----------  ----------------------------
       state.material.ambient         (r,g,b,a)   front ambient material color
       state.material.diffuse         (r,g,b,a)   front diffuse material color
       state.material.specular        (r,g,b,a)   front specular material color
       state.material.emission        (r,g,b,a)   front emissive material color
       state.material.shininess       (s,0,0,1)   front material shininess
       state.material.front.ambient   (r,g,b,a)   front ambient material color
       state.material.front.diffuse   (r,g,b,a)   front diffuse material color
       state.material.front.specular  (r,g,b,a)   front specular material color
       state.material.front.emission  (r,g,b,a)   front emissive material color
       state.material.front.shininess (s,0,0,1)   front material shininess
       state.material.back.ambient    (r,g,b,a)   back ambient material color
       state.material.back.diffuse    (r,g,b,a)   back diffuse material color
       state.material.back.specular   (r,g,b,a)   back specular material color
       state.material.back.emission   (r,g,b,a)   back emissive material color
       state.material.back.shininess  (s,0,0,1)   back material shininess

       Table X.3:  Material Property Bindings.  If a material face is not
       specified in the binding, the front property is used.

     If a program parameter binding matches any of the material properties
     listed in Table X.3, the program parameter variable is filled according to
     the table.  For ambient, diffuse, specular, or emissive colors, the "x",
     "y", "z", and "w" components are filled with the "r", "g", "b", and "a"
     components, respectively, of the corresponding material color.  For
     material shininess, the "x" component is filled with the material's
     specular exponent, and the "y", "z", and "w" components are filled with
     the floating-point constants 0, 0, and 1, respectively.  Bindings
     containing ".back" refer to the back material; all other bindings refer to
     the front material.

     Material properties can be changed inside a Begin/End pair, either
     directly by calling Material, or indirectly through color material.
     However, such property changes are not guaranteed to update program
     parameter bindings until the following End command.  Program parameter
     variables bound to material properties changed inside a Begin/End pair are
     undefined until the following End command.


     Light Property Bindings

       Binding                        Components  Underlying State
       -----------------------------  ----------  ----------------------------
       state.light[n].ambient         (r,g,b,a)   light n ambient color
       state.light[n].diffuse         (r,g,b,a)   light n diffuse color
       state.light[n].specular        (r,g,b,a)   light n specular color
       state.light[n].position        (x,y,z,w)   light n position
       state.light[n].attenuation     (a,b,c,e)   light n attenuation constants
                                                  and spot light exponent
       state.light[n].spot.direction  (x,y,z,c)   light n spot direction and
                                                  cutoff angle cosine
       state.light[n].half            (x,y,z,1)   light n infinite half-angle
       state.lightmodel.ambient       (r,g,b,a)   light model ambient color
       state.lightmodel.scenecolor    (r,g,b,a)   light model front scene color
       state.lightmodel.              (r,g,b,a)   light model front scene color
                front.scenecolor
       state.lightmodel.              (r,g,b,a)   light model back scene color
                back.scenecolor
       state.lightprod[n].ambient     (r,g,b,a)   light n / front material
                                                  ambient color product
       state.lightprod[n].diffuse     (r,g,b,a)   light n / front material
                                                  diffuse color product
       state.lightprod[n].specular    (r,g,b,a)   light n / front material
                                                  specular color product
       state.lightprod[n].            (r,g,b,a)   light n / front material
               front.ambient                      ambient color product
       state.lightprod[n].            (r,g,b,a)   light n / front material
               front.diffuse                      diffuse color product
       state.lightprod[n].            (r,g,b,a)   light n / front material
               front.specular                     specular color product
       state.lightprod[n].            (r,g,b,a)   light n / back material
               back.ambient                       ambient color product
       state.lightprod[n].            (r,g,b,a)   light n / back material
               back.diffuse                       diffuse color product
       state.lightprod[n].            (r,g,b,a)   light n / back material
               back.specular                      specular color product

       Table X.4: Light Property Bindings.  <n> indicates a light number.

     If a program parameter binding matches "state.light[n].ambient",
     "state.light[n].diffuse", or "state.light[n].specular", the "x", "y", "z",
     and "w" components of the program parameter variable are filled with the
     "r", "g", "b", and "a" components, respectively, of the corresponding
     light color.

     If a program parameter binding matches "state.light[n].position", the "x",
     "y", "z", and "w" components of the program parameter variable are filled
     with the "x", "y", "z", and "w" components, respectively, of the light
     position.

     If a program parameter binding matches "state.light[n].attenuation", the
     "x", "y", and "z" components of the program parameter variable are filled
     with the constant, linear, and quadratic attenuation parameters of the
     specified light, respectively (section 2.13.1).  The "w" component of the
     program parameter variable is filled with the spot light exponent of the
     specified light.

     If a program parameter binding matches "state.light[n].spot.direction",
     the "x", "y", and "z" components of the program parameter variable are
     filled with the "x", "y", and "z" components of the spot light direction
     of the specified light, respectively (section 2.13.1).  The "w" component
     of the program parameter variable is filled with the cosine of the spot
     light cutoff angle of the specified light.

     If a program parameter binding matches "state.light[n].half", the "x",
     "y", and "z" components of the program parameter variable are filled with
     the x, y, and z components, respectively, of the normalized infinite
     half-angle vector

       h_inf = || P + (0, 0, 1) ||.

     The "w" component is filled with 1.0.  In the computation of h_inf, P
     consists of the x, y, and z coordinates of the normalized vector from the
     eye position P_e to the eye-space light position P_pli (section 2.13.1).
     h_inf is defined to correspond to the normalized half-angle vector when
     using an infinite light (w coordinate of the position is zero) and an
     infinite viewer (v_bs is FALSE).  For local lights or a local viewer,
     h_inf is well-defined but does not match the normalized half-angle vector,
     which will vary depending on the vertex position.

     If a program parameter binding matches "state.lightmodel.ambient", the
     "x", "y", "z", and "w" components of the program parameter variable are
     filled with the "r", "g", "b", and "a" components of the light model
     ambient color, respectively.

     If a program parameter binding matches "state.lightmodel.scenecolor" or
     "state.lightmodel.front.scenecolor", the "x", "y", and "z" components of
     the program parameter variable are filled with the "r", "g", and "b"
     components respectively of the "front scene color"

       c_scene = a_cs * a_cm + e_cm,

     where a_cs is the light model ambient color, a_cm is the front ambient
     material color, and e_cm is the front emissive material color.  The "w"
     component of the program parameter variable is filled with the alpha
     component of the front diffuse material color.  If a program parameter
     binding matches "state.lightmodel.back.scenecolor", a similar back scene
     color, computed using back-facing material properties, is used.  The front
     and back scene colors match the values that would be assigned to vertices
     using conventional lighting if all lights were disabled.

     If a program parameter binding matches anything beginning with
     "state.lightprod[n]", the "x", "y", and "z" components of the program
     parameter variable are filled with the "r", "g", and "b" components,
     respectively, of the corresponding light product.  The three light product
     components are the products of the corresponding color components of the
     specified material property and the light color of the specified light
     (see Table X.4).  The "w" component of the program parameter variable is
     filled with the alpha component of the specified material property.

     Light products depend on material properties, which can be changed inside
     a Begin/End pair.  Such property changes are not guaranteed to take effect
     until the following End command.  Program parameter variables bound to
     light products whose corresponding material property changes inside a
     Begin/End pair are undefined until the following End command.


     Texture Coordinate Generation Property Bindings

       Binding                    Components  Underlying State
       -------------------------  ----------  ----------------------------
       state.texgen[n].eye.s      (a,b,c,d)   TexGen eye linear plane
                                              coefficients, s coord, unit n
       state.texgen[n].eye.t      (a,b,c,d)   TexGen eye linear plane
                                              coefficients, t coord, unit n
       state.texgen[n].eye.r      (a,b,c,d)   TexGen eye linear plane
                                              coefficients, r coord, unit n
       state.texgen[n].eye.q      (a,b,c,d)   TexGen eye linear plane
                                              coefficients, q coord, unit n
       state.texgen[n].object.s   (a,b,c,d)   TexGen object linear plane
                                              coefficients, s coord, unit n
       state.texgen[n].object.t   (a,b,c,d)   TexGen object linear plane
                                              coefficients, t coord, unit n
       state.texgen[n].object.r   (a,b,c,d)   TexGen object linear plane
                                              coefficients, r coord, unit n
       state.texgen[n].object.q   (a,b,c,d)   TexGen object linear plane
                                              coefficients, q coord, unit n

       Table X.5:  Texture Coordinate Generation Property Bindings.  "[n]" is
       optional -- texture unit <n> is used if specified; texture unit 0 is
       used otherwise.

     If a program parameter binding matches a set of TexGen plane coefficients,
     the "x", "y", "z", and "w" components of the program parameter variable
     are filled with the coefficients p1, p2, p3, and p4, respectively, for
     object linear coefficients, and the coefficents p1', p2', p3', and p4',
     respectively, for eye linear coefficients (section 2.10.4).


     Fog Property Bindings

       Binding                        Components  Underlying State
       -----------------------------  ----------  ----------------------------
       state.fog.color                (r,g,b,a)   RGB fog color (section 3.10)
       state.fog.params               (d,s,e,r)   fog density, linear start
                                                  and end, and 1/(end-start)
                                                  (section 3.10)

       Table X.6:  Fog Property Bindings

     If a program parameter binding matches "state.fog.color", the "x", "y",
     "z", and "w" components of the program parameter variable are filled with
     the "r", "g", "b", and "a" components, respectively, of the fog color
     (section 3.10).

     If a program parameter binding matches "state.fog.params", the "x", "y",
     and "z" components of the program parameter variable are filled with the
     fog density, linear fog start, and linear fog end parameters (section
     3.10), respectively.  The "w" component is filled with 1/(end-start),
     where end and start are the linear fog end and start parameters,
     respectively.


     Clip Plane Property Bindings

       Binding                        Components  Underlying State
       -----------------------------  ----------  ----------------------------
       state.clip[n].plane            (a,b,c,d)   clip plane n coefficients

       Table X.7:  Clip Plane Property Bindings.  <n> specifies the clip plane
       number, and is required.

     If a program parameter binding matches "state.clip[n].plane", the "x",
     "y", "z", and "w" components of the program parameter variable are filled
     with the coefficients p1', p2', p3', and p4', respectively, of clip plane
     <n> (section 2.11).


     Point Property Bindings

       Binding                        Components  Underlying State
       -----------------------------  ----------  ----------------------------
       state.point.size               (s,n,x,f)   point size, min and max size
                                                  clamps, and fade threshold
                                                  (section 3.3)
       state.point.attenuation        (a,b,c,1)   point size attenuation consts

       Table X.8:  Point Property Bindings

     If a program parameter binding matches "state.point.size", the "x", "y",
     "z", and "w" components of the program parameter variable are filled with
     the point size, minimum point size, maximum point size, and fade
     threshold, respectively (section 3.3).

     If a program parameter binding matches "state.point.attenuation", the "x",
     "y", and "z" components of the program parameter variable are filled with
     the constant, linear, and quadratic point size attenuation parameters (a,
     b, and c), respectively (section 3.3).  The "w" component is filled with
     1.0.


     Texture Environment Property Bindings

       Binding                    Components  Underlying State
       -------------------------  ----------  ----------------------------
       state.texenv[n].color      (r,g,b,a)   texture environment n color

       Table X.9:  Texture Environment Property Bindings.  "[n]" is optional --
       texture unit <n> is used if specified; texture unit 0 is used otherwise.

     If a program parameter binding matches "state.texenv[n].color", the "x",
     "y", "z", and "w" components of the program parameter variable are filled
     with the "r", "g", "b", and "a" components, respectively, of the
     corresponding texture environment color.  Note that only "legacy" texture
     units, as queried by MAX_TEXTURE_UNITS, include texture environment state.
     Texture image units and texture coordinate sets do not have associated
     texture environment state.


     Depth Property Bindings

       Binding                      Components  Underlying State
       ---------------------------  ----------  ----------------------------
       state.depth.range            (n,f,d,1)   Depth range near, far, and
                                                (far-near) (section 2.10.1)

       Table X.10:  Depth Property Bindings

     If a program parameter binding matches "state.depth.range", the "x" and
     "y" components of the program parameter variable are filled with the
     mappings of near and far clipping planes to window coordinates,
     respectively.  The "z" component is filled with the difference of the
     mappings of near and far clipping planes, far minus near.  The "w"
     component is filled with 1.0.


     Matrix Property Bindings

       Binding                               Underlying State
       ------------------------------------  ---------------------------
       * state.matrix.modelview[n]           modelview matrix n
         state.matrix.projection             projection matrix
         state.matrix.mvp                    modelview-projection matrix
       * state.matrix.texture[n]             texture matrix n
         state.matrix.program[n]             program matrix n

       Table X.11:  Base Matrix Property Bindings.  The "[n]" syntax indicates
       a specific matrix number.  For modelview and texture matrices, a matrix
       number is optional, and matrix zero will be used if the matrix number is
       omitted.  These base bindings may further be modified by a
       inverse/transpose selector and a row selector.

     If the beginning of a program parameter binding matches any of the matrix
     binding names listed in Table X.11, the binding corresponds to a 4x4
     matrix.  If the parameter binding is followed by ".inverse", ".transpose",
     or ".invtrans" (<stateMatModifier> grammar rule), the inverse, transpose,
     or transpose of the inverse, respectively, of the matrix specified in
     Table X.11 is selected.  Otherwise, the matrix specified in Table X.11 is
     selected.  If the specified matrix is poorly-conditioned (singular or
     nearly so), its inverse matrix is undefined.  The binding name
     "state.matrix.mvp" refers to the product of modelview matrix zero and the
     projection matrix, defined as

        MVP = P * M0,

     where P is the projection matrix and M0 is modelview matrix zero.

     If the selected matrix is followed by ".row[<a>]" (matching the
     <stateMatrixRow> grammar rule), the "x", "y", "z", and "w" components of
     the program parameter variable are filled with the four entries of row <a>
     of the selected matrix.  In the example,

       PARAM m0 = state.matrix.modelview[1].row[0];
       PARAM m1 = state.matrix.projection.transpose.row[3];

     the variable "m0" is set to the first row (row 0) of modelview matrix 1
     and "m1" is set to the last row (row 3) of the transpose of the projection
     matrix.

     For program parameter array bindings, multiple rows of the selected matrix
     can be bound via the <stateMatrixRows> grammar rule.  If the selected
     matrix binding is followed by ".row[<a>..<b>]", the result is equivalent
     to specifying matrix rows <a> through <b>, in order.  A program will fail
     to load if <a> is greater than <b>.  If no row selection is specified
     (<optMatrixRows> matches ""), matrix rows 0 through 3 are bound in order.
     In the example,

       PARAM m2[] = { state.matrix.program[0].row[1..2] };
       PARAM m3[] = { state.matrix.program[0].transpose };

     the array "m2" has two entries, containing rows 1 and 2 of program matrix
     zero, and "m3" has four entries, containing all four rows of the transpose
     of program matrix zero.


     Section 2.X.3.4, Program Temporaries

     Program temporary variables are used to hold temporary results during
     program execution.  Temporaries do not persist between program
     invocations, and are undefined at the beginning of each program
     invocation.

     Temporary variables are declared explicitly using the <TEMP_statement>
     grammar rule.  Each such statement can declare one or more temporaries.
     Temporaries can not be declared implicitly.  Temporaries can be declared
     using any component size ("SHORT" or "LONG") and type ("FLOAT" or "INT")
     modifier.

     Temporary variables may be declared as arrays.  Temporary variables
     declared as arrays may be stored in slower memory than those not declared
     as arrays, and it is recommended to use non-array variables unless array
     functionality is required.


     Section 2.X.3.5, Program Results

     Program result variables represent the per-vertex or per-fragment results
     of the program.  All result variables have associated bindings, are
     write-only during program execution, and are undefined at the beginning of
     each program invocation.  Any vertex or fragment attributes corresponding
     to unwritten result variables will be undefined in subsequent stages of
     the pipeline.  Result variables may be declared explicitly via the
     <OUTPUT_statement> grammar rule, or implicitly by using a result binding
     in an instruction.

     The set of available result bindings depends on the program type, and is
     enumerated in the specifications for each program type.

     Result variables may generally be declared as arrays, but the set of
     bindings allowed for arrays is limited to state grouped in arrays (e.g.,
     texture coordinates, clip distances, colors).  Additionally, all bindings
     assigned to the array must be of the same binding type and must increase
     consecutively.  Examples of valid and invalid binding lists for vertex
     programs include:

       result.clip[1], result.clip[2]          # valid, 2-entry array
       result.texcoord[0..3]                   # valid, 4-entry array
       result.texcoord[1], result.texcoord[3]  # invalid, skipped texcoord 2
       result.texcoord[2], result.texcoord[1]  # invalid, wrong order
       result.texcoord[1], result.clip[2]      # invalid, different types

     Additionally, result bindings may be used in no more than one array
     addressed with relative addressing.

     Implementations may have a limit on the total number of result binding
     components used by each program target (MAX_PROGRAM_RESULT_COMPONENTS_NV).
     Programs that require more result binding components than this limit will
     fail to load.  The method of counting used result binding components is
     implementation-dependent, but must satisfy the following properties:

       * If a result binding is not referenced in a program, or is referenced
         only in declarations of result variables that are not used, none of
         its components are counted.

       * A result binding component may be counted as used only if there exists
         an instruction operand where

           - the component is enabled in the write mask (Section 2.X.4.3), and

           - the result binding is either

               - referenced directly by the operand,

               - bound to a declared variable referenced by the operand, or

               - bound to a declared array variable where another binding in
                 the array satisfies one of the two previous conditions.

         Implementations are not required to optimize out unused elements of an
         result array or components that are used in only some elements of an
         array.  The last of these rules is intended to cover the case where
         the same result binding is used in multiple variables.

         For example, an instruction whose write mask selects only the x
         component may result in the x component of a result binding being
         counted, but may never result in the counting of the y, z, or w
         components of any result binding.


     Section 2.X.3.6, Program Parameter Buffers

     Program parameter buffers are arrays consisting of single-component
     typeless values or four-component typeless vectors stored in a buffer
     object.  The GL provides an implementation-dependent number of buffer
     object binding points for each program target, to which buffer objects can
     be attached.  Program parameter buffer variables can be changed either by
     updating the contents of bound buffer objects, or simply by changing the
     buffer object attached to a binding point.

     Program parameter buffer variables are used as constants during program
     execution.  All program parameter buffer variables have an associated
     binding and are read-only during program execution.  Program parameter
     buffers retain their values across program invocations, although their
     values may change as buffer object bindings or contents change.  Program
     parameter buffer variables must be declared explicitly via the
     <BUFFER_statement> grammar rule.  Program parameter buffer bindings can
     not be used directly in executable instructions.

     Program parameter buffer variables are treated as an array of
     single-component values if the <bufferDeclType> grammar rule matches
     "BUFFER" or as an array of four-component vectors if it matches "BUFFER4".
     A program will fail to load if a variable declared as "BUFFER" and another
     variable declared as "BUFFER4" use the same buffer binding point.

     Program parameter buffer variables may be declared as arrays, but all
     bindings assigned to the array must use the same binding point and must
     increase consecutively.

       Binding                        Components  Underlying State
       -----------------------------  ----------  -----------------------------
       program.buffer[a][b]           (x,x,x,x)   program parameter buffer a,
                                                    element b
       program.buffer[a][b..c]        (x,x,x,x)   program parameter buffer a,
                                                    elements b through c
       program.buffer[a]              (x,x,x,x)   program parameter buffer a,
                                                    all elements

       Table X.12: Program Parameter Buffer Bindings.  <a> indicates a buffer
       number, <b> and <c> indicate individual elements.

     If a program parameter buffer binding matches "program.buffer[a][b]", the
     program parameter variable are filled with element <b> of the buffer
     object bound to binding point <a>.  Each element of the bound buffer
     object is treated a one or four words of data that can hold integer or
     floating-point values.  When a single-component binding is evaluated, the
     selected word is broadcast to all four components of the variable.  When a
     four-component binding is evaluated, the four components of the buffer
     element are loaded into the variable.  If no buffer object is bound to
     binding point <a>, or the bound buffer object is not large enough to hold
     an element <b>, the values used are undefined.  The binding point <a> must
     be a nonnegative integer constant.

     For program parameter buffer array declarations, "program.buffer[a][b..c]"
     is equivalent to specifying elements <b> through <c> of the buffer object
     bound to binding point <a> in order.

     For program parameter buffer array declarations, "program.buffer[a]" is
     equivalent to specifying the entire buffer -- elements 0 through <n>-1,
     where <n> is either the size of the array (if declared) or the
     implementation-dependent maximum parameter buffer object size limit (if no
     size is declared).


     Section 2.X.3.7, Program Condition Code Registers

     The program condition code registers are four-component vectors.  Each
     component of this register is a collection of single-bit flags, including
     a sign flag (SF), a zero flag (ZF), an overflow flag (OF), and a carry
     flag (CF).  There are two condition code registers (CC0 and CC1), whose
     values are undefined at the beginning of program execution.

     Most program instructions can optionally update one of the condition code
     registers, by designating the condition code to update in the instruction.
     When a condition code component is updated, the four flags of each
     component of the condition code are set according to the corresponding
     component of the instruction result.  Full details on the condition code
     updates and tests can be found in Section 2.X.4.3.

     The value of these four flags can be combined in various condition code
     tests, which can be used to mask writes to destination variables and to
     perform conditional branches or other condition operations.


     Section 2.X.3.8, Program Aliases

     Programs can create aliases by matching the <ALIAS_statement> grammar
     rule.  Aliases allow programs to use multiple variable names to refer to a
     single underlying variable.  For example, the statement

       ALIAS var1 = var0

     establishes a variable name of "var1".  Subsequent references to "var1" in
     the program text are treated as references to "var0".  The left hand side
     of an ALIAS statement must be a new variable name, and the right hand side
     must be an established variable name.

     Aliases are not considered variable declarations, so do not count against
     the limits on the number of variable declarations allowed in the program
     text.


     Section 2.X.3.9, Program Resource Limits

     (see ARB_vertex_program specification, incorporates all the different
     limits on instruction counts, temporaries, attribute bindings, program
     parameters, and so on)


     Section 2.X.4, Program Execution Environment

     The set of instructions supported for GPU programs is given in Table X.13
     below and is described in detail in Section 2.X.8.  An instruction can use
     up to three operands when it executes, and most instructions can write a
     single result vector.  Instructions may also specify one or more
     modifiers, according to the <opModifiers> grammar rule.  Instruction
     modifiers affect how the specified operation is performed.

     GPU programs may operate on signed integer, unsigned integer, or
     floating-point values; some instructions are capable of operating on any
     of the three types.  However, the data type of the operands and the result
     are always determined based solely on the instruction and its modifiers.
     If any of the variables used in the instruction are typeless, they will be
     interpreted according to the data type derived from the instruction.  If
     any variables with a conflicting data type are used in the instruction,
     the program will fail to load unless the "NTC" (no type checking)
     instruction modifier is specified.

                   Modifiers
       Instruction F I C S H D  Out Inputs    Description
       ----------- - - - - - -  --- --------  --------------------------------
       ABS         X X X X X F  v   v         absolute value
       ADD         X X X X X F  v   v,v       add
       AND         - X X - - S  v   v,v       bitwise and
       BRK         - - - - - -  -   c         break out of loop instruction
       CAL         - - - - - -  -   c         subroutine call
       CEIL        X X X X X F  v   vf        ceiling
       CMP         X X X X X F  v   v,v,v     compare
       CONT        - - - - - -  -   c         continue with next loop interation
       COS         X - X X X F  s   s         cosine with reduction to [-PI,PI]
       DIV         X X X X X F  v   v,s       divide vector components by scalar
       DP2         X - X X X F  s   v,v       2-component dot product
       DP2A        X - X X X F  s   v,v,v     2-comp. dot product w/scalar add
       DP3         X - X X X F  s   v,v       3-component dot product
       DP4         X - X X X F  s   v,v       4-component dot product
       DPH         X - X X X F  s   v,v       homogeneous dot product
       DST         X - X X X F  v   v,v       distance vector
       ELSE        - - - - - -  -   -         start if test else block
       ENDIF       - - - - - -  -   -         end if test block
       ENDREP      - - - - - -  -   -         end of repeat block
       EX2         X - X X X F  s   s         exponential base 2
       FLR         X X X X X F  v   vf        floor
       FRC         X - X X X F  v   v         fraction
       I2F         - X X - - S  vf  v         integer to float
       IF          - - - - - -  -   c         start of if test block
       KIL         X X - - X F  -   vc        kill fragment
       LG2         X - X X X F  s   s         logarithm base 2
       LIT         X - X X X F  v   v         compute lighting coefficients
       LRP         X - X X X F  v   v,v,v     linear interpolation
       MAD         X X X X X F  v   v,v,v     multiply and add
       MAX         X X X X X F  v   v,v       maximum
       MIN         X X X X X F  v   v,v       minimum
       MOD         - X X - - S  v   v,s       modulus vector components by scalar
       MOV         X X X X X F  v   v         move
       MUL         X X X X X F  v   v,v       multiply
       NOT         - X X - - S  v   v         bitwise not
       NRM         X - X X X F  v   v         normalize 3-component vector
       OR          - X X - - S  v   v,v       bitwise or
       PK2H        X X - - - F  s   vf        pack two 16-bit floats
       PK2US       X X - - - F  s   vf        pack two floats as unsigned 16-bit
       PK4B        X X - - - F  s   vf        pack four floats as signed 8-bit
       PK4UB       X X - - - F  s   vf        pack four floats as unsigned 8-bit
       POW         X - X X X F  s   s,s       exponentiate
       RCC         X - X X X F  s   s         reciprocal (clamped)
       RCP         X - X X X F  s   s         reciprocal
       REP         X X - - X F  -   v         start of repeat block
       RET         - - - - - -  -   c         subroutine return
       RFL         X - X X X F  v   v,v       reflection vector
       ROUND       X X X X X F  v   vf        round to nearest integer
       RSQ         X - X X X F  s   s         reciprocal square root
       SAD         - X X - - S  vu  v,v,vu    sum of absolute differences
       SCS         X - X X X F  v   s         sine/cosine without reduction
       SEQ         X X X X X F  v   v,v       set on equal
       SFL         X X X X X F  v   v,v       set on false
       SGE         X X X X X F  v   v,v       set on greater than or equal
       SGT         X X X X X F  v   v,v       set on greater than
       SHL         - X X - - S  v   v,s       shift left
       SHR         - X X - - S  v   v,s       shift right
       SIN         X - X X X F  s   s         sine with reduction to [-PI,PI]
       SLE         X X X X X F  v   v,v       set on less than or equal
       SLT         X X X X X F  v   v,v       set on less than
       SNE         X X X X X F  v   v,v       set on not equal
       SSG         X - X X X F  v   v         set sign
       STR         X X X X X F  v   v,v       set on true
       SUB         X X X X X F  v   v,v       subtract
       SWZ         X - X X X F  v   v         extended swizzle
       TEX         X X X X - F  v   vf        texture sample
       TRUNC       X X X X X F  v   vf        truncate (round toward zero)
       TXB         X X X X - F  v   vf        texture sample with bias
       TXD         X X X X - F  v   vf,vf,vf  texture sample w/partials
       TXF         X X X X - F  v   vs        texel fetch
       TXL         X X X X - F  v   vf        texture sample w/LOD
       TXP         X X X X - F  v   vf        texture sample w/projection
       TXQ         - - - - - S  vs  vs        texture info query
       UP2H        X X X X - F  vf  s         unpack two 16-bit floats
       UP2US       X X X X - F  vf  s         unpack two unsigned 16-bit ints
       UP4B        X X X X - F  vf  s         unpack four signed 8-bit ints
       UP4UB       X X X X - F  vf  s         unpack four unsigned 8-bit ints
       X2D         X - X X X F  v   v,v,v     2D coordinate transformation
       XOR         - X X - - S  v   v,v       exclusive or
       XPD         X - X X X F  v   v,v       cross product

       Table X.13:  Summary of NV_gpu_program4 instructions.  The "Modifiers"
       columns specify the set of modifiers allowed for the instruction:

         F = floating-point data type modifiers
         I = signed and unsigned integer data type modifiers
         C = condition code update modifiers
         S = clamping (saturation) modifiers
         H = half-precision float data type suffix
         D = default data type modifier (F, U, or S)

       The input and output columns describe the formats of the operands and
       results of the instruction.

         v:  4-component vector (data type is inherited from operation)
         vf: 4-component vector (data type is always floating-point)
         vs: 4-component vector (data type is always signed integer)
         vu: 4-component vector (data type is always unsigned integer)
         s:  scalar (replicated if written to a vector destination;
                     data type is inherited from operation)
         c:  condition code test result (e.g., "EQ", "GT1.x")
         vc: 4-component vector or condition code test


     Section 2.X.4.1, Program Instruction Modifiers

     There are several types of instruction modifiers available.  A data type
     modifier specifies that an instruction should operate on signed integer,
     unsigned integer, or floating-point data, when multiple data types are
     supported.  A clamping modifier applies to instructions with
     floating-point results, and specifies the range to which the results
     should be clamped.  A condition code update modifier specifies that the
     instruction should update one of the condition code variables.  Several
     other special modifiers are also provided.

     Instruction modifiers may be specified as stand-alone modifiers or as
     suffixes concatenated with the opcode name.  A program will fail to load
     if it contains an instruction that

       * specifies more than one modifier of any given type,

       * specifies a clamping modifier on an instruction, unless it produces
         floating-point results, or

       * specifies a modifier that is not supported by the instruction (see
         Table X.13 and the instruction description).

     Stand-alone instruction modifiers are specified according to the
     <opModifiers> grammar rule using a ".<modifier>" syntax.  Multiple
     modifers, separated by periods, may be specified.  The set of supported
     modifiers is described in Table X.14.

       Modifier  Description
       --------  -----------------------------------------------
       F         Floating-point operation
       U         Fixed-point operation, unsigned operands
       S         Fixed-point operation, signed operands
       CC        Update condition code register zero
       CC0       Update condition code register zero
       CC1       Update condition code register one
       SAT       Floating-point results clamped to [0,1]
       SSAT      Floating-point results clamped to [-1,1]
       NTC       Disable type-checking on operands/results
       S24       Signed multiply (24-bit operands)
       U24       Unsigned multiply (24-bit operands)
       HI        Multiplies two 32-bit integer operands, returns
                   the 32 MSBs of the product

       Table X.14, Instruction Modifers.

     "F", "U", and "S" modifiers are data type modifiers and specify that the
     instruction should operate on floating-point, unsigned integer, or
     signed integer values, respectively.  For example, "ADD.F", "ADD.U", and
     "ADD.S" specify component-wise addition of floating-point, unsigned
     integer, or signed integer vectors, respectively.  These modifiers specify
     a data type, but do not specify a precision at which the operation is
     performed.  Floating-point operations will be carried out with an internal
     precision no less than that used to represent the largest operand.
     Fixed-point operations will be carried out using at least as many bits as
     used to represent the largest operand.  Operands represented with fewer
     bits than used to perform the instruction will be promoted to a larger
     data type.  Signed integer operands will be sign-extended, where the most
     significant bits are filled with ones if the operand is negative and zero
     otherwise.  Unsigned integer operands will be zero-extended, where the
     most significant bits are always filled with zeroes.  For some
     instructions, the data type of some operands or the result are fixed; in
     these cases, the data type modifier specifies the data type of the
     remaining values.

     "CC", "CC0", and "CC1" are condition code update modifiers that specify
     that one of the condition code registers should be updated based on the
     result of the instruction, as described in section 2.X.4.3.  "CC" and
     "CC0" specify that the condition code register CC0 be updated; "CC1"
     specifies an update to CC1.  If no condition code update modifier is
     provided, the condition code registers will not be affected.

     "SAT" and "SSAT" are clamping modifiers that specify that the
     floating-point components of the instruction result should be clamped to
     [0,1] or [-1,1], respectively, before updating the condition code and the
     destination variable.  If no clamping suffix is specified, unclamped
     results will be used for condition code updates (if any) and destination
     variable writes.  Clamping modifiers are not supported on instructions
     that do not produce floating-point results.

     "NTC" (no type checking) disables data type checking on the instruction,
     and allows instructions to use operands or result variables whose data
     types are inconsistent with the expected data types of the instruction.

     "S24", "U24", and "HI" are special modifiers that are allowed only for the
     MUL instruction, and are described in detail where MUL is documented.  No
     more than one such modifier may be provided for any instruction.

     If an instruction supports data type modifiers, but none is provided, a
     default data type will be chosen based on the instruction, as specified in
     Table X.13 and the instruction set description (Section 2.X.8).  If
     condition code update or clamping modifiers are not specified, the
     corresponding operation will not be performed.

     Additionally, each instruction name may have one or more suffixes,
     concatenated onto the base instruction name, that operate as instruction
     modifiers.  For conciseness, these suffixes are not spelled out in the
     grammar -- the base opcode name is used as a placeholder for the opcode
     and all of its possible suffixes.  Instruction suffixes are provided
     mainly for compatibility with prior GPU program instruction sets (e.g.,
     NV_vertex_program3, NV_fragment_program2, and predecessors).  The set of
     allowable suffixes, and their equivalent stand-alone modifiers, are listed
     in Table X.15.

       Suffix  Modifier     Description
       ------  ----------   ---------------------------------------------------
       R       F            Floating-point operation, 32-bit precision
       H       F(*)         Floating-point operation, at least 16-bit precision
       C       CC0          Update condition code register zero
       C0      CC0          Update condition code register zero
       C1      CC1          Update condition code register one
       _SAT    SAT          Floating-point results clamped to [0,1]
       _SSAT   SSAT         Floating-point results clamped to [-1,1]

       Table X.15,  Instruction Suffixes.

     The "R" and "H" suffixes specify floating-point operations and are
     equivalent to the "F" data type modifier.  They additionally specify a
     minimum precision for the operations.  Instructions with an "R" precision
     modifier will be carried out at no less than IEEE single-precision
     floating-point (8 bits of exponent, 23 bits of mantissa).  Instructions
     with an "H" precision modifier will be carried out at no less than 16-bit
     floating-point precision (5 bits of exponent, 10 bits of mantissa).

     An instruction may have multiple suffixes, but they must appear in order,
     with data type suffixes first, followed by condition code update suffixes,
     followed by clamping suffixes.  For example, "ADDR" carries out an add at
     32-bit precision.  "ADDH_SAT" carries out an add at 16-bit precision (or
     better) and clamps the results to [0,1].  "ADDRC1_SSAT" carries out an add
     at 32-bit floating-point precision, clamps the results to [-1,1], and
     updates condition code one based on the clamped result.


     Section 2.X.4.2, Program Operands

     Most program instructions operate on one or more scalar or vector
     operands.  Each operand specifies an operand variable, which is either the
     name of a previously declared variable or an implicit variable declaration
     created by using a variable binding in the instruction.  Attribute,
     parameter, or parameter buffer variables can be declared implicitly by
     using a valid binding name in an operand.  Instruction operands are
     specified by the <instOperandV>, <instOperandS>, or <instOperandVNS>
     grammar rules.

     If the operand variable is not an array, its contents are loaded directly.
     If the operand variable is an array, a single element of the array is
     loaded according to the <arrayMem> grammar rule.  The elements of an array
     are numbered from 0 to <n>-1, where <n> is the number of entries in the
     array.  Array members can be accessed using either absolute or relative
     addressing.

     Absolute array addressing is used when the <arrayMemAbs> grammar rule is
     matched; the array member to load is specified by the matching integer.
     Out-of-bounds array absolute accesses are not allowed.  If the specified
     member number is greater than or equal to the size of the array, the
     program will fail to load.

     Relative array addressing is used when the <arrayMemRel> grammar rule is
     matched.  This grammar rule allows the program to specify a scalar integer
     operand and an optional constant offset, according to the <arrayMemReg>
     and <arrayMemOffset> grammar rules.  When performing relative addressing,
     the GL evaluates the specified integer scalar operand (according to the
     rules specified in this section) and adds the constant offset.  The array
     member loaded is given by this sum.  The constant offset is considered
     zero if an offset is omitted.  If the sum is negative or exceeds the size
     of the array, the results of the access are undefined, but may not lead to
     program or GL termination.  The set of constant offsets supported for
     relative addressing is limited to values in the range [0,<n>-1], where <n>
     is the size of the array.  A program will fail to load if it specifies an
     offset outside that range.  If offsets outside that range are required,
     they can be applied by using an integer ADD instruction writing to a
     temporary variable.

     After the operand is loaded, its components can be rearranged according to
     the <swizzleSuffix> grammar rule, or it can be converted to a scalar
     operand according to the <scalarSuffix> grammar rule.

     The <swizzleSuffix> grammar rule rearranges the components of a loaded
     vector to produce another vector.  If the <swizzleSuffix> rule matches the
     <xyzwSwizzle> or <rgbaSwizzle> grammar rule, a pattern of the form ".????"
     is used, where each question mark is replaced with one of "x", "y", "z",
     "w", "r", "g", "b", or a".  For such patterns, the x, y, z, and w
     components of the operand are taken from the vector components named by
     the first, second, third, and fourth character of the pattern,
     respectively.  Swizzle components of "r", "g", "b", and "a" are equivalent
     to "x", "y", "z", and "w", respectively.  For example, if the swizzle
     suffix is ".yzzx" or ".gbbr" and the specified source contains {2,8,9,0},
     the result is the vector {8,9,9,2}.  If the <swizzleSuffix> matches the
     <component> grammar rule, a pattern of the form ".?" is used.  For this
     pattern, all four components of the operand are taken from the single
     component identified by the pattern.  If the swizzle suffix is omitted,
     components are not rearranged and swizzling has no effect, as though
     ".xyzw" were specified.

     The swizzle suffix rules do not allow mixing "x", "y", "z", or "w"
     selectors with "r", "g", "b", or "a" selectors.  A program will fail to
     load if it contains a swizzle suffix with selectors from both of these
     sets.

     The <scalarSuffix> grammar rule converts a vector to a scalar by selecting
     a single component.  The <scalarSuffix> rule is similar to the swizzle
     selector, except that only a single component is selected.  If the scalar
     suffix is ".y" and the specified source contains {2,8,9,0}, the value is
     the scalar value 8.

     Next, a component-wise negate operation is performed on the operand if the
     <operandNeg> grammar rule matches "-".  Negation is not performed if the
     operand has no sign prefix, or is prefixed with "+".  For unsigned integer
     operands, the negate operand performs a two's complement operation.

     Next, a component-wise absolute value operation is performed on the
     operand if the <instOperandAbsV> or <instOperandAbsS> grammar rule is
     matched, by surrounding the operand with two "|" characters.  The result
     is optionally negated if the <operandAbsNeg> grammar rule matches "-".
     For unsigned integer operands, the absolute value operation has no effect.


     Section 2.X.4.3, Program Destination Variable Update

     Most program instructions perform computations that produce a result,
     which will be written to a variable.  Each instruction that computes a
     result specifies a destination variable, which is either the name of a
     previously declared variable or an implicit variable declaration created
     by using a variable binding in the instruction.  Result variables can be
     declared implicitly by using a valid program result binding name in the
     result portion of the instruction.  Instruction results are specified
     according to the <instResult> grammar rule.

     The destination variable may be a single member of an array.  In this
     case, a single array member is specified using the <arrayMem> grammar
     rule, and the array member to update is computed in the exact same manner
     as done for operand loads.  If the array member is computed at run time,
     and is negative or greater than or equal to the size of the array, the
     results of the destination variable update are undefined and could result
     in overwriting other program variables.

     The results of the operation may be obtained at a different precision than
     that used to store the destination variable.  If so, the results are
     converted to match the size of the destination variable.  For
     floating-point values, the results are rounded to the nearest
     floating-point value that can be represented in the destination variable.
     If a result component is larger in magnitude than the largest
     representable floating-point value in the data type of the destination
     variable, an infinity encoding (+/-INF) is used.  Signed or unsigned
     integer values are sign-extended or zero-extended, respectively, if the
     destination variable has more bits than the result, and have their most
     significant bits discarded if the destination variable has fewer bits.

     Writes to individual components of a vector destination variable can be
     controlled at compile time by individual component write masks specified
     in the instruction.  The component write mask is specified by the
     <optWriteMask> grammar rule, and is a string of up to four characters,
     naming the components to enable for writing.  If no write mask is
     specified, all components are enabled for writing.  The characters "x",
     "y", "z", and "w" match the x, y, z, and w components respectively.  For
     example, a write mask mask of ".xzw" indicates that the x, z, and w
     components should be enabled for writing but the y component should not be
     written.  The grammar requires that the destination register mask
     components must be listed in "xyzw" order.  Additionally, write mask
     components of "r", "g", "b", and "a" are equivalent to "x", "y", "z", and
     "w", respectively.  The grammar does not allow mixing "x", "y", "z", or
     "w" components with "r", "g", "b", and "a" ones.

     Writes to individual components of a vector destination variable, or to a
     scalar destination variable, can also be controlled at run time using
     condition code write masks.  The condition code write mask is specified by
     the <ccMask> grammar rule.  If a mask is specified, a condition code
     variable is loaded according to the <ccMaskRule> grammar rule and tested
     as described in Table X.16 to produce a four-component vector of TRUE/FALSE
     values.

          mask rule         test name                condition
          ---------------   ----------------------   -----------------
          EQ,  EQ0,  EQ1    equal                    !SF && ZF
          GE,  GE0,  GE1    greater than or equal    !(SF ^ OF)
          GT,  GT0,  GT1    greater than             (!SF ^ OF) && !ZF
          LE,  LE0,  LE1    less than or equal       SF ^ (ZF || OF)
          LT,  LT0,  LT1    less than                (SF && !ZF) ^ OF
          NE,  NE0,  NE1    not equal                SF || !ZF
          FL,  FL0,  FL1    false                    always false
          TR,  TR0,  TR1    true                     always true

          NAN, NAN0, NAN1   not a number             SF && ZF
          LEG, LEG0, LEG1   less, equal, or greater  !SF || !ZF
                              (anything but a NaN)

          CF,  CF0,  CF1    carry flag               CF
          NCF, NCF0, NCF1   no carry flag            !CF
          OF,  OF0,  OF1    overflow flag            OF
          NOF, NOF0, NOF1   no overflow flag         !OF
          SF,  SF0,  SF1    sign flag                SF
          NSF, NSF0, NSF1   no sign flag             !SF
          AB,  AB0,  AB1    above                    CF && !ZF
          BLE, BLE0, BLE1   below or equal           !CF || ZF

       Table X.16, Condition Code Tests.  The allowed rules are specified in
       the "mask rule" column.  If "0" or "1" is appended to the rule name
       (e.g., "EQ1"), the corresponding condition code register (CC1 in this
       example) is loaded, otherwise CC0 is loaded.  After loading, each
       component is tested, using the expression listed in the "condition"
       column.

     After the condition code tests are performed, the four-component result
     can be swizzled according to the <swizzleSuffix> grammar rule.  Individual
     components of the destination variable are written only if the
     corresponding component of the swizzled condition code test result is
     TRUE.  If both a (compile-time) component write mask and a condition code
     write mask are specified, destination variable components are written only
     if the corresponding component is enabled in both masks.

     A program instruction can also optionally update one of the two condition
     code registers if the "CC", "CC0", or "CC1" instruction modifier are
     specified.  These instruction modifiers update condition code register
     CC0, CC0, or CC1, respectively.  The instructions "ADD.CC" or "ADD.CC0"
     will perform an add and update condition code zero, "ADD.CC1" will add and
     update condition code one, and "ADD" will simply perform the add without a
     condition code update.  The components of the selected condition code
     register are updated if and only if the corresponding component of the
     destination variable are enabled by both write masks.  For the purposes of
     condition code update, a scalar destination variable is treated as a
     vector where the scalar result is written to "x" (if enabled in the write
     mask), and writes to the "y", "z", and "w" components are disabled.

     When condition code components are written, the condition code flags are
     updated based on the corresponding component of the result.  If a
     component of the destination register is not enabled for writes, the
     corresponding condition code component is also unchanged.

     For floating-point results, the sign flag (SF) is set if the result is
     less than zero or is a NaN (not a number) value.  The zero flag (ZF) is
     set if the result is equal to zero or is a NaN.

     For signed and unsigned integer results, the sign flag (SF) is set if the
     most significant bit of the value written to the result variable is set
     and the zero flag (ZF) is set if the result written is zero.  For
     instructions other than those performing an integer add or subtract (ADD,
     MAD, SAD, SUB), the overflow and carry flags (OF and CF) are cleared.

     For integer add or subtract operations, the overflow and carry flags by
     doing both signed and unsigned adds/subtracts as follows:

       The overflow flag (OF) is set by interpreting the two operands as signed
       integers and performing a signed add or subtract.  If the result is
       representable as a signed integer (i.e., doesn't overflow), the overflow
       flag is cleared; otherwise, it is set.

       The carry flag (CF) is set by interpreting the two operands as unsigned
       integers and performing an unsigned add or subtract.  If the result of
       an add is representable as an unsigned integer (i.e., doesn't overflow),
       the carry flag is cleared; otherwise, it is set.  If the result of a
       subtract is greater than or equal to zero, the carry flag is set;
       otherwise, it is cleared.

     For the purposes of condition code setting, negation modifiers turn add
     operations into subtracts and vice versa.  If the operation is equivalent
     to an add with both operands negated (-A-B), the carry and overflow flags
     are both undefined.


     Section 2.X.4.4, Program Texture Access

     Certain program instructions may access texture images, as described in
     section 3.8.  The coordinates, level-of-detail, and partial derivatives
     used for performing the texture lookup are derived from values provided in
     the program as described in the various sub-sections of Section 2.X.8.
     These descriptions use the function

       result_t_vec
         TextureSample(float_vec coord, float lod, float_vec ddx,
                       float_vec ddy, int_vec offset);

     which obtains a filtered texel value <tau> as described in Section 3.8.8
     and returns a 4-component vector (R,G,B,A) according to the format
     conversions specified in Table 3.21.  The result vector is interpreted as
     floating-point, signed integer, or unsigned integer, according to the data
     type modifier of the instruction.  If the internal format of the texture
     does not match the instruction's data type modifer, the results of the
     texture lookup are undefined.

     (Note:  For unextended OpenGL 2.0, all supported texture internal formats
     store integer values but return floating-point results in the range [0,1]
     on a texture lookup.  The ARB_texture_float extension introduces
     floating-point internal format where components are both stored and
     returned as floating-point values.  The EXT_texture_integer extension
     introduces formats that both store and return either signed or unsigned
     integer values.)

     <coord> is a four-component floating-point vector from which the (s,t,r)
     texture coordinates used for the texture access, the layer used for array
     textures, and the reference value used for depth comparisons (section
     3.8.14) are extracted according to Table X.17.  If the texture is a cube
     map, (s,t,r) is projected to one of the six cube faces to produce a new
     (s,t) vector according to Section 3.8.6.  For array textures, the layer
     used is derived by rounding the extracted floating-point component to the
     nearest integer and clamping the result to the range [0,<n>-1], where <n>
     is the number of layers in the texture.

     <lod> specifies the level of detail parameter and replaces the value
     computed in equation 3.18.  <ddx> and <ddy> specify partial derivatives
     (ds/dx, dt/dx, dr/dx, ds/dy, dt/dy, and dr/dy) for the texture
     coordinates, and may be used to derive footprint shapes for anisotropic
     texture filtering.

     <offset> is a constant 3-component signed integer vector specified
     according to the <texOffset> grammar rule, which is added to the computed
     <u>, <v>, and <w> texel locations prior to sampling.  One, two, or three
     components may be specified in the instruction; if fewer than three are
     specified, the remaining offset components are zero.  A limited range of
     offset values are supported; the minimum and maximum <texOffset> values
     are implementation-dependent and given by MIN_PROGRAM_TEXEL_OFFSET_EXT and
     MAX_PROGRAM_TEXEL_OFFSET_EXT, respectively.  A program will fail to load:

       * if the texture target specified in the instruction is 1D, ARRAY1D,
         SHADOW1D, or SHADOWARRAY1D, and the second or third component of the
         offset vector is non-zero,

       * if the texture target specified in the instruction is 2D, RECT,
         ARRAY2D, SHADOW2D, SHADOWRECT, or SHADOWARRAY2D, and the third
         component of the offset vector is non-zero,

       * if the texture target is CUBE or SHADOWCUBE, and any component of the
         offset vector is non-zero -- texel offsets are not supported for cube
         map or buffer textures, or

       * if any component of the offset vector is less than
         MIN_PROGRAM_TEXEL_OFFSET_EXT or greater than
         MAX_PROGRAM_TEXEL_OFFSET_EXT.

     (NOTE:  Texel offsets are a new feature provided by this extension and are
     described in more detail in edits to Section 3.8 below.)

     The texture used by TextureSample() is one of the textures bound to the
     texture image unit whose number is specified in the instruction according
     to the <texImageUnit> grammar rule.  The texture target accessed is
     specified according to the <texTarget> grammar rule and Table X.17.
     Fixed-function texture enables are always ignored when determining the
     texture to access in a program.

                                                      coordinates used
       texTarget          Texture Type               s t r  layer  shadow
       ----------------   ---------------------      -----  -----  ------
       1D                 TEXTURE_1D                 x - -    -      -
       2D                 TEXTURE_2D                 x y -    -      -
       3D                 TEXTURE_3D                 x y z    -      -
       CUBE               TEXTURE_CUBE_MAP           x y z    -      -
       RECT               TEXTURE_RECTANGLE_ARB      x y -    -      -
       ARRAY1D            TEXTURE_1D_ARRAY_EXT       x - -    y      -
       ARRAY2D            TEXTURE_2D_ARRAY_EXT       x y -    z      -
       SHADOW1D           TEXTURE_1D                 x - -    -      z
       SHADOW2D           TEXTURE_2D                 x y -    -      z
       SHADOWRECT         TEXTURE_RECTANGLE_ARB      x y -    -      z
       SHADOWCUBE         TEXTURE_CUBE_MAP           x y z    -      w
       SHADOWARRAY1D      TEXTURE_1D_ARRAY_EXT       x - -    y      z
       SHADOWARRAY2D      TEXTURE_2D_ARRAY_EXT       x y -    z      w
       BUFFER             TEXTURE_BUFFER_EXT           <not supported>

       Table X.17:  Texture types accessed for each of the <texTarget>, and
       coordinate mappings.  The "SHADOW" and "ARRAY" targets are special
       pseudo-targets described below.  The "coordinates used" column indicate
       the input values used for each coordinate of the texture lookup, the
       layer selector for array textures, and the reference value for texture
       comparisons.  Buffer textures are not supported by normal texture lookup
       functions, but are supported by TXF and TXQ, described below.

     Texture targets with "SHADOW" are used to access textures with a
     DEPTH_COMPONENT base internal format using depth comparisons (Section
     3.8.14).  Results of a texture access are undefined:

       * if a "SHADOW" target is used, and the corresponding texture has a base
         internal format other than DEPTH_COMPONENT or a TEXTURE_COMPARE_MODE
         of NONE, or

       * if a non-"SHADOW" target is used, and the corresponding texture has a
         base internal format of DEPTH_COMPONENT and a TEXTURE_COMPARE_MODE
         other than NONE.

     If the texture being accessed is not complete (or cube complete for
     cubemap textures), no texture access is performed and the result is
     undefined.

     A program will fail to load if it attempts to sample from multiple texture
     targets (including the SHADOW pseudo-targets) on the same texture image
     unit.  For example, a program containing any two the following
     instructions will fail to load:

       TEX out, coord, texture[0], 1D;
       TEX out, coord, texture[0], 2D;
       TEX out, coord, texture[0], ARRAY2D;
       TEX out, coord, texture[0], SHADOW2D;
       TEX out, coord, texture[0], 3D;

     Additionally, multiple texture targets for a single texture image unit may
     not be used at the same time by the GL.  The error INVALID_OPERATION is
     generated by Begin, RasterPos, or any command that performs an implicit
     Begin if an enabled program accesses one texture target for a texture unit
     while another enabled program or fixed-function fragment processing
     accesses a different texture target for the same texture image unit.

     Some texture instructions use standard methods to compute partial
     derivatives and/or the level-of-detail used to perform texture accesses.
     For fragment programs, the functions

       float_vec ComputePartialsX(float_vec coord);
       float_vec ComputePartialsY(float_vec coord);

     compute approximate component-wise partial derivatives of the
     floating-point vector <coord> relative to the X and Y coordinates,
     respectively.  For vertex and geometry programs, these functions always
     return (0,0,0,0).  The function

       float ComputeLOD(float_vec ddx, float_vec ddy);

     maps partial derivative vectors <ddx> and <ddy> to ds/dx, dt/dx, dr/dx,
     ds/dy, dt/dy, and dr/dy and computes lambda_base(x,y) according to
     equation 3.18.

     The TXF instruction provides the ability to extract a single texel from a
     specified texture image using the function

       result_t_vec TexelFetch(int_vec coord, int_vec offset);

     The extracted texel is converted to an (R,G,B,A) vector according to Table
     3.21.  The result vector is interpreted as floating-point, signed integer,
     or unsigned integer, according to the data type modifier of the
     instruction.  If the internal format of the texture is not compatible with
     the instruction's data type modifer, the extracted texel value is
     undefined.

     <coord> is a four-component signed integer vector used to identify the
     single texel accessed.  The (i,j,k) coordinates of the texel and the layer
     used for array textures are extracted according to Table X.18.  The level
     of detail accessed is obtained by adding the w component of <coord> to the
     base level (level_base).  <offset> is a constant 3-component signed
     integer vector added to the texel coordinates prior to the texel fetch as
     described above.  In addition to the restrictions described above,
     non-zero offset components are also not supported for BUFFER targets.

     The texture used by TexelFetch() is specified by the image unit and target
     parameters provided in the instruction, as for TextureSample() above.
     Single texel fetches can not perform depth comparisons or access cubemaps.
     If a program contains a TXF instruction specifying one of the "SHADOW" or
     "CUBE" targets, it will fail to load.

                                       coordinates used
       texTarget          supported      i j k  layer  lod
       ----------------   ---------      -----  -----  ---
       1D                    yes         x - -    -     w
       2D                    yes         x y -    -     w
       3D                    yes         x y z    -     w
       CUBE                  no          - - -    -     -
       RECT                  yes         x y -    -     w
       ARRAY1D               yes         x - -    y     w
       ARRAY2D               yes         x y -    z     w
       SHADOW1D              no          - - -    -     -
       SHADOW2D              no          - - -    -     -
       SHADOWRECT            no          - - -    -     -
       SHADOWCUBE            no          - - -    -     -
       SHADOWARRAY1D         no          - - -    -     -
       SHADOWARRAY2D         no          - - -    -     -
       BUFFER                yes         x - -    -     -

       Table X.18, Mappings of texel fetch coordinates to texel location.

     Single-texel fetches do not support LOD clamping or any texture wrap mode,
     and require a mipmapped minification filter to access any level of detail
     other than the base level.  The results of the texel fetch are undefined:

       * if the computed LOD is less than the texture's base level (level_base)
         or greater than the maximum level (level_max),

       * if the computed LOD is not the texture's base level and the texture's
         minification filter is NEAREST or LINEAR,

       * if the layer specified for array textures is negative or greater than
         the number of layers in the array texture,

       * if the texel at (i,j,k) coordinates refer to a border texel outside
         the defined extents of the specified LOD, where

          i < -b_s, j < -b_s, k < -b_s,
          i >= w_s - b_s, j >= h_s - b_s, or k >= d_s - b_s,

         where the size parameters (w_s, h_s, d_s, and b_s) refer to the width,
         height, depth, and border size of the image, as in equations 3.15,
         3.16, and 3.17, or

       * if the texture being accessed is not complete (or cube complete for
         cubemaps).


     Section 2.X.5, Program Flow Control

     In addition to basic arithmetic, logical, and texture instructions, a
     number of flow control instructions are provided, which are described in
     detail in Section 2.X.8.  Programs can contain several types of
     instruction blocks:  IF/ELSE/ENDIF blocks, REP/ENDREP blocks, and
     subroutine blocks.  IF/ELSE/ENDIF blocks are a set of instructions
     beginning with an "IF" instruction, ending with an "ENDIF" instruction,
     and possibly containing an optional "ELSE" instruction.  REP/ENDREP blocks
     are a set of instructions beginning with a "REP" instruction and ending
     with an "ENDREP" instruction.  Subroutine blocks begin with an instruction
     label identifying the name of the subroutine and ending just before the
     next instruction label or the end of the program.  Examples include the
     following:

         MOVC CC, R0;
         IF GT.x;
           MOV R0, R1;     # executes if R0.x > 0
         ELSE;
           MOV R0, R2;     # executes if R0.x <= 0
         ENDIF;

         REP repCount;
         ADD R0, R0, R1;
         ENDREP;

       square:             # subroutine to compute R0^2
         MUL R0, R0, R0;
         RET;
       main:
         MOV R0, 9.0;
         CAL square;       # compute 9.0^2 in R0

     IF/ELSE/ENDIF and REP/ENDREP blocks may be nested inside each other, and
     inside subroutines.  In all cases, each instruction block must be
     terminated with the appropriate instruction (ENDIF for IF, ENDREP for
     REP).  Nested instruction blocks must be wholly contained within a block
     -- if a REP instruction is found between an IF and ELSE instruction, the
     corresponding ENDREP must also be present between the IF and ELSE.
     Subroutines may not be nested inside IF/ELSE/ENDIF or REP/ENDREP blocks,
     or inside other subroutines.  A program will fail to load if any
     instruction block is terminated by an incorrect instruction, is not
     terminated before the block containing it, or contains an instruction
     label.

     IF/ELSE/ENDIF blocks evaluate a condition to determine which instructions
     to execute.  If the condition is true, all instructions between the IF and
     ELSE are executed.  If the condition is false, all instructions between
     the ELSE and ENDIF are executed.  The ELSE instruction is optional.  If
     the ELSE is omitted, all instructions between the IF and ENDIF are
     executed if the condition is true, or skipped if the condition is false.
     A limited amount of nesting is supported -- a program will fail to load if
     an IF instruction is nested inside MAX_PROGRAM_IF_DEPTH_NV or more
     IF/ELSE/ENDIF blocks.

     REP/ENDREP blocks are used to execute a sequence of instructions multiple
     times.  The REP instruction includes an optional scalar operand to specify
     a loop count indicating the number of times the block of instructions
     should be repeated.  If the loop count is omitted, the contents of a
     REP/ENDREP block will be repeated indefinitely until the loop is
     explicitly terminated.  A limited amount of nesting is supported -- a
     program will fail to load if a REP instruction is nested inside
     MAX_PROGRAM_LOOP_DEPTH_NV or more REP/ENDREP blocks.

     Within a REP/ENDREP block, the CONT instruction can be used to terminate
     the current iteration of the loop by effectively jumping to the ENDREP
     instruction.  The BRK instruction can be used to terminate the entire loop
     by effectively jumping to the instruction immediately following the ENDREP
     instruction.  If CONT and BRK instructions are found inside multiply
     nested REP/ENDREP blocks, they apply to the innermost block.  A program
     will fail to load if it includes a CONT or BRK instruction that is not
     contained inside a REP/ENDREP block.

     A REP/ENDREP block without a specified loop count can result in an
     infinite loop.  To prevent obvious infinite loops, a program will fail to
     load if it contains a REP/ENDREP block that contains neither a BRK
     instruction at the current nesting level or a RET instruction at any
     nesting level.

     Subroutines are supported via the CAL and RET instructions.  A subroutine
     block is identified by an instruction, which can be any valid identifier
     according to the <instLabel> grammar rule.  The CAL instruction identifies
     a subroutine name to call according to the <instTarget> grammar rule.
     Instruction labels used in CAL instructions do not need to be defined in
     the program text that precedes the instruction, but a program will fail to
     load if it includes a CAL instruction that references an instruction label
     that is not defined anywhere in the program.  When a CAL instruction is
     executed, it transfers control to the instruction immediately following
     the specified instruction label.  Subsequent instructions in that
     subroutine are executed until a RET instruction is executed, or until
     program execution reaches another instruction label or the end of the
     program text.  After the subroutine finishes, execution continues with the
     instruction immediately following the CAL instruction.  When a RET
     instruction is issued, it will break out of any IF/ELSE/ENDIF or
     REP/ENDREP blocks that contain it.

     Subroutines may call other subroutines before completing, up to an
     implementation-dependent maximum depth of MAX_PROGRAM_CALL_DEPTH_NV calls.
     Subroutines may call any subroutine in the program, including themselves,
     as long as the call depth limit is obeyed.  The results of issuing a CAL
     instruction while MAX_PROGRAM_CALL_DEPTH subroutines have not completed
     has undefined results, including possible program termination.

     Several flow control instructions include condition code tests.  The IF
     instruction requires a condition test to determine what instructions are
     executed.  The CONT, BRK, CAL, and RET instructions have an optional
     condition code test; if the test fails, the instructions are not executed.
     Condition code tests are specified by the <ccTest> grammar rule.  The test
     is evaluated like the condition code write mask (section 2.X.4.3), and
     passes if and only if any of the four components passes.

     If an instruction label named "main" is specified, GPU program execution
     begins with the instruction immediately following that label.  Otherwise,
     it begins with the first instruction of the program.  Instructions are
     executed in sequence until either a RET instruction is issued in the main
     subroutine or the end of the program text is reached.


     Section 2.X.6, Program Options

     Programs may specify a number of options to indicate that one or more
     extended language features are used by the program.  All program options
     used by the program must be declared at the beginning of the program
     string.  Each program option specified in a program string will modify the
     syntactic or semantic rules used to interpet the program and the execution
     environment used to execute the program.  Features in program options
     not declared by the program are ignored, even if the option is otherwise
     supported by the GL.  Each option declaration consists of two tokens: the
     keyword "OPTION" and an identifier.

     The set of available options depends on the program type, and is
     enumerated in the specifications for each program type.  Some program
     types may not provide any options.


     Section 2.X.7, Program Declarations

     Programs may include a number of declaration statements to specify
     characteristics of the program.  Each declaration statement is followed by
     one or more arguments, separated by commas.

     The set of available declarations depends on the program type, and is
     enumerated in the specifications for each program type.  Some program
     types may not provide declarations.


     Section 2.X.8, Program Instruction Set

     The following sections enumerate the set of instructions supported for GPU
     programs.

     Some instructions allow the use of one of the three basic data type
     modifiers (floating point, signed integer, and unsigned integer).  Unless
     otherwise mentioned:

       * the result and all of the operands will be interpreted according to
         the specified data type, and

       * if no data type modifier is specified, the instruction will operate as
         though a floating-point modifier ("F") were specified.

     Some instructions will override one or both of these rules.


     Section 2.X.8.Z, ABS:  Absolute Value

     The ABS instruction performs a component-wise absolute value operation on
     the single operand to yield a result vector.

       tmp = VectorLoad(op0);
       result.x = abs(tmp.x);
       result.y = abs(tmp.y);
       result.z = abs(tmp.z);
       result.w = abs(tmp.w);

     ABS supports all three data type modifiers.  Taking the absolute value of
     an unsigned integer is not a useful operation, but is not illegal.


     Section 2.X.8.Z, ADD:  Add

     The ADD instruction performs a component-wise add of the two operands to
     yield a result vector.

       tmp0 = VectorLoad(op0);
       tmp1 = VectorLoad(op1);
       result.x = tmp0.x + tmp1.x;
       result.y = tmp0.y + tmp1.y;
       result.z = tmp0.z + tmp1.z;
       result.w = tmp0.w + tmp1.w;

     ADD supports all three data type modifiers.


     Section 2.X.8.Z, AND:  Bitwise AND

     The AND instruction performs a bitwise AND operation on the components of
     the two source vectors to yield a result vector.

       tmp0 = VectorLoad(op0);
       tmp1 = VectorLoad(op1);
       result.x = tmp0.x & tmp1.x;
       result.y = tmp0.y & tmp1.y;
       result.z = tmp0.z & tmp1.z;
       result.w = tmp0.w & tmp1.w;

     AND supports only signed and unsigned integer data type modifiers.  If no
     type modifier is specified, both operands and the result are treated as
     signed integers.


     Section 2.X.8.Z, BRK:  Break out of Loop Instruction

     The BRK instruction conditionally transfers control to the instruction
     immediately following the next ENDREP instruction.  A BRK instruction has
     no effect if the condition code test evaluates to FALSE.

     The following pseudocode describes the operation of the instruction:

       if (TestCC(cc.c***) || TestCC(cc.*c**) ||
           TestCC(cc.**c*) || TestCC(cc.***c)) {
         continue execution at instruction following the next ENDREP;
       }


     Section 2.X.8.Z, CAL:  Subroutine Call

     The CAL instruction conditionally transfers control to the instruction
     following the label specified in the instruction.  It also pushes a
     reference to the instruction immediately following the CAL instruction
     onto the call stack, where execution will continue after executing the
     matching RET instruction.  The following pseudocode describes the
     operation of the instruction:

       if (TestCC(cc.c***) || TestCC(cc.*c**) ||
           TestCC(cc.**c*) || TestCC(cc.***c)) {
         if (callStackDepth >= MAX_PROGRAM_CALL_DEPTH_NV) {
           // undefined results
         } else {
           callStack[callStackDepth] = nextInstruction;
           callStackDepth++;
         }
         // continue execution at instruction following <instTarget>
       } else {
         // do nothing
       }

     In the pseudocode, <instTarget> is the label specified in the instruction
     matching the <branchLabel> grammar rule, <callStackDepth> is the current
     depth of the call stack, <callStack> is an array holding the call stack,
     and <nextInstruction> is a reference to the instruction immediately
     following the CAL instruction in the program string.

     If the call stack overflows, the results of the CAL instruction are
     undefined, and can result in immediate program termination.

     An instruction label signifies the beginning of a new subroutine.
     Subroutines may not nest or overlap.  If a CAL instruction is executed and
     subsequent program execution reaches an instruction label before a
     corresponding RET instruction is executed, the subroutine call returns
     immediately, as though an unconditional RET instruction were inserted
     immediately before the instruction label.

     (Note:  On previous vertex program extensions -- NV_vertex_program2 and
     NV_vertex_program3 -- instruction labels were also used as targets for
     branch (BRA) instructions.  This unstructured branching functionality has
     been replaced with the structured branching constructs found in this
     instruction set.)


     Section 2.X.8.Z, CEIL:  Ceiling

     The CEIL instruction loads a single vector operand and performs a
     component-wise ceiling operation to generate a result vector.

       tmp = VectorLoad(op0);
       iresult.x = ceil(tmp.x);
       iresult.y = ceil(tmp.y);
       iresult.z = ceil(tmp.z);
       iresult.w = ceil(tmp.w);

     The ceiling operation returns the nearest integer greater than or equal to
     the operand.  For example ceil(-1.7) = -1.0, ceil(+1.0) = +1.0, and
     ceil(+3.7) = +4.0.

     CEIL supports all three data type modifiers.  The single operand is always
     treated as a floating-point vector, but the result is written as a
     floating-point value, a signed integer, or an unsigned integer, as
     specified by the data type modifier.  If a value is not exactly
     representable using the data type of the result (e.g., an overflow or
     writing a negative value to an unsigned integer), the result is undefined.


     Section 2.X.8.Z, CMP:  Compare

     The CMP instructions performs a component-wise comparison of the first
     operand against zero, and copies the values of the second or third
     operands based on the results of the compare.

       tmp0 = VectorLoad(op0);
       tmp1 = VectorLoad(op1);
       tmp2 = VectorLoad(op2);
       result.x = (tmp0.x < 0) ? tmp1.x : tmp2.x;
       result.y = (tmp0.y < 0) ? tmp1.y : tmp2.y;
       result.z = (tmp0.z < 0) ? tmp1.z : tmp2.z;
       result.w = (tmp0.w < 0) ? tmp1.w : tmp2.w;

     CMP supports all three data type modifiers.  CMP with an unsigned data
     type modifier is not a useful operation, but is not illegal.


     Section 2.X.8.Z, CONT:  Continue with Next Loop Iteration

     The CONT instruction conditionally transfers control to the next ENDREP
     instruction.  A CONT instruction has no effect if the condition code test
     evaluates to FALSE.

     The following pseudocode describes the operation of the instruction:

       if (TestCC(cc.c***) || TestCC(cc.*c**) ||
           TestCC(cc.**c*) || TestCC(cc.***c)) {
         continue execution at the next ENDREP;
       }


     Section 2.X.8.Z, COS:  Cosine with Reduction to [-PI,PI]

     The COS instruction approximates the trigonometric cosine of the angle
     specified by the scalar operand and replicates it to all four components
     of the result vector.  The angle is specified in radians and does not have
     to be in the range [-PI,PI].

       tmp = ScalarLoad(op0);
       result.x = ApproxCosine(tmp);
       result.y = ApproxCosine(tmp);
       result.z = ApproxCosine(tmp);
       result.w = ApproxCosine(tmp);

     COS supports only floating-point data type modifiers.


     Section 2.X.8.Z, DDX:  Partial Derivative Relative to X

     The DDX instruction computes approximate partial derivatives of a vector
     operand with respect to the X window coordinate, and is only available to
     fragment programs.  See the NV_fragment_program4 specification for more
     details.


     Section 2.X.8.Z, DDY:  Partial Derivative Relative to Y

     The DDY instruction computes approximate partial derivatives of a vector
     operand with respect to the Y window coordinate, and is only available to
     fragment programs.  See the NV_fragment_program4 specification for more
     details.


     Section 2.X.8.Z, DIV:  Divide Vector Components by Scalar

     The DIV instruction performs a component-wise divide of the first vector
     operand by the second scalar operand to produce a 4-component result
     vector.

       tmp0 = VectorLoad(op0);
       tmp1 = ScalarLoad(op1);
       result.x = tmp0.x / tmp1;
       result.y = tmp0.y / tmp1;
       result.z = tmp0.z / tmp1;
       result.w = tmp0.w / tmp1;

     DIV supports all three data type modifiers.  For floating-point division,
     this instruction is not guaranteed to produce results identical to a
     RCP/MUL instruction sequence.

     The results of an signed or unsigned integer division by zero are
     undefined.


     Section 2.X.8.Z, DP2:  2-Component Dot Product

     The DP2 instruction computes a two-component dot product of the two
     operands (using the first two components) and replicates the dot product
     to all four components of the result vector.

       tmp0 = VectorLoad(op0);
       tmp1 = VectorLoad(op1);
       dot = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y);
       result.x = dot;
       result.y = dot;
       result.z = dot;
       result.w = dot;

     DP2 supports only floating-point data type modifiers.


     Section 2.X.8.Z, DP2A:  2-Component Dot Product with Scalar Add

     The DP2 instruction computes a two-component dot product of the two
     operands (using the first two components), adds the x component of the
     third operand, and replicates the result to all four components of the
     result vector.

       tmp0 = VectorLoad(op0);
       tmp1 = VectorLoad(op1);
       tmp2 = VectorLoad(op2);
       dot = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + tmp2.x;
       result.x = dot;
       result.y = dot;
       result.z = dot;
       result.w = dot;

     DP2A supports only floating-point data type modifiers.


     Section 2.X.8.Z, DP3:  3-Component Dot Product

     The DP3 instruction computes a three-component dot product of the two
     operands (using the x, y, and z components) and replicates the dot product
     to all four components of the result vector.

       tmp0 = VectorLoad(op0);
       tmp1 = VectorLoad(op1);
       dot = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +
             (tmp0.z * tmp1.z);
       result.x = dot;
       result.y = dot;
       result.z = dot;
       result.w = dot;

     DP3 supports only floating-point data type modifiers.


     Section 2.X.8.Z, DP4:  4-Component Dot Product

     The DP4 instruction computes a four-component dot product of the two
     operands and replicates the dot product to all four components of the
     result vector.

       tmp0 = VectorLoad(op0);
       tmp1 = VectorLoad(op1):
       dot = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +
             (tmp0.z * tmp1.z) + (tmp0.w * tmp1.w);
       result.x = dot;
       result.y = dot;
       result.z = dot;
       result.w = dot;

     DP4 supports only floating-point data type modifiers.


     Section 2.X.8.Z, DPH:  Homogeneous Dot Product

     The DPH instruction computes a three-component dot product of the two
     operands (using the x, y, and z components), adds the w component of the
     second operand, and replicates the sum to all four components of the
     result vector.  This is equivalent to a four-component dot product where
     the w component of the first operand is forced to 1.0.

       tmp0 = VectorLoad(op0);
       tmp1 = VectorLoad(op1):
       dot = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +
             (tmp0.z * tmp1.z) + tmp1.w;
       result.x = dot;
       result.y = dot;
       result.z = dot;
       result.w = dot;

     DPH supports only floating-point data type modifiers.


     Section 2.X.8.Z, DST:  Distance Vector

     The DST instruction computes a distance vector from two specially-
     formatted operands.  The first operand should be of the form [NA, d^2,
     d^2, NA] and the second operand should be of the form [NA, 1/d, NA, 1/d],
     where NA values are not relevant to the calculation and d is a vector
     length.  If both vectors satisfy these conditions, the result vector will
     be of the form [1.0, d, d^2, 1/d].

     The exact behavior is specified in the following pseudo-code:

       tmp0 = VectorLoad(op0);
       tmp1 = VectorLoad(op1);
       result.x = 1.0;
       result.y = tmp0.y * tmp1.y;
       result.z = tmp0.z;
       result.w = tmp1.w;

     Given an arbitrary vector, d^2 can be obtained using the DP3 instruction
     (using the same vector for both operands) and 1/d can be obtained from d^2
     using the RSQ instruction.

     This distance vector is useful for per-vertex light attenuation
     calculations:  a DP3 operation using the distance vector and an
     attenuation constants vector as operands will yield the attenuation
     factor.

     DST supports only floating-point data type modifiers.


     Section 2.X.8.Z, ELSE:  Start of If Test Else Block

     The ELSE instruction signifies the end of the "execute if true" portion of
     an IF/ELSE/ENDIF block and the beginning of the "execute if false"
     portion.

     If the condition evaluated at the IF statement was TRUE, when a program
     reaches the ELSE statement, it has completed the entire "execute if true"
     portion of the IF/ELSE/ENDIF block.  Execution will continue at the
     corresponding ENDIF instruction.

     If the condition evaluated at the IF statement was FALSE, program
     execution would skip over the entire "execute if true" portion of the
     IF/ELSE/ENDIF block, including the ELSE instruction.


     Section 2.X.8.Z, EMIT:  Emit Vertex

     The EMIT instruction emits a new vertex to be added to the current output
     primitive generated by a geometry program, and is only available to
     geometry programs.  See the NV_geometry_program4 specification for more
     details.


     Section 2.X.8.Z, ENDIF:  End of If Test Block

     The ENDIF instruction signifies the end of an IF/ELSE/ENDIF block.  It has
     no other effect on program execution.


     Section 2.X.8,Z, ENDPRIM:  End of Primitive

     A geometry program can emit multiple primitives in a single invocation.
     The ENDPRIM instruction is used in a geometry program to signify the end
     of the current primitive and the beginning of a new primitive of the same
     type.  It is only available to geometry programs.  See the
     NV_geometry_program4 specification for more details.


     Section 2.X.8.Z, ENDREP:  End of Repeat Block

     The ENDREP instruction specifies the end of a REP block.

     When used with in conjunction with a REP instruction with a loop count,
     ENDREP decrements the loop counter.  If the decremented loop counter is
     greater than zero, ENDREP transfers control to the instruction immediately
     after the corresponding REP instruction.  If the loop counter is less than
     or equal to zero, execution continues at the instruction following the
     ENDREP instruction.  When used in conjunction with a REP instruction
     without loop count, ENDREP always transfers control to the instruction
     immediately after the REP instruction.

       if (REP instruction includes a loop count) {
         LoopCount--;
         if (LoopCount > 0) {
           continue execution at instruction following corresponding REP
             instruction;
         }
       } else {
         continue execution at instruction following corresponding REP
           instruction;
       }


     Section 2.X.8.Z, EX2:  Exponential Base 2

     The EX2 instruction approximates 2 raised to the power of the scalar
     operand and replicates the approximation to all four components of the
     result vector.

       tmp = ScalarLoad(op0);
       result.x = Approx2ToX(tmp);
       result.y = Approx2ToX(tmp);
       result.z = Approx2ToX(tmp);
       result.w = Approx2ToX(tmp);

     EX2 supports only floating-point data type modifiers.


     Section 2.X.8.Z, FLR:  Floor

     The FLR instruction loads a single vector operand and performs a
     component-wise floor operation to generate a result vector.

       tmp = VectorLoad(op0);
       result.x = floor(tmp.x);
       result.y = floor(tmp.y);
       result.z = floor(tmp.z);
       result.w = floor(tmp.w);

     The floor operation returns the nearest integer less than or equal to the
     operand.  For example floor(-1.7) = -2.0, floor(+1.0) = +1.0, and floor(+3.7)
     = +3.0.

     FLR supports all three data type modifiers.  The single operand is always
     treated as a floating-point value, but the result is written as a
     floating-point value, a signed integer, or an unsigned integer, as
     specified by the data type modifier.  If a value is not exactly
     representable using the data type of the result (e.g., an overflow or
     writing a negative value to an unsigned integer), the result is undefined.


     Section 2.X.8.Z, FRC:  Fraction

     The FRC instruction extracts the fractional portion of each component of
     the operand to generate a result vector.  The fractional portion of a
     component is defined as the result after subtracting off the floor of the
     component (see FLR), and is always in the range [0.0, 1.0).

     For negative values, the fractional portion is NOT the number written to
     the right of the decimal point -- the fractional portion of -1.7 is not
     0.7 -- it is 0.3.  0.3 is produced by subtracting the floor of -1.7 (-2.0)
     from -1.7.

       tmp = VectorLoad(op0);
       result.x = fraction(tmp.x);
       result.y = fraction(tmp.y);
       result.z = fraction(tmp.z);
       result.w = fraction(tmp.w);

     FRC supports only floating-point data type modifiers.


     Section 2.X.8.Z, I2F:  Integer to Float

     The I2F instruction converts the components of an integer vector operand
     to floating-point to produce a floating-point result vector.

       tmp = VectorLoad(op0);
       result.x = (float) tmp.x;
       result.y = (float) tmp.y;
       result.z = (float) tmp.z;
       result.w = (float) tmp.w;

     I2F supports only signed and unsigned integer data type modifiers.  The
     single operand is interpreted according to the data type modifier.  If no
     data type modifier is specified, the operand is treated as a signed
     integer vector.  The result is always written as a float.


     Section 2.X.8.Z, IF:  Start of If Test Block

     The IF instruction performs a condition code test to determine what
     instructions inside an IF/ELSE/ENDIF block are executed.  If the test
     passes, execution continues at the instruction immediately following the
     IF instruction.  If the test fails, IF transfers control to the
     instruction immediately following the corresponding ELSE instruction (if
     present) or the ENDIF instruction (if no ELSE is present).

     Implementations may have a limited ability to nest IF blocks in any
     subroutine.  If the number of IF/ENDIF blocks nested inside each other is
     MAX_PROGRAM_IF_DEPTH_NV or higher, a program will fail to compile.

       // Evaluate the condition.  If the condition is true, continue at the
       // next instruction.  Otherwise, continue at the
       if (TestCC(cc.c***) || TestCC(cc.*c**) ||
           TestCC(cc.**c*) || TestCC(cc.***c)) {
         continue execution at the next instruction;
       } else if (IF block contains an ELSE statement) {
         continue execution at instruction following corresponding ELSE;
       } else {
         continue execution at instruction following corresponding ENDIF;
       }

     (Note:  Unlike the NV_fragment_program2 extension, there is no run-time
     limit on the maximum overall depth of IF/ENDIF nesting.  As long as each
     individual subroutine of the program obeys the static nesting limits,
     there will be no run-time errors in the program.  With the
     NV_fragment_program2 extension, a program could terminate abnormally if it
     called a subroutine inside a very deeply nested set of IF/ENDIF blocks and
     the called subroutine also contained deeply nested IF/ENDIF blocks.  SUch
     an error could occur even if neither subroutine exceeded static limits.)


     Section 2.X.8.Z, KIL:  Kill Fragment

     The KIL instruction conditionally kills a fragment, and is only available
     to fragment programs.  See the NV_fragment_program4 specification for more
     details.


     Section 2.X.8.Z, LG2:  Logarithm Base 2

     The LG2 instruction approximates the base 2 logarithm of the scalar
     operand and replicates it to all four components of the result vector.

       tmp = ScalarLoad(op0);
       result.x = ApproxLog2(tmp);
       result.y = ApproxLog2(tmp);
       result.z = ApproxLog2(tmp);
       result.w = ApproxLog2(tmp);

     If the scalar operand is zero or negative, the result is undefined.

     LG2 supports only floating-point data type modifiers.


     Section 2.X.8.Z, LIT:  Compute Lighting Coefficients

     The LIT instruction accelerates lighting computations by computing
     lighting coefficients for ambient, diffuse, and specular light
     contributions.  The "x" component of the single operand is assumed to hold
     a diffuse dot product (n dot VP_pli, as in the vertex lighting equations
     in Section 2.13.1).  The "y" component of the operand is assumed to hold a
     specular dot product (n dot h_i).  The "w" component of the operand is
     assumed to hold the specular exponent of the material (s_rm), and is
     clamped to the range (-128, +128) exclusive.

     The "x" component of the result vector receives the value that should be
     multiplied by the ambient light/material product (always 1.0).  The "y"
     component of the result vector receives the value that should be
     multiplied by the diffuse light/material product (n dot VP_pli).  The "z"
     component of the result vector receives the value that should be
     multiplied by the specular light/material product (f_i * (n dot h_i) ^
     s_rm).  The "w" component of the result is the constant 1.0.

     Negative diffuse and specular dot products are clamped to 0.0, as is done
     in the standard per-vertex lighting operations.  In addition, if the
     diffuse dot product is zero or negative, the specular coefficient is
     forced to zero.

       tmp = VectorLoad(op0);
       if (tmp.x < 0) tmp.x = 0;
       if (tmp.y < 0) tmp.y = 0;
       if (tmp.w < -(128.0-epsilon)) tmp.w = -(128.0-epsilon);
       else if (tmp.w > 128-epsilon) tmp.w = 128-epsilon;
       result.x = 1.0;
       result.y = tmp.x;
       result.z = (tmp.x > 0) ? RoughApproxPower(tmp.y, tmp.w) : 0.0;
       result.w = 1.0;

     Since 0^0 is defined to be 1, RoughApproxPower(0.0, 0.0) will produce 1.0.

     LIT supports only floating-point data type modifiers.


     Section 2.X.8.Z, LRP:  Linear Interpolation

     The LRP instruction performs a component-wise linear interpolation between
     the second and third operands using the first operand as the blend factor.

       tmp0 = VectorLoad(op0);
       tmp1 = VectorLoad(op1);
       tmp2 = VectorLoad(op2);
       result.x = tmp0.x * tmp1.x + (1 - tmp0.x) * tmp2.x;
       result.y = tmp0.y * tmp1.y + (1 - tmp0.y) * tmp2.y;
       result.z = tmp0.z * tmp1.z + (1 - tmp0.z) * tmp2.z;
       result.w = tmp0.w * tmp1.w + (1 - tmp0.w) * tmp2.w;

     LRP supports only floating-point data type modifiers.


     Section 2.X.8.Z, MAD:  Multiply and Add

     The MAD instruction performs a component-wise multiply of the first two
     operands, and then does a component-wise add of the product to the third
     operand to yield a result vector.

       tmp0 = VectorLoad(op0);
       tmp1 = VectorLoad(op1);
       tmp2 = VectorLoad(op2);
       result.x = tmp0.x * tmp1.x + tmp2.x;
       result.y = tmp0.y * tmp1.y + tmp2.y;
       result.z = tmp0.z * tmp1.z + tmp2.z;
       result.w = tmp0.w * tmp1.w + tmp2.w;

     The multiplication and addition operations in this instruction are subject
     to the same rules as described for the MUL and ADD instructions.

     MAD supports all three data type modifiers.


     Section 2.X.8.Z, MAX:  Maximum

     The MAX instruction computes component-wise maximums of the values in the
     two operands to yield a result vector.

       tmp0 = VectorLoad(op0);
       tmp1 = VectorLoad(op1);
       result.x = (tmp0.x > tmp1.x) ? tmp0.x : tmp1.x;
       result.y = (tmp0.y > tmp1.y) ? tmp0.y : tmp1.y;
       result.z = (tmp0.z > tmp1.z) ? tmp0.z : tmp1.z;
       result.w = (tmp0.w > tmp1.w) ? tmp0.w : tmp1.w;

     MAX supports all three data type modifiers.


     Section 2.X.8.Z, MIN:  Minimum

     The MIN instruction computes component-wise minimums of the values in the
     two operands to yield a result vector.

       tmp0 = VectorLoad(op0);
       tmp1 = VectorLoad(op1);
       result.x = (tmp0.x > tmp1.x) ? tmp1.x : tmp0.x;
       result.y = (tmp0.y > tmp1.y) ? tmp1.y : tmp0.y;
       result.z = (tmp0.z > tmp1.z) ? tmp1.z : tmp0.z;
       result.w = (tmp0.w > tmp1.w) ? tmp1.w : tmp0.w;

     MIN supports all three data type modifiers.


     Section 2.X.8.Z, MOD:  Modulus

     The MOD instruction performs a component-wise modulus operation on the first
     vector operand by the second scalar operand to produce a 4-component result
     vector.

       tmp0 = VectorLoad(op0);
       tmp1 = ScalarLoad(op1);
       result.x = tmp0.x % tmp1;
       result.y = tmp0.y % tmp1;
       result.z = tmp0.z % tmp1;
       result.w = tmp0.w % tmp1;

     MOD supports both signed and unsigned integer data type modifiers.  If no
     data type modifier is specified, both operands and the result are treated
     as signed integers.

     A result component is undefined if the corresponding component of the
     first operand is negative or if the second operand is less than or equal
     to zero.


     Section 2.X.8.Z, MOV:  Move

     The MOV instruction copies the value of the operand to yield a result
     vector.

       result = VectorLoad(op0);

     MOV supports all three data type modifiers.


     Section 2.X.8.Z, MUL:  Multiply

     The MUL instruction performs a component-wise multiply of the two operands
     to yield a result vector.

       tmp0 = VectorLoad(op0);
       tmp1 = VectorLoad(op1);
       result.x = tmp0.x * tmp1.x;
       result.y = tmp0.y * tmp1.y;
       result.z = tmp0.z * tmp1.z;
       result.w = tmp0.w * tmp1.w;

     MUL supports all three data type modifiers.  The MUL instruction
     additionally supports three special modifiers.

     The "S24" and "U24" modifiers specify "fast" signed or unsigned integer
     multiplies of 24-bit quantities, respectively.  The results of such
     multiplies are undefined if either operand is outside the range
     [-2^23,+2^23-1] for S24 or [0,2^24-1] for U24.  If "S24" or "U24" is
     specified, the data type is implied and normal data type modifiers may not
     be provided.

     The "HI" modifier specifies a 32-bit integer multiply that returns the 32
     most significant bits of the 64-bit product.  Integer multiplies without
     the "HI" modifier normally return the least significant bits of the
     product.  If "HI" is specified, either of the "S" or "U" integer data type
     modifiers must also be specified.

     Note that if condition code updates are performed on integer multiplies,
     the overflow or carry flags are always cleared, even if the product
     overflowed.  If it is necessary to determine if the results of an integer
     multiply overflowed, the MUL.HI instruction may be used.


     Section 2.X.8.Z, NOT:  Bitwise Not

     The NOT instruction performs a component-wise bitwise NOT operation on the
     source vector to produce a result vector.

       tmp = VectorLoad(op0);
       tmp.x = ~tmp.x;
       tmp.y = ~tmp.y;
       tmp.z = ~tmp.z;
       tmp.w = ~tmp.w;

     NOT supports only integer data type modifiers.  If no type modifier is
     specified, the operand and the result are treated as signed integers.


     Section 2.X.8.Z, NRM:  Normalize 3-Component Vector

     The NRM instruction normalizes the vector given by the x, y, and z
     components of the vector operand to produce the x, y, and z components of
     the result vector.  The w component of the result is undefined.

       tmp = VectorLoad(op0);
       scale = ApproxRSQ(tmp.x * tmp.x + tmp.y * tmp.y + tmp.z * tmp.z);
       result.x = tmp.x * scale;
       result.y = tmp.y * scale;
       result.z = tmp.z * scale;
       result.w = undefined;

     NRM supports only floating-point data type modifiers.


     Section 2.X.8.Z, OR:  Bitwise Or

     The OR instruction performs a bitwise OR operation on the components of
     the two source vectors to yield a result vector.

       tmp0 = VectorLoad(op0);
       tmp1 = VectorLoad(op1);
       result.x = tmp0.x | tmp1.x;
       result.y = tmp0.y | tmp1.y;
       result.z = tmp0.z | tmp1.z;
       result.w = tmp0.w | tmp1.w;

     OR supports only integer data type modifiers.  If no type modifier is
     specified, both operands and the result are treated as signed integers.


     Section 2.X.8.Z, PK2H:  Pack Two 16-bit Floats

     The PK2H instruction converts the "x" and "y" components of the single
     floating-point vector operand into 16-bit floating-point format, packs the
     bit representation of these two floats into a 32-bit unsigned integer, and
     replicates that value to all four components of the result vector.  The
     PK2H instruction can be reversed by the UP2H instruction below.

       tmp0 = VectorLoad(op0);
       /* result obtained by combining raw bits of tmp0.x, tmp0.y */
       result.x = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16);
       result.y = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16);
       result.z = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16);
       result.w = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16);

     PK2H supports all three data type modifiers.  The single operand is always
     treated as a floating-point value, but the result is written as a
     floating-point value, a signed integer, or an unsigned integer, as
     specified by the data type modifier.  For integer results, the bits can be
     interpreted as described above.  For floating-point result variables, the
     packed results do not constitute a meaningful floating-point variable and
     should only be used to feed future unpack instructions.

     A program will fail to load if it contains a PK2H instruction that writes
     its results to a variable declared as "SHORT".


     Section 2.X.8.Z, PK2US:  Pack Two Floats as Unsigned 16-bit

     The PK2US instruction converts the "x" and "y" components of the single
     floating-point vector operand into a packed pair of 16-bit unsigned
     scalars.  The scalars are represented in a bit pattern where all '0' bits
     corresponds to 0.0 and all '1' bits corresponds to 1.0.  The bit
     representations of the two converted components are packed into a 32-bit
     unsigned integer, and that value is replicated to all four components of
     the result vector.  The PK2US instruction can be reversed by the UP2US
     instruction below.

       tmp0 = VectorLoad(op0);
       if (tmp0.x < 0.0) tmp0.x = 0.0;
       if (tmp0.x > 1.0) tmp0.x = 1.0;
       if (tmp0.y < 0.0) tmp0.y = 0.0;
       if (tmp0.y > 1.0) tmp0.y = 1.0;
       us.x = round(65535.0 * tmp0.x);  /* us is a ushort vector */
       us.y = round(65535.0 * tmp0.y);
       /* result obtained by combining raw bits of us. */
       result.x = ((us.x) | (us.y << 16));
       result.y = ((us.x) | (us.y << 16));
       result.z = ((us.x) | (us.y << 16));
       result.w = ((us.x) | (us.y << 16));

     PK2US supports all three data type modifiers.  The single operand is
     always treated as a floating-point value, but the result is written as a
     floating-point value, a signed integer, or an unsigned integer, as
     specified by the data type modifier.  For integer result variables, the
     bits can be interpreted as described above.  For floating-point result
     variables, the packed results do not constitute a meaningful
     floating-point variable and should only be used to feed future unpack
     instructions.

     A program will fail to load if it contains a PK2US instruction that writes
     its results to a variable declared as "SHORT".


     Section 2.X.8.Z, PK4B:  Pack Four Floats as Signed 8-bit

     The PK4B instruction converts the four components of the single
     floating-point vector operand into 8-bit signed quantities.  The signed
     quantities are represented in a bit pattern where all '0' bits corresponds
     to -128/127 and all '1' bits corresponds to +127/127.  The bit
     representations of the four converted components are packed into a 32-bit
     unsigned integer, and that value is replicated to all four components of
     the result vector.  The PK4B instruction can be reversed by the UP4B
     instruction below.

       tmp0 = VectorLoad(op0);
       if (tmp0.x < -128/127) tmp0.x = -128/127;
       if (tmp0.y < -128/127) tmp0.y = -128/127;
       if (tmp0.z < -128/127) tmp0.z = -128/127;
       if (tmp0.w < -128/127) tmp0.w = -128/127;
       if (tmp0.x > +127/127) tmp0.x = +127/127;
       if (tmp0.y > +127/127) tmp0.y = +127/127;
       if (tmp0.z > +127/127) tmp0.z = +127/127;
       if (tmp0.w > +127/127) tmp0.w = +127/127;
       ub.x = round(127.0 * tmp0.x + 128.0);  /* ub is a ubyte vector */
       ub.y = round(127.0 * tmp0.y + 128.0);
       ub.z = round(127.0 * tmp0.z + 128.0);
       ub.w = round(127.0 * tmp0.w + 128.0);
       /* result obtained by combining raw bits of ub. */
       result.x = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
       result.y = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
       result.z = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
       result.w = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));

     PK4B supports all three data type modifiers.  The single operand is always
     treated as a floating-point value, but the result is written as a
     floating-point value, a signed integer, or an unsigned integer, as
     specified by the data type modifier.  For integer result variables, the
     bits can be interpreted as described above.  For floating-point result
     variables, the packed results do not constitute a meaningful
     floating-point variable and should only be used to feed future unpack
     instructions.  A program will fail to load if it contains a PK4B
     instruction that writes its results to a variable declared as "SHORT".


     Section 2.X.8.Z, PK4UB:  Pack Four Floats as Unsigned 8-bit

     The PK4UB instruction converts the four components of the single
     floating-point vector operand into a packed grouping of 8-bit unsigned
     scalars.  The scalars are represented in a bit pattern where all '0' bits
     corresponds to 0.0 and all '1' bits corresponds to 1.0.  The bit
     representations of the four converted components are packed into a 32-bit
     unsigned integer, and that value is replicated to all four components of
     the result vector.  The PK4UB instruction can be reversed by the UP4UB
     instruction below.

       tmp0 = VectorLoad(op0);
       if (tmp0.x < 0.0) tmp0.x = 0.0;
       if (tmp0.x > 1.0) tmp0.x = 1.0;
       if (tmp0.y < 0.0) tmp0.y = 0.0;
       if (tmp0.y > 1.0) tmp0.y = 1.0;
       if (tmp0.z < 0.0) tmp0.z = 0.0;
       if (tmp0.z > 1.0) tmp0.z = 1.0;
       if (tmp0.w < 0.0) tmp0.w = 0.0;
       if (tmp0.w > 1.0) tmp0.w = 1.0;
       ub.x = round(255.0 * tmp0.x);  /* ub is a ubyte vector */
       ub.y = round(255.0 * tmp0.y);
       ub.z = round(255.0 * tmp0.z);
       ub.w = round(255.0 * tmp0.w);
       /* result obtained by combining raw bits of ub. */
       result.x = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
       result.y = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
       result.z = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
       result.w = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));

     PK4UB supports all three data type modifiers.  The single operand is
     always treated as a floating-point value, but the result is written as a
     floating-point value, a signed integer, or an unsigned integer, as
     specified by the data type modifier.  For integer result variables, the
     bits can be interpreted as described above.  For floating-point result
     variables, the packed results do not constitute a meaningful
     floating-point variable and should only be used to feed future unpack
     instructions.

     A program will fail to load if it contains a PK4UB instruction that writes
     its results to a variable declared as "SHORT".


     Section 2.X.8.Z, POW:  Exponentiate

     The POW instruction approximates the value of the first scalar operand
     raised to the power of the second scalar operand and replicates it to all
     four components of the result vector.

       tmp0 = ScalarLoad(op0);
       tmp1 = ScalarLoad(op1);
       result.x = ApproxPower(tmp0, tmp1);
       result.y = ApproxPower(tmp0, tmp1);
       result.z = ApproxPower(tmp0, tmp1);
       result.w = ApproxPower(tmp0, tmp1);

     The exponentiation approximation function may be implemented using the
     base 2 exponentiation and logarithm approximation operations in the EX2
     and LG2 instructions.  In particular,

       ApproxPower(a,b) = ApproxExp2(b * ApproxLog2(a)).

     Note that a logarithm may be involved even for cases where the exponent is
     an integer.  This means that it may not be possible to exponentiate
     correctly with a negative base.  In constrast, it is possible in a
     "normal" mathematical formulation to raise negative numbers to integral
     powers (e.g., (-3)^2== 9, and (-0.5)^-2==4).

     POW supports only floating-point data type modifiers.


     Section 2.X.8.Z, RCC:  Reciprocal (Clamped)

     The RCC instruction approximates the reciprocal of the scalar operand,
     clamps the result to one of two ranges, and replicates the clamped result
     to all four components of the result vector.

     If the approximated reciprocal is greater than 0.0, the result is clamped
     to the range [2^-64, 2^+64].  If the approximate reciprocal is not greater
     than zero, the result is clamped to the range [-2^+64, -2^-64].

       tmp = ScalarLoad(op0);
       result.x = ClampApproxReciprocal(tmp);
       result.y = ClampApproxReciprocal(tmp);
       result.z = ClampApproxReciprocal(tmp);
       result.w = ClampApproxReciprocal(tmp);

     RCC supports only floating-point data type modifiers.


     Section 2.X.8.Z, RCP:  Reciprocal

     The RCP instruction approximates the reciprocal of the scalar operand and
     replicates it to all four components of the result vector.

       tmp = ScalarLoad(op0);
       result.x = ApproxReciprocal(tmp);
       result.y = ApproxReciprocal(tmp);
       result.z = ApproxReciprocal(tmp);
       result.w = ApproxReciprocal(tmp);

     RCP supports only floating-point data type modifiers.


     Section 2.X.8.Z, REP:  Start of Repeat Block

     The REP instruction begins a REP/ENDREP block.  The REP instruction
     supports an optional operand whose x component specifies the initial value
     for the loop count.  The loop count indicates the number of times the
     instructions between the REP and corresponding ENDREP instruction will be
     executed.  If the initial value of the loop count is not positive, the
     entire block is skipped and execution continues at the instruction
     following the corresponding ENDREP instruction.  If the loop count is
     specified as a floating-point value, it is converted to the largest
     integer less than or equal to the specified value (i.e., taking its
     floor).

     If no operand is provided to REP, the loop count is ignored and the
     corresponding ENDREP instruction unconditionally transfers control to the
     instruction immediately following the REP instruction.  The only way to
     exit such a loop is with the BRK instruction.  To prevent obvious infinite
     loops, a program that includes a REP/ENDREP block with no loop count will
     fail to compile unless it contains either a BRK instruction at the current
     nesting level or a RET instruction at any nesting level.

     Implementations may have a limited ability to nest REP/ENDREP blocks.  If
     the number of REP/ENDREP blocks nested inside each other is
     MAX_PROGRAM_LOOP_DEPTH_NV or higher, a program will fail to compile.

       // Set up loop information for the new nesting level.
       tmp = VectorLoad(op0);
       LoopCount = floor(tmp.x);
       if (LoopCount <= 0) {
         continue execution at the corresponding ENDREP;
       }

     REP supports all three data type modifiers.  The single operand is
     interpreted according to the data type modifier.

     (Note:  Unlike the NV_fragment_program2 extension, REP blocks in this
     extension support fully general looping; the specified loop count can be
     computed in the program itself.  Additionally, there is no run-time limit
     on the maximum overall depth of REP/ENDREP nesting.  As long as each
     individual subroutine of the program obeys the static nesting limits,
     there will be no run-time errors in the program.  With the
     NV_fragment_program2 extension, a program could terminate abnormally if it
     called a subroutine inside a deeply nested set of REP/ENDREP blocks and
     the called subroutine also contained deeply nested REP/ENDREP blocks.
     Such an error could occur even if neither subroutine exceeded static
     limits.)


     Section 2.X.8.Z, RET:  Subroutine Return

     The RET instruction conditionally returns from a subroutine initiated by a
     CAL instruction by popping an instruction reference off the top of the
     call stack and transferring control to the referenced instruction.  The
     following pseudocode describes the operation of the instruction:

       if (TestCC(cc.c***) || TestCC(cc.*c**) ||
           TestCC(cc.**c*) || TestCC(cc.***c)) {
         if (callStackDepth <= 0) {
           // terminate program
         } else {
           callStackDepth--;
           instruction = callStack[callStackDepth];
         }

         // continue execution at <instruction>
       } else {
         // do nothing
       }

     In the pseudocode, <callStackDepth> is the depth of the call stack,
     <callStack> is an array holding the call stack, and <instruction> is a
     reference to an instruction previously pushed onto the call stack.

     If the call stack is empty when RET executes, the program terminates
     normally.


     Section 2.X.8.Z, RFL:  Reflection Vector

     The RFL instruction computes the reflection of the second vector operand
     (the "direction" vector) about the vector specified by the first vector
     operand (the "axis" vector).  Both operands are treated as 3D vectors (the
     w components are ignored).  The result vector is another 3D vector (the
     "reflected direction" vector).  The length of the result vector, ignoring
     rounding errors, should equal that of the second operand.

       axis = VectorLoad(op0);
       direction = VectorLoad(op1);
       tmp.w = (axis.x * axis.x + axis.y * axis.y + axis.z * axis.z);
       tmp.x = (axis.x * direction.x + axis.y * direction.y +
                axis.z * direction.z);
       tmp.x = 2.0 * tmp.x;
       tmp.x = tmp.x / tmp.w;
       result.x = tmp.x * axis.x - direction.x;
       result.y = tmp.x * axis.y - direction.y;
       result.z = tmp.x * axis.z - direction.z;
       result.w = undefined;

     RFL supports only floating-point data type modifiers.


     Section 2.X.8.Z, ROUND:  Round to Nearest Integer

     The ROUND instruction loads a single vector operand and performs a
     component-wise round operation to generate a result vector.

       tmp = VectorLoad(op0);
       result.x = round(tmp.x);
       result.y = round(tmp.y);
       result.z = round(tmp.z);
       result.w = round(tmp.w);

     The round operation returns the nearest integer to the operand.  If the
     fractional portion of the operand is 0.5, round() selects the nearest even
     integer.  For example round(-1.7) = -2.0, round(+1.0) = +1.0, and
     round(+3.7) = +4.0.

     ROUND supports all three data type modifiers.  The single operand is
     always treated as a floating-point value, but the result is written as a
     floating-point value, a signed integer, or an unsigned integer, as
     specified by the data type modifier.  If a value is not exactly
     representable using the data type of the result (e.g., an overflow or
     writing a negative value to an unsigned integer), the result is undefined.


     Section 2.X.8.Z, RSQ:  Reciprocal Square Root

     The RSQ instruction approximates the reciprocal of the square root of the
     scalar operand and replicates it to all four components of the result
     vector.

       tmp = ScalarLoad(op0);
       result.x = ApproxRSQRT(tmp);
       result.y = ApproxRSQRT(tmp);
       result.z = ApproxRSQRT(tmp);
       result.w = ApproxRSQRT(tmp);

     If the operand is less than or equal to zero, the results of the
     instruction are undefined.

     RSQ supports only floating-point data type modifiers.

     Note that this instruction differs from the RSQ instruction in
     ARB_vertex_program in that it does not implicitly take the absolute value
     of its operand.  The |abs| operator can be used to achieve equivalent
     semantics.


     Section 2.X.8.Z, SAD:  Sum of Absolute Differences

     The SAD instruction performs a component-wise difference of the first two
     integer operands (subtracting the second from the first), and then does a
     component-wise add of the absolute value of the difference to the third
     unsigned integer operand to yield an unsigned integer result vector.

       tmp0 = VectorLoad(op0);
       tmp1 = VectorLoad(op1);
       tmp2 = VectorLoad(op2);
       result.x = abs(tmp0.x - tmp1.x) + tmp2.x;
       result.y = abs(tmp0.y - tmp1.y) + tmp2.y;
       result.z = abs(tmp0.z - tmp1.z) + tmp2.z;
       result.w = abs(tmp0.w - tmp1.w) + tmp2.w;

     SAD supports signed and unsigned integer data type modifiers.  The first
     two operands are interpreted according to the data type modifier.  The
     third operand and the result are always unsigned integers.


     Section 2.X.8.Z, SCS:  Sine/Cosine without Reduction

     The SCS instruction approximates the trigonometric sine and cosine of the
     angle specified by the scalar operand and places the cosine in the x
     component and the sine in the y component of the result vector.  The z and
     w components of the result vector are undefined.  The angle is specified
     in radians and must be in the range [-PI,PI].

       tmp = ScalarLoad(op0);
       result.x = ApproxCosine(tmp);
       result.y = ApproxSine(tmp);
       result.z = undefined;
       result.w = undefined;

     If the scalar operand is not in the range [-PI,PI], the result vector is
     undefined.

     SCS supports only floating-point data type modifiers.


     Section 2.X.8.Z, SEQ:  Set on Equal

     The SEQ instruction performs a component-wise comparison of the two
     operands.  Each component of the result vector returns a TRUE value
     (described below) if the corresponding component of the first operand is
     equal to that of the second, and a FALSE value otherwise.

       tmp0 = VectorLoad(op0);
       tmp1 = VectorLoad(op1);
       result.x = (tmp0.x == tmp1.x) ? TRUE : FALSE;
       result.y = (tmp0.y == tmp1.y) ? TRUE : FALSE;
       result.z = (tmp0.z == tmp1.z) ? TRUE : FALSE;
       result.w = (tmp0.w == tmp1.w) ? TRUE : FALSE;

     SEQ supports all data type modifiers.  For floating-point data types, the
     TRUE value is 1.0 and the FALSE value is 0.0.  For signed integer data
     types, the TRUE value is -1 and the FALSE value is 0.  For unsigned
     integer data types, the TRUE value is the maximum integer value (all bits
     are ones) and the FALSE value is zero.


     Section 2.X.8.Z, SFL:  Set on False

     The SFL instruction is a degenerate case of the other "Set on"
     instructions that sets all components of the result vector to a FALSE
     value (described below).

       result.x = FALSE;
       result.y = FALSE;
       result.z = FALSE;
       result.w = FALSE;

     SFL supports all data type modifiers.  For floating-point data types, the
     FALSE value is 0.0.  For signed and unsigned integer data types, the FALSE
     value is zero.


     Section 2.X.8.Z, SGE:  Set on Greater Than or Equal

     The SGE instruction performs a component-wise comparison of the two
     operands.  Each component of the result vector returns a TRUE value
     (described below) if the corresponding component of the first operand is
     greater than or equal to that of the second, and a FALSE value otherwise.

       tmp0 = VectorLoad(op0);
       tmp1 = VectorLoad(op1);
       result.x = (tmp0.x >= tmp1.x) ? TRUE : FALSE;
       result.y = (tmp0.y >= tmp1.y) ? TRUE : FALSE;
       result.z = (tmp0.z >= tmp1.z) ? TRUE : FALSE;
       result.w = (tmp0.w >= tmp1.w) ? TRUE : FALSE;

     SGE supports all data type modifiers.  For floating-point data types, the
     TRUE value is 1.0 and the FALSE value is 0.0.  For signed integer data
     types, the TRUE value is -1 and the FALSE value is 0.  For unsigned
     integer data types, the TRUE value is the maximum integer value (all bits
     are ones) and the FALSE value is zero.


     Section 2.X.8.Z, SGT:  Set on Greater Than

     The SGT instruction performs a component-wise comparison of the two
     operands.  Each component of the result vector returns a TRUE value
     (described below) if the corresponding component of the first operand is
     greater than that of the second, and a FALSE value otherwise.

       tmp0 = VectorLoad(op0);
       tmp1 = VectorLoad(op1);
       result.x = (tmp0.x > tmp1.x) ? TRUE : FALSE;
       result.y = (tmp0.y > tmp1.y) ? TRUE : FALSE;
       result.z = (tmp0.z > tmp1.z) ? TRUE : FALSE;
       result.w = (tmp0.w > tmp1.w) ? TRUE : FALSE;

     SGT supports all data type modifiers.  For floating-point data types, the
     TRUE value is 1.0 and the FALSE value is 0.0.  For signed integer data
     types, the TRUE value is -1 and the FALSE value is 0.  For unsigned
     integer data types, the TRUE value is the maximum integer value (all bits
     are ones) and the FALSE value is zero.


     Section 2.X.8.Z, SHL:  Shift Left

     The SHL instruction performs a component-wise left shift of the bits of
     the first operand by the value of the second scalar operand to produce a
     result vector.  The bits vacated during the shift operation are filled
     with zeroes.

       tmp0 = VectorLoad(op0);
       tmp1 = ScalarLoad(op1);
       result.x = tmp0.x << tmp1;
       result.y = tmp0.y << tmp1;
       result.z = tmp0.z << tmp1;
       result.w = tmp0.w << tmp1;

     The results of a shift operation ("<<") are undefined if the value of the
     second operand is negative, or greater than or equal to the number of bits
     in the first operand.

     SHL supports both signed and unsigned integer data type modifiers.  If no
     modifier is provided, the operands and the result are treated as signed
     integers.


     Section 2.X.8.Z, SHR:  Shift Right

     The SHR instruction performs a component-wise right shift of the bits of
     the first operand by the value of the second scalar operand to produce a
     result vector.  The bits vacated during shift operation are filled with
     zeros if the operand is non-negative and ones otherwise.

       tmp0 = VectorLoad(op0);
       tmp1 = ScalarLoad(op1);
       result.x = tmp0.x >> tmp1;
       result.y = tmp0.y >> tmp1;
       result.z = tmp0.z >> tmp1;
       result.w = tmp0.w >> tmp1;

     The results of a shift operation (">>") are undefined if the value of the
     second operand is negative, or greater than or equal to the number of bits
     in the first operand.

     SHR supports both signed and unsigned integer data type modifiers.  If no
     modifiers are provided, the operands and the result are treated as signed
     integers.


     Section 2.X.8.Z, SIN:  Sine with Reduction to [-PI,PI]

     The SIN instruction approximates the trigonometric sine of the angle
     specified by the scalar operand and replicates it to all four components
     of the result vector.  The angle is specified in radians and does not have
     to be in the range [-PI,PI].

       tmp = ScalarLoad(op0);
       result.x = ApproxSine(tmp);
       result.y = ApproxSine(tmp);
       result.z = ApproxSine(tmp);
       result.w = ApproxSine(tmp);

     SIN supports only floating-point data type modifiers.


     Section 2.X.8.Z, SLE:  Set on Less Than or Equal

     The SLE instruction performs a component-wise comparison of the two
     operands.  Each component of the result vector returns a TRUE value
     (described below) if the corresponding component of the first operand is
     less than or equal to that of the second, and a FALSE value otherwise.

       tmp0 = VectorLoad(op0);
       tmp1 = VectorLoad(op1);
       result.x = (tmp0.x <= tmp1.x) ? TRUE : FALSE;
       result.y = (tmp0.y <= tmp1.y) ? TRUE : FALSE;
       result.z = (tmp0.z <= tmp1.z) ? TRUE : FALSE;
       result.w = (tmp0.w <= tmp1.w) ? TRUE : FALSE;

     SLE supports all data type modifiers.  For floating-point data types, the
     TRUE value is 1.0 and the FALSE value is 0.0.  For signed integer data
     types, the TRUE value is -1 and the FALSE value is 0.  For unsigned
     integer data types, the TRUE value is the maximum integer value (all bits
     are ones) and the FALSE value is zero.


     Section 2.X.8.Z, SLT:  Set on Less Than

     The SLT instruction performs a component-wise comparison of the two
     operands.  Each component of the result vector returns a TRUE value
     (described below) if the corresponding component of the first operand is
     less than that of the second, and a FALSE value otherwise.

       tmp0 = VectorLoad(op0);
       tmp1 = VectorLoad(op1);
       result.x = (tmp0.x < tmp1.x) ? TRUE : FALSE;
       result.y = (tmp0.y < tmp1.y) ? TRUE : FALSE;
       result.z = (tmp0.z < tmp1.z) ? TRUE : FALSE;
       result.w = (tmp0.w < tmp1.w) ? TRUE : FALSE;

     SLT supports all data type modifiers.  For floating-point data types, the
     TRUE value is 1.0 and the FALSE value is 0.0.  For signed integer data
     types, the TRUE value is -1 and the FALSE value is 0.  For unsigned
     integer data types, the TRUE value is the maximum integer value (all bits
     are ones) and the FALSE value is zero.


     Section 2.X.8.Z, SNE:  Set on Not Equal

     The SNE instruction performs a component-wise comparison of the two
     operands.  Each component of the result vector returns a TRUE value
     (described below) if the corresponding component of the first operand is
     less than that of the second, and a FALSE value otherwise.

       tmp0 = VectorLoad(op0);
       tmp1 = VectorLoad(op1);
       result.x = (tmp0.x != tmp1.x) ? TRUE : FALSE;
       result.y = (tmp0.y != tmp1.y) ? TRUE : FALSE;
       result.z = (tmp0.z != tmp1.z) ? TRUE : FALSE;
       result.w = (tmp0.w != tmp1.w) ? TRUE : FALSE;

     SNE supports all data type modifiers.  For floating-point data types, the
     TRUE value is 1.0 and the FALSE value is 0.0.  For signed integer data
     types, the TRUE value is -1 and the FALSE value is 0.  For unsigned
     integer data types, the TRUE value is the maximum integer value (all bits
     are ones) and the FALSE value is zero.


     Section 2.X.8.Z, SSG:  Set Sign

     The SSG instruction generates a result vector containing the signs of
     each component of the single vector operand.  Each component of the
     result vector is 1.0 if the corresponding component of the operand
     is greater than zero, 0.0 if the corresponding component of the
     operand is equal to zero, and -1.0 if the corresponding component
     of the operand is less than zero.

       tmp = VectorLoad(op0);
       result.x = SetSign(tmp.x);
       result.y = SetSign(tmp.y);
       result.z = SetSign(tmp.z);
       result.w = SetSign(tmp.w);

     SSG supports only floating-point data type modifiers.


     Section 2.X.8.Z, STR:  Set on True

     The STR instruction is a degenerate case of the other "Set on"
     instructions that sets all components of the result vector to a TRUE value
     (described below).

       result.x = TRUE;
       result.y = TRUE;
       result.z = TRUE;
       result.w = TRUE;

     STR supports all data type modifiers.  For floating-point data types, the
     TRUE value is 1.0.  For signed integer data types, the TRUE value is -1.
     For unsigned integer data types, the TRUE value is the maximum integer
     value (all bits are ones).


     Section 2.X.8.Z, SUB:  Subtract

     The SUB instruction performs a component-wise subtraction of the second
     operand from the first to yield a result vector.

       tmp0 = VectorLoad(op0);
       tmp1 = VectorLoad(op1);
       result.x = tmp0.x - tmp1.x;
       result.y = tmp0.y - tmp1.y;
       result.z = tmp0.z - tmp1.z;
       result.w = tmp0.w - tmp1.w;

     SUB supports all three data type modifiers.


     Section 2.X.8.Z, SWZ:  Extended Swizzle

     The SWZ instruction loads the single vector operand, and performs a
     swizzle operation more powerful than that provided for loading normal
     vector operands to yield an instruction vector.

     After the operand is loaded, the "x", "y", "z", and "w" components of the
     result vector are selected by the first, second, third, and fourth matches
     of the <extSwizComp> pattern in the <extendedSwizzle> rule.

     A result component can be selected from any of the four components of the
     operand or the constants 0.0 and 1.0.  The result component can also be
     optionally negated.  The following pseudocode describes the component
     selection method.  "operand" refers to the vector operand, "select" is an
     enumerant where the values ZERO, ONE, X, Y, Z, and W correspond to the
     <extSwizSel> rule matching "0", "1", "x", "y", "z", and "w", respectively.
     "negate" is TRUE if and only if the <optionalSign> rule in <extSwizComp>
     matches "-".

       float ExtSwizComponent(floatVec operand, enum select, boolean negate)
       {
           float result;
           switch (select) {
             case ZERO:  result = 0.0; break;
             case ONE:   result = 1.0; break;
             case X:     result = operand.x; break;
             case Y:     result = operand.y; break;
             case Z:     result = operand.z; break;
             case W:     result = operand.w; break;
           }
           if (negate) {
             result = -result;
           }
           return result;
       }

     The entire extended swizzle operation is then defined using the following
     pseudocode:

       tmp = VectorLoad(op0);
       result.x = ExtSwizComponent(tmp, xSelect, xNegate);
       result.y = ExtSwizComponent(tmp, ySelect, yNegate);
       result.z = ExtSwizComponent(tmp, zSelect, zNegate);
       result.w = ExtSwizComponent(tmp, wSelect, wNegate);

     "xSelect", "xNegate", "ySelect", "yNegate", "zSelect", "zNegate",
     "wSelect", and "wNegate" correspond to the "select" and "negate" values
     above for the four <extSwizComp> matches.

     Since this instruction allows for component selection and negation for
     each individual component, the grammar does not allow the use of the
     normal swizzle and negation operations allowed for vector operands in
     other instructions.

     SWZ supports only floating-point data type modifiers.


     Section 2.X.8.Z, TEX:  Texture Sample

     The TEX instruction takes the four components of a single floating-point
     source vector and performs a filtered texture access as described in
     Section 2.X.4.4.  The returned (R,G,B,A) value is written to the
     floating-point result vector.  Partial derivatives and the level of detail
     are computed automatically.

       tmp = VectorLoad(op0);
       ddx = ComputePartialsX(tmp);
       ddy = ComputePartialsY(tmp);
       lambda = ComputeLOD(ddx, ddy);
       result = TextureSample(tmp, lambda, ddx, ddy, texelOffset);

     TEX supports all three data type modifiers.  The single operand is always
     treated as a floating-point vector; the results are interpreted according
     to the data type modifier.


     Section 2.X.8.Z, TRUNC:  Truncate (Round Toward Zero)

     The TRUNC instruction loads a single vector operand and performs a
     component-wise truncate operation to generate a result vector.

       tmp = VectorLoad(op0);
       result.x = trunc(tmp.x);
       result.y = trunc(tmp.y);
       result.z = trunc(tmp.z);
       result.w = trunc(tmp.w);

     The truncate operation returns the nearest integer to zero smaller in
     magnitude than the operand.  For example trunc(-1.7) = -1.0, trunc(+1.0) =
     +1.0, and trunc(+3.7) = +3.0.

     TRUNC supports all three data type modifiers.  The single operand is
     always treated as a floating-point value, but the result is written as a
     floating-point value, a signed integer, or an unsigned integer, as
     specified by the data type modifier.  If a value is not exactly
     representable using the data type of the result (e.g., an overflow or
     writing a negative value to an unsigned integer), the result is undefined.


     Section 2.X.8.Z, TXB:  Texture Sample with Bias

     The TXB instruction takes the four components of a single floating-point
     source vector and performs a filtered texture access as described in
     Section 2.X.4.4.  The returned (R,G,B,A) value is written to the
     floating-point result vector.  Partial derivatives and the level of detail
     are computed automatically, but the fourth component of the source vector
     is added to the computed LOD prior to sampling.

       tmp = VectorLoad(op0);
       ddx = ComputePartialsX(tmp);
       ddy = ComputePartialsY(tmp);
       lambda = ComputeLOD(ddx, ddy);
       result = TextureSample(tmp, lambda + tmp.w, ddx, ddy, texelOffset);

     The single source vector in the TXB instruction does not have enough
     coordinates to specify a lookup into a two-dimensional array texture or
     cube map texture with both an LOD bias and an explicit reference value for
     depth comparison.  A program will fail to load if it contains a TXB
     instruction with a target of SHADOWCUBE or SHADOWARRAY2D.

     TXB supports all three data type modifiers.  The single operand is always
     treated as a floating-point vector; the results are interpreted according
     to the data type modifier.


     Section 2.X.8.Z, TXD:  Texture Sample with Partials

     The TXD instruction takes the four components of the first floating-point
     source vector and performs a filtered texture access as described in
     Section 2.X.4.4.  The returned (R,G,B,A) value is written to the
     floating-point result vector.  The partial derivatives of the texture
     coordinates with respect to X and Y are specified by the second and third
     floating-point source vectors.  The level of detail is computed
     automatically using the provided partial derivatives.

     Note that for cube map texture targets, the provided partial derivatives
     are in the coordinate system used before texture coordinates are projected
     onto the appropriate cube face.  The partial derivatives of the
     post-projection texture coordinates, which are used for level-of-detail
     and anisotropic filtering calculations, are derived from the original
     coordinates and partial derivatives in an implementation-dependent manner.

       tmp0 = VectorLoad(op0);
       tmp1 = VectorLoad(op1);
       tmp2 = VectorLoad(op2);
       lambda = ComputeLOD(tmp1, tmp2);
       result = TextureSample(tmp0, lambda, tmp1, tmp2, texelOffset);

     TXD supports all three data type modifiers.  All three operands are always
     treated as floating-point vectors; the results are interpreted according
     to the data type modifier.


     Section 2.X.8.Z, TXF:  Texel Fetch

     The TXF instruction takes the four components of a single signed integer
     source vector and performs a single texel fetch as described in Section
     2.X.4.4.  The first three components provide the <i>, <j>, and <k> values
     for the texel fetch, and the fourth component is used to determine the LOD
     to access.  The returned (R,G,B,A) value is written to the floating-point
     result vector.  Partial derivatives are irrelevant for single texel
     fetches.

       tmp = VectorLoad(op0);
       result = TexelFetch(tmp, texelOffset);

     TXF supports all three data type modifiers.  The single vector operand is
     treated as a signed integer vector; the results are interpreted according
     to the data type modifier.


     Section 2.X.8.Z, TXL:  Texture Sample with LOD

     The TXL instruction takes the four components of a single floating-point
     source vector and performs a filtered texture access as described in
     Section 2.X.4.4.  The returned (R,G,B,A) value is written to the
     floating-point result vector.  The level of detail is taken from the
     fourth component of the source vector.

     Partial derivatives are not computed by the TXL instruction and
     anisotropic filtering is not performed.

       tmp = VectorLoad(op0);
       ddx = (0,0,0);
       ddy = (0,0,0);
       result = TextureSample(tmp, tmp.w, ddx, ddy, texelOffset);

     The single source vector in the TXL instruction does not have enough
     coordinates to specify a lookup into a 2D array or cube map texture with
     both an explicit LOD and a reference value for depth comparison.  A
     program will fail to load if it contains a TXL instruction with a target
     of SHADOWCUBE or SHADOWARRAY2D.

     TXL supports all three data type modifiers.  The single vector operand is
     treated as a floating-point vector; the results are interpreted according
     to the data type modifier.


     Section 2.X.8.Z, TXP:  Texture Sample with Projection

     The TXP instruction divides the first three components of its single
     floating-point source vector by its fourth component, maps the results to
     s, t, and r, and performs a filtered texture access as described in
     Section 2.X.4.4.  The returned (R,G,B,A) value is written to the
     floating-point result vector.  Partial derivatives and the level of detail
     are computed automatically.

       tmp0 = VectorLoad(op0);
       tmp0.x = tmp0.x / tmp0.w;
       tmp0.y = tmp0.y / tmp0.w;
       tmp0.z = tmp0.z / tmp0.w;
       ddx = ComputePartialsX(tmp);
       ddy = ComputePartialsY(tmp);
       lambda = ComputeLOD(ddx, ddy);
       result = TextureSample(tmp, lambda, ddx, ddy, texelOffset);

     The single source vector in the TXP instruction does not have enough
     coordinates to specify a lookup into a 2D array or cube map texture with
     both a Q coordinate and an explicit reference value for depth comparison.
     A program will fail to load if it contains a TXP instruction with a target
     of SHADOWCUBE or SHADOWARRAY2D.

     TXP supports all three data type modifiers.  The single vector operand is
     treated as a floating-point vector; the results are interpreted according
     to the data type modifier.


     Section 2.X.8.Z, TXQ:  Texture Size Query

     The TXQ instruction takes the first component of the single integer vector
     operand, adds the number of the base level of the specified texture to
     determine a texture image level, and returns an integer result vector
     containing the size of the image at that level of the texture.

     For one-dimensional and one-dimensional array textures, the "x" component
     of the result vector is filled with the width of the image(s).  For
     two-dimensional, rectangle, cube map, and two-dimensional array textures,
     the "x" and "y" components are filled with the width and height of the
     image(s).  For three-dimensional textures, the "x", "y", and "z"
     components are filled with the width, height, and depth of the image.
     Additionally, the number of layers in an array texture is returned in the
     "y" component of the result for one-dimensional array textures or the "z"
     component for two-dimensional array textures.  All other components of the
     result vector is undefined.  For the purposes of this instruction, the
     width, height, and depth of a texture do NOT include any border.

       tmp0 = VectorLoad(op0);
       tmp0.x = tmp0.x + texture[op1].target[op2].base_level;
       result.x = texture[op1].target[op2].level[tmp0.x].width;
       result.y = texture[op1].target[op2].level[tmp0.x].height;
       result.z = texture[op1].target[op2].level[tmp0.x].depth;

     If the level computed by adding the operand to the base level of the
     texture is less than the base level number or greater than the maximum
     level number, the results are undefined.

     TXQ supports no data type modifiers; the scalar operand and the result
     vector are both interpreted as signed integers.


     Section 2.X.8.Z, UP2H:  Unpack Two 16-bit Floats

     The UP2H instruction unpacks two 16-bit floats stored together in a 32-bit
     scalar operand.  The first 16-bit float (stored in the 16 least
     significant bits) is written into the "x" and "z" components of the result
     vector; the second is written into the "y" and "w" components of the
     result vector.

     This operation undoes the type conversion and packing performed by
     the PK2H instruction.

       tmp = ScalarLoad(op0);
       result.x = (fp16) (RawBits(tmp) & 0xFFFF);
       result.y = (fp16) ((RawBits(tmp) >> 16) & 0xFFFF);
       result.z = (fp16) (RawBits(tmp) & 0xFFFF);
       result.w = (fp16) ((RawBits(tmp) >> 16) & 0xFFFF);

     UP2H supports all three data type modifiers.  The single operand is read
     as a floating-point value, a signed integer, or an unsigned integer, as
     specified by the data type modifier; the 32 least significant bits of the
     encoding are used for unpacking.  For floating-point operand variables, it
     is expected (but not required) that the operand was produced by a previous
     pack instruction.  The result is always written as a floating-point
     vector.

     A program will fail to load if it contains a UP2H instruction whose
     operand is a variable declared as "SHORT".


     Section 2.X.8.Z, UP2US:  Unpack Two Unsigned 16-bit Integers

     The UP2US instruction unpacks two 16-bit unsigned values packed
     together in a 32-bit scalar operand.  The unsigned quantities are
     encoded where a bit pattern of all '0' bits corresponds to 0.0 and
     a pattern of all '1' bits corresponds to 1.0.  The "x" and "z"
     components of the result vector are obtained from the 16 least
     significant bits of the operand; the "y" and "w" components are
     obtained from the 16 most significant bits.

     This operation undoes the type conversion and packing performed by
     the PK2US instruction.

       tmp = ScalarLoad(op0);
       result.x = ((RawBits(tmp) >> 0)  & 0xFFFF) / 65535.0;
       result.y = ((RawBits(tmp) >> 16) & 0xFFFF) / 65535.0;
       result.z = ((RawBits(tmp) >> 0)  & 0xFFFF) / 65535.0;
       result.w = ((RawBits(tmp) >> 16) & 0xFFFF) / 65535.0;

     UP2US supports all three data type modifiers.  The single operand is read
     as a floating-point value, a signed integer, or an unsigned integer, as
     specified by the data type modifier; the 32 least significant bits of the
     encoding are used for unpacking.  For floating-point operand variables, it
     is expected (but not required) that the operand was produced by a previous
     pack instruction.  The result is always written as a floating-point
     vector.

     A GPU program will fail to load if it contains a UP2S instruction
     whose operand is a variable declared as "SHORT".


     Section 2.X.8.Z, UP4B:  Unpack Four Signed 8-bit Integers

     The UP4B instruction unpacks four 8-bit signed values packed together
     in a 32-bit scalar operand.  The signed quantities are encoded where
     a bit pattern of all '0' bits corresponds to -128/127 and a pattern
     of all '1' bits corresponds to +127/127.  The "x" component of the
     result vector is the converted value corresponding to the 8 least
     significant bits of the operand; the "w" component corresponds to
     the 8 most significant bits.

     This operation undoes the type conversion and packing performed by
     the PK4B instruction.

       tmp = ScalarLoad(op0);
       result.x = (((RawBits(tmp) >> 0) & 0xFF) - 128) / 127.0;
       result.y = (((RawBits(tmp) >> 8) & 0xFF) - 128) / 127.0;
       result.z = (((RawBits(tmp) >> 16) & 0xFF) - 128) / 127.0;
       result.w = (((RawBits(tmp) >> 24) & 0xFF) - 128) / 127.0;

     UP2B supports all three data type modifiers.  The single operand is read
     as a floating-point value, a signed integer, or an unsigned integer, as
     specified by the data type modifier; the 32 least significant bits of the
     encoding are used for unpacking.  For floating-point operand variables, it
     is expected (but not required) that the operand was produced by a previous
     pack instruction.  The result is always written as a floating-point
     vector.

     A program will fail to load if it contains a UP4B instruction whose
     operand is a variable declared as "SHORT".


     Section 2.X.8.Z, UP4UB:  Unpack Four Unsigned 8-bit Integers

     The UP4UB instruction unpacks four 8-bit unsigned values packed
     together in a 32-bit scalar operand.  The unsigned quantities are
     encoded where a bit pattern of all '0' bits corresponds to 0.0 and a
     pattern of all '1' bits corresponds to 1.0.  The "x" component of the
     result vector is obtained from the 8 least significant bits of the
     operand; the "w" component is obtained from the 8 most significant
     bits.

     This operation undoes the type conversion and packing performed by
     the PK4UB instruction.

       tmp = ScalarLoad(op0);
       result.x = ((RawBits(tmp) >> 0)  & 0xFF) / 255.0;
       result.y = ((RawBits(tmp) >> 8)  & 0xFF) / 255.0;
       result.z = ((RawBits(tmp) >> 16) & 0xFF) / 255.0;
       result.w = ((RawBits(tmp) >> 24) & 0xFF) / 255.0;

     UP4UB supports all three data type modifiers.  The single operand is read
     as a floating-point value, a signed integer, or an unsigned integer, as
     specified by the data type modifier; the 32 least significant bits of the
     encoding are used for unpacking.  For floating-point operand variables, it
     is expected (but not required) that the operand was produced by a previous
     pack instruction.  The result is always written as a floating-point
     vector.

     A program will fail to load if it contains a UP4UB instruction whose
     operand is a variable declared as "SHORT".


     Section 2.X.8.Z, X2D:  2D Coordinate Transformation

     The X2D instruction multiplies the 2D offset vector specified by the
     "x" and "y" components of the second vector operand by the 2x2 matrix
     specified by the four components of the third vector operand, and adds
     the transformed offset vector to the 2D vector specified by the "x"
     and "y" components of the first vector operand.  The first component
     of the sum is written to the "x" and "z" components of the result;
     the second component is written to the "y" and "w" components of
     the result.

       tmp0 = VectorLoad(op0);
       tmp1 = VectorLoad(op1);
       tmp2 = VectorLoad(op2);
       result.x = tmp0.x + tmp1.x * tmp2.x + tmp1.y * tmp2.y;
       result.y = tmp0.y + tmp1.x * tmp2.z + tmp1.y * tmp2.w;
       result.z = tmp0.x + tmp1.x * tmp2.x + tmp1.y * tmp2.y;
       result.w = tmp0.y + tmp1.x * tmp2.z + tmp1.y * tmp2.w;

     X2D supports only floating-point data type modifiers.


     Section 2.X.8.Z, XOR:  Exclusive Or

     The XOR instruction performs a bitwise XOR operation on the components of
     the two source vectors to yield a result vector.

       tmp0 = VectorLoad(op0);
       tmp1 = VectorLoad(op1);
       result.x = tmp0.x ^ tmp1.x;
       result.y = tmp0.y ^ tmp1.y;
       result.z = tmp0.z ^ tmp1.z;
       result.w = tmp0.w ^ tmp1.w;

     XOR supports only integer data type modifiers.  If no type modifier is
     specified, both operands and the result are treated as signed integers.


     Section 2.X.8.Z, XPD:  Cross Product

     The XPD instruction computes the cross product using the first three
     components of its two vector operands to generate the x, y, and z
     components of the result vector.  The w component of the result vector is
     undefined.

       tmp0 = VectorLoad(op0);
       tmp1 = VectorLoad(op1);
       result.x = tmp0.y * tmp1.z - tmp0.z * tmp1.y;
       result.y = tmp0.z * tmp1.x - tmp0.x * tmp1.z;
       result.z = tmp0.x * tmp1.y - tmp0.y * tmp1.x;
       result.w = undefined;

     XPD supports only floating-point data type modifiers.


 Additions to Chapter 3 of the OpenGL 1.5 Specification (Rasterization)

     Modify Section 3.8.1, Texture Image Specification, p. 150

     (modify 4th paragraph, p. 151 -- add cubemaps to the list of texture
     targets that can be used with DEPTH_COMPONENT textures) Textures with a
     base internal format of DEPTH_COMPONENT are supported by texture image
     specification commands only if <target> is TEXTURE_1D, TEXTURE_2D,
     TEXTURE_CUBE_MAP, TEXTURE_RECTANGLE_ARB, TEXTURE_1D_ARRAY_EXT,
     TEXTURE_2D_ARRAY_EXT, PROXY_TEXTURE_1D PROXY_TEXTURE_2D,
     PROXY_TEXTURE_CUBE_MAP, PROXY_TEXTURE_RECTANGLE_ARB,
     PROXY_TEXTURE_1D_ARRAY_EXT, or PROXY_TEXTURE_2D_ARRAY_EXT.  Using this
     format in conjunction with any other target will result in an
     INVALID_OPERATION error.


     Delete Section 3.8.7, Texture Wrap Modes.  (The language in this section
     is folded into updates to the following section, and is no longer needed
     here.)


     Modify Section 3.8.8, Texture Minification:

     (replace the last paragraph, p. 171):  Let s(x,y) be the function that
     associates an s texture coordinate with each set of window coordinates
     (x,y) that lie within a primitive; define t(x,y) and r(x,y) analogously.
     Let

       u(x,y) = w_t * s(x,y) + offsetu_shader,
       v(x,y) = h_t * t(x,y) + offsetv_shader,
       w(x,y) = d_t * r(x,y) + offsetw_shader, and

     where w_t, h_t, and d_t are as defined by equations 3.15, 3.16, and 3.17
     with w_s, h_s, and d_s equal to the width, height, and depth of the image
     array whose level is level_base.  (offsetu_shader, offsetv_shader,
     offsetw_shader) is the texel offset specified in the vertex, geometry, or
     fragment program instruction used to perform the access.  For
     fixed-function texture accesses, all three shader offsets are taken to be
     zero.  For a one-dimensional texture, define v(x,y) == 0 and w(x,y) === 0;
     for two-dimensional textures, define w(x,y) == 0.

     After u(x,y), v(x,y), and w(x,y) are generated, they are clamped if the
     corresponding texture wrap modes are CLAMP or MIRROR_CLAMP_EXT.  Let

       u'(x,y) = clamp(u(x,y), 0, w_t),      if TEXTURE_WRAP_S is CLAMP
                 clamp(u(x,y), -w_t, w_t),   if TEXTURE_WRAP_S is
                                               MIRROR_CLAMP_EXT, or
                 u(x,y),                     otherwise
       v'(x,y) = clamp(v(x,y), 0, w_t),      if TEXTURE_WRAP_T is CLAMP
                 clamp(v(x,y), -w_t, w_t),   if TEXTURE_WRAP_T is
                                               MIRROR_CLAMP_EXT, or
                 v(x,y),                     otherwise
       w'(x,y) = clamp(w(x,y), 0, w_t),      if TEXTURE_WRAP_R is CLAMP
                 clamp(w(x,y), -w_t, w_t),   if TEXTURE_WRAP_R is
                                               MIRROR_CLAMP_EXT, or
                 w(x,y),                     otherwise,

     where clamp(<a>,<b>,<c>) returns <b> if <a> is less than <b>, <c> if a is
     greater than <c>, and <a> otherwise.

     (start a new paragraph with "For a polygon, rho is given at a fragment
     with window coordinates...", and then continue with the original spec
     text.)

     (replace text starting with the last paragraph on p. 172, continuing to
     the end of p. 174)

     When lambda indicates minification, the value assigned to
     TEXTURE_MIN_FILTER is used to determine how the texture value for a
     fragment is selected.

     When TEXTURE_MIN_FILTER is NEAREST, the texel in the image array of level
     level_base that is nearest (in Manhattan distance) to that specified by
     (s,t,r) is obtained.  Let i, j, and k be integers such that:

       i = apply_wrap(floor(u'(x,y))),
       j = apply_wrap(floor(v'(x,y))), and
       k = apply_wrap(floor(w'(x,y))),

     where the coordinate returned by apply_wrap() is as defined by Table X.19.
     The values of i, j, and k are then modified according to the texture wrap
     modes, as described in Table 3.19, to produce new values (i', j', and k').
     For a three-dimensional texture, the texel at location (i,j,k) becomes the
     texture value.  For a two-dimensional texture, k is irrelevant, and the
     texel at location (i,j) becomes the texture value.  For a one-dimensional
     texture, j and k are irrelevant, and the texel at location i becomes the
     texture value.

       Wrap mode                   Result
       --------------------------  ------------------------------------------
       CLAMP_TO_EDGE               clamp(coord, 0, size-1)
       CLAMP_TO_BORDER             clamp(coord, -1, size)
       CLAMP                       { clamp(coord, 0, size-1),
                                   {         for NEAREST filtering
                                   { clamp(coord, -1, size),
                                   {         for LINEAR filtering
       REPEAT                      mod(coord, size)
       MIRROR_CLAMP_TO_EDGE_EXT    clamp(mirror(coord), 0, size-1)
       MIRROR_CLAMP_TO_BORDER_EXT  clamp(mirror(size), 0, size)
       MIRROR_CLAMP_EXT            { clamp(mirror(coord), 0, size-1),
                                   {         for NEAREST filtering
                                   { clamp(mirror(size), 0, size),
                                   {         for LINEAR filtering
       MIRRORED_REPEAT             (size-1) - mirror(mod(coord, 2*size)-size)

       Table X.19:  Texel location wrap mode application.  mod(<a>,<b>) is
       defined to return <a>-<b>*floor(<a>/<b>), and mirror(<a>) is defined to
       return <a> if <a> is greater than or equal to zero or -(1+<a>)
       otherwise.  The values of "wrap mode" and size are TEXTURE_WRAP_S and
       w_t, TEXTURE_WRAP_T and h_t, and TEXTURE_WRAP_R and d_t, for i, j, and k
       coordinates, respectively.  The coordinate clamp and MIRROR_CLAMP_EXT
       depends on the filtering mode (NEAREST or LINEAR).

     If the selected (i,j,k), (i,j), or i location refers to a border texel
     that satisfies any of the following conditions:

       i < -b_s,
       j < -b_s,
       k < -b_s,
       i >= w_t + b_s,
       j >= h_t + b_s, or
       j >= d_t + b_s,

     then the border values defined by TEXTURE_BORDER_COLOR are used in place
     of the non-existent texel. If the texture contains color components, the
     values of TEXTURE_BORDER_COLOR are interpreted as an RGBA color to match
     the texture's internal format in a manner consistent with table 3.15. If
     the texture contains depth components, the first component of
     TEXTURE_BORDER_COLOR is interpreted as a depth value.

     When TEXTURE_MIN_FILTER is LINEAR, a 2x2x2 cube of texels in the image
     array of level level_base is selected.  Let:

       i_0   = apply_wrap(floor(u' - 0.5)),
       j_0   = apply_wrap(floor(v' - 0.5)),
       k_0   = apply_wrap(floor(w' - 0.5)),
       i_1   = apply_wrap(floor(u' - 0.5) + 1),
       j_1   = apply_wrap(floor(v' - 0.5) + 1),
       k_1   = apply_wrap(floor(w' - 0.5) + 1),
       alpha = frac(u' - 0.5),
       beta  = frac(v' - 0.5),
       gamma = frac(w' - 0.5),

     where frac(<x>) denotes the fractional part of <x>.

     For a three-dimensional texture, the texture value tau is found as...

     (replace last paragraph, p.174) For any texel in the equation above that
     refers to a border texel outside the defined range of the image, the texel
     value is taken from the texture border color as with NEAREST filtering.


     Modify Section 3.8.14, Texture Comparison Modes (p. 185)

     (modify 2nd paragraph, p. 188, indicating that the Q texture coordinate is
     used for depth comparisons on cubemap textures)

     Let D_t be the depth texture value, in the range [0, 1].  For
     fixed-function texture lookups, let R be the interpolated <r> texture
     coordinate, clamped to the range [0, 1].  For texture lookups generated by
     a program instruction, let R be the reference value for depth comparisons
     provided in the instruction, also clamped to [0, 1].  Then the effective
     texture value L_t, I_t, or A_t is computed as follows:


 Additions to Chapter 4 of the OpenGL 1.5 Specification (Per-Fragment
 Operations and the Frame Buffer)

     None.


 Additions to Chapter 5 of the OpenGL 1.5 Specification (Special Functions)

     None.


 Additions to Chapter 6 of the OpenGL 1.5 Specification (State and
 State Requests)

     Modify Section 6.1.12 of the ARB_vertex_program specification.

     (Add new integer program parameter queries, plus language that program
     environment or local parameter query results are undefined if the query
     specifies a data type incompatible with the data type of the parameter
     being queried.)

     The commands

       void GetProgramEnvParameterdvARB(enum target, uint index,
                                        double *params);
       void GetProgramEnvParameterfvARB(enum target, uint index,
                                        float *params);
       void GetProgramEnvParameterIivNV(enum target, uint index,
                                        int *params);
       void GetProgramEnvParameterIuivNV(enum target, uint index,
                                         uint *params);

     obtain the current value for the program environment parameter numbered
     <index> for the given program target <target>, and places the information
     in the array <params>.  The values returned are undefined if the data type
     of the components of the parameter is not compatible with the data type of
     <params>.  Floating-point components are compatible with "double" or
     "float"; signed and unsigned integer components are compatible with "int"
     and "uint", respectively.  The error INVALID_ENUM is generated if <target>
     specifies a nonexistent program target or a program target that does not
     support program environment parameters.  The error INVALID_VALUE is
     generated if <index> is greater than or equal to the
     implementation-dependent number of supported program environment
     parameters for the program target.

     ...

     The commands

       void GetProgramLocalParameterdvARB(enum target, uint index,
                                          double *params);
       void GetProgramLocalParameterfvARB(enum target, uint index,
                                          float *params);
       void GetProgramLocalParameterIivNV(enum target, uint index,
                                          int *params);
       void GetProgramLocalParameterIuivNV(enum target, uint index,
                                           uint *params);

     obtain the current value for the program local parameter numbered <index>
     belonging to the program object currently bound to <target>, and places
     the information in the array <params>.  The values returned are undefined
     if the data type of the components of the parameter is not compatible with
     the data type of <params>.  Floating-point components are compatible with
     "double' or "float"; signed and unsigned integer components are compatible
     with "int" and "uint", respectively.  The error INVALID_ENUM is generated
     if <target> specifies a nonexistent program target or a program target
     that does not support program local parameters.  The error INVALID_VALUE
     is generated if <index> is greater than or equal to the
     implementation-dependent number of supported program local parameters for
     the program target.

     ...

     The command

       void GetProgramivARB(enum target, enum pname, int *params);

     obtains program state for the program target <target>, writing ...

     (add new paragraphs describing the new supported queries)

     If <pname> is PROGRAM_ATTRIB_COMPONENTS_NV or
     PROGRAM_RESULT_COMPONENTS_NV, GetProgramivARB returns a single integer
     holding the number of active attribute or result variable components,
     respectively, used by the program object currently bound to <target>.

     If <pname> is MAX_PROGRAM_ATTRIB_COMPONENTS or
     MAX_PROGRAM_RESULT_COMPONENTS_NV, GetProgramivARB returns a single integer
     holding the maximum number of active attribute or result variable
     components, respectively, supported for programs of type <target>.


 Additions to Appendix A of the OpenGL 1.5 Specification (Invariance)

     None.


 Additions to the AGL/GLX/WGL Specifications

     None.


 GLX Protocol

     The following new rendering commands are sent to the server as part
     of a glXRender request.

     ProgramLocalParameterI4ivNV

         2           28               rendering command length
         2           4303             rendering command opcode
         4           ENUM             target
         4           CARD32           index
         4           INT32            params[0]
         4           INT32            params[1]
         4           INT32            params[2]
         4           INT32            params[3]

     ProgramLocalParameterI4uivNV

         2           28               rendering command length
         2           4305             rendering command opcode
         4           ENUM             target
         4           CARD32           index
         4           CARD32           params[0]
         4           CARD32           params[1]
         4           CARD32           params[2]
         4           CARD32           params[3]

     ProgramEnvParameterI4ivNV

         2           28               rendering command length
         2           4307             rendering command opcode
         4           ENUM             target
         4           CARD32           index
         4           INT32            params[0]
         4           INT32            params[1]
         4           INT32            params[2]
         4           INT32            params[3]

     ProgramEnvParameterI4uivNV

         2           28               rendering command length
         2           4309             rendering command opcode
         4           ENUM             target
         4           CARD32           index
         4           CARD32           params[0]
         4           CARD32           params[1]
         4           CARD32           params[2]
         4           CARD32           params[3]

     Following new rendering commands are added. These can be sent as a
     glXRender or glXRenderLarge request.

     ProgramLocalParametersI4ivNV

         2           16+count*4*4     rendering command length
         2           4304             rendering command opcode
         4           ENUM             target
         4           CARD32           index
         4           CARD32           count
         4*count*4   LISTofINT32      params

     If the command is encoded in a glXRenderLarge request, the
     command opcode and command length fields above are expanded to
     4 bytes each:

         4           20+count*4*4     rendering command length
         4           4304             rendering command opcode

     ProgramLocalParametersI4uivNV

         2           16+count*4*4     rendering command length
         2           4306             rendering command opcode
         4           ENUM             target
         4           CARD32           index
         4           CARD32           count
         4*count*4   LISTofCARD32     params

     If the command is encoded in a glXRenderLarge request, the
     command opcode and command length fields above are expanded to
     4 bytes each:

         4           20+count*4*4     rendering command length
         4           4306             rendering command opcode

     ProgramEnvParametersI4ivNV

         2           16+count*4*4     rendering command length
         2           4308             rendering command opcode
         4           ENUM             target
         4           CARD32           index
         4           CARD32           count
         4*count*4   LISTofCARD32     params

     If the command is encoded in a glXRenderLarge request, the
     command opcode and command length fields above are expanded to
     4 bytes each:

         4           20+count*4*4     rendering command length
         4           4308             rendering command opcode

     ProgramEnvParametersI4uivNV

         2           16+count*4*4     rendering command length
         2           4310             rendering command opcode
         4           ENUM             target
         4           CARD32           index
         4           INT32            count
         4*count*4   LISTofCARD32     params

     If the command is encoded in a glXRenderLarge request, the
     command opcode and command length fields above are expanded to
     4 bytes each:

         4           20+count*4*4     rendering command length
         4           4310             rendering command opcode

     The remaining commands are non-rendering commands.  These commands
     are sent separately (i.e., not as part of a glXRender or
     glXRenderLarge request), using the glXVendorPrivateWithReply
     request:

     GetProgramLocalParameterIivNV
         1           CARD8            opcode (X assigned)
         1           17               GLX opcode (X_GLXVendorPrivateWithReply)
         2           5                request length
         4           1365             vendor specific opcode
         4           GLX_CONTEXT_TAG  context tag
         4           ENUM             target
         4           CARD32           index
       =>
         1           1                reply
         1           CARD8            unused
         2           CARD16           sequence number
         4           4                reply length
         24          CARD32           unused
         16          INT32            params

     GetProgramLocalParameterIuivNV
         1           CARD8            opcode (X assigned)
         1           17               GLX opcode (X_GLXVendorPrivateWithReply)
         2           5                request length
         4           1366             vendor specific opcode
         4           GLX_CONTEXT_TAG  context tag
         4           ENUM             target
         4           CARD32           index
       =>
         1           1                reply
         1           CARD8            unused
         2           CARD16           sequence number
         4           4                reply length
         24          CARD32           unused
         16          CARD32           params

     GetProgramEnvParameterIivNV
         1           CARD8            opcode (X assigned)
         1           17               GLX opcode (X_GLXVendorPrivateWithReply)
         2           5                request length
         4           1367             vendor specific opcode
         4           GLX_CONTEXT_TAG  context tag
         4           ENUM             target
         4           CARD32           index
       =>
         1           1                reply
         1           CARD8            unused
         2           CARD16           sequence number
         4           4                reply length
         24          CARD32           unused
         16          INT32            params

     GetProgramEnvParameterIuivNV
         1           CARD8            opcode (X assigned)
         1           17               GLX opcode (X_GLXVendorPrivateWithReply)
         2           5                request length
         4           1368             vendor specific opcode
         4           GLX_CONTEXT_TAG  context tag
         4           ENUM             target
         4           CARD32           index
       =>
         1           1                reply
         1           CARD8            unused
         2           CARD16           sequence number
         4           4                reply length
         24          CARD32           unused
         16          CARD32           params

 Errors

     The error INVALID_VALUE is generated by ProgramLocalParameter4fARB,
     ProgramLocalParameter4fvARB, ProgramLocalParameter4dARB,
     ProgramLocalParameter4dvARB, ProgramLocalParameterI4iNV,
     ProgramLocalParameterI4ivNV, ProgramLocalParameterI4uiNV,
     ProgramLocalParameterI4uivNV, GetProgramLocalParameter4fvARB,
     GetProgramLocalParameter4dvARB, GetProgramLocalParameterI4ivNV, and
     GetProgramLocalParameterI4uivNV if <index> is greater than or equal to the
     number of program local parameters supported by <target>.

     The error INVALID_VALUE is generated by ProgramEnvParameter4fARB,
     ProgramEnvParameter4fvARB, ProgramEnvParameter4dARB,
     ProgramEnvParameter4dvARB, ProgramEnvParameterI4iNV,
     ProgramEnvParameterI4ivNV, ProgramEnvParameterI4uiNV,
     ProgramEnvParameterI4uivNV, GetProgramEnvParameter4fvARB,
     GetProgramEnvParameter4dvARB, GetProgramEnvParameterI4ivNV, and
     GetProgramEnvParameterI4uivNV if <index> is greater than or equal to the
     number of program environment parameters supported by <target>.

     The error INVALID_VALUE is generated by ProgramLocalParameters4fvNV,
     ProgramLocalParametersI4ivNV, and ProgramLocalParametersI4uivNV if the sum
     of <index> and <count> is greater than the number of program local
     parameters supported by <target>.

     The error INVALID_VALUE is generated by ProgramEnvParameters4fvNV,
     ProgramEnvParametersI4ivNV, and ProgramEnvParametersI4uivNV if the sum of
     <index> and <count> is greater than the number of program environment
     parameters supported by <target>.


 Dependencies on NV_parameter_buffer_object

     If NV_parameter_buffer_object is not supported, references to program
     parameter buffer variables and bindings should be removed.


 Dependencies on ARB_texture_rectangle

     If ARB_texture_rectangle is not supported, references to rectangle
     textures and the RECT and SHADOWRECT texture target identifiers should be
     removed.


 Dependencies on EXT_gpu_program_parameters

     If EXT_gpu_program_parameters is not supported, references to the
     Program{Local,Env}Parameters4fvNV commands, which set multiple program
     local or environment parameters in a single call, should be removed.
     These prototypes were included in this spec for completeness only.


 Dependencies on EXT_texture_integer

     If EXT_texture_integer is not supported, references to texture lookups
     returning integer values in Section 2.X.4.4 (Texture Access) should be
     removed, and all texture formats are considered to produce floating-point
     values.


 Dependencies on EXT_texture_array

     If EXT_texture_array is not supported, references to array textures in
     Section 2.X.4.4 (Texture Access) and elsewhere should be removed, as
     should all references to the "ARRAY1D", "ARRAY2D", "SHADOWARRAY1D", and
     "SHADOWARRAY2D" tokens.


 Dependencies on EXT_texture_buffer_object

     If EXT_texture_buffer_object is not supported, references to buffer
     textures in Section 2.X.4.4 (Texture Access) and elsewhere should be
     removed, as should all references to the "BUFFER" tokens.


 Dependencies on NV_primitive_restart

     If NV_primitive_restart is supported, index values causing a primitive
     restart are not considered as specifying an End command, followed by
     another Begin.  Primitive restart is therefore not guaranteed to
     immediately update bindings for material properties changed inside a
     Begin/End.  The spec language says they "are not guaranteed to update
     program parameter bindings until the following End command."


 New State

                                                          Initial
     Get Value                     Type  Get Command       Value  Description             Sec     Attrib
     ----------------------------  ----  ---------------  ------- ----------------------  ------  ------
     PROGRAM_ATTRIB_COMPONENTS_NV  Z+    GetProgramivARB     -    number of components    6.1.12   -
                                                                  used for attributes
     PROGRAM_RESULT_COMPONENTS_NV  Z+    GetProgramivARB     -    number of components    6.1.12   -
                                                                  used for results

     Table X.20.  New Program Object State.  Program object queries return
     attributes of the program object currently bound to the program target
     <target>.


 New Implementation Dependent State

                                                              Minimum
     Get Value                         Type  Get Command       Value   Description           Sec.   Attrib
     --------------------------------  ----  ---------------  -------  --------------------- ------ ------
     MIN_PROGRAM_TEXEL_OFFSET_EXT      Z     GetIntegerv        -8     minimum texel offset  2.x.4.4  -
                                                                       allowed in lookup
     MAX_PROGRAM_TEXEL_OFFSET_EXT      Z     GetIntegerv        +7     maximum texel offset  2.x.4.4  -
                                                                       allowed in lookup
     MAX_PROGRAM_ATTRIB_COMPONENTS_NV  Z+    GetProgramivARB    (*)    maximum number of     6.1.12   -
                                                                       components allowed
                                                                       for attributes
     MAX_PROGRAM_RESULT_COMPONENTS_NV  Z+    GetProgramivARB    (*)    maximum number of     6.1.12   -
                                                                       components allowed
                                                                       for results
     MAX_PROGRAM_GENERIC_ATTRIBS_NV    Z+    GetProgramivARB    (*)    number of generic     6.1.12   -
                                                                       attribute vectors
                                                                       supported
     MAX_PROGRAM_GENERIC_RESULTS_NV    Z+    GetProgramivARB    (*)    number of generic     6.1.12   -
                                                                       result vectors
                                                                       supported
     MAX_PROGRAM_CALL_DEPTH_NV         Z+    GetProgramivARB     4     maximum program       2.X.5    -
                                                                       call stack depth
     MAX_PROGRAM_IF_DEPTH_NV           Z+    GetProgramivARB     48    maximum program       2.X.5    -
                                                                       if nesting
     MAX_PROGRAM_LOOP_DEPTH_NV         Z+    GetProgramivARB     4     maximum program       2.X.5    -
                                                                       loop nesting

     Table X.21:  New Implementation-Dependent Values Introduced by
     NV_gpu_program4.  (*) means that the required minimum is program
     type-specific.  There are separate limits for each program type.


 Issues

     (1) How does this extension differ from previous NV_vertex_program and
     NV_fragment_program extensions?

       RESOLVED:

         - This extension provides a uniform set of instructions and bindings.
           Unlike previous extensions, the set of instructions and bindings
           available is generally the same.  The only exceptions are a small
           number of instructions and bindings that make sense for one specific
           program type.

         - This extension supports integer data types and provides a
           full-fledged integer instruction set.

         - This extension supports array variables of all types, including
           temporaries.  Array variables can be accessed directly or indirectly
           (using integer temporaries as indices).

         - This extension provides a uniform set of structured branching
           constructs (if tests, loops, subroutines) that fully support
           run-time condition testing.  Previous versions of NV_vertex_program
           provided unstructured branching.  Previous versions of
           NV_fragment_program provided structure branching constructs, but the
           support was more limited -- for example, looping constructs couldn't
           specify loop counts with values computed at run time.

         - This extension supports geometry programs, which are described in
           more detail in the NV_geometry_program4 extension.

         - This extension provides the ability to specify and use cubemap
           textures with a DEPTH_COMPONENT internal format.  Shadow mapping is
           supported; the Q texture coordinate is used as the reference value
           for comparisons.

     (2) Is this extension backward-compatible with previous NV_vertex_program
     and NV_fragment_program extensions?  If not, what support has been
     removed?

       RESOLVED:  This extension is largely, but not completely,
       backward-compatible.  Functionality removed includes:

         - Unstructured branching:  NV_vertex_program2 included a general
           branch instruction "BRA" that could be used to jump to an arbitrary
           instruction.  The "CAL" instruction could "call" to an arbitrary
           instruction into code that was not necessarily structured as simple
           subroutine blocks.  Arbitrary unstructured branching can be
           difficult to implement efficiently on highly parallel GPU
           architectures, while basic structured branching is not nearly as
           difficult.

           This extension retains the "CAL" instruction but treats each block
           of code between instruction labels as a separate subroutine.  The
           "BRA" instruction and arbitrary branching has been removed.  The
           structured branching constructs in this extension are sufficient to
           implement almost all of the looping/branching support in high-level
           languages ("goto" being the most obvious exception).

         - Address registers:  NV_vertex_program added the notion of address
           registers, which were effectively under-powered integer temporaries.
           The set of instructions used to manipulate address registers was
           severely limited.  NV_vertex_program[23] extended the original
           scalars to vectors and added a few more instructions to manipulate
           address registers.  Fragment programs had no address registers until
           NV_fragment_program2 added the loop counter, which was very similar
           in functionality to vertex program address registers, but even more
           limited.  This extension adds true integer temporaries, which can
           accomplish everything old address registers could do, and much more.
           Address register support was removed to simplify the API.

         - NV_fragment_program2 LOOP construct:  NV_fragment_program2 added a
           LOOP instruction, which let you repeat a block of code <N> times,
           with a parallel loop counter that started at <A> and stepped by <B>
           on each iteration.  This construct was signficantly limited in
           several ways -- the loop count had to be constant, and you could
           only access the innermost loop counter in a nested loop.  This
           extension discards the support and retains the simpler "REP"
           construct to implement loops.  If desired, a loop counter can be
           implemented by manipulating an integer temporary.  The "BRK"
           instruction (conditional break) is retained, and a "CONT"
           instruction (conditional continue) is added.  Additionally, the loop
           count need not be a constant.

         - NV_vertex_program and ARB_vertex_program EXP and LOG instructions:
           NV_vertex_program provided EXP and LOG instructions that computed a
           rough approximation of 2^x or log_2(x) and provided some additional
           values that could help refine the approximation.  Those opcodes were
           carried forward into ARB_vertex_program.  Both ARB_vertex_program
           and NV_vertex_program2 provided EX2 and LG2 instructions that
           computed a better approximation.  All fragment program extensions
           also provided EX2 and LG2, but did not bother to include EXP and
           LOG.  On the hardware targeted by this extension, there is no
           advantage to using EXP and LOG, so these opcodes have been removed
           for simplicity.

         - NV_vertex_program3 and NV_fragment_program2 provide the ability to
           do indirect addressing of inputs/outputs when using bindings in
           instructions -- for example:

             MOV R0, vertex.attrib[A0.x+2];      # vertex
             MOV result.texcoord[A0.y], R1;      # vertex
             MOV R2, fragment.texcoord[A0.x];    # fragment

           This extension provides indexing capability, but using named array
           variables instead.

             ATTRIB attribs[] = { vertex.attrib[2..5] };
             MOV R0, attribs[A0.x];
             OUTPUT outcoords[] = { result.texcoord[0..3] };
             MOV outcoords[A0.y], R1;
             ATTRIB texcoords[] = { fragment.texcoord[0..2] };
             MOV R2, texcoords[A0.x];

           This approach makes the set of attribute and result bindings more
           regular.  Additionally, it helps the assembler determine which
           vertex/fragment attributes are actually needed -- when the assembler
           sees constructs like "fragment.texcoord[A0.x]", it must treat *all*
           texture coordinates as live unless it can determine the range of
           values used for indexing.  The named array variable approach
           explicitly identifies which attributes are needed when indexing is
           used.

       Functionality altered includes:

         - The RSQ instruction in the original NV_vertex_program and
           ARB_vertex_program extensions implicitly took the absolute value of
           their operand.  Since the ARB extensions don't have numerics
           guarantees, computing the reciprocal square root of a negative value
           was not meaningful.  To allow for the possibility of taking the
           reciprocal square root of a negative value (which should yield NaN
           -- "not a number"), the RSQ instruction in this instruction no
           longer implicitly takes the absolute value of its operand.
           Equivalent functionality can be achieved using the explicit |abs|
           absolute value operator on the operand to RSQ.

         - The results of texture lookups accessing inconsistent textures are
           now undefined, instead of producing a fixed constant vector.


     (3) What should this set of extensions be called?

       RESOLVED:  NV_gpu_program4, NV_vertex_program4, NV_fragment_program4,
       and NV_geometry_program4.  Only NV_gpu_program4 will appear in the
       extension string; the other three specifications exist simply to define
       vertex, fragment, and geometry program-specific features.

       The "gpu_program" name was chosen due to the common instruction set
       intended to run on GPUs.  On previous chip generations, the vertex and
       fragment instruction sets were similar, but there were enough
       differences to package them separately.

       The choice of "4" indicates that this is the fourth generation of
       programmable hardware from NVIDIA.  The GeForce3 and GeForce4 series
       supported NV_vertex_program.  The GeForce FX series supported
       NV_vertex_program2 and added fragment programmability with
       NV_fragment_program.  Around this time, the OpenGL Architecture Review
       Board (ARB) approved ARB_vertex_program and ARB_fragment_program
       extensions, and NVIDIA added NV_vertex_program2_option and
       NV_fragment_program_option extensions exposing GeForce FX features using
       the ARB extensions' instruction set.  The GeForce6 and GeForce7 series
       brought the NV_vertex_program3 and NV_fragment_program2 extensions,
       which extend the ARB extensions further.  This extension adds geometry
       programs, and brings the "version number" for each of these extensions
       up to "4".


     (4) This instruction adds integer data type support in programmable
     shaders that were previously float-centric.  Should applications be able
     to pass integer values directly to the shaders, and if so, how does it
     work?

       RESOLVED:  The diagram at the bottom of this issue depicts data flows in
       the GL, as extended by this and related extensions.

       This extension generalizes some state to be "typeless", instead of being
       strongly typed (and almost invariably floating-point) as in the core
       specification.  We introduce a new set of functions to specify GL state
       as signed or unsigned integer values, instead of floating point values.
       These functions include:

         * VertexAttribI*{i,ui}() -- Specify generic vertex attributes as
           integers.  This extension does not create "integer" versions for
           fixed-function attribute functions (e.g., glColor, glTexCoord),
           which remain fully floating-point.

         * Program{Env,Local}ParameterI*{i,ui}() -- Specify environment and
           local parameters as integers.

         * TexImage*() with EXT_texture_integer internal formats -- Specify
           texture images as containing integer data whose values are not
           converted to floating-point values.

         * EXT_parameter_buffer_object functions -- Bind (typeless) buffer
           object data stores for use as program parameters.  These buffer
           objects can be loaded with either integer or floating-point data.

         * EXT_texture_buffer_object functions -- Bind (typeless) buffer object
           data stores for use as textures.  These buffer objects can be loaded
           with either integer or floating-point data.

       Each type of program (using NV_gpu_program4 and related extension) can
       read attributes using any data type (float, signed integer, unsigned
       integer) and write result values used by subsequent stages using any
       data type.

       Finally, there are several new places where integer data can be
       consumed by the GL:

         * NV_transform_feedback -- Stream transformed vertex attribute
           components to a (typeless) buffer object.  The transformed
           attributes can be written as signed or unsigned integers in vertex
           and geometry programs.

         * EXT_texture_integer internal formats and framebuffer objects --
           Provide support for rendering to integer texture formats, where
           final fragment values are treated as signed or unsigned integers,
           rather than floating-point values.

       The diagram below represents a substantial portion of the GL pipeline.
       Each line connecting blocks represents an interface where data is
       "produced" from the GL state or by fixed-function or programmable
       pipeline stages and "consumed" by another pipeline stage.  Each producer
       and consumer is labeled with a data type.  For producers, the
       "(typeless)" designation generally means that the state and/or output
       can be written as floating-point values or as signed or unsigned
       integers.  "(float)" means that the outputs are always written as
       floating-point.  The same distinction applies to consumers --
       "(typeless)" means that the consumer is capable of reading inputs using
       any data type, and "(float)" means that consumer always reads inputs as
       floating-point values.

       To get sane results, applications must ensure that each value passed
       between pipeline stages is produced and consumed using the same data
       type.  If a value is written in one stage as a floating-point value; it
       must be read as a floating-point value as well.  If such a value is read
       as a signed or unsigned integer, its value is considered undefined.  In
       practice, the raw bits used to represent the floating-point (IEEE
       single-precision floating-point encoding in the initial implementation
       of this spec) will be treated as an integer.

       Type matching between stages is not enforced by the GL, because the
       overhead of doing so would be substantial.  Such overhead would include:

         * matching the inputs and outputs of each pipeline stage
           (fixed-function or programmable) every time the program
           configuration or fixed-function state changes,

         * tracking the data type of each generic vertex attribute and checking
           it against the vertex program's inputs,

         * tracking the data type of each program parameter and checking it
           against the manner the parameters were used in programs,

         * matching color buffers against fragment program outputs.

       Such error checking is certainly valuable, but the additional CPU
       overhead cost is substantial.  Given that current CPUs often have a hard
       time keeping up with high-end GPUs, adding more overhead is a step in
       the wrong direction.  We expect developer tools, such as instrumented
       drivers, to be able to provide type checking on most interfaces.

       The diagram below depicts assembly programmability.  Using vertex,
       geometry, and fragment shaders provided by the OpenGL Shading Language
       (GLSL) isn't substantially different from the assembly interface, except
       that the interfaces between programmable pipeline stages are more
       tightly coupled in GLSL (vertex, geometry, and fragment shaders are
       linked together into a single program object), and that shader variables
       are more strongly typed in GLSL than in the assembly interface.

       In the figure below, the first programmable stage is vertex program
       execution.  For all inputs read by the vertex program, they must be
       specified in the GL vertex APIs (immediate mode or vertex arrays) using
       a data type matching the data type read by the shader.  Additionally,
       vertex programs (and all other program types) can read program
       parameters, parameter buffers, and textures.  In all cases the
       parameter, buffer, or texture data must be accessed in the shader using
       the same data type used to specify the data.  If vertex programs are
       disabled, fixed-function vertex processing is used.  Fixed-function
       vertex processing is fully floating-point, and all the conventional
       vertex attributes and state used by fixed-function are floating-point
       values.

       After vertex processing, an optional geometry program can be executed,
       which reads attributes written by vertex programs (or fixed-functon) and
       writes out new vertex attributes.  The vertex attributes it reads must
       have been written by the vertex program (or fixed-function) using a
       matching data type.

       After geometry program execution, vertex attributes can optionally be
       written out to buffer objects using the NV_transform_feedback extension.
       The vertex attributes are written by the GL to the buffer objects using
       the same data type used to write the attribute in the geometry program
       (or vertex program if geometry programs are disabled).

       Then, rasterization generates fragments based on transformed vertices.
       Most attributes written by vertex or geometry programs can be read by
       fragment programs, after the rasterization hardware "interpolates" them.
       This extension allows fragment programs to control how each attribute is
       interpolated.  If an attribute is flat-shaded, it will be taken from the
       output attribute of the provoking vertex of the primitive using the same
       data type.  If an attribute is smooth-shaded, the per-vertex attributes
       will be interpreted as a floating-point value, and a floating-point
       result.  One necessary consequence of this is that any integer
       per-fragment attributes must be flat-shaded.  To prevent some
       interpolation type errors, assembly and GLSL fragment shaders will not
       compile if they declare an integer fragment attribute that is not flat
       shaded.  [NOTE:  While point primitives generally have constant
       attributes, any integer attributes must still be flat-shaded; point
       rasterization may perform (degenerate) floating-point interpolation.]

       Fragment programs must read attributes using data types matching the
       outputs of the interpolation or flat-shading operations.  They may write
       one or more color outputs using any data type, but the data type used
       must match the corresponding framebuffer attachments.  Outputs directed
       at signed or unsigned integer textures (EXT_texture_integer) must be
       written using the appropriate integer data type; all other outputs must
       be written as floating-point values.  Note that some of the
       fixed-function per-fragment operations (e.g., blending, alpha test) are
       specified as floating-point operations and are skipped when directed at
       signed or unsigned integer color buffers.


                                      generic               conventional
                                      vertex                  vertex
                                     attributes              attributes
                                        | (typeless)             | (float)
                                        |                        |
                                        |                        |
                                        | +----------------------+
          program                       | |                      |
         parameters ----+               | |                      |
         (typeless)     |               | | (typeless)           | (float)
                        |               V V                      V
          constant      +-+----------> vertex              fixed-function
          buffers   ----+ |(typeless)  program                 vertex
         (typeless)     | |              |                       |
                        | |              | (typeless)            | (float)
          textures  ----+ |              V                       |
         (typeless)       |              |<----------------------+
             |            |              |
             |            |              +---------------+
             |            |              |               |
             |            |              | (typeless)    |
             |            |              V               |
             |            +---------> geometry           |
             |            |(typeless) program            |
             |            |              |               |
             |            |              | (typeless)    |
             |            |              V               |
             |            |              |<--------------+
             |            |              |
             |            |              |
             |            |              +-----------------+
             |            |              |                 |(typeless)
             |            |              |                 v
             |            |              |             transform
             |            |              |             feedback
             |            |              |              buffers
             |            |              |
             |            |              |
             |            |              +-----------------------+
             |            |              |                       |
             |            |              | (float)               | (typeless)
             |            |              V                       V
             |            |         interpolated               flat
             |            |          attributes             attributes
             |            |              |                       |
             |            |              | (float)               | (typeless)
             |            |              V                       |
             |            |              |<----------------------+
             |            |              |
             |            |              +-----------------------+
             |            |              |                       |
             |            |              | (typeless)            | (float)
             |            |(typeless)    V                       V
             |            +---------> fragment     +------> fixed-function
             |                        program      |(float)   fragment
             |                           |         |             |
             +--------------------------/|/--------+             |
                                         |                       |
                                         | (typeless)            | (float)
                                         V                       |
                                         |<----------------------+
                                         |
                                         +-----------------------+------ ....
                                         |                       |
                                         | (typeless)            | (typeless)
                                         V                       V
                                       color                   color
                                     attachment              attachment
                                         0                       1


     (5) Instructions can operate on signed integer, unsigned integer, and
     floating-point values.  Some operations make sense on all three data
     types?  How is this supported, and what type checking support is provided
     by the assembler?

       RESOLVED:  One important property of the instruction set is that the
       data type for all operands and the result is fully specified by the
       instructions themselves.  For instructions (such as ADD) that make sense
       for both integer and floating-point values, an optional data type
       modifier is provided to indicate which type of operation should be
       performed.  For example, "ADD.S", "ADD.U", and "ADD.F", add signed
       integers, unsigned integers, or floating-point values, respectively.  If
       no data type modifier is provided, ".F" is assumed if the instruction
       can apply to floating-point values and ".S" is assumed otherwise.

       To help identify errors where the wrong data type is used -- for
       example, adding integer values in an ADD instruction that omits a data
       type modifier and thus defaults to "ADD.F" -- variables may be declared
       with optional data type modifiers.  In the following code:

         INT TEMP a;
         UINT TEMP b;
         FLOAT TEMP c;
         TEMP d;

       "a", "b", "c", and "d" are declared as temporary variables holding
       signed integer, unsigned integer, floating-point, and typeless values.
       Since each instruction fully specifies the data type of each operand and
       its result, these data types can be checked against the data type
       assigned to the variables operated on.  If the types don't match, and
       the variable is not typeless, an error is reported.  The opcode modifier
       ".NTC" can be used to ignore such errors on a per-opcode basis, if
       required.

       Note that when bindings are used directly in instructions, they are
       always considered typeless for simplicity.  Some fixed-function bindings
       have an obvious data type, but other bindings (e.g., program parameters)
       can hold either integer or floating-point values, depending on how they
       were specified.

       Variable data types are optional.  Typeless variables are provided
       because some programs may want to reuse the same variable in several
       places with different data types.

     (6) Should both signed (INT) and unsigned integer (UINT) data types be
     provided?

       RESOLVED:  Yes.  Signed and unsigned integer operations are supported.
       Providing both "INT" and "UINT" variable modifiers distinguish between
       signed and unsigned values for type checking purposes, to ensure that
       unsigned values aren't read as signed values and vice versa.

       This specification says if a value is read a signed integer, but was
       written as an unsigned integer, the value returned is undefined.
       However, signed and unsigned integers are interchangeable in practice,
       except for very large unsigned integers (which can't be represented as
       signed values of the equivalent size) or negative signed integers.

       If programs know that they won't generate negative or very large values,
       signed and unsigned integers can be used interchangeably.  To avoid type
       errors in the assembler in this case, typeless variables can be used.
       Or the ".NTC" modifier can be used when appropriate.

     (7) Integer and floating-point constants are supported in the instruction
     set.  Integer constants might be interpreted to mean either "real integer"
     values or floating-point values.  How are they supported?

       RESOLVED:  When an obvious floating point constant is specified (e.g.,
       "3.0"), the developers' intent is clear.  If you try to use a
       floating-point value in an instruction that wants an integer operand, or
       a declaration of an integer parameter variable, the program will fail to
       load.  An integer constant used in an instruction isn't quite as clear.
       But its meaning can be easily inferred because the operand types of
       instructions are well-known at compile time.  An integer multiply
       involving the constant "2" will interpret the "2" as an integer.  A
       floating-point multiply involving the same constant "2" will interpret
       it as a floating-point value.

       The only real problem is for a parameter declaration that is typeless.
       For typed variables, the intent is clear:

         INT PARAM two = 2;               # use integer 2
         FLOAT PARAM twoPt0 = 2;          # use floating-point 2.0

       For typeless variables, there's no context to go on:

         PARAM two = 2;                   # 2?  2.0?

       This extension is intended to be largely upward-compatible with
       ARB_vertex_program, ARB_fragment_program, and the other extensions built
       on top of them.  In all of these, the previous declaration is legal and
       means "2.0".  For compatibility, we choose to interpret integer
       constants in this case as floating-point values.  The assembler in the
       NVIDIA implementation will issue a warning if this case ever occurs.

       This extension does not provide decoration of integer constant values --
       we considered adding suffixed integers such as "2U" to mean "2, and
       don't even think about converting me to a float!".  We expect that it
       will be sufficient to use the "INT" or "FLOAT" modifiers to disambiguate
       effectively.

     (8) Should hexadecimal constants (e.g., 0x87A3 or 0xFFFFFFFF) be supported?

       RESOLVED:  Yes.

     (9) Should we provide data type modifiers with explicit component sizes?
     For example, "INT8", "FLOAT16", or "INT32".  If so, should we provide a
     mechanism to query the size (in bits) of a variable, or of different
     variable types/qualifiers?

       RESOLVED:  No.

     (10) Should this extension provide better support for array variables?

       RESOLVED:  Yes; array variables of all types are allowed.

       In ARB_vertex_program, program parameter (constant) variables could be
       addressed as arrays.  Temporary variables, vertex attributes, and vertex
       results could not be declared as arrays.

       In NV_vertex_program3 and NV_fragment_program2, relative addressing was
       supported in program bindings:

         MOV R0, vertex.attrib[A0.x];            # vertex
         MOV result.texcoord[A0.x], R0;          # vertex
         MOV R0, fragment.texcoord[A0.x];        # fragment -- inside LOOP

       Explicitly declared attribute or result arrays were not supported, and
       temporaries could also not be arrays.

       This extension allows users to declare attribute, result, and temporary
       arrays such as:

         ATTRIB attribs[] = { vertex.attrib[7..11] };
         TEMP scratch[10];
         RESULT texcoords[] = { result.texcoord[0..3] };

       Additionally, the relative addressing mechanisms provided by
       NV_vertex_program3 and NV_fragment_program2 are NOT supported in this
       extension -- instead, declared array variables are the only way to get
       relative addressing.  Using declared arrays allows the assembler to
       identify which attributes will actually be used.  An expression like
       "vertex.texcoord[A0.x]" doesn't identify which texture coordinates are
       referenced, and the assembler must be conservative in this case and
       assume that they all are.

     (11) Is relative addressing of temporaries allowed?

       RESOLVED:  Yes.  However, arrays of temporaries may end up being stored
       in off-chip memory, and may be slower to access than non-array
       temporaries.

     (12) Should this extension add bindings to pass generic attributes between
     vertex, geometry, and fragment programs, or are texture coordinates
     sufficient?

       RESOLVED:  While texture coordinates have been used in the past, generic
       attributes should be provided.

       The assembler provides a large set of bindings and automatically
       eliminates generic attributes or components that are unused.  At each
       interface between programs, there is an implementation-dependent limit
       on the number of attribute components that can be passed.

       There are several reasons that this approach was chosen.  First, if the
       number of attributes that can be passed between program stages exceeds
       the number of existing texture coordinate sets supported when specifying
       vertex, a second implementation-dependent number of texture coordinates
       would need to be exposed to cover the number supported between stages.
       Second, the mechanisms described above reduce or eliminate the need to
       pack attributes into four component vectors.  Third, "texture
       coordinates" that have been historically used for texture lookups don't
       need to be used to pass values that aren't used this way.

     (13) The structured branching support in NV_fragment_program2 provides a
     REP instruction that says to repeat a block of code <N> times, as well as
     a LOOP instruction that does the same, but also provides a special loop
     counter variable.  What sort of looping mechanism should we provide here?

       RESOLVED:  Provide only the REP instruction.  The functionality provided
       by the LOOP instruction can be easily achieved by using an integer
       temporary as the loop index.  This avoids two annoyances of the old LOOP
       models:  (a) the loop index (A0.x) is a special variable name, while all
       other variables are declared normally and (b) instructions can only
       access the loop index of the innermost loop -- loop indices at higher
       nesting levels are not accessible.

       One other option was a considered -- a "LOOPV" instruction (LOOP with a
       variable where the program specified a variable name and component to
       hold the loop index, instead of using the implicit variable name "A0.x".
       In the end, it was decided that using an integer temporary as a loop
       counter was sufficient.

     (14) The structured branching support in NV_fragment_program2 provides a
     REP instruction that requires a loop count.  Some looping constructs may
     not have a definite loop count, such as a "while" statement in C.  Should
     this construct be supported, and if so, how?

       RESOLVED:  The REP instruction is extended to make the loop count
       optional.  If no loop count is provided, the REP instruction specified a
       loop that can only be exited using the BRK (break) or RET instructions.
       To avoid obvious infinite loops, an error will be reported if a
       REP/ENDREP block contains no BRK instruction at the current nesting
       level and no RET instruction at any nesting level.

       To implement a loop like "while (value < 7.0) ...", code such as the
       following can be used:

         TEMP cc;                        # dummy variable
         REP;
           SLT.CC cc.x, value.x, 7.0;    # compare value.x to 7.0, set CC0
           BRK NE.x;                     # break out if not true
           ...
           ...                           # presumably update value!
           ...
         ENDREP;

     (15) The structured branching support in NV_fragment_program2 provides a
     BRK instruction that operates like C's "break" statement.  Should we
     provide something similar to C's "continue" statement, which skips to the
     next iteration of the loop?

       RESOLVED:  Yes, a new CONT opcode is provided for this purpose.

     (16) Can the BRK or CONT instructions break out of multiple levels of
     nested loops at once?

       RESOLVED:  No.  BRK and CONT only exit the current nesting level.  To
       break out of multiple levels of nested loops, multiple BRK/CONT
       instructions are required.

     (17) For REP instructions, is the loop counter reloaded on each iteration
     of the loop?

       RESOLVED:  No.  The loop counter is loaded once at the top of the loop,
       compared to zero at the top of the loop, and decremented when each loop
       iteration completes.  A program may overwrite the variable used to
       specify the initial value of the loop counter inside the loop without
       affecting the number of times the loop body is executed.

     (18) How are floating-point values represented in this extension?  What
     about floating-point arithmetic operations?

       RESOLVED:  In the initial hardware implementation of this extension,
       floating-point values are represented using the standard 32-bit IEEE
       single-precision encoding, consisting of a sign bit, 8 exponent bits,
       and 23 mantissa bits.  Special encodings for NaN (not a number), +/-INF
       (infinity), and positive and negative zero are supported.  Denorms
       (values less than 2^-126, which have an exponent encoding of "0" and no
       implied leading one) are supported, but may be flushed to zero,
       preserving the sign bit of the original value.  Arithmetic operations
       are carried out at single-precision using normal IEEE floating-point
       rules, including special rules for generating infinities, NaNs, and
       zeros of each sign.

       Floating-point temporaries declared as "SHORT" may be, but are not
       necessarily, stored as 16-bit "fp16" values (sign bit, five exponent
       bits, ten mantissa bits), as specified in the NV_float_buffer and
       ARB_half_float_pixel extensions.

     (19) Should we provide a method to declare how fragment attributes are
     interpolated?  It is possible to have flat-shaded attributes,
     perspective-corrected attributes, and centroid-sampled attributes.

       RESOLVED:  Yes.  Fragment program attribute variable declarations may
       specify the "FLAT", "NOPERSPECTIVE", and "CENTROID" modifiers.

       These modifiers are documented in detail in the NV_fragment_program4
       specification.

     (20) Should vertex and primitive identifiers be supported?  If so, how?

       RESOLVED:  A vertex identifier is available as "vertex.id" in a vertex
       program.  The vertex ID is equal to value effectively passed to
       ArrayElement when the vertex is specified, and is defined only if vertex
       arrays are used with buffer objects (VBOs).

       A primitive identifier is available as "primitive.id" in a geometry or
       fragment program.  The primitive ID is equal to the number of primitives
       processed since the last implicit or explicit call to glBegin().

       See the NV_vertex_program4 spec for more information on vertex IDs, and
       the NV_geometry_program4 or NV_fragment_program4 specs for more
       information on primitive IDs.

     (21) For integer opcodes, should a bitwise inversion operator "~" be
     provided, analogous to existing negation operator?

       RESOLVED:  No.  If this operator were provided, it might allow a program
       to evaluate the expression "a&(~b)" using a single instruction:

         AND.U a, a, ~b;

       Instead, it is necessary to instead do something like:

         UINT TEMP t;
         NOT.U t, b;
         AND.U a, a, t;

       If necessary, this functionality could be added in a subsequent
       extension.

     (22) What happens if you negate or take the absolute value of the
     biggest-magnitude negative integer?

       RESOLVED:  Signed integers are represented using two's complement
       representation.  For 32-bit integers, the largest possible value is
       2^31-1; the smallest possible value is -2^31.  There is no way to
       represent 2^31, which is what these operators "should" return.  The
       value returned in this case is the original value of -2^31.

     (23) How do condition codes work?  How are they different from those
     provided in previous NVIDIA extensions?

       RESOLVED:  There are two condition codes -- CC0 and CC1 -- each of which
       is a four-component vector.  The condition codes are set based on the
       result of an instruction that specifies a condition code update
       modifier.  Examples include:

         ADD.S.CC  R0, R1, R2;       # add signed integers R1 and R2, update
                                     #   CC0 based on the result, write the
                                     #   final value to R0
         ADD.F.CC1 R3, R4, R5;       # add floats R4 and R5, update CC1 based
                                     #   on the result, write the final value
                                     #   to R3
         ADD.U.CC0 R6.xy, R7, R8;    # add unsigned integers R7 and R8, update
                                     #   CC0 (x and y components) based on the
                                     #   result, write the final value to R6
                                     #   (x and y components)

       Condition codes can be used for conditional writes, conditional
       branches, or other operations.  The condition codes aren't used
       directly, but are instead used with a condition code test such as "LT"
       (less than) or "EQ" (equal to).  Examples include:

         MOV R0 (GT.x), R1;          # move R1 to R0 only if the x component of
                                     #   CC0 indicates a result of ">0"
         MOV R2 (NE1), R3;           # component-wise move of R3 to R2 if the
                                     #   corresponding component of CC1
                                     #   indicates a result of "!=0"
         IF LE0.xyxy;                # execute the block of code if the x or
           ...                       #   y components of CC0 indicate a result
         ENDIF;                      #   of "<=0"
         REP;
           ...
           BRK EQ1.xyzx;             # break out of loop if the x, y, or z
         ENDREP;                     #   components of CC1 indicate a result of
                                     #   "==0".

       Previous NVIDIA extensions provide eight tests, which are still
       supported here.  The tests "EQ" (equal), "GE" (greater/equal), "GT"
       (greater than), "LE" (less/equal), "LT" (less than), and "NE" (not
       equal) can be used to determine the relation of the result used to set
       the condition code with zero.  The tests "TR" (true) and "FL" (false),
       are special tests that always evaluate to true or false respectively.

       For floating-point results, a NaN (not a number) encoding causes the
       "NE" condition to evaluate to TRUE and all other conditions to evaluate
       to FALSE.  IEEE encodings for "negative" and "positive" zero are both
       treated as equal to zero.

       Condition codes are implemented as a set of flags, which are set
       depending on the type of operation, as described in the spec.

       For instructions that return floating-point or signed integer values,
       the normal condition code tests reliably indicate the relationship of
       the result to zero.  For instructions that return unsigned values, the
       condition codes are a bit more complicated.  For example, the sign flag
       is set if the most significant bit of the result written is set.  As a
       result, very large unsigned integer values (e.g., 0x80000000 -
       0xFFFFFFFF) are effectively treated as negative values.  Condition code
       tests should be used with care with unsigned results -- to test if an
       unsigned integer is ">0", use a sequence like:

         MOV.U.CC R0, R1;            # move R1 to R0, set condition code
         IF NE;                      # test if the result is "!=0", a very
           ...                       #   large value might fail "GT"!
         ENDIF;

       This extension provides a number of additional condition code tests
       useful for different floating-point or integer operations:

         * NAN (not a number) is true if a floating-point result is a NaN.  LEG
           (less, equal to, or greater) is the opposite of NAN.

         * CF (carry flag) is true if an unsigned add overflows, or if an
           unsigned subtract produces a non-negative value.  NCF (no carry
           flag) is the opposite of CF.

         * OF (overflow flag) is true if a signed add or subtract overflows.
           NOF (no overflow flag) is the opposite of OF.

         * SF (sign flag) is true if the sign flag is set.  NSF (no sign flag)
           is the opposite of SF.

         * AB (above) is true if an unsigned subtract produces a positive
           result.  BLE (below or equal) is the opposite of AB, and is true if
           an unsigned subtract produces a negative result or zero.  Note that
           CF can be used to test if the result is greater than or equal to
           zero, and NCF can be used to test if the result is less than zero.

     (24) How do the "set on" instructions (SEQ, SGE, SGT, SLE, SLT, SNE) work
     with integer values and/or condition codes?

       RESOLVED:  "Set on" instructions comparing signed and unsigned values
       return zero if the condition is false, and an integer with all bits set
       if the condition is true.  If the result is signed, it is interpreted as
       -1.  If the result is unsigned, it is interpreted the largest unsigned
       value (0xFFFFFFFF for 32-bit integers).  This is different from the
       floating-point "set on", which is defined to return 1.0.

       This specific result encoding was chosen so that bitwise operators (NOT,
       AND, OR, XOR) can be used to evaluate boolean expressions.

       When performing condition code tests on the results of an integer "set
       on" instruction, keep in mind that a TRUE result has the most
       significant bit set and will be interpreted as a negative value.  To
       test if a condition is true, use "NE" (!=0).  A condition code test of
       "GT" will always fail if the condition code was written by an integer
       "set on" instruction.

     (25) What new texture functionality is provided?

       RESOLVED:  Several new features are provided.

       First, the TXF (texel fetch) instruction allows programs to access a
       texture map like a normal array.  Integer coordinates identifying an
       individual texel and LOD are provided, and the corresponding texture
       data is returned without filtering of any type.

       Second, the TXQ (texture size query) instruction allows programs to
       query the size of a specified level of detail of a texture.  This
       feature allows programs to perform computations dependent on the size of
       the texture without having to pass the size as a program parameter or
       via some other mechanism.

       Third, applications may specify a constant texel offset in a texture
       instruction that moves the texture sample point by the specified number
       of texels.  This offset can be used to perform custom texture filtering,
       and is also independent of the size of the texture LOD -- the same
       offsets are applied, regardless of the mipmap level.

       Fourth, shadow mapping is supported for cube map textures.  The first
       three coordinates are the normal (s,t,r) coordinates for a cube map
       texture lookup, and the fourth component is a depth reference value that
       can be compared to the depth value stored in the texture.

     (26) What "consistency" requirements are in effect for textures accessed
     via the TXF (texel fetch) instruction?

       UNRESOLVED:  The texture must be usable for regular texture mapping
       operations -- if texture sizes or formats are inconsistent and a
       mipmapped min filter is used, the results are undefined.

     (27) How does the TXF instruction work with bordered textures?

       RESOLVED:  The entire image can be accessed, including the border
       texels.  For a 64x64 2D texture plus border (66x66 overall), the lower
       left border texel is accessed using the coordinates (-1,-1); the upper
       right border texel is accessed using the coordinates (64,64).

     (28) What should TXQ (texture size query) return for "irrelevant" texture
     sizes (e.g., height of a 1D texture)?  Should it return any other
     information at the same time?

       RESOLVED:  This specification leaves all "extra" components undefined.

     (29) How do texture offsets interact with cubemap textures?

       RESOLVED:  They are not supported in this extension.

     (30) How do texture offsets interact with mipmapped textures?

       RESOLVED:  The texture offsets are added after the (s,t,r) coordinates
       have been divided by q (if applicable) and converted to (u,v,w)
       coordinates by multiplying by the size of the selected texture level.
       The offsets are added to the (u,v,w) coordinates, and always move the
       sample point by an integral number of texel coordinates.  If multiple
       mipmaps are accessed, the sample point in each mipmap level is moved by
       an identical offset.  The applied offsets are independent of the
       selected mipmap level.

     (31) How do shadow cube maps work?

       UNRESOLVED:  An application can define a cube map texture with a
       DEPTH_COMPONENT internal format, and then render a scene using the cube
       map faces as the depth buffer(s).  When rendering the projection should
       be set up using the "center" of the cubemap as the eye, and using a
       normal projection matrix.  When applying the shadow map, the fragment
       program read the (x,y,z) eye coordinates, compute the length of the
       major axis (MAX(|x|,|y|,|z|) and then transform this coordinate to [0,1]
       space using the same parameters used to derive Z in the projection
       matrix.  A 4-component vector consisting of x, y, z, and this computed
       depth value should be passed to the texture lookup, and normal shadow
       mapping operations will be performed.

       This issue should include the math needed to do this computation and
       sample code.

     (32) Integer multiplies can overflow by a lot.  Should there be some way
     to return the high part of both unsigned and signed integer multiplies?

       RESOLVED:  Yes.  The ".HI" multipler is provided to do a return the 32
       MSBs of a 32x32 integer multiply.  The instruction sequence:

         INT TEMP R0, R1, R2, R3;
         MUL.S    R0, R2, R3;
         MUL.S.HI R1, R2, R3;

      will do a 32x32 signed integer multiply of R2 and R3, with the 32 LSBs of
      the 64-bit result in R0 and the 32 MSBs in R1.

     (33) Should there be any other special multiplication modifiers?

       RESOLVED:  Yes.  The ".S24" and ".U24" modifiers allow for signed and
       unsigned integer multiplies where both operands are guaranteed to fit in
       the least significant 24 bits.  On some architectures supporting this
       extension, ".S24" and ".U24" integer multiplies may be faster than
       general-purpose ".S" and ".U" multiplies.  If either value doesn't fit
       in 24 bits, the results of the operation are undefined --
       implementations may, but are not required to, ignore the MSBs of the
       operands if ".S24" or ".U24" is specified.

     (34) This extension provides subroutines, but doesn't provide a stack to
     push and pop parameters.  How do we deal with this?  NV_vertex_program3
     supported PUSHA/POPA instructions to push and pop address registers.

       RESOLVED:  No explicit stack is required.  A program can implement a
       stack by allocating a temporary array plus a single integer temporary to
       use as the stack "pointer".  For example:

         TEMP stack[256];                # 256 4-component vectors
         INT TEMP sp;                    # sp.x == stack pointer
         INT TEMP cc;                    # condition code results

         function:
           SGE.S.CC cc.x, sp.x, 256;     # compute stackPointer >= 256
           RET NE.x;                     # return if TRUE
           MOV stack[sp], R0;            # push R0 onto the stack
           ADD.S sp.x, sp.x, 1;
           ...
           SUB.S sp.x, sp.x, 1;          # pop R0 off the stack
           MOV R0, stack[sp];
           RET

     (35) Should we provide new vector semantics for previously-defined opcodes
     (e.g., LG2 computes a component-wise logarithm)?

       RESOLVED:  Not in this extension.  The instructions we define here are
       compatible with the vector or scalar nature of previously defined
       opcodes.  This simplifies the implementation of an assembler that needs
       to support both old and new instruction sets.

     (36) Should it really be undefined to read from a register storing data of
     one type with an instruction of the other type (e.g., to read the bits of
     a floating-point number as an unsigned integer)?

       RESOLVED:  The spec describes undefined results for simplicity.  In
       practice, mixing data types can be done, where signed integers are
       represented as two's complement integers and floating-point numbers are
       represented using IEEE single-precision representation.  For example:

         TEMP R0, R1;                    # typeless
         MOV.U R0, 0x3F800000;           # R0 = 1.0
         MOV.U R1, 0xBF800000;           # R1 = -1.0
         MUL.F R0, R0, R1;               # R0 = -1 * 1 = -1 (0xBF800000)
         XOR.U R0, R0, R1;               # R0 = 0xBF800000 ^ 0xBF800000 = 0
         NOT.U R0, R0;                   # R0 = 0xFFFFFFFF
         I2F.S R0, R0;                   # R0 = -1.0 (0xFFFFFFFF = -1 signed)
         SEQ.F R0, R0, R1;               # R0 = 1.0 (-1.0 == -1.0)

     (37) Buffer objects can be sourced as program parameters using the
     NV_parameter_buffer_object extension.  How are they accessed in a program?

       RESOLVED:  The instruction set and existing program environment and
       local parameter bindings operate largely on four-component vectors.
       However, NV_parameter_buffer_object exposes the ability to reach into
       buffers consisting of user-generated data or data written to the buffer
       object by the GPU.  Such data sets may not consist entirely
       four-component floating-point vectors, so a four-component vector API
       may be unnatural.  An application might need to reformat its data set to
       deal with this issue.  Or it might generate odd code to compensate for
       mis-alignment -- for example, reading an array of 3-component vectors by
       doing two four-component vector accesses and then rotating based on
       alignment.  Neither approach is particularly satisfying.

       Instead, this extension takes the approach of treating parameter buffers
       as array of scalar words.  When an individual buffer element is read,
       the single word is replicated to produce a four-component vector.  To
       access an array of 3-component vectors, code like the following can be
       used:

         PARAM buffer[] = { program.buffer[0] };
         INT TEMP index;
         TEMP R0;
         ...
         MUL.S index, index, 3;          # to read "vec3" #X, compute 3*X
         MOV R0.x, buffer[index+0];
         MOV R0.y, buffer[index+1];
         MOV R0.z, buffer[index+2];

     (38) Should recursion be allowed?  If so, how is the total amount of
     recursion limited?

       RESOLVED:  Recursion is allowed, and a call stack is provided by the
       implementation.  The size of the call stack is limited to the
       implementation-dependent constant MAX_PROGRAM_CALL_DEPTH, and when a the
       call stack is full, the results of further CAL instructions is
       undefined.  In the initial implementation of this extension, such
       instructions will have no effect.

       Note that no stack is provided to hold local registers; a program may
       implement its own via a temporary array and integer stack "pointer".

     (39) Variables are all four-component vectors in previous extensions.
     Should scalar or small-vector variables be provided?

       RESOLVED:  It would be a useful feature, but it was left out for
       simplicity.  In practice, a variable where only the X component is used
       will be equivalent to a scalar.

     (40) The PK* (pack) and UP* (unpack) instructions allow packing multiple
     components of data into a single component.  The bit packing is
     well-defined.  Should we require specific data types (e.g., unsigned
     integer) to hold packed values?

       RESOLVED:  No.  Previous instruction sets only allowed programs to write
       packed values to a floating-point variable (the only data type
       provided).  We will allow packed results to be written to a variable of
       any data type.  Integer instructions can be used to manipulate bits of
       packed data in place.

     (41) What happens when converting integers to floats or vice versa if
     there is insufficient precision or range to represent the result?

       RESOLVED:  For integer-to-float conversions, the nearest representable
       floating-point value is used, and the least significant bits of the
       original integer value are lost.  For float-to-integer conversions,
       out-of-range values are clamped to the nearest representable integer.

     (42) Why are some of the grammar rules so bizarre (e.g., attribUseD,
     attribUseV, attribUseS, attribUseVNS)?

       RESOLVED:  This grammar is based upon the original ARB_vertex_program
       grammar, which has a number of "interesting" characteristics.  For
       example, some of the bindings provided by ARB_vertex_program naturally
       require some amount of lookahead.  For example, a vertex program can
       write an output color using any of the following:

         MOV result.color, 0;            # primary color
         MOV result.color.primary, 0;    # primary color again
         MOV result.color.secondary, 0;  # secondary color this time

       The pieces of the color binding are separated by "." tokens.  However,
       writemasks are also supported, which also use "." before the write
       mask.  So, we could also have something like:

         MOV result.color.xyz, 0;        # primary color with W masked off

       In this form, a parser needs to look at both the "." and the "xyz" to
       determine that the binding being used is "result.color" (and not
       "result.color.secondary").

       Additionally, some checks that should probably be semantic errors (e.g.,
       allowing different swizzle or scalar operand selectors per instruction,
       or disallowing both in the case of SWZ) we specified in the original
       grammar.

       ARB_fragment_program and subsequent NVIDIA instructions built upon this,
       and the grammar for this extension was rewritten in the current form so
       it could be validated more easily.

     (43) This is an NV extension (NV_gpu_program4).  Why does the
      MAX_PROGRAM_TEXEL_OFFSET_EXT token has an "EXT" suffix?

       RESOLVED:  This token is shared between this extension and the
       comparable high-level GLSL programmability extension (EXT_gpu_shader4).
       Rather than provide a duplicate set of token names, we simply use the
       EXT version here.

     (44) For the purposes of determining the number of attribute and result
          components, how are "scalar" attributes counted.  For example, only
          the x component of the "pointsize" per-vertex output is actually
          relevant.

       RESOLVED:  Implementations are allowed to count all inputs and outputs
       as full four-component vectors.  To avoid this, apply appropriate write
       masks or swizzles.

       For example, writing to "result.pointsize" may count as four components.
       Consistently writing to "result.pointsize.x" may only count as one.
       Similarly, reading a fragment's fog coordinate as "fragment.fogcoord"
       may count as four components; "fragment.fogcoord.x" will only count as
       one.

 Revision History

     Rev.    Date    Author    Changes
     ----  --------  --------  --------------------------------------------
     11    09/11/14  pbrown    Fix cut-and-paste error in PK2US section.

     10    12/14/09  mgodse    Added GLX protocol.

      9    10/29/09  pbrown    Add language for previously undocumented errors
                               when using "SHORT" and "LONG" modifiers on
                               variable declarations.  They're allowed only on
                               "TEMP" statements, except that "SHORT" is
                               allowed for "OUTPUT" as well.

      8    08/11/08  jbreton   Clarified that when a MOD instruction is
                               performed on negative operands the result is
                               undefined.

      7    07/29/08  pbrown    Discovered additional issues with texture wrap
                               handling, replaced with logic that applies wrap
                               modes per sample.  Add a few instruction
                               pseudo-code lines explicitly identifying
                               undefined components.

      6    05/02/08  pbrown    Fix the prototype for the internal TexelFetch()
                               function used in the spec language; texel
                               coordinates are signed integers.

      5    02/22/08  pbrown    Clarified that when counting attribute/result
                               components, irrelevant/undefined components
                               can still count against the limits.

      4    02/04/08  pbrown    Fix errors in texture wrap mode handling.
                               Added a missing clamp to avoid sampling border
                               in REPEAT mode.  Fixed incorrectly specified
                               weights for LINEAR filtering.

      3    02/09/07  pbrown    Updated status section (now released).

      2    10/19/06  pbrown    Change the token suffix for maximum texel offset
                               values from NV to EXT, since it is shared with
                               EXT_gpu_shader4.  Clarify what happens on a
                               negate of an unsigned value.  Fix typo in data
                               type modifier description.  Add missing
                               description of the "BUFFER4" declaration
                               keyword.

      1              pbrown    Internal spec development.