| Name |
| |
| NV_gpu_program4 |
| |
| Name Strings |
| |
| GL_NV_gpu_program4 |
| |
| Contact |
| |
| Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com) |
| |
| Status |
| |
| Shipping for GeForce 8 Series (November 2006) |
| |
| Version |
| |
| Last Modified Date: 09/11/2014 |
| NVIDIA Revision: 11 |
| |
| Number |
| |
| 322 |
| |
| Dependencies |
| |
| This extension is written against to OpenGL 2.0 specification. |
| |
| OpenGL 2.0 is not required, but we expect all implementations of this |
| extension will also support OpenGL 2.0. |
| |
| This extension is also written against the ARB_vertex_program |
| specification, which provides the basic mechanisms for the assembly |
| programming model used by this extension. |
| |
| This extension serves as the basis for the NV_fragment_program4, |
| NV_geometry_program4, and NV_vertex_program4, which all build on this |
| extension to support fragment, geometry, and vertex programs, |
| respectively. If "GL_NV_gpu_program4" is found in the extension string, |
| all of these extensions are supported. |
| |
| NV_parameter_buffer_object affects the definition of this extension. |
| |
| ARB_texture_rectangle trivially affects the definition of this extension. |
| |
| EXT_gpu_program_parameters trivially affects the definition of this |
| extension. |
| |
| EXT_texture_integer trivially affects the definition of this extension. |
| |
| EXT_texture_array trivially affects the definition of this extension. |
| |
| EXT_texture_buffer_object trivially affects the definition of this |
| extension. |
| |
| NV_primitive_restart trivially affects the definition of this extension. |
| |
| Overview |
| |
| This specification documents the common instruction set and basic |
| functionality provided by NVIDIA's 4th generation of assembly instruction |
| sets supporting programmable graphics pipeline stages. |
| |
| The instruction set builds upon the basic framework provided by the |
| ARB_vertex_program and ARB_fragment_program extensions to expose |
| considerably more capable hardware. In addition to new capabilities for |
| vertex and fragment programs, this extension provides a new program type |
| (geometry programs) further described in the NV_geometry_program4 |
| specification. |
| |
| NV_gpu_program4 provides a unified instruction set -- all instruction set |
| features are available for all program types, except for a small number of |
| features that make sense only for a specific program type. It provides |
| fully capable signed and unsigned integer data types, along with a set of |
| arithmetic, logical, and data type conversion instructions capable of |
| operating on integers. It also provides a uniform set of structured |
| branching constructs (if tests, loops, and subroutines) that fully support |
| run-time condition testing. |
| |
| This extension provides several new texture mapping capabilities. Shadow |
| cube maps are supported, where cube map faces can encode depth values. |
| Texture lookup instructions can include an immediate texel offset, which |
| can assist in advanced filtering. New instructions are provided to fetch |
| a single texel by address in a texture map (TXF) and query the size of a |
| specified texture level (TXQ). |
| |
| By and large, vertex and fragment programs written to ARB_vertex_program |
| and ARB_fragment_program can be ported directly by simply changing the |
| program header from "!!ARBvp1.0" or "!!ARBfp1.0" to "!!NVvp4.0" or |
| "!!NVfp4.0", and then modifying the code to take advantage of the expanded |
| feature set. There are a small number of areas where this extension is |
| not a functional superset of previous vertex program extensions, which are |
| documented in this specification. |
| |
| |
| New Procedures and Functions |
| |
| void ProgramLocalParameterI4iNV(enum target, uint index, |
| int x, int y, int z, int w); |
| void ProgramLocalParameterI4ivNV(enum target, uint index, |
| const int *params); |
| void ProgramLocalParametersI4ivNV(enum target, uint index, |
| sizei count, const int *params); |
| void ProgramLocalParameterI4uiNV(enum target, uint index, |
| uint x, uint y, uint z, uint w); |
| void ProgramLocalParameterI4uivNV(enum target, uint index, |
| const uint *params); |
| void ProgramLocalParametersI4uivNV(enum target, uint index, |
| sizei count, const uint *params); |
| |
| void ProgramEnvParameterI4iNV(enum target, uint index, |
| int x, int y, int z, int w); |
| void ProgramEnvParameterI4ivNV(enum target, uint index, |
| const int *params); |
| void ProgramEnvParametersI4ivNV(enum target, uint index, |
| sizei count, const int *params); |
| void ProgramEnvParameterI4uiNV(enum target, uint index, |
| uint x, uint y, uint z, uint w); |
| void ProgramEnvParameterI4uivNV(enum target, uint index, |
| const uint *params); |
| void ProgramEnvParametersI4uivNV(enum target, uint index, |
| sizei count, const uint *params); |
| |
| void GetProgramLocalParameterIivNV(enum target, uint index, |
| int *params); |
| void GetProgramLocalParameterIuivNV(enum target, uint index, |
| uint *params); |
| void GetProgramEnvParameterIivNV(enum target, uint index, |
| int *params); |
| void GetProgramEnvParameterIuivNV(enum target, uint index, |
| uint *params); |
| |
| New Tokens |
| |
| |
| Accepted by the <pname> parameter of GetBooleanv, GetIntegerv, |
| GetFloatv, and GetDoublev: |
| |
| MIN_PROGRAM_TEXEL_OFFSET_EXT 0x8904 |
| MAX_PROGRAM_TEXEL_OFFSET_EXT 0x8905 |
| |
| (note: these tokens are shared with the EXT_gpu_shader4 extension.) |
| |
| Accepted by the <pname> parameter of GetProgramivARB: |
| |
| PROGRAM_ATTRIB_COMPONENTS_NV 0x8906 |
| PROGRAM_RESULT_COMPONENTS_NV 0x8907 |
| MAX_PROGRAM_ATTRIB_COMPONENTS_NV 0x8908 |
| MAX_PROGRAM_RESULT_COMPONENTS_NV 0x8909 |
| MAX_PROGRAM_GENERIC_ATTRIBS_NV 0x8DA5 |
| MAX_PROGRAM_GENERIC_RESULTS_NV 0x8DA6 |
| |
| Additions to Chapter 2 of the OpenGL 1.5 Specification (OpenGL Operation) |
| |
| (Modify "Section 2.14.1" of the ARB_vertex_program specification, |
| describing program parameters.) |
| |
| Each program object has an associated array of program local parameters. |
| Program local parameters are four-component vectors whose components can |
| hold floating-point, signed integer, or unsigned integer values. The data |
| type of each local parameter is established when the parameter's values |
| are assigned. If a program attempts to read a local parameter using a |
| data type other than the one used when the parameter is set, the values |
| returned are undefined. ... The commands |
| |
| void ProgramLocalParameter4fARB(enum target, uint index, |
| float x, float y, float z, float w); |
| void ProgramLocalParameter4fvARB(enum target, uint index, |
| const float *params); |
| void ProgramLocalParameter4dARB(enum target, uint index, |
| double x, double y, double z, double w); |
| void ProgramLocalParameter4dvARB(enum target, uint index, |
| const double *params); |
| |
| void ProgramLocalParameterI4iNV(enum target, uint index, |
| int x, int y, int z, int w); |
| void ProgramLocalParameterI4ivNV(enum target, uint index, |
| const int *params); |
| void ProgramLocalParameterI4uiNV(enum target, uint index, |
| uint x, uint y, uint z, uint w); |
| void ProgramLocalParameterI4uivNV(enum target, uint index, |
| const uint *params); |
| |
| update the values of the program local parameter numbered <index> |
| belonging to the program object currently bound to <target>. For the |
| non-vector versions of these commands, the four components of the |
| parameter are updated with the values of <x>, <y>, <z>, and <w>, |
| respectively. For the vector versions, the components of the parameter |
| are updated with the array of four values pointed to by <params>. The |
| error INVALID_VALUE is generated if <index> is greater than or equal to |
| the number of program local parameters supported by <target>. |
| |
| The commands |
| |
| void ProgramLocalParameters4fvNV(enum target, uint index, |
| sizei count, const float *params); |
| void ProgramLocalParametersI4ivNV(enum target, uint index, |
| sizei count, const int *params); |
| void ProgramLocalParametersI4uivNV(enum target, uint index, |
| sizei count, const uint *params); |
| |
| update the values of the program local parameters numbered <index> through |
| <index> + <count> - 1 with the array of 4 * <count> values pointed to by |
| <params>. The error INVALID_VALUE is generated if the sum of <index> and |
| <count> is greater than the number of program local parameters supported |
| by <target>. |
| |
| When a program local parameter is updated, the data type of its components |
| is assigned according to the data type of the provided values. If values |
| provided are of type "float" or "double", the components of the parameter |
| are floating-point. If the values provided are of type "int", the |
| components of the parameter are signed integers. If the values provided |
| are of type "uint", the components of the parameter are unsigned integers. |
| |
| Additionally, each program target has an associated array of program |
| environment parameters. Unlike program local parameters, program |
| environment parameters are shared by all program objects of a given |
| target. Program environment parameters are four-component vectors whose |
| components can hold floating-point, signed integer, or unsigned integer |
| values. The data type of each environment parameter is established when |
| the parameter's values are assigned. If a program attempts to read an |
| environment parameter using a data type other than the one used when the |
| parameter is set, the values returned are undefined. ... The commands |
| |
| void ProgramEnvParameter4fARB(enum target, uint index, |
| float x, float y, float z, float w); |
| void ProgramEnvParameter4fvARB(enum target, uint index, |
| const float *params); |
| void ProgramEnvParameter4dARB(enum target, uint index, |
| double x, double y, double z, double w); |
| void ProgramEnvParameter4dvARB(enum target, uint index, |
| const double *params); |
| void ProgramEnvParameterI4iNV(enum target, uint index, |
| int x, int y, int z, int w); |
| void ProgramEnvParameterI4ivNV(enum target, uint index, |
| const int *params); |
| void ProgramEnvParameterI4uiNV(enum target, uint index, |
| uint x, uint y, uint z, uint w); |
| void ProgramEnvParameterI4uivNV(enum target, uint index, |
| const uint *params); |
| |
| update the values of the program environment parameter numbered <index> |
| for the given program target <target>. For the non-vector versions of |
| these commands, the four components of the parameter are updated with the |
| values of <x>, <y>, <z>, and <w>, respectively. For the vector versions, |
| the four components of the parameter are updated with the array of four |
| values pointed to by <params>. The error INVALID_VALUE is generated if |
| <index> is greater than or equal to the number of program environment |
| parameters supported by <target>. |
| |
| The commands |
| |
| void ProgramEnvParameters4fvNV(enum target, uint index, |
| sizei count, const float *params); |
| void ProgramEnvParametersI4ivNV(enum target, uint index, |
| sizei count, const int *params); |
| void ProgramEnvParametersI4uivNV(enum target, uint index, |
| sizei count, const uint *params); |
| |
| update the values of the program environment parameters numbered <index> |
| through <index> + <count> - 1 with the array of 4 * <count> values pointed |
| to by <params>. The error INVALID_VALUE is generated if the sum of |
| <index> and <count> is greater than the number of program local parameters |
| supported by <target>. |
| |
| When a program environment parameter is updated, the data type of its |
| components is assigned according to the data type of the provided values. |
| If values provided are of type "float" or "double", the components of the |
| parameter are floating-point. If the values provided are of type "int", |
| the components of the parameter are signed integers. If the values |
| provided are of type "uint", the components of the parameter are unsigned |
| integers. |
| |
| ... |
| |
| |
| Insert New Section 2.X between Sections 2.Y and 2.Z: |
| |
| Section 2.X, GPU Programs |
| |
| The GL provides a number of different program targets that allow an |
| application to either replace certain fixed-function pipeline stages with |
| a fully programmable model or use a program to control aspects of the GL |
| pipeline that previously had only hard-wired behavior. |
| |
| A common base instruction set is available for all program types, |
| providing both integer and floating-point operations. Structured |
| branching operations and subroutine calls are available. Texture |
| mapping (loading data from external images) is supported for all |
| program types. The main differences between the different program |
| types are the set of available inputs and outputs, which are program type- |
| specific, and a few instructions that are meaningful for only a subset |
| of program types. |
| |
| |
| |
| Section 2.X.2, Program Grammar |
| |
| GPU program strings are specified as an array of ASCII characters |
| containing the program text. When a GPU program is loaded by a call to |
| ProgramStringARB, the program string is parsed into a set of tokens |
| possibly separated by whitespace. Spaces, tabs, newlines, carriage |
| returns, and comments are considered whitespace. Comments begin with the |
| character "#" and are terminated by a newline, a carriage return, or the |
| end of the program array. |
| |
| The Backus-Naur Form (BNF) grammar below specifies the syntactically valid |
| sequences for GPU programs. The set of valid tokens can be inferred |
| from the grammar. A line containing "/* empty */" represents an empty |
| string and is used to indicate optional rules. A program is invalid if it |
| contains any tokens or characters not defined in this specification. |
| |
| Note that this extension is not a standalone extension and a small number |
| of grammar rules are left to be defined in the extensions defining the |
| specific vertex, fragment, and geometry program types. |
| |
| |
| <program> ::= <optionSequence> <declSequence> |
| <statementSequence> "END" |
| |
| <optionSequence> ::= <option> <optionSequence> |
| | /* empty */ |
| |
| <option> ::= "OPTION" <identifier> ";" |
| |
| <declSequence> ::= /* empty */ |
| |
| <statementSequence> ::= <statement> <statementSequence> |
| | /* empty */ |
| |
| <statement> ::= <instruction> ";" |
| | <namingStatement> ";" |
| | <instLabel> ":" |
| |
| <instruction> ::= <ALUInstruction> |
| | <TexInstruction> |
| | <FlowInstruction> |
| |
| <ALUInstruction> ::= <VECTORop_instruction> |
| | <SCALARop_instruction> |
| | <BINSCop_instruction> |
| | <BINop_instruction> |
| | <VECSCAop_instruction> |
| | <TRIop_instruction> |
| | <SWZop_instruction> |
| |
| <TexInstruction> ::= <TEXop_instruction> |
| | <TXDop_instruction> |
| |
| <FlowInstruction> ::= <BRAop_instruction> |
| | <FLOWCCop_instruction> |
| | <IFop_instruction> |
| | <REPop_instruction> |
| | <ENDFLOWop_instruction> |
| |
| <VECTORop_instruction> ::= <VECTORop> <opModifiers> <instResult> "," |
| <instOperandV> |
| |
| <VECTORop> ::= "ABS" |
| | "CEIL" |
| | "FLR" |
| | "FRC" |
| | "I2F" |
| | "LIT" |
| | "MOV" |
| | "NOT" |
| | "NRM" |
| | "PK2H" |
| | "PK2US" |
| | "PK4B" |
| | "PK4UB" |
| | "ROUND" |
| | "SSG" |
| | "TRUNC" |
| |
| <SCALARop_instruction> ::= <SCALARop> <opModifiers> <instResult> "," |
| <instOperandS> |
| |
| <SCALARop> ::= "COS" |
| | "EX2" |
| | "LG2" |
| | "RCC" |
| | "RCP" |
| | "RSQ" |
| | "SCS" |
| | "SIN" |
| | "UP2H" |
| | "UP2US" |
| | "UP4B" |
| | "UP4UB" |
| |
| <BINSCop_instruction> ::= <BINSCop> <opModifiers> <instResult> "," |
| <instOperandS> "," <instOperandS> |
| |
| <BINSCop> ::= "POW" |
| |
| <VECSCAop_instruction> ::= <VECSCAop> <opModifiers> <instResult> "," |
| <instOperandV> "," <instOperandS> |
| |
| <VECSCAop> ::= "DIV" |
| | "SHL" |
| | "SHR" |
| | "MOD" |
| |
| <BINop_instruction> ::= <BINop> <opModifiers> <instResult> "," |
| <instOperandV> "," <instOperandV> |
| |
| <BINop> ::= "ADD" |
| | "AND" |
| | "DP3" |
| | "DP4" |
| | "DPH" |
| | "DST" |
| | "MAX" |
| | "MIN" |
| | "MUL" |
| | "OR" |
| | "RFL" |
| | "SEQ" |
| | "SFL" |
| | "SGE" |
| | "SGT" |
| | "SLE" |
| | "SLT" |
| | "SNE" |
| | "STR" |
| | "SUB" |
| | "XPD" |
| | "DP2" |
| | "XOR" |
| |
| <TRIop_instruction> ::= <TRIop> <opModifiers> <instResult> "," |
| <instOperandV> "," <instOperandV> "," |
| <instOperandV> |
| |
| <TRIop> ::= "CMP" |
| | "DP2A" |
| | "LRP" |
| | "MAD" |
| | "SAD" |
| | "X2D" |
| |
| <SWZop_instruction> ::= <SWZop> <opModifiers> <instResult> "," |
| <instOperandVNS> "," <extendedSwizzle> |
| |
| <SWZop> ::= "SWZ" |
| |
| <TEXop_instruction> ::= <TEXop> <opModifiers> <instResult> "," |
| <instOperandV> "," <texAccess> |
| |
| <TEXop> ::= "TEX" |
| | "TXB" |
| | "TXF" |
| | "TXL" |
| | "TXP" |
| | "TXQ" |
| |
| <TXDop_instruction> ::= <TXDop> <opModifiers> <instResult> "," |
| <instOperandV> "," <instOperandV> "," |
| <instOperandV> "," <texAccess> |
| |
| <TXDop> ::= "TXD" |
| |
| <BRAop_instruction> ::= <BRAop> <opModifiers> <instTarget> |
| <optBranchCond> |
| |
| <BRAop> ::= "CAL" |
| |
| <FLOWCCop_instruction> ::= <FLOWCCop> <opModifiers> <optBranchCond> |
| |
| <FLOWCCop> ::= "RET" |
| | "BRK" |
| | "CONT" |
| |
| <IFop_instruction> ::= <IFop> <opModifiers> <ccTest> |
| |
| <IFop> ::= "IF" |
| |
| <REPop_instruction> ::= <REPop> <opModifiers> <instOperandV> |
| | <REPop> <opModifiers> |
| |
| <REPop> ::= "REP" |
| |
| <ENDFLOWop_instruction> ::= <ENDFLOWop> <opModifiers> |
| |
| <ENDFLOWop> ::= "ELSE" |
| | "ENDIF" |
| | "ENDREP" |
| |
| <opModifiers> ::= <opModifierItem> <opModifiers> |
| | /* empty */ |
| |
| <opModifierItem> ::= "." <opModifier> |
| |
| <opModifier> ::= "F" |
| | "U" |
| | "S" |
| | "CC" |
| | "CC0" |
| | "CC1" |
| | "SAT" |
| | "SSAT" |
| | "NTC" |
| | "S24" |
| | "U24" |
| | "HI" |
| |
| <texAccess> ::= <texImageUnit> "," <texTarget> |
| | <texImageUnit> "," <texTarget> "," <texOffset> |
| |
| <texImageUnit> ::= "texture" <optArrayMemAbs> |
| |
| <texTarget> ::= "1D" |
| | "2D" |
| | "3D" |
| | "CUBE" |
| | "RECT" |
| | "SHADOW1D" |
| | "SHADOW2D" |
| | "SHADOWRECT" |
| | "ARRAY1D" |
| | "ARRAY2D" |
| | "SHADOWCUBE" |
| | "SHADOWARRAY1D" |
| | "SHADOWARRAY2D" |
| |
| <texOffset> ::= "(" <texOffsetComp> ")" |
| | "(" <texOffsetComp> "," <texOffsetComp> ")" |
| | "(" <texOffsetComp> "," <texOffsetComp> "," |
| <texOffsetComp> ")" |
| |
| <texOffsetComp> ::= <optSign> <int> |
| |
| <optBranchCond> ::= /* empty */ |
| | <ccMask> |
| |
| <instOperandV> ::= <instOperandAbsV> |
| | <instOperandBaseV> |
| |
| <instOperandAbsV> ::= <operandAbsNeg> "|" <instOperandBaseV> "|" |
| |
| <instOperandBaseV> ::= <operandNeg> <attribUseV> |
| | <operandNeg> <tempUseV> |
| | <operandNeg> <paramUseV> |
| | <operandNeg> <bufferUseV> |
| |
| <instOperandS> ::= <instOperandAbsS> |
| | <instOperandBaseS> |
| |
| <instOperandAbsS> ::= <operandAbsNeg> "|" <instOperandBaseS> "|" |
| |
| <instOperandBaseS> ::= <operandNeg> <attribUseS> |
| | <operandNeg> <tempUseS> |
| | <operandNeg> <paramUseS> |
| | <operandNeg> <bufferUseS> |
| |
| <instOperandVNS> ::= <attribUseVNS> |
| | <tempUseVNS> |
| | <paramUseVNS> |
| | <bufferUseVNS> |
| |
| <operandAbsNeg> ::= <optSign> |
| |
| <operandNeg> ::= <optSign> |
| |
| <instResult> ::= <instResultCC> |
| | <instResultBase> |
| |
| <instResultCC> ::= <instResultBase> <ccMask> |
| |
| <instResultBase> ::= <tempUseW> |
| | <resultUseW> |
| |
| <namingStatement> ::= <varMods> <ATTRIB_statement> |
| | <varMods> <PARAM_statement> |
| | <varMods> <TEMP_statement> |
| | <varMods> <OUTPUT_statement> |
| | <varMods> <BUFFER_statement> |
| | <ALIAS_statement> |
| |
| <ATTRIB_statement> ::= "ATTRIB" <establishName> "=" <attribUseD> |
| |
| <PARAM_statement> ::= <PARAM_singleStmt> |
| | <PARAM_multipleStmt> |
| |
| <PARAM_singleStmt> ::= "PARAM" <establishName> <paramSingleInit> |
| |
| <PARAM_multipleStmt> ::= "PARAM" <establishName> <optArraySize> |
| <paramMultipleInit> |
| |
| <paramSingleInit> ::= "=" <paramUseDB> |
| |
| <paramMultipleInit> ::= "=" "{" <paramMultInitList> "}" |
| |
| <paramMultInitList> ::= <paramUseDM> |
| | <paramUseDM> "," <paramMultInitList> |
| |
| <TEMP_statement> ::= "TEMP" <varNameList> |
| |
| <OUTPUT_statement> ::= "OUTPUT" <establishName> "=" <resultUseD> |
| |
| <varMods> ::= <varModifier> <varMods> |
| | /* empty */ |
| |
| <varModifier> ::= "SHORT" |
| | "LONG" |
| | "INT" |
| | "UINT" |
| | "FLOAT" |
| |
| <ALIAS_statement> ::= "ALIAS" <establishName> "=" <establishedName> |
| |
| <BUFFER_statement> ::= <bufferDeclType> <establishName> "=" |
| <bufferSingleInit> |
| | <bufferDeclType> <establishName> |
| <optArraySize> "=" <bufferMultInit> |
| |
| <bufferDeclType> ::= "BUFFER" |
| | "BUFFER4" |
| |
| <bufferSingleInit> ::= "=" <bufferUseDB> |
| |
| <bufferMultInit> ::= "=" "{" <bufferMultInitList> "}" |
| |
| <bufferMultInitList> ::= <bufferUseDM> |
| | <bufferUseDM> "," <bufferMultInitList> |
| |
| <varNameList> ::= <establishName> |
| | <establishName> "," <varNameList> |
| |
| <attribUseV> ::= <attribBasic> <swizzleSuffix> |
| | <attribVarName> <swizzleSuffix> |
| | <attribVarName> <arrayMem> <swizzleSuffix> |
| | <attribColor> <swizzleSuffix> |
| | <attribColor> "." <colorType> <swizzleSuffix> |
| |
| <attribUseS> ::= <attribBasic> <scalarSuffix> |
| | <attribVarName> <scalarSuffix> |
| | <attribVarName> <arrayMem> <scalarSuffix> |
| | <attribColor> <scalarSuffix> |
| | <attribColor> "." <colorType> <scalarSuffix> |
| |
| <attribUseVNS> ::= <attribBasic> |
| | <attribVarName> |
| | <attribVarName> <arrayMem> |
| | <attribColor> |
| | <attribColor> "." <colorType> |
| |
| <attribUseD> ::= <attribBasic> |
| | <attribColor> |
| | <attribColor> "." <colorType> |
| | <attribMulti> |
| |
| <paramUseV> ::= <paramVarName> <optArrayMem> <swizzleSuffix> |
| | <stateSingleItem> <swizzleSuffix> |
| | <programSingleItem> <swizzleSuffix> |
| | <constantVector> <swizzleSuffix> |
| | <constantScalar> |
| |
| <paramUseS> ::= <paramVarName> <optArrayMem> <scalarSuffix> |
| | <stateSingleItem> <scalarSuffix> |
| | <programSingleItem> <scalarSuffix> |
| | <constantVector> <scalarSuffix> |
| | <constantScalar> |
| |
| <paramUseVNS> ::= <paramVarName> <optArrayMem> |
| | <stateSingleItem> |
| | <programSingleItem> |
| | <constantVector> |
| | <constantScalar> |
| |
| <paramUseDB> ::= <stateSingleItem> |
| | <programSingleItem> |
| | <constantVector> |
| | <signedConstantScalar> |
| |
| <paramUseDM> ::= <stateMultipleItem> |
| | <programMultipleItem> |
| | <constantVector> |
| | <signedConstantScalar> |
| |
| <stateMultipleItem> ::= <stateSingleItem> |
| | "state" "." <stateMatrixRows> |
| |
| <stateSingleItem> ::= "state" "." <stateMaterialItem> |
| | "state" "." <stateLightItem> |
| | "state" "." <stateLightModelItem> |
| | "state" "." <stateLightProdItem> |
| | "state" "." <stateFogItem> |
| | "state" "." <stateMatrixRow> |
| | "state" "." <stateTexGenItem> |
| | "state" "." <stateClipPlaneItem> |
| | "state" "." <statePointItem> |
| | "state" "." <stateTexEnvItem> |
| | "state" "." <stateDepthItem> |
| |
| <stateMaterialItem> ::= "material" "." <stateMatProperty> |
| | "material" "." <faceType> "." |
| <stateMatProperty> |
| |
| <stateMatProperty> ::= "ambient" |
| | "diffuse" |
| | "specular" |
| | "emission" |
| | "shininess" |
| |
| <stateLightItem> ::= "light" <arrayMemAbs> "." <stateLightProperty> |
| |
| <stateLightProperty> ::= "ambient" |
| | "diffuse" |
| | "specular" |
| | "position" |
| | "attenuation" |
| | "spot" "." <stateSpotProperty> |
| | "half" |
| |
| <stateSpotProperty> ::= "direction" |
| |
| <stateLightModelItem> ::= "lightmodel" "." <stateLModProperty> |
| |
| <stateLModProperty> ::= "ambient" |
| | "scenecolor" |
| | <faceType> "." "scenecolor" |
| |
| <stateLightProdItem> ::= "lightprod" <arrayMemAbs> "." |
| <stateLProdProperty> |
| | "lightprod" <arrayMemAbs> "." <faceType> "." |
| <stateLProdProperty> |
| |
| <stateLProdProperty> ::= "ambient" |
| | "diffuse" |
| | "specular" |
| |
| <stateFogItem> ::= "fog" "." <stateFogProperty> |
| |
| <stateFogProperty> ::= "color" |
| | "params" |
| |
| <stateMatrixRows> ::= <stateMatrixItem> |
| | <stateMatrixItem> "." <stateMatModifier> |
| | <stateMatrixItem> "." "row" <arrayRange> |
| | <stateMatrixItem> "." <stateMatModifier> "." |
| "row" <arrayRange> |
| |
| <stateMatrixRow> ::= <stateMatrixItem> "." "row" <arrayMemAbs> |
| | <stateMatrixItem> "." <stateMatModifier> "." |
| "row" <arrayMemAbs> |
| |
| <stateMatrixItem> ::= "matrix" "." <stateMatrixName> |
| |
| <stateMatModifier> ::= "inverse" |
| | "transpose" |
| | "invtrans" |
| |
| <stateMatrixName> ::= "modelview" <optArrayMemAbs> |
| | "projection" |
| | "mvp" |
| | "texture" <optArrayMemAbs> |
| | "program" <arrayMemAbs> |
| |
| <stateTexGenItem> ::= "texgen" <optArrayMemAbs> "." |
| <stateTexGenType> "." <stateTexGenCoord> |
| |
| <stateTexGenType> ::= "eye" |
| | "object" |
| |
| <stateTexGenCoord> ::= "s" |
| | "t" |
| | "r" |
| | "q" |
| |
| <stateClipPlaneItem> ::= "clip" <arrayMemAbs> "." "plane" |
| |
| <statePointItem> ::= "point" "." <statePointProperty> |
| |
| <statePointProperty> ::= "size" |
| | "attenuation" |
| |
| <stateTexEnvItem> ::= "texenv" <optArrayMemAbs> "." |
| <stateTexEnvProperty> |
| |
| <stateTexEnvProperty> ::= "color" |
| |
| <stateDepthItem> ::= "depth" "." <stateDepthProperty> |
| |
| <stateDepthProperty> ::= "range" |
| |
| <programSingleItem> ::= <progEnvParam> |
| | <progLocalParam> |
| |
| <programMultipleItem> ::= <progEnvParams> |
| | <progLocalParams> |
| |
| <progEnvParams> ::= "program" "." "env" <arrayMemAbs> |
| | "program" "." "env" <arrayRange> |
| |
| <progEnvParam> ::= "program" "." "env" <arrayMemAbs> |
| |
| <progLocalParams> ::= "program" "." "local" <arrayMemAbs> |
| | "program" "." "local" <arrayRange> |
| |
| <progLocalParam> ::= "program" "." "local" <arrayMemAbs> |
| |
| <constantVector> ::= "{" <constantVectorList> "}" |
| |
| <constantVectorList> ::= <signedConstantScalar> |
| | <signedConstantScalar> "," |
| <signedConstantScalar> |
| | <signedConstantScalar> "," |
| <signedConstantScalar> "," |
| <signedConstantScalar> |
| | <signedConstantScalar> "," |
| <signedConstantScalar> "," |
| <signedConstantScalar> "," |
| <signedConstantScalar> |
| |
| <signedConstantScalar> ::= <optSign> <constantScalar> |
| |
| <constantScalar> ::= <floatConstant> |
| | <intConstant> |
| |
| <floatConstant> ::= <float> |
| |
| <intConstant> ::= <int> |
| |
| <tempUseV> ::= <tempVarName> <swizzleSuffix> |
| |
| <tempUseS> ::= <tempVarName> <scalarSuffix> |
| |
| <tempUseVNS> ::= <tempVarName> |
| |
| <tempUseW> ::= <tempVarName> <optWriteMask> |
| |
| <resultUseW> ::= <resultBasic> <optWriteMask> |
| | <resultVarName> <optWriteMask> |
| |
| <resultUseD> ::= <resultBasic> |
| |
| <bufferUseV> ::= <bufferVarName> <optArrayMem> <swizzleSuffix> |
| |
| <bufferUseS> ::= <bufferVarName> <optArrayMem> <scalarSuffix> |
| |
| <bufferUseVNS> ::= <bufferVarName> <optArrayMem> |
| |
| <bufferUseDB> ::= <bufferBinding> <arrayMemAbs> |
| |
| <bufferUseDM> ::= <bufferBinding> <arrayMemAbs> |
| | <bufferBinding> <arrayRange> |
| | <bufferBinding> |
| |
| <bufferBinding> ::= "program" "." "buffer" <arrayMemAbs> |
| |
| <optArraySize> ::= "[" "]" |
| | "[" <int> "]" |
| |
| <optArrayMem> ::= /* empty */ |
| | <arrayMem> |
| |
| <arrayMem> ::= <arrayMemAbs> |
| | <arrayMemRel> |
| |
| <optArrayMemAbs> ::= /* empty */ |
| | <arrayMemAbs> |
| |
| <arrayMemAbs> ::= "[" <int> "]" |
| |
| <arrayMemRel> ::= "[" <arrayMemReg> <arrayMemOffset> "]" |
| |
| <arrayMemReg> ::= <addrUseS> |
| |
| <arrayMemOffset> ::= /* empty */ |
| | "+" <int> |
| | "-" <int> |
| |
| <arrayRange> ::= "[" <int> ".." <int> "]" |
| |
| <addrUseS> ::= <addrVarName> <scalarSuffix> |
| |
| <ccMask> ::= "(" <ccTest> ")" |
| |
| <ccTest> ::= <ccMaskRule> <swizzleSuffix> |
| |
| <ccMaskRule> ::= "EQ" |
| | "GE" |
| | "GT" |
| | "LE" |
| | "LT" |
| | "NE" |
| | "TR" |
| | "FL" |
| | "EQ0" |
| | "GE0" |
| | "GT0" |
| | "LE0" |
| | "LT0" |
| | "NE0" |
| | "TR0" |
| | "FL0" |
| | "EQ1" |
| | "GE1" |
| | "GT1" |
| | "LE1" |
| | "LT1" |
| | "NE1" |
| | "TR1" |
| | "FL1" |
| | "NAN" |
| | "NAN0" |
| | "NAN1" |
| | "LEG" |
| | "LEG0" |
| | "LEG1" |
| | "CF" |
| | "CF0" |
| | "CF1" |
| | "NCF" |
| | "NCF0" |
| | "NCF1" |
| | "OF" |
| | "OF0" |
| | "OF1" |
| | "NOF" |
| | "NOF0" |
| | "NOF1" |
| | "AB" |
| | "AB0" |
| | "AB1" |
| | "BLE" |
| | "BLE0" |
| | "BLE1" |
| | "SF" |
| | "SF0" |
| | "SF1" |
| | "NSF" |
| | "NSF0" |
| | "NSF1" |
| |
| <optWriteMask> ::= /* empty */ |
| | <xyzwMask> |
| | <rgbaMask> |
| |
| <xyzwMask> ::= "." "x" |
| | "." "y" |
| | "." "xy" |
| | "." "z" |
| | "." "xz" |
| | "." "yz" |
| | "." "xyz" |
| | "." "w" |
| | "." "xw" |
| | "." "yw" |
| | "." "xyw" |
| | "." "zw" |
| | "." "xzw" |
| | "." "yzw" |
| | "." "xyzw" |
| |
| <rgbaMask> ::= "." "r" |
| | "." "g" |
| | "." "rg" |
| | "." "b" |
| | "." "rb" |
| | "." "gb" |
| | "." "rgb" |
| | "." "a" |
| | "." "ra" |
| | "." "ga" |
| | "." "rga" |
| | "." "ba" |
| | "." "rba" |
| | "." "gba" |
| | "." "rgba" |
| |
| <swizzleSuffix> ::= /* empty */ |
| | "." <component> |
| | "." <xyzwSwizzle> |
| | "." <rgbaSwizzle> |
| |
| <extendedSwizzle> ::= <extSwizComp> "," <extSwizComp> "," |
| <extSwizComp> "," <extSwizComp> |
| |
| <extSwizComp> ::= <optSign> <xyzwExtSwizSel> |
| | <optSign> <rgbaExtSwizSel> |
| |
| <xyzwExtSwizSel> ::= "0" |
| | "1" |
| | <xyzwComponent> |
| |
| <rgbaExtSwizSel> ::= <rgbaComponent> |
| |
| <scalarSuffix> ::= "." <component> |
| |
| <component> ::= <xyzwComponent> |
| | <rgbaComponent> |
| |
| <xyzwComponent> ::= "x" |
| | "y" |
| | "z" |
| | "w" |
| |
| <rgbaComponent> ::= "r" |
| | "g" |
| | "b" |
| | "a" |
| |
| <optSign> ::= /* empty */ |
| | "-" |
| | "+" |
| |
| <faceType> ::= "front" |
| | "back" |
| |
| <colorType> ::= "primary" |
| | "secondary" |
| |
| <instLabel> ::= <identifier> |
| |
| <instTarget> ::= <identifier> |
| |
| <establishedName> ::= <identifier> |
| |
| <establishName> ::= <identifier> |
| |
| |
| The <int> rule matches an integer constant. The integer consists of a |
| sequence of one or more digits ("0" through "9"), or a sequence in |
| hexadecimal form beginning with "0x" followed by a sequence of one or more |
| hexadecimal digits ("0" through "9", "a" through "f", "A" through "F"). |
| |
| The <float> rule matches a floating-point constant consisting of an |
| integer part, a decimal point, a fraction part, an "e" or "E", and an |
| optionally signed integer exponent. The integer and fraction parts both |
| consist of a sequence of one or more digits ("0" through "9"). Either the |
| integer part or the fraction parts (not both) may be missing; either the |
| decimal point or the "e" (or "E") and the exponent (not both) may be |
| missing. Most grammar rules that allow floating-point values also allow |
| integers matching the <int> rule. |
| |
| The <identifier> rule matches a sequence of one or more letters ("A" |
| through "Z", "a" through "z"), digits ("0" through "9), underscores ("_"), |
| or dollar signs ("$"); the first character must not be a number. Upper |
| and lower case letters are considered different (names are |
| case-sensitive). The following strings are reserved keywords and may not |
| be used as identifiers: "fragment" (for fragment programs only), "vertex" |
| (for vertex and geometry programs), "primitive" (for fragment and geometry |
| programs), "program", "result", "state", and "texture". |
| |
| The <tempVarName>, <paramVarName>, <attribVarName>, <resultVarName>, and |
| <bufferName> rules match identifiers that have been previously established |
| as names of temporary, program parameter, attribute, result, and program |
| parameter buffer variables, respectively. |
| |
| The <xyzwSwizzle> and <rgbaSwizzle> rules match any 4-character strings |
| consisting only of the characters "x", "y", "z", and "w" (<xyzwSwizzle>) |
| or "r", "g", "b", "a" (<rgbaSwizzle>). |
| |
| The error INVALID_OPERATION is generated if a program fails to load |
| because it is not syntactically correct or for one of the semantic |
| restrictions described in the following sections. |
| |
| A successfully loaded program is parsed into a sequence of instructions. |
| Each instruction is identified by its tokenized name. The operation of |
| these instructions when executed is defined in section 2.X.4. A |
| successfully loaded program string replaces the program string previously |
| loaded into the specified program object. If the OUT_OF_MEMORY error is |
| generated by ProgramStringARB, no change is made to the previous contents |
| of the current program object. |
| |
| |
| Section 2.X.3, Program Variables |
| |
| Programs may operate on a number of different variables during their |
| execution. The following sections define the different classes of |
| variables that can be declared and used by a program. |
| |
| Some variable classes require variable bindings. Variable classes with |
| bindings refer to state that is either generated or consumed outside the |
| program. Examples of variable bindings include a vertex's normal, the |
| position of a vertex computed by a vertex program, an interpolated texture |
| coordinate, and the diffuse color of light 1. Variables that are used |
| only during program execution do not have bindings. |
| |
| Variables may be declared explicitly according to the <namingStatement> |
| grammar rule. Explicit variable declarations allow a program to establish |
| a variable name that can be used to refer to a specified resource in |
| subsequent instructions. Variables may be declared anywhere in the |
| program string, but must be declared prior to use. A program will fail to |
| load if it declares the same variable name more than once, or if it refers |
| to a variable name that has not been previously declared in the program |
| string. |
| |
| Variables may also be declared implicitly, simply by using a variable |
| binding as an operand in a program instruction. Such uses are considered |
| to automatically create a nameless variable using the specified binding. |
| Only variable from classes with bindings can be declared implicitly. |
| |
| |
| Section 2.X.3.1, Program Variable Types |
| |
| Explicit variable declarations may include one or more modifiers that |
| specify additional information about the variable, such as the size and |
| data type of the components of the variable. Variable modifiers are |
| specified according to the <varModifier> grammar rule. |
| |
| By default, variables are considered typeless. They can be used in |
| instructions that read or write the variable as floating-point values, |
| signed integers, or unsigned integers. If a variable is written using one |
| data type but then read using a different one, the results of the |
| operation are undefined. Variables with bindings are considered to be |
| read or written when their values are produced or consumed; the data type |
| used by the GL is specified in the description of each binding. |
| |
| Explicitly declared variables may optionally have one data type modifier, |
| which can be used to detect data type mismatch errors. Type modifers of |
| "INT", "UINT", and "FLOAT" indicate that the components of the variable |
| are stored as signed integers, unsigned integers, or floating-point |
| values, respectively. A program will fail to load if it attempts to read |
| or write a variable using a data type other than the one indicated by the |
| data type modifier. Variables without a data type modifier can be read or |
| written using any data type. |
| |
| Explicitly declared variables may optionally have one storage size |
| modifier. Variables decared as "SHORT" will be represented using at least |
| 16 bits per component. "SHORT" floating-point values will have at least 5 |
| bits of exponent and 10 bits of mantissa. Variables declared as "LONG" |
| will be represented with at least 32 bits per component. "LONG" |
| floating-point values will have at least 8 bits of exponent and 23 bits of |
| mantissa. If no size modifier is provided, the GL will automatically |
| select component sizes. Implementations are not required to support more |
| than one component size, so "SHORT", "LONG", and the default could all |
| refer to the same component size. The "LONG" modifier is supported only |
| for declarations of temporary variables ("TEMP"). The "SHORT" modifier is |
| supported only for declarations of temporary variables and result |
| variables ("OUTPUT"). |
| |
| Each variable declaration can include at most one data type and one |
| storage size modifier. A program will fail to load if it specifies |
| multiple data type or multiple storage size modifiers in a single variable |
| declaration. |
| |
| (NOTE: Fragment programs also support the modifiers "FLAT", "CENTROID", |
| and "NOPERSPECTIVE", which control how per-fragment attribute values are |
| produced. These modifiers are described in detail in the |
| NV_fragment_program4 specification.) |
| |
| Explicitly declared variables of all types may be declared as arrays. An |
| array variable has one or more members, numbered 0 through <n>-1, where |
| <n> is the number of entries in the array. The total number of entries in |
| the array can be declared using the <optArraySize> grammar rule. For |
| variable classes without bindings, an array size must be specified in the |
| program, and must be a positive integer. For variable classes with |
| bindings, a declared size is optional, and is taken from the number of |
| bindings assigned in the declaration if omitted. A program will fail to |
| load if the declared size of an array variable does not match the number |
| of assigned bindings. |
| |
| When a variable is declared as an array, instructions that use the |
| variable must specify an array member to access according to the |
| <arrayMem> grammar rule. A program will fail to load if it contains an |
| instruction that accesses an array variable without specifying an array |
| member or an instruction that specifies an array member for a non-array |
| variable. |
| |
| |
| Section 2.X.3.2, Program Attribute Variables |
| |
| Program attribute variables represent per-vertex or per-fragment inputs to |
| the program. All attribute variables have associated bindings, and are |
| read-only during program execution. Attribute variables may be declared |
| explicitly via the <ATTRIB_statement> grammar rule, or implicitly by using |
| an attribute binding in an instruction. |
| |
| The set of available attribute bindings depends on the program type, and |
| is enumerated in the specifications for each program type. |
| |
| The set of bindings allowed for attribute array variables is limited to |
| attribute state grouped in arrays (e.g., texture coordinates, generic |
| vertex attributes). Additionally, all bindings assigned to the array must |
| be of the same binding type and must increase consecutively. Examples of |
| valid and invalid binding lists include: |
| |
| vertex.attrib[1], vertex.attrib[2] # valid, 2-entry array |
| vertex.texcoord[0..3] # valid, 4-entry array |
| vertex.attrib[1], vertex.attrib[3] # invalid, skipped attrib 2 |
| vertex.attrib[2], vertex.attrib[1] # invalid, wrong order |
| vertex.attrib[1], vertex.texcoord[2] # invalid, different types |
| |
| Additionally, attribute bindings may be used in no more than one array |
| variable accessed with relative addressing. |
| |
| Implementations may have a limit on the total number of attribute binding |
| components used by each program target (MAX_PROGRAM_ATTRIB_COMPONENTS_NV). |
| Programs that use more attribute binding components than this limit will |
| fail to load. The method of counting used attribute binding components is |
| implementation-dependent, but must satisfy the following properties: |
| |
| * If an attribute binding is not referenced in a program, or is |
| referenced only in declarations of attribute variables that are not |
| used, none of its components are counted. |
| |
| * An attribute binding component may be counted as used only if there |
| exists an instruction operand where |
| |
| - the component is enabled for read by the swizzle pattern (Section |
| 2.X.4.2), and |
| |
| - the attribute binding is |
| |
| - referenced directly by the operand, |
| |
| - bound to a declared variable referenced by the operand, or |
| |
| - bound to a declared array variable where another binding in |
| the array satisfies one of the two previous conditions. |
| |
| Implementations are not required to optimize out unused elements of an |
| attribute array or components that are used in only some elements of |
| an array. The last of these rules is intended to cover the case where |
| the same attribute binding is used in multiple variables. |
| |
| For example, an operand whose swizzle pattern selects only the x |
| component may result in the x component of an attribute binding being |
| counted, but may never result in the counting of the y, z, or w |
| components of any attribute binding. |
| |
| * Implementations are not required to determine that components read by |
| an instruction are actually unused due to: |
| |
| - instruction write masks (for example, a component-wise ADD |
| operation that only writes the "x" component doesn't have to read |
| the "y", "z", and "w" components of its operands) or |
| |
| - any other properties of the instruction (for example, the DP3 |
| instruction computes a 3-component dot product doesn't have to |
| read the "w" component of its operands). |
| |
| |
| Section 2.X.3.3, Program Parameters |
| |
| Program parameter variables are used as constants during program |
| execution. All program parameter variables have associated bindings and |
| are read-only during program execution. Program parameters retain their |
| values across program invocations, although their values may change |
| between invocations due to GL state changes. Program parameter variables |
| may be declared explicitly via the <PARAM_statement> grammar rule, or |
| implicitly by using a parameter binding in an instruction. Except where |
| otherwise specified, program parameter bindings always specify |
| floating-point values. |
| |
| When declaring program parameter array variables, all bindings are |
| supported and can be assigned to array members in any order. The only |
| restriction is that no parameter binding may be used more than once in |
| array variables accessed using relative addressing. A program will fail |
| to load if any program parameter binding is used more than once in a |
| single array accessed using relative addressing or used at least once in |
| two or more arrays accessed using relative addressing. |
| |
| |
| Constant Bindings |
| |
| If a program parameter binding matches the <constantScalar> or |
| <signedConstantScalar> grammar rules, the corresponding program parameter |
| variable is bound to the vector (X,X,X,X), where X is the value of the |
| specified constant. |
| |
| If a program parameter binding matches <constantVector>, the corresponding |
| program parameter variable is bound to the vector (X,Y,Z,W), where X, Y, |
| Z, and W are the values corresponding to the first, second, third, and |
| fourth match of <signedConstantScalar>. If fewer than four constants are |
| specified, Y, Z, and W assume the values 0, 0, and 1, if their respective |
| constants are not specified. |
| |
| Constant bindings can be interpreted as having signed integer, unsigned |
| integer, or floating-point values, depending on how they are used in the |
| program text. For constants in variable declarations, the components of |
| the constant are interpreted according to the variable's component data |
| type modifier. If no data type modifier is specified in a declaration, |
| constants are interpreted as floating-point values. For constant bindings |
| used directly in an instruction, the components of the constant are |
| interpreted according to the required data type of the operand. A program |
| will fail to load if it specifies a floating-point constant value |
| (matching the <floatConstant> grammar rule) that should be interpreted as |
| a signed or unsigned integer, or a negative integer constant value that |
| should be interpreted as an unsigned integer. |
| |
| If the value used to specify a floating-point constant can not be exactly |
| represented, the nearest floating-point value will be used. If the value |
| used to specify an integer constant is too large to be represented, the |
| program will fail to load. |
| |
| |
| Program Environment/Local Parameter Bindings |
| |
| Binding Components Underlying State |
| ------------------------- ---------- ------------------------------- |
| program.env[a] (x,y,z,w) program environment parameter a |
| program.local[a] (x,y,z,w) program local parameter a |
| program.env[a..b] (x,y,z,w) program environment parameters |
| a through b |
| program.local[a..b] (x,y,z,w) program local parameters |
| a through b |
| |
| Table X.1: Program Environment/Local Parameter Bindings. <a> and <b> |
| indicate parameter numbers, where <a> must be less than or equal to <b>. |
| |
| If a program parameter binding matches "program.env[a]" or |
| "program.local[a]", the four components of the program parameter variable |
| are filled with the four components of program environment parameter <a> |
| or program local parameter <a> respectively. |
| |
| Additionally, for program parameter array bindings, "program.env[a..b]" |
| and "program.local[a..b]" are equivalent to specifying program environment |
| or local parameters <a> through <b> in order, respectively. A program |
| using any of these bindings will fail to load if <a> is greater than <b>. |
| |
| Program environment and local parameters are typeless, and may be |
| specified as signed integer, unsigned integer, or floating-point |
| variables. If a program environment parameter is read using a data type |
| other than the one used to specify it, an undefined value is returned. |
| |
| |
| Material Property Bindings |
| |
| Binding Components Underlying State |
| ----------------------------- ---------- ---------------------------- |
| state.material.ambient (r,g,b,a) front ambient material color |
| state.material.diffuse (r,g,b,a) front diffuse material color |
| state.material.specular (r,g,b,a) front specular material color |
| state.material.emission (r,g,b,a) front emissive material color |
| state.material.shininess (s,0,0,1) front material shininess |
| state.material.front.ambient (r,g,b,a) front ambient material color |
| state.material.front.diffuse (r,g,b,a) front diffuse material color |
| state.material.front.specular (r,g,b,a) front specular material color |
| state.material.front.emission (r,g,b,a) front emissive material color |
| state.material.front.shininess (s,0,0,1) front material shininess |
| state.material.back.ambient (r,g,b,a) back ambient material color |
| state.material.back.diffuse (r,g,b,a) back diffuse material color |
| state.material.back.specular (r,g,b,a) back specular material color |
| state.material.back.emission (r,g,b,a) back emissive material color |
| state.material.back.shininess (s,0,0,1) back material shininess |
| |
| Table X.3: Material Property Bindings. If a material face is not |
| specified in the binding, the front property is used. |
| |
| If a program parameter binding matches any of the material properties |
| listed in Table X.3, the program parameter variable is filled according to |
| the table. For ambient, diffuse, specular, or emissive colors, the "x", |
| "y", "z", and "w" components are filled with the "r", "g", "b", and "a" |
| components, respectively, of the corresponding material color. For |
| material shininess, the "x" component is filled with the material's |
| specular exponent, and the "y", "z", and "w" components are filled with |
| the floating-point constants 0, 0, and 1, respectively. Bindings |
| containing ".back" refer to the back material; all other bindings refer to |
| the front material. |
| |
| Material properties can be changed inside a Begin/End pair, either |
| directly by calling Material, or indirectly through color material. |
| However, such property changes are not guaranteed to update program |
| parameter bindings until the following End command. Program parameter |
| variables bound to material properties changed inside a Begin/End pair are |
| undefined until the following End command. |
| |
| |
| Light Property Bindings |
| |
| Binding Components Underlying State |
| ----------------------------- ---------- ---------------------------- |
| state.light[n].ambient (r,g,b,a) light n ambient color |
| state.light[n].diffuse (r,g,b,a) light n diffuse color |
| state.light[n].specular (r,g,b,a) light n specular color |
| state.light[n].position (x,y,z,w) light n position |
| state.light[n].attenuation (a,b,c,e) light n attenuation constants |
| and spot light exponent |
| state.light[n].spot.direction (x,y,z,c) light n spot direction and |
| cutoff angle cosine |
| state.light[n].half (x,y,z,1) light n infinite half-angle |
| state.lightmodel.ambient (r,g,b,a) light model ambient color |
| state.lightmodel.scenecolor (r,g,b,a) light model front scene color |
| state.lightmodel. (r,g,b,a) light model front scene color |
| front.scenecolor |
| state.lightmodel. (r,g,b,a) light model back scene color |
| back.scenecolor |
| state.lightprod[n].ambient (r,g,b,a) light n / front material |
| ambient color product |
| state.lightprod[n].diffuse (r,g,b,a) light n / front material |
| diffuse color product |
| state.lightprod[n].specular (r,g,b,a) light n / front material |
| specular color product |
| state.lightprod[n]. (r,g,b,a) light n / front material |
| front.ambient ambient color product |
| state.lightprod[n]. (r,g,b,a) light n / front material |
| front.diffuse diffuse color product |
| state.lightprod[n]. (r,g,b,a) light n / front material |
| front.specular specular color product |
| state.lightprod[n]. (r,g,b,a) light n / back material |
| back.ambient ambient color product |
| state.lightprod[n]. (r,g,b,a) light n / back material |
| back.diffuse diffuse color product |
| state.lightprod[n]. (r,g,b,a) light n / back material |
| back.specular specular color product |
| |
| Table X.4: Light Property Bindings. <n> indicates a light number. |
| |
| If a program parameter binding matches "state.light[n].ambient", |
| "state.light[n].diffuse", or "state.light[n].specular", the "x", "y", "z", |
| and "w" components of the program parameter variable are filled with the |
| "r", "g", "b", and "a" components, respectively, of the corresponding |
| light color. |
| |
| If a program parameter binding matches "state.light[n].position", the "x", |
| "y", "z", and "w" components of the program parameter variable are filled |
| with the "x", "y", "z", and "w" components, respectively, of the light |
| position. |
| |
| If a program parameter binding matches "state.light[n].attenuation", the |
| "x", "y", and "z" components of the program parameter variable are filled |
| with the constant, linear, and quadratic attenuation parameters of the |
| specified light, respectively (section 2.13.1). The "w" component of the |
| program parameter variable is filled with the spot light exponent of the |
| specified light. |
| |
| If a program parameter binding matches "state.light[n].spot.direction", |
| the "x", "y", and "z" components of the program parameter variable are |
| filled with the "x", "y", and "z" components of the spot light direction |
| of the specified light, respectively (section 2.13.1). The "w" component |
| of the program parameter variable is filled with the cosine of the spot |
| light cutoff angle of the specified light. |
| |
| If a program parameter binding matches "state.light[n].half", the "x", |
| "y", and "z" components of the program parameter variable are filled with |
| the x, y, and z components, respectively, of the normalized infinite |
| half-angle vector |
| |
| h_inf = || P + (0, 0, 1) ||. |
| |
| The "w" component is filled with 1.0. In the computation of h_inf, P |
| consists of the x, y, and z coordinates of the normalized vector from the |
| eye position P_e to the eye-space light position P_pli (section 2.13.1). |
| h_inf is defined to correspond to the normalized half-angle vector when |
| using an infinite light (w coordinate of the position is zero) and an |
| infinite viewer (v_bs is FALSE). For local lights or a local viewer, |
| h_inf is well-defined but does not match the normalized half-angle vector, |
| which will vary depending on the vertex position. |
| |
| If a program parameter binding matches "state.lightmodel.ambient", the |
| "x", "y", "z", and "w" components of the program parameter variable are |
| filled with the "r", "g", "b", and "a" components of the light model |
| ambient color, respectively. |
| |
| If a program parameter binding matches "state.lightmodel.scenecolor" or |
| "state.lightmodel.front.scenecolor", the "x", "y", and "z" components of |
| the program parameter variable are filled with the "r", "g", and "b" |
| components respectively of the "front scene color" |
| |
| c_scene = a_cs * a_cm + e_cm, |
| |
| where a_cs is the light model ambient color, a_cm is the front ambient |
| material color, and e_cm is the front emissive material color. The "w" |
| component of the program parameter variable is filled with the alpha |
| component of the front diffuse material color. If a program parameter |
| binding matches "state.lightmodel.back.scenecolor", a similar back scene |
| color, computed using back-facing material properties, is used. The front |
| and back scene colors match the values that would be assigned to vertices |
| using conventional lighting if all lights were disabled. |
| |
| If a program parameter binding matches anything beginning with |
| "state.lightprod[n]", the "x", "y", and "z" components of the program |
| parameter variable are filled with the "r", "g", and "b" components, |
| respectively, of the corresponding light product. The three light product |
| components are the products of the corresponding color components of the |
| specified material property and the light color of the specified light |
| (see Table X.4). The "w" component of the program parameter variable is |
| filled with the alpha component of the specified material property. |
| |
| Light products depend on material properties, which can be changed inside |
| a Begin/End pair. Such property changes are not guaranteed to take effect |
| until the following End command. Program parameter variables bound to |
| light products whose corresponding material property changes inside a |
| Begin/End pair are undefined until the following End command. |
| |
| |
| Texture Coordinate Generation Property Bindings |
| |
| Binding Components Underlying State |
| ------------------------- ---------- ---------------------------- |
| state.texgen[n].eye.s (a,b,c,d) TexGen eye linear plane |
| coefficients, s coord, unit n |
| state.texgen[n].eye.t (a,b,c,d) TexGen eye linear plane |
| coefficients, t coord, unit n |
| state.texgen[n].eye.r (a,b,c,d) TexGen eye linear plane |
| coefficients, r coord, unit n |
| state.texgen[n].eye.q (a,b,c,d) TexGen eye linear plane |
| coefficients, q coord, unit n |
| state.texgen[n].object.s (a,b,c,d) TexGen object linear plane |
| coefficients, s coord, unit n |
| state.texgen[n].object.t (a,b,c,d) TexGen object linear plane |
| coefficients, t coord, unit n |
| state.texgen[n].object.r (a,b,c,d) TexGen object linear plane |
| coefficients, r coord, unit n |
| state.texgen[n].object.q (a,b,c,d) TexGen object linear plane |
| coefficients, q coord, unit n |
| |
| Table X.5: Texture Coordinate Generation Property Bindings. "[n]" is |
| optional -- texture unit <n> is used if specified; texture unit 0 is |
| used otherwise. |
| |
| If a program parameter binding matches a set of TexGen plane coefficients, |
| the "x", "y", "z", and "w" components of the program parameter variable |
| are filled with the coefficients p1, p2, p3, and p4, respectively, for |
| object linear coefficients, and the coefficents p1', p2', p3', and p4', |
| respectively, for eye linear coefficients (section 2.10.4). |
| |
| |
| Fog Property Bindings |
| |
| Binding Components Underlying State |
| ----------------------------- ---------- ---------------------------- |
| state.fog.color (r,g,b,a) RGB fog color (section 3.10) |
| state.fog.params (d,s,e,r) fog density, linear start |
| and end, and 1/(end-start) |
| (section 3.10) |
| |
| Table X.6: Fog Property Bindings |
| |
| If a program parameter binding matches "state.fog.color", the "x", "y", |
| "z", and "w" components of the program parameter variable are filled with |
| the "r", "g", "b", and "a" components, respectively, of the fog color |
| (section 3.10). |
| |
| If a program parameter binding matches "state.fog.params", the "x", "y", |
| and "z" components of the program parameter variable are filled with the |
| fog density, linear fog start, and linear fog end parameters (section |
| 3.10), respectively. The "w" component is filled with 1/(end-start), |
| where end and start are the linear fog end and start parameters, |
| respectively. |
| |
| |
| Clip Plane Property Bindings |
| |
| Binding Components Underlying State |
| ----------------------------- ---------- ---------------------------- |
| state.clip[n].plane (a,b,c,d) clip plane n coefficients |
| |
| Table X.7: Clip Plane Property Bindings. <n> specifies the clip plane |
| number, and is required. |
| |
| If a program parameter binding matches "state.clip[n].plane", the "x", |
| "y", "z", and "w" components of the program parameter variable are filled |
| with the coefficients p1', p2', p3', and p4', respectively, of clip plane |
| <n> (section 2.11). |
| |
| |
| Point Property Bindings |
| |
| Binding Components Underlying State |
| ----------------------------- ---------- ---------------------------- |
| state.point.size (s,n,x,f) point size, min and max size |
| clamps, and fade threshold |
| (section 3.3) |
| state.point.attenuation (a,b,c,1) point size attenuation consts |
| |
| Table X.8: Point Property Bindings |
| |
| If a program parameter binding matches "state.point.size", the "x", "y", |
| "z", and "w" components of the program parameter variable are filled with |
| the point size, minimum point size, maximum point size, and fade |
| threshold, respectively (section 3.3). |
| |
| If a program parameter binding matches "state.point.attenuation", the "x", |
| "y", and "z" components of the program parameter variable are filled with |
| the constant, linear, and quadratic point size attenuation parameters (a, |
| b, and c), respectively (section 3.3). The "w" component is filled with |
| 1.0. |
| |
| |
| Texture Environment Property Bindings |
| |
| Binding Components Underlying State |
| ------------------------- ---------- ---------------------------- |
| state.texenv[n].color (r,g,b,a) texture environment n color |
| |
| Table X.9: Texture Environment Property Bindings. "[n]" is optional -- |
| texture unit <n> is used if specified; texture unit 0 is used otherwise. |
| |
| If a program parameter binding matches "state.texenv[n].color", the "x", |
| "y", "z", and "w" components of the program parameter variable are filled |
| with the "r", "g", "b", and "a" components, respectively, of the |
| corresponding texture environment color. Note that only "legacy" texture |
| units, as queried by MAX_TEXTURE_UNITS, include texture environment state. |
| Texture image units and texture coordinate sets do not have associated |
| texture environment state. |
| |
| |
| Depth Property Bindings |
| |
| Binding Components Underlying State |
| --------------------------- ---------- ---------------------------- |
| state.depth.range (n,f,d,1) Depth range near, far, and |
| (far-near) (section 2.10.1) |
| |
| Table X.10: Depth Property Bindings |
| |
| If a program parameter binding matches "state.depth.range", the "x" and |
| "y" components of the program parameter variable are filled with the |
| mappings of near and far clipping planes to window coordinates, |
| respectively. The "z" component is filled with the difference of the |
| mappings of near and far clipping planes, far minus near. The "w" |
| component is filled with 1.0. |
| |
| |
| Matrix Property Bindings |
| |
| Binding Underlying State |
| ------------------------------------ --------------------------- |
| * state.matrix.modelview[n] modelview matrix n |
| state.matrix.projection projection matrix |
| state.matrix.mvp modelview-projection matrix |
| * state.matrix.texture[n] texture matrix n |
| state.matrix.program[n] program matrix n |
| |
| Table X.11: Base Matrix Property Bindings. The "[n]" syntax indicates |
| a specific matrix number. For modelview and texture matrices, a matrix |
| number is optional, and matrix zero will be used if the matrix number is |
| omitted. These base bindings may further be modified by a |
| inverse/transpose selector and a row selector. |
| |
| If the beginning of a program parameter binding matches any of the matrix |
| binding names listed in Table X.11, the binding corresponds to a 4x4 |
| matrix. If the parameter binding is followed by ".inverse", ".transpose", |
| or ".invtrans" (<stateMatModifier> grammar rule), the inverse, transpose, |
| or transpose of the inverse, respectively, of the matrix specified in |
| Table X.11 is selected. Otherwise, the matrix specified in Table X.11 is |
| selected. If the specified matrix is poorly-conditioned (singular or |
| nearly so), its inverse matrix is undefined. The binding name |
| "state.matrix.mvp" refers to the product of modelview matrix zero and the |
| projection matrix, defined as |
| |
| MVP = P * M0, |
| |
| where P is the projection matrix and M0 is modelview matrix zero. |
| |
| If the selected matrix is followed by ".row[<a>]" (matching the |
| <stateMatrixRow> grammar rule), the "x", "y", "z", and "w" components of |
| the program parameter variable are filled with the four entries of row <a> |
| of the selected matrix. In the example, |
| |
| PARAM m0 = state.matrix.modelview[1].row[0]; |
| PARAM m1 = state.matrix.projection.transpose.row[3]; |
| |
| the variable "m0" is set to the first row (row 0) of modelview matrix 1 |
| and "m1" is set to the last row (row 3) of the transpose of the projection |
| matrix. |
| |
| For program parameter array bindings, multiple rows of the selected matrix |
| can be bound via the <stateMatrixRows> grammar rule. If the selected |
| matrix binding is followed by ".row[<a>..<b>]", the result is equivalent |
| to specifying matrix rows <a> through <b>, in order. A program will fail |
| to load if <a> is greater than <b>. If no row selection is specified |
| (<optMatrixRows> matches ""), matrix rows 0 through 3 are bound in order. |
| In the example, |
| |
| PARAM m2[] = { state.matrix.program[0].row[1..2] }; |
| PARAM m3[] = { state.matrix.program[0].transpose }; |
| |
| the array "m2" has two entries, containing rows 1 and 2 of program matrix |
| zero, and "m3" has four entries, containing all four rows of the transpose |
| of program matrix zero. |
| |
| |
| Section 2.X.3.4, Program Temporaries |
| |
| Program temporary variables are used to hold temporary results during |
| program execution. Temporaries do not persist between program |
| invocations, and are undefined at the beginning of each program |
| invocation. |
| |
| Temporary variables are declared explicitly using the <TEMP_statement> |
| grammar rule. Each such statement can declare one or more temporaries. |
| Temporaries can not be declared implicitly. Temporaries can be declared |
| using any component size ("SHORT" or "LONG") and type ("FLOAT" or "INT") |
| modifier. |
| |
| Temporary variables may be declared as arrays. Temporary variables |
| declared as arrays may be stored in slower memory than those not declared |
| as arrays, and it is recommended to use non-array variables unless array |
| functionality is required. |
| |
| |
| Section 2.X.3.5, Program Results |
| |
| Program result variables represent the per-vertex or per-fragment results |
| of the program. All result variables have associated bindings, are |
| write-only during program execution, and are undefined at the beginning of |
| each program invocation. Any vertex or fragment attributes corresponding |
| to unwritten result variables will be undefined in subsequent stages of |
| the pipeline. Result variables may be declared explicitly via the |
| <OUTPUT_statement> grammar rule, or implicitly by using a result binding |
| in an instruction. |
| |
| The set of available result bindings depends on the program type, and is |
| enumerated in the specifications for each program type. |
| |
| Result variables may generally be declared as arrays, but the set of |
| bindings allowed for arrays is limited to state grouped in arrays (e.g., |
| texture coordinates, clip distances, colors). Additionally, all bindings |
| assigned to the array must be of the same binding type and must increase |
| consecutively. Examples of valid and invalid binding lists for vertex |
| programs include: |
| |
| result.clip[1], result.clip[2] # valid, 2-entry array |
| result.texcoord[0..3] # valid, 4-entry array |
| result.texcoord[1], result.texcoord[3] # invalid, skipped texcoord 2 |
| result.texcoord[2], result.texcoord[1] # invalid, wrong order |
| result.texcoord[1], result.clip[2] # invalid, different types |
| |
| Additionally, result bindings may be used in no more than one array |
| addressed with relative addressing. |
| |
| Implementations may have a limit on the total number of result binding |
| components used by each program target (MAX_PROGRAM_RESULT_COMPONENTS_NV). |
| Programs that require more result binding components than this limit will |
| fail to load. The method of counting used result binding components is |
| implementation-dependent, but must satisfy the following properties: |
| |
| * If a result binding is not referenced in a program, or is referenced |
| only in declarations of result variables that are not used, none of |
| its components are counted. |
| |
| * A result binding component may be counted as used only if there exists |
| an instruction operand where |
| |
| - the component is enabled in the write mask (Section 2.X.4.3), and |
| |
| - the result binding is either |
| |
| - referenced directly by the operand, |
| |
| - bound to a declared variable referenced by the operand, or |
| |
| - bound to a declared array variable where another binding in |
| the array satisfies one of the two previous conditions. |
| |
| Implementations are not required to optimize out unused elements of an |
| result array or components that are used in only some elements of an |
| array. The last of these rules is intended to cover the case where |
| the same result binding is used in multiple variables. |
| |
| For example, an instruction whose write mask selects only the x |
| component may result in the x component of a result binding being |
| counted, but may never result in the counting of the y, z, or w |
| components of any result binding. |
| |
| |
| Section 2.X.3.6, Program Parameter Buffers |
| |
| Program parameter buffers are arrays consisting of single-component |
| typeless values or four-component typeless vectors stored in a buffer |
| object. The GL provides an implementation-dependent number of buffer |
| object binding points for each program target, to which buffer objects can |
| be attached. Program parameter buffer variables can be changed either by |
| updating the contents of bound buffer objects, or simply by changing the |
| buffer object attached to a binding point. |
| |
| Program parameter buffer variables are used as constants during program |
| execution. All program parameter buffer variables have an associated |
| binding and are read-only during program execution. Program parameter |
| buffers retain their values across program invocations, although their |
| values may change as buffer object bindings or contents change. Program |
| parameter buffer variables must be declared explicitly via the |
| <BUFFER_statement> grammar rule. Program parameter buffer bindings can |
| not be used directly in executable instructions. |
| |
| Program parameter buffer variables are treated as an array of |
| single-component values if the <bufferDeclType> grammar rule matches |
| "BUFFER" or as an array of four-component vectors if it matches "BUFFER4". |
| A program will fail to load if a variable declared as "BUFFER" and another |
| variable declared as "BUFFER4" use the same buffer binding point. |
| |
| Program parameter buffer variables may be declared as arrays, but all |
| bindings assigned to the array must use the same binding point and must |
| increase consecutively. |
| |
| Binding Components Underlying State |
| ----------------------------- ---------- ----------------------------- |
| program.buffer[a][b] (x,x,x,x) program parameter buffer a, |
| element b |
| program.buffer[a][b..c] (x,x,x,x) program parameter buffer a, |
| elements b through c |
| program.buffer[a] (x,x,x,x) program parameter buffer a, |
| all elements |
| |
| Table X.12: Program Parameter Buffer Bindings. <a> indicates a buffer |
| number, <b> and <c> indicate individual elements. |
| |
| If a program parameter buffer binding matches "program.buffer[a][b]", the |
| program parameter variable are filled with element <b> of the buffer |
| object bound to binding point <a>. Each element of the bound buffer |
| object is treated a one or four words of data that can hold integer or |
| floating-point values. When a single-component binding is evaluated, the |
| selected word is broadcast to all four components of the variable. When a |
| four-component binding is evaluated, the four components of the buffer |
| element are loaded into the variable. If no buffer object is bound to |
| binding point <a>, or the bound buffer object is not large enough to hold |
| an element <b>, the values used are undefined. The binding point <a> must |
| be a nonnegative integer constant. |
| |
| For program parameter buffer array declarations, "program.buffer[a][b..c]" |
| is equivalent to specifying elements <b> through <c> of the buffer object |
| bound to binding point <a> in order. |
| |
| For program parameter buffer array declarations, "program.buffer[a]" is |
| equivalent to specifying the entire buffer -- elements 0 through <n>-1, |
| where <n> is either the size of the array (if declared) or the |
| implementation-dependent maximum parameter buffer object size limit (if no |
| size is declared). |
| |
| |
| Section 2.X.3.7, Program Condition Code Registers |
| |
| The program condition code registers are four-component vectors. Each |
| component of this register is a collection of single-bit flags, including |
| a sign flag (SF), a zero flag (ZF), an overflow flag (OF), and a carry |
| flag (CF). There are two condition code registers (CC0 and CC1), whose |
| values are undefined at the beginning of program execution. |
| |
| Most program instructions can optionally update one of the condition code |
| registers, by designating the condition code to update in the instruction. |
| When a condition code component is updated, the four flags of each |
| component of the condition code are set according to the corresponding |
| component of the instruction result. Full details on the condition code |
| updates and tests can be found in Section 2.X.4.3. |
| |
| The value of these four flags can be combined in various condition code |
| tests, which can be used to mask writes to destination variables and to |
| perform conditional branches or other condition operations. |
| |
| |
| Section 2.X.3.8, Program Aliases |
| |
| Programs can create aliases by matching the <ALIAS_statement> grammar |
| rule. Aliases allow programs to use multiple variable names to refer to a |
| single underlying variable. For example, the statement |
| |
| ALIAS var1 = var0 |
| |
| establishes a variable name of "var1". Subsequent references to "var1" in |
| the program text are treated as references to "var0". The left hand side |
| of an ALIAS statement must be a new variable name, and the right hand side |
| must be an established variable name. |
| |
| Aliases are not considered variable declarations, so do not count against |
| the limits on the number of variable declarations allowed in the program |
| text. |
| |
| |
| Section 2.X.3.9, Program Resource Limits |
| |
| (see ARB_vertex_program specification, incorporates all the different |
| limits on instruction counts, temporaries, attribute bindings, program |
| parameters, and so on) |
| |
| |
| Section 2.X.4, Program Execution Environment |
| |
| The set of instructions supported for GPU programs is given in Table X.13 |
| below and is described in detail in Section 2.X.8. An instruction can use |
| up to three operands when it executes, and most instructions can write a |
| single result vector. Instructions may also specify one or more |
| modifiers, according to the <opModifiers> grammar rule. Instruction |
| modifiers affect how the specified operation is performed. |
| |
| GPU programs may operate on signed integer, unsigned integer, or |
| floating-point values; some instructions are capable of operating on any |
| of the three types. However, the data type of the operands and the result |
| are always determined based solely on the instruction and its modifiers. |
| If any of the variables used in the instruction are typeless, they will be |
| interpreted according to the data type derived from the instruction. If |
| any variables with a conflicting data type are used in the instruction, |
| the program will fail to load unless the "NTC" (no type checking) |
| instruction modifier is specified. |
| |
| Modifiers |
| Instruction F I C S H D Out Inputs Description |
| ----------- - - - - - - --- -------- -------------------------------- |
| ABS X X X X X F v v absolute value |
| ADD X X X X X F v v,v add |
| AND - X X - - S v v,v bitwise and |
| BRK - - - - - - - c break out of loop instruction |
| CAL - - - - - - - c subroutine call |
| CEIL X X X X X F v vf ceiling |
| CMP X X X X X F v v,v,v compare |
| CONT - - - - - - - c continue with next loop interation |
| COS X - X X X F s s cosine with reduction to [-PI,PI] |
| DIV X X X X X F v v,s divide vector components by scalar |
| DP2 X - X X X F s v,v 2-component dot product |
| DP2A X - X X X F s v,v,v 2-comp. dot product w/scalar add |
| DP3 X - X X X F s v,v 3-component dot product |
| DP4 X - X X X F s v,v 4-component dot product |
| DPH X - X X X F s v,v homogeneous dot product |
| DST X - X X X F v v,v distance vector |
| ELSE - - - - - - - - start if test else block |
| ENDIF - - - - - - - - end if test block |
| ENDREP - - - - - - - - end of repeat block |
| EX2 X - X X X F s s exponential base 2 |
| FLR X X X X X F v vf floor |
| FRC X - X X X F v v fraction |
| I2F - X X - - S vf v integer to float |
| IF - - - - - - - c start of if test block |
| KIL X X - - X F - vc kill fragment |
| LG2 X - X X X F s s logarithm base 2 |
| LIT X - X X X F v v compute lighting coefficients |
| LRP X - X X X F v v,v,v linear interpolation |
| MAD X X X X X F v v,v,v multiply and add |
| MAX X X X X X F v v,v maximum |
| MIN X X X X X F v v,v minimum |
| MOD - X X - - S v v,s modulus vector components by scalar |
| MOV X X X X X F v v move |
| MUL X X X X X F v v,v multiply |
| NOT - X X - - S v v bitwise not |
| NRM X - X X X F v v normalize 3-component vector |
| OR - X X - - S v v,v bitwise or |
| PK2H X X - - - F s vf pack two 16-bit floats |
| PK2US X X - - - F s vf pack two floats as unsigned 16-bit |
| PK4B X X - - - F s vf pack four floats as signed 8-bit |
| PK4UB X X - - - F s vf pack four floats as unsigned 8-bit |
| POW X - X X X F s s,s exponentiate |
| RCC X - X X X F s s reciprocal (clamped) |
| RCP X - X X X F s s reciprocal |
| REP X X - - X F - v start of repeat block |
| RET - - - - - - - c subroutine return |
| RFL X - X X X F v v,v reflection vector |
| ROUND X X X X X F v vf round to nearest integer |
| RSQ X - X X X F s s reciprocal square root |
| SAD - X X - - S vu v,v,vu sum of absolute differences |
| SCS X - X X X F v s sine/cosine without reduction |
| SEQ X X X X X F v v,v set on equal |
| SFL X X X X X F v v,v set on false |
| SGE X X X X X F v v,v set on greater than or equal |
| SGT X X X X X F v v,v set on greater than |
| SHL - X X - - S v v,s shift left |
| SHR - X X - - S v v,s shift right |
| SIN X - X X X F s s sine with reduction to [-PI,PI] |
| SLE X X X X X F v v,v set on less than or equal |
| SLT X X X X X F v v,v set on less than |
| SNE X X X X X F v v,v set on not equal |
| SSG X - X X X F v v set sign |
| STR X X X X X F v v,v set on true |
| SUB X X X X X F v v,v subtract |
| SWZ X - X X X F v v extended swizzle |
| TEX X X X X - F v vf texture sample |
| TRUNC X X X X X F v vf truncate (round toward zero) |
| TXB X X X X - F v vf texture sample with bias |
| TXD X X X X - F v vf,vf,vf texture sample w/partials |
| TXF X X X X - F v vs texel fetch |
| TXL X X X X - F v vf texture sample w/LOD |
| TXP X X X X - F v vf texture sample w/projection |
| TXQ - - - - - S vs vs texture info query |
| UP2H X X X X - F vf s unpack two 16-bit floats |
| UP2US X X X X - F vf s unpack two unsigned 16-bit ints |
| UP4B X X X X - F vf s unpack four signed 8-bit ints |
| UP4UB X X X X - F vf s unpack four unsigned 8-bit ints |
| X2D X - X X X F v v,v,v 2D coordinate transformation |
| XOR - X X - - S v v,v exclusive or |
| XPD X - X X X F v v,v cross product |
| |
| Table X.13: Summary of NV_gpu_program4 instructions. The "Modifiers" |
| columns specify the set of modifiers allowed for the instruction: |
| |
| F = floating-point data type modifiers |
| I = signed and unsigned integer data type modifiers |
| C = condition code update modifiers |
| S = clamping (saturation) modifiers |
| H = half-precision float data type suffix |
| D = default data type modifier (F, U, or S) |
| |
| The input and output columns describe the formats of the operands and |
| results of the instruction. |
| |
| v: 4-component vector (data type is inherited from operation) |
| vf: 4-component vector (data type is always floating-point) |
| vs: 4-component vector (data type is always signed integer) |
| vu: 4-component vector (data type is always unsigned integer) |
| s: scalar (replicated if written to a vector destination; |
| data type is inherited from operation) |
| c: condition code test result (e.g., "EQ", "GT1.x") |
| vc: 4-component vector or condition code test |
| |
| |
| Section 2.X.4.1, Program Instruction Modifiers |
| |
| There are several types of instruction modifiers available. A data type |
| modifier specifies that an instruction should operate on signed integer, |
| unsigned integer, or floating-point data, when multiple data types are |
| supported. A clamping modifier applies to instructions with |
| floating-point results, and specifies the range to which the results |
| should be clamped. A condition code update modifier specifies that the |
| instruction should update one of the condition code variables. Several |
| other special modifiers are also provided. |
| |
| Instruction modifiers may be specified as stand-alone modifiers or as |
| suffixes concatenated with the opcode name. A program will fail to load |
| if it contains an instruction that |
| |
| * specifies more than one modifier of any given type, |
| |
| * specifies a clamping modifier on an instruction, unless it produces |
| floating-point results, or |
| |
| * specifies a modifier that is not supported by the instruction (see |
| Table X.13 and the instruction description). |
| |
| Stand-alone instruction modifiers are specified according to the |
| <opModifiers> grammar rule using a ".<modifier>" syntax. Multiple |
| modifers, separated by periods, may be specified. The set of supported |
| modifiers is described in Table X.14. |
| |
| Modifier Description |
| -------- ----------------------------------------------- |
| F Floating-point operation |
| U Fixed-point operation, unsigned operands |
| S Fixed-point operation, signed operands |
| CC Update condition code register zero |
| CC0 Update condition code register zero |
| CC1 Update condition code register one |
| SAT Floating-point results clamped to [0,1] |
| SSAT Floating-point results clamped to [-1,1] |
| NTC Disable type-checking on operands/results |
| S24 Signed multiply (24-bit operands) |
| U24 Unsigned multiply (24-bit operands) |
| HI Multiplies two 32-bit integer operands, returns |
| the 32 MSBs of the product |
| |
| Table X.14, Instruction Modifers. |
| |
| "F", "U", and "S" modifiers are data type modifiers and specify that the |
| instruction should operate on floating-point, unsigned integer, or |
| signed integer values, respectively. For example, "ADD.F", "ADD.U", and |
| "ADD.S" specify component-wise addition of floating-point, unsigned |
| integer, or signed integer vectors, respectively. These modifiers specify |
| a data type, but do not specify a precision at which the operation is |
| performed. Floating-point operations will be carried out with an internal |
| precision no less than that used to represent the largest operand. |
| Fixed-point operations will be carried out using at least as many bits as |
| used to represent the largest operand. Operands represented with fewer |
| bits than used to perform the instruction will be promoted to a larger |
| data type. Signed integer operands will be sign-extended, where the most |
| significant bits are filled with ones if the operand is negative and zero |
| otherwise. Unsigned integer operands will be zero-extended, where the |
| most significant bits are always filled with zeroes. For some |
| instructions, the data type of some operands or the result are fixed; in |
| these cases, the data type modifier specifies the data type of the |
| remaining values. |
| |
| "CC", "CC0", and "CC1" are condition code update modifiers that specify |
| that one of the condition code registers should be updated based on the |
| result of the instruction, as described in section 2.X.4.3. "CC" and |
| "CC0" specify that the condition code register CC0 be updated; "CC1" |
| specifies an update to CC1. If no condition code update modifier is |
| provided, the condition code registers will not be affected. |
| |
| "SAT" and "SSAT" are clamping modifiers that specify that the |
| floating-point components of the instruction result should be clamped to |
| [0,1] or [-1,1], respectively, before updating the condition code and the |
| destination variable. If no clamping suffix is specified, unclamped |
| results will be used for condition code updates (if any) and destination |
| variable writes. Clamping modifiers are not supported on instructions |
| that do not produce floating-point results. |
| |
| "NTC" (no type checking) disables data type checking on the instruction, |
| and allows instructions to use operands or result variables whose data |
| types are inconsistent with the expected data types of the instruction. |
| |
| "S24", "U24", and "HI" are special modifiers that are allowed only for the |
| MUL instruction, and are described in detail where MUL is documented. No |
| more than one such modifier may be provided for any instruction. |
| |
| If an instruction supports data type modifiers, but none is provided, a |
| default data type will be chosen based on the instruction, as specified in |
| Table X.13 and the instruction set description (Section 2.X.8). If |
| condition code update or clamping modifiers are not specified, the |
| corresponding operation will not be performed. |
| |
| Additionally, each instruction name may have one or more suffixes, |
| concatenated onto the base instruction name, that operate as instruction |
| modifiers. For conciseness, these suffixes are not spelled out in the |
| grammar -- the base opcode name is used as a placeholder for the opcode |
| and all of its possible suffixes. Instruction suffixes are provided |
| mainly for compatibility with prior GPU program instruction sets (e.g., |
| NV_vertex_program3, NV_fragment_program2, and predecessors). The set of |
| allowable suffixes, and their equivalent stand-alone modifiers, are listed |
| in Table X.15. |
| |
| Suffix Modifier Description |
| ------ ---------- --------------------------------------------------- |
| R F Floating-point operation, 32-bit precision |
| H F(*) Floating-point operation, at least 16-bit precision |
| C CC0 Update condition code register zero |
| C0 CC0 Update condition code register zero |
| C1 CC1 Update condition code register one |
| _SAT SAT Floating-point results clamped to [0,1] |
| _SSAT SSAT Floating-point results clamped to [-1,1] |
| |
| Table X.15, Instruction Suffixes. |
| |
| The "R" and "H" suffixes specify floating-point operations and are |
| equivalent to the "F" data type modifier. They additionally specify a |
| minimum precision for the operations. Instructions with an "R" precision |
| modifier will be carried out at no less than IEEE single-precision |
| floating-point (8 bits of exponent, 23 bits of mantissa). Instructions |
| with an "H" precision modifier will be carried out at no less than 16-bit |
| floating-point precision (5 bits of exponent, 10 bits of mantissa). |
| |
| An instruction may have multiple suffixes, but they must appear in order, |
| with data type suffixes first, followed by condition code update suffixes, |
| followed by clamping suffixes. For example, "ADDR" carries out an add at |
| 32-bit precision. "ADDH_SAT" carries out an add at 16-bit precision (or |
| better) and clamps the results to [0,1]. "ADDRC1_SSAT" carries out an add |
| at 32-bit floating-point precision, clamps the results to [-1,1], and |
| updates condition code one based on the clamped result. |
| |
| |
| Section 2.X.4.2, Program Operands |
| |
| Most program instructions operate on one or more scalar or vector |
| operands. Each operand specifies an operand variable, which is either the |
| name of a previously declared variable or an implicit variable declaration |
| created by using a variable binding in the instruction. Attribute, |
| parameter, or parameter buffer variables can be declared implicitly by |
| using a valid binding name in an operand. Instruction operands are |
| specified by the <instOperandV>, <instOperandS>, or <instOperandVNS> |
| grammar rules. |
| |
| If the operand variable is not an array, its contents are loaded directly. |
| If the operand variable is an array, a single element of the array is |
| loaded according to the <arrayMem> grammar rule. The elements of an array |
| are numbered from 0 to <n>-1, where <n> is the number of entries in the |
| array. Array members can be accessed using either absolute or relative |
| addressing. |
| |
| Absolute array addressing is used when the <arrayMemAbs> grammar rule is |
| matched; the array member to load is specified by the matching integer. |
| Out-of-bounds array absolute accesses are not allowed. If the specified |
| member number is greater than or equal to the size of the array, the |
| program will fail to load. |
| |
| Relative array addressing is used when the <arrayMemRel> grammar rule is |
| matched. This grammar rule allows the program to specify a scalar integer |
| operand and an optional constant offset, according to the <arrayMemReg> |
| and <arrayMemOffset> grammar rules. When performing relative addressing, |
| the GL evaluates the specified integer scalar operand (according to the |
| rules specified in this section) and adds the constant offset. The array |
| member loaded is given by this sum. The constant offset is considered |
| zero if an offset is omitted. If the sum is negative or exceeds the size |
| of the array, the results of the access are undefined, but may not lead to |
| program or GL termination. The set of constant offsets supported for |
| relative addressing is limited to values in the range [0,<n>-1], where <n> |
| is the size of the array. A program will fail to load if it specifies an |
| offset outside that range. If offsets outside that range are required, |
| they can be applied by using an integer ADD instruction writing to a |
| temporary variable. |
| |
| After the operand is loaded, its components can be rearranged according to |
| the <swizzleSuffix> grammar rule, or it can be converted to a scalar |
| operand according to the <scalarSuffix> grammar rule. |
| |
| The <swizzleSuffix> grammar rule rearranges the components of a loaded |
| vector to produce another vector. If the <swizzleSuffix> rule matches the |
| <xyzwSwizzle> or <rgbaSwizzle> grammar rule, a pattern of the form ".????" |
| is used, where each question mark is replaced with one of "x", "y", "z", |
| "w", "r", "g", "b", or a". For such patterns, the x, y, z, and w |
| components of the operand are taken from the vector components named by |
| the first, second, third, and fourth character of the pattern, |
| respectively. Swizzle components of "r", "g", "b", and "a" are equivalent |
| to "x", "y", "z", and "w", respectively. For example, if the swizzle |
| suffix is ".yzzx" or ".gbbr" and the specified source contains {2,8,9,0}, |
| the result is the vector {8,9,9,2}. If the <swizzleSuffix> matches the |
| <component> grammar rule, a pattern of the form ".?" is used. For this |
| pattern, all four components of the operand are taken from the single |
| component identified by the pattern. If the swizzle suffix is omitted, |
| components are not rearranged and swizzling has no effect, as though |
| ".xyzw" were specified. |
| |
| The swizzle suffix rules do not allow mixing "x", "y", "z", or "w" |
| selectors with "r", "g", "b", or "a" selectors. A program will fail to |
| load if it contains a swizzle suffix with selectors from both of these |
| sets. |
| |
| The <scalarSuffix> grammar rule converts a vector to a scalar by selecting |
| a single component. The <scalarSuffix> rule is similar to the swizzle |
| selector, except that only a single component is selected. If the scalar |
| suffix is ".y" and the specified source contains {2,8,9,0}, the value is |
| the scalar value 8. |
| |
| Next, a component-wise negate operation is performed on the operand if the |
| <operandNeg> grammar rule matches "-". Negation is not performed if the |
| operand has no sign prefix, or is prefixed with "+". For unsigned integer |
| operands, the negate operand performs a two's complement operation. |
| |
| Next, a component-wise absolute value operation is performed on the |
| operand if the <instOperandAbsV> or <instOperandAbsS> grammar rule is |
| matched, by surrounding the operand with two "|" characters. The result |
| is optionally negated if the <operandAbsNeg> grammar rule matches "-". |
| For unsigned integer operands, the absolute value operation has no effect. |
| |
| |
| Section 2.X.4.3, Program Destination Variable Update |
| |
| Most program instructions perform computations that produce a result, |
| which will be written to a variable. Each instruction that computes a |
| result specifies a destination variable, which is either the name of a |
| previously declared variable or an implicit variable declaration created |
| by using a variable binding in the instruction. Result variables can be |
| declared implicitly by using a valid program result binding name in the |
| result portion of the instruction. Instruction results are specified |
| according to the <instResult> grammar rule. |
| |
| The destination variable may be a single member of an array. In this |
| case, a single array member is specified using the <arrayMem> grammar |
| rule, and the array member to update is computed in the exact same manner |
| as done for operand loads. If the array member is computed at run time, |
| and is negative or greater than or equal to the size of the array, the |
| results of the destination variable update are undefined and could result |
| in overwriting other program variables. |
| |
| The results of the operation may be obtained at a different precision than |
| that used to store the destination variable. If so, the results are |
| converted to match the size of the destination variable. For |
| floating-point values, the results are rounded to the nearest |
| floating-point value that can be represented in the destination variable. |
| If a result component is larger in magnitude than the largest |
| representable floating-point value in the data type of the destination |
| variable, an infinity encoding (+/-INF) is used. Signed or unsigned |
| integer values are sign-extended or zero-extended, respectively, if the |
| destination variable has more bits than the result, and have their most |
| significant bits discarded if the destination variable has fewer bits. |
| |
| Writes to individual components of a vector destination variable can be |
| controlled at compile time by individual component write masks specified |
| in the instruction. The component write mask is specified by the |
| <optWriteMask> grammar rule, and is a string of up to four characters, |
| naming the components to enable for writing. If no write mask is |
| specified, all components are enabled for writing. The characters "x", |
| "y", "z", and "w" match the x, y, z, and w components respectively. For |
| example, a write mask mask of ".xzw" indicates that the x, z, and w |
| components should be enabled for writing but the y component should not be |
| written. The grammar requires that the destination register mask |
| components must be listed in "xyzw" order. Additionally, write mask |
| components of "r", "g", "b", and "a" are equivalent to "x", "y", "z", and |
| "w", respectively. The grammar does not allow mixing "x", "y", "z", or |
| "w" components with "r", "g", "b", and "a" ones. |
| |
| Writes to individual components of a vector destination variable, or to a |
| scalar destination variable, can also be controlled at run time using |
| condition code write masks. The condition code write mask is specified by |
| the <ccMask> grammar rule. If a mask is specified, a condition code |
| variable is loaded according to the <ccMaskRule> grammar rule and tested |
| as described in Table X.16 to produce a four-component vector of TRUE/FALSE |
| values. |
| |
| mask rule test name condition |
| --------------- ---------------------- ----------------- |
| EQ, EQ0, EQ1 equal !SF && ZF |
| GE, GE0, GE1 greater than or equal !(SF ^ OF) |
| GT, GT0, GT1 greater than (!SF ^ OF) && !ZF |
| LE, LE0, LE1 less than or equal SF ^ (ZF || OF) |
| LT, LT0, LT1 less than (SF && !ZF) ^ OF |
| NE, NE0, NE1 not equal SF || !ZF |
| FL, FL0, FL1 false always false |
| TR, TR0, TR1 true always true |
| |
| NAN, NAN0, NAN1 not a number SF && ZF |
| LEG, LEG0, LEG1 less, equal, or greater !SF || !ZF |
| (anything but a NaN) |
| |
| CF, CF0, CF1 carry flag CF |
| NCF, NCF0, NCF1 no carry flag !CF |
| OF, OF0, OF1 overflow flag OF |
| NOF, NOF0, NOF1 no overflow flag !OF |
| SF, SF0, SF1 sign flag SF |
| NSF, NSF0, NSF1 no sign flag !SF |
| AB, AB0, AB1 above CF && !ZF |
| BLE, BLE0, BLE1 below or equal !CF || ZF |
| |
| Table X.16, Condition Code Tests. The allowed rules are specified in |
| the "mask rule" column. If "0" or "1" is appended to the rule name |
| (e.g., "EQ1"), the corresponding condition code register (CC1 in this |
| example) is loaded, otherwise CC0 is loaded. After loading, each |
| component is tested, using the expression listed in the "condition" |
| column. |
| |
| After the condition code tests are performed, the four-component result |
| can be swizzled according to the <swizzleSuffix> grammar rule. Individual |
| components of the destination variable are written only if the |
| corresponding component of the swizzled condition code test result is |
| TRUE. If both a (compile-time) component write mask and a condition code |
| write mask are specified, destination variable components are written only |
| if the corresponding component is enabled in both masks. |
| |
| A program instruction can also optionally update one of the two condition |
| code registers if the "CC", "CC0", or "CC1" instruction modifier are |
| specified. These instruction modifiers update condition code register |
| CC0, CC0, or CC1, respectively. The instructions "ADD.CC" or "ADD.CC0" |
| will perform an add and update condition code zero, "ADD.CC1" will add and |
| update condition code one, and "ADD" will simply perform the add without a |
| condition code update. The components of the selected condition code |
| register are updated if and only if the corresponding component of the |
| destination variable are enabled by both write masks. For the purposes of |
| condition code update, a scalar destination variable is treated as a |
| vector where the scalar result is written to "x" (if enabled in the write |
| mask), and writes to the "y", "z", and "w" components are disabled. |
| |
| When condition code components are written, the condition code flags are |
| updated based on the corresponding component of the result. If a |
| component of the destination register is not enabled for writes, the |
| corresponding condition code component is also unchanged. |
| |
| For floating-point results, the sign flag (SF) is set if the result is |
| less than zero or is a NaN (not a number) value. The zero flag (ZF) is |
| set if the result is equal to zero or is a NaN. |
| |
| For signed and unsigned integer results, the sign flag (SF) is set if the |
| most significant bit of the value written to the result variable is set |
| and the zero flag (ZF) is set if the result written is zero. For |
| instructions other than those performing an integer add or subtract (ADD, |
| MAD, SAD, SUB), the overflow and carry flags (OF and CF) are cleared. |
| |
| For integer add or subtract operations, the overflow and carry flags by |
| doing both signed and unsigned adds/subtracts as follows: |
| |
| The overflow flag (OF) is set by interpreting the two operands as signed |
| integers and performing a signed add or subtract. If the result is |
| representable as a signed integer (i.e., doesn't overflow), the overflow |
| flag is cleared; otherwise, it is set. |
| |
| The carry flag (CF) is set by interpreting the two operands as unsigned |
| integers and performing an unsigned add or subtract. If the result of |
| an add is representable as an unsigned integer (i.e., doesn't overflow), |
| the carry flag is cleared; otherwise, it is set. If the result of a |
| subtract is greater than or equal to zero, the carry flag is set; |
| otherwise, it is cleared. |
| |
| For the purposes of condition code setting, negation modifiers turn add |
| operations into subtracts and vice versa. If the operation is equivalent |
| to an add with both operands negated (-A-B), the carry and overflow flags |
| are both undefined. |
| |
| |
| Section 2.X.4.4, Program Texture Access |
| |
| Certain program instructions may access texture images, as described in |
| section 3.8. The coordinates, level-of-detail, and partial derivatives |
| used for performing the texture lookup are derived from values provided in |
| the program as described in the various sub-sections of Section 2.X.8. |
| These descriptions use the function |
| |
| result_t_vec |
| TextureSample(float_vec coord, float lod, float_vec ddx, |
| float_vec ddy, int_vec offset); |
| |
| which obtains a filtered texel value <tau> as described in Section 3.8.8 |
| and returns a 4-component vector (R,G,B,A) according to the format |
| conversions specified in Table 3.21. The result vector is interpreted as |
| floating-point, signed integer, or unsigned integer, according to the data |
| type modifier of the instruction. If the internal format of the texture |
| does not match the instruction's data type modifer, the results of the |
| texture lookup are undefined. |
| |
| (Note: For unextended OpenGL 2.0, all supported texture internal formats |
| store integer values but return floating-point results in the range [0,1] |
| on a texture lookup. The ARB_texture_float extension introduces |
| floating-point internal format where components are both stored and |
| returned as floating-point values. The EXT_texture_integer extension |
| introduces formats that both store and return either signed or unsigned |
| integer values.) |
| |
| <coord> is a four-component floating-point vector from which the (s,t,r) |
| texture coordinates used for the texture access, the layer used for array |
| textures, and the reference value used for depth comparisons (section |
| 3.8.14) are extracted according to Table X.17. If the texture is a cube |
| map, (s,t,r) is projected to one of the six cube faces to produce a new |
| (s,t) vector according to Section 3.8.6. For array textures, the layer |
| used is derived by rounding the extracted floating-point component to the |
| nearest integer and clamping the result to the range [0,<n>-1], where <n> |
| is the number of layers in the texture. |
| |
| <lod> specifies the level of detail parameter and replaces the value |
| computed in equation 3.18. <ddx> and <ddy> specify partial derivatives |
| (ds/dx, dt/dx, dr/dx, ds/dy, dt/dy, and dr/dy) for the texture |
| coordinates, and may be used to derive footprint shapes for anisotropic |
| texture filtering. |
| |
| <offset> is a constant 3-component signed integer vector specified |
| according to the <texOffset> grammar rule, which is added to the computed |
| <u>, <v>, and <w> texel locations prior to sampling. One, two, or three |
| components may be specified in the instruction; if fewer than three are |
| specified, the remaining offset components are zero. A limited range of |
| offset values are supported; the minimum and maximum <texOffset> values |
| are implementation-dependent and given by MIN_PROGRAM_TEXEL_OFFSET_EXT and |
| MAX_PROGRAM_TEXEL_OFFSET_EXT, respectively. A program will fail to load: |
| |
| * if the texture target specified in the instruction is 1D, ARRAY1D, |
| SHADOW1D, or SHADOWARRAY1D, and the second or third component of the |
| offset vector is non-zero, |
| |
| * if the texture target specified in the instruction is 2D, RECT, |
| ARRAY2D, SHADOW2D, SHADOWRECT, or SHADOWARRAY2D, and the third |
| component of the offset vector is non-zero, |
| |
| * if the texture target is CUBE or SHADOWCUBE, and any component of the |
| offset vector is non-zero -- texel offsets are not supported for cube |
| map or buffer textures, or |
| |
| * if any component of the offset vector is less than |
| MIN_PROGRAM_TEXEL_OFFSET_EXT or greater than |
| MAX_PROGRAM_TEXEL_OFFSET_EXT. |
| |
| (NOTE: Texel offsets are a new feature provided by this extension and are |
| described in more detail in edits to Section 3.8 below.) |
| |
| The texture used by TextureSample() is one of the textures bound to the |
| texture image unit whose number is specified in the instruction according |
| to the <texImageUnit> grammar rule. The texture target accessed is |
| specified according to the <texTarget> grammar rule and Table X.17. |
| Fixed-function texture enables are always ignored when determining the |
| texture to access in a program. |
| |
| coordinates used |
| texTarget Texture Type s t r layer shadow |
| ---------------- --------------------- ----- ----- ------ |
| 1D TEXTURE_1D x - - - - |
| 2D TEXTURE_2D x y - - - |
| 3D TEXTURE_3D x y z - - |
| CUBE TEXTURE_CUBE_MAP x y z - - |
| RECT TEXTURE_RECTANGLE_ARB x y - - - |
| ARRAY1D TEXTURE_1D_ARRAY_EXT x - - y - |
| ARRAY2D TEXTURE_2D_ARRAY_EXT x y - z - |
| SHADOW1D TEXTURE_1D x - - - z |
| SHADOW2D TEXTURE_2D x y - - z |
| SHADOWRECT TEXTURE_RECTANGLE_ARB x y - - z |
| SHADOWCUBE TEXTURE_CUBE_MAP x y z - w |
| SHADOWARRAY1D TEXTURE_1D_ARRAY_EXT x - - y z |
| SHADOWARRAY2D TEXTURE_2D_ARRAY_EXT x y - z w |
| BUFFER TEXTURE_BUFFER_EXT <not supported> |
| |
| Table X.17: Texture types accessed for each of the <texTarget>, and |
| coordinate mappings. The "SHADOW" and "ARRAY" targets are special |
| pseudo-targets described below. The "coordinates used" column indicate |
| the input values used for each coordinate of the texture lookup, the |
| layer selector for array textures, and the reference value for texture |
| comparisons. Buffer textures are not supported by normal texture lookup |
| functions, but are supported by TXF and TXQ, described below. |
| |
| Texture targets with "SHADOW" are used to access textures with a |
| DEPTH_COMPONENT base internal format using depth comparisons (Section |
| 3.8.14). Results of a texture access are undefined: |
| |
| * if a "SHADOW" target is used, and the corresponding texture has a base |
| internal format other than DEPTH_COMPONENT or a TEXTURE_COMPARE_MODE |
| of NONE, or |
| |
| * if a non-"SHADOW" target is used, and the corresponding texture has a |
| base internal format of DEPTH_COMPONENT and a TEXTURE_COMPARE_MODE |
| other than NONE. |
| |
| If the texture being accessed is not complete (or cube complete for |
| cubemap textures), no texture access is performed and the result is |
| undefined. |
| |
| A program will fail to load if it attempts to sample from multiple texture |
| targets (including the SHADOW pseudo-targets) on the same texture image |
| unit. For example, a program containing any two the following |
| instructions will fail to load: |
| |
| TEX out, coord, texture[0], 1D; |
| TEX out, coord, texture[0], 2D; |
| TEX out, coord, texture[0], ARRAY2D; |
| TEX out, coord, texture[0], SHADOW2D; |
| TEX out, coord, texture[0], 3D; |
| |
| Additionally, multiple texture targets for a single texture image unit may |
| not be used at the same time by the GL. The error INVALID_OPERATION is |
| generated by Begin, RasterPos, or any command that performs an implicit |
| Begin if an enabled program accesses one texture target for a texture unit |
| while another enabled program or fixed-function fragment processing |
| accesses a different texture target for the same texture image unit. |
| |
| Some texture instructions use standard methods to compute partial |
| derivatives and/or the level-of-detail used to perform texture accesses. |
| For fragment programs, the functions |
| |
| float_vec ComputePartialsX(float_vec coord); |
| float_vec ComputePartialsY(float_vec coord); |
| |
| compute approximate component-wise partial derivatives of the |
| floating-point vector <coord> relative to the X and Y coordinates, |
| respectively. For vertex and geometry programs, these functions always |
| return (0,0,0,0). The function |
| |
| float ComputeLOD(float_vec ddx, float_vec ddy); |
| |
| maps partial derivative vectors <ddx> and <ddy> to ds/dx, dt/dx, dr/dx, |
| ds/dy, dt/dy, and dr/dy and computes lambda_base(x,y) according to |
| equation 3.18. |
| |
| The TXF instruction provides the ability to extract a single texel from a |
| specified texture image using the function |
| |
| result_t_vec TexelFetch(int_vec coord, int_vec offset); |
| |
| The extracted texel is converted to an (R,G,B,A) vector according to Table |
| 3.21. The result vector is interpreted as floating-point, signed integer, |
| or unsigned integer, according to the data type modifier of the |
| instruction. If the internal format of the texture is not compatible with |
| the instruction's data type modifer, the extracted texel value is |
| undefined. |
| |
| <coord> is a four-component signed integer vector used to identify the |
| single texel accessed. The (i,j,k) coordinates of the texel and the layer |
| used for array textures are extracted according to Table X.18. The level |
| of detail accessed is obtained by adding the w component of <coord> to the |
| base level (level_base). <offset> is a constant 3-component signed |
| integer vector added to the texel coordinates prior to the texel fetch as |
| described above. In addition to the restrictions described above, |
| non-zero offset components are also not supported for BUFFER targets. |
| |
| The texture used by TexelFetch() is specified by the image unit and target |
| parameters provided in the instruction, as for TextureSample() above. |
| Single texel fetches can not perform depth comparisons or access cubemaps. |
| If a program contains a TXF instruction specifying one of the "SHADOW" or |
| "CUBE" targets, it will fail to load. |
| |
| coordinates used |
| texTarget supported i j k layer lod |
| ---------------- --------- ----- ----- --- |
| 1D yes x - - - w |
| 2D yes x y - - w |
| 3D yes x y z - w |
| CUBE no - - - - - |
| RECT yes x y - - w |
| ARRAY1D yes x - - y w |
| ARRAY2D yes x y - z w |
| SHADOW1D no - - - - - |
| SHADOW2D no - - - - - |
| SHADOWRECT no - - - - - |
| SHADOWCUBE no - - - - - |
| SHADOWARRAY1D no - - - - - |
| SHADOWARRAY2D no - - - - - |
| BUFFER yes x - - - - |
| |
| Table X.18, Mappings of texel fetch coordinates to texel location. |
| |
| Single-texel fetches do not support LOD clamping or any texture wrap mode, |
| and require a mipmapped minification filter to access any level of detail |
| other than the base level. The results of the texel fetch are undefined: |
| |
| * if the computed LOD is less than the texture's base level (level_base) |
| or greater than the maximum level (level_max), |
| |
| * if the computed LOD is not the texture's base level and the texture's |
| minification filter is NEAREST or LINEAR, |
| |
| * if the layer specified for array textures is negative or greater than |
| the number of layers in the array texture, |
| |
| * if the texel at (i,j,k) coordinates refer to a border texel outside |
| the defined extents of the specified LOD, where |
| |
| i < -b_s, j < -b_s, k < -b_s, |
| i >= w_s - b_s, j >= h_s - b_s, or k >= d_s - b_s, |
| |
| where the size parameters (w_s, h_s, d_s, and b_s) refer to the width, |
| height, depth, and border size of the image, as in equations 3.15, |
| 3.16, and 3.17, or |
| |
| * if the texture being accessed is not complete (or cube complete for |
| cubemaps). |
| |
| |
| Section 2.X.5, Program Flow Control |
| |
| In addition to basic arithmetic, logical, and texture instructions, a |
| number of flow control instructions are provided, which are described in |
| detail in Section 2.X.8. Programs can contain several types of |
| instruction blocks: IF/ELSE/ENDIF blocks, REP/ENDREP blocks, and |
| subroutine blocks. IF/ELSE/ENDIF blocks are a set of instructions |
| beginning with an "IF" instruction, ending with an "ENDIF" instruction, |
| and possibly containing an optional "ELSE" instruction. REP/ENDREP blocks |
| are a set of instructions beginning with a "REP" instruction and ending |
| with an "ENDREP" instruction. Subroutine blocks begin with an instruction |
| label identifying the name of the subroutine and ending just before the |
| next instruction label or the end of the program. Examples include the |
| following: |
| |
| MOVC CC, R0; |
| IF GT.x; |
| MOV R0, R1; # executes if R0.x > 0 |
| ELSE; |
| MOV R0, R2; # executes if R0.x <= 0 |
| ENDIF; |
| |
| REP repCount; |
| ADD R0, R0, R1; |
| ENDREP; |
| |
| square: # subroutine to compute R0^2 |
| MUL R0, R0, R0; |
| RET; |
| main: |
| MOV R0, 9.0; |
| CAL square; # compute 9.0^2 in R0 |
| |
| IF/ELSE/ENDIF and REP/ENDREP blocks may be nested inside each other, and |
| inside subroutines. In all cases, each instruction block must be |
| terminated with the appropriate instruction (ENDIF for IF, ENDREP for |
| REP). Nested instruction blocks must be wholly contained within a block |
| -- if a REP instruction is found between an IF and ELSE instruction, the |
| corresponding ENDREP must also be present between the IF and ELSE. |
| Subroutines may not be nested inside IF/ELSE/ENDIF or REP/ENDREP blocks, |
| or inside other subroutines. A program will fail to load if any |
| instruction block is terminated by an incorrect instruction, is not |
| terminated before the block containing it, or contains an instruction |
| label. |
| |
| IF/ELSE/ENDIF blocks evaluate a condition to determine which instructions |
| to execute. If the condition is true, all instructions between the IF and |
| ELSE are executed. If the condition is false, all instructions between |
| the ELSE and ENDIF are executed. The ELSE instruction is optional. If |
| the ELSE is omitted, all instructions between the IF and ENDIF are |
| executed if the condition is true, or skipped if the condition is false. |
| A limited amount of nesting is supported -- a program will fail to load if |
| an IF instruction is nested inside MAX_PROGRAM_IF_DEPTH_NV or more |
| IF/ELSE/ENDIF blocks. |
| |
| REP/ENDREP blocks are used to execute a sequence of instructions multiple |
| times. The REP instruction includes an optional scalar operand to specify |
| a loop count indicating the number of times the block of instructions |
| should be repeated. If the loop count is omitted, the contents of a |
| REP/ENDREP block will be repeated indefinitely until the loop is |
| explicitly terminated. A limited amount of nesting is supported -- a |
| program will fail to load if a REP instruction is nested inside |
| MAX_PROGRAM_LOOP_DEPTH_NV or more REP/ENDREP blocks. |
| |
| Within a REP/ENDREP block, the CONT instruction can be used to terminate |
| the current iteration of the loop by effectively jumping to the ENDREP |
| instruction. The BRK instruction can be used to terminate the entire loop |
| by effectively jumping to the instruction immediately following the ENDREP |
| instruction. If CONT and BRK instructions are found inside multiply |
| nested REP/ENDREP blocks, they apply to the innermost block. A program |
| will fail to load if it includes a CONT or BRK instruction that is not |
| contained inside a REP/ENDREP block. |
| |
| A REP/ENDREP block without a specified loop count can result in an |
| infinite loop. To prevent obvious infinite loops, a program will fail to |
| load if it contains a REP/ENDREP block that contains neither a BRK |
| instruction at the current nesting level or a RET instruction at any |
| nesting level. |
| |
| Subroutines are supported via the CAL and RET instructions. A subroutine |
| block is identified by an instruction, which can be any valid identifier |
| according to the <instLabel> grammar rule. The CAL instruction identifies |
| a subroutine name to call according to the <instTarget> grammar rule. |
| Instruction labels used in CAL instructions do not need to be defined in |
| the program text that precedes the instruction, but a program will fail to |
| load if it includes a CAL instruction that references an instruction label |
| that is not defined anywhere in the program. When a CAL instruction is |
| executed, it transfers control to the instruction immediately following |
| the specified instruction label. Subsequent instructions in that |
| subroutine are executed until a RET instruction is executed, or until |
| program execution reaches another instruction label or the end of the |
| program text. After the subroutine finishes, execution continues with the |
| instruction immediately following the CAL instruction. When a RET |
| instruction is issued, it will break out of any IF/ELSE/ENDIF or |
| REP/ENDREP blocks that contain it. |
| |
| Subroutines may call other subroutines before completing, up to an |
| implementation-dependent maximum depth of MAX_PROGRAM_CALL_DEPTH_NV calls. |
| Subroutines may call any subroutine in the program, including themselves, |
| as long as the call depth limit is obeyed. The results of issuing a CAL |
| instruction while MAX_PROGRAM_CALL_DEPTH subroutines have not completed |
| has undefined results, including possible program termination. |
| |
| Several flow control instructions include condition code tests. The IF |
| instruction requires a condition test to determine what instructions are |
| executed. The CONT, BRK, CAL, and RET instructions have an optional |
| condition code test; if the test fails, the instructions are not executed. |
| Condition code tests are specified by the <ccTest> grammar rule. The test |
| is evaluated like the condition code write mask (section 2.X.4.3), and |
| passes if and only if any of the four components passes. |
| |
| If an instruction label named "main" is specified, GPU program execution |
| begins with the instruction immediately following that label. Otherwise, |
| it begins with the first instruction of the program. Instructions are |
| executed in sequence until either a RET instruction is issued in the main |
| subroutine or the end of the program text is reached. |
| |
| |
| Section 2.X.6, Program Options |
| |
| Programs may specify a number of options to indicate that one or more |
| extended language features are used by the program. All program options |
| used by the program must be declared at the beginning of the program |
| string. Each program option specified in a program string will modify the |
| syntactic or semantic rules used to interpet the program and the execution |
| environment used to execute the program. Features in program options |
| not declared by the program are ignored, even if the option is otherwise |
| supported by the GL. Each option declaration consists of two tokens: the |
| keyword "OPTION" and an identifier. |
| |
| The set of available options depends on the program type, and is |
| enumerated in the specifications for each program type. Some program |
| types may not provide any options. |
| |
| |
| Section 2.X.7, Program Declarations |
| |
| Programs may include a number of declaration statements to specify |
| characteristics of the program. Each declaration statement is followed by |
| one or more arguments, separated by commas. |
| |
| The set of available declarations depends on the program type, and is |
| enumerated in the specifications for each program type. Some program |
| types may not provide declarations. |
| |
| |
| Section 2.X.8, Program Instruction Set |
| |
| The following sections enumerate the set of instructions supported for GPU |
| programs. |
| |
| Some instructions allow the use of one of the three basic data type |
| modifiers (floating point, signed integer, and unsigned integer). Unless |
| otherwise mentioned: |
| |
| * the result and all of the operands will be interpreted according to |
| the specified data type, and |
| |
| * if no data type modifier is specified, the instruction will operate as |
| though a floating-point modifier ("F") were specified. |
| |
| Some instructions will override one or both of these rules. |
| |
| |
| Section 2.X.8.Z, ABS: Absolute Value |
| |
| The ABS instruction performs a component-wise absolute value operation on |
| the single operand to yield a result vector. |
| |
| tmp = VectorLoad(op0); |
| result.x = abs(tmp.x); |
| result.y = abs(tmp.y); |
| result.z = abs(tmp.z); |
| result.w = abs(tmp.w); |
| |
| ABS supports all three data type modifiers. Taking the absolute value of |
| an unsigned integer is not a useful operation, but is not illegal. |
| |
| |
| Section 2.X.8.Z, ADD: Add |
| |
| The ADD instruction performs a component-wise add of the two operands to |
| yield a result vector. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| result.x = tmp0.x + tmp1.x; |
| result.y = tmp0.y + tmp1.y; |
| result.z = tmp0.z + tmp1.z; |
| result.w = tmp0.w + tmp1.w; |
| |
| ADD supports all three data type modifiers. |
| |
| |
| Section 2.X.8.Z, AND: Bitwise AND |
| |
| The AND instruction performs a bitwise AND operation on the components of |
| the two source vectors to yield a result vector. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| result.x = tmp0.x & tmp1.x; |
| result.y = tmp0.y & tmp1.y; |
| result.z = tmp0.z & tmp1.z; |
| result.w = tmp0.w & tmp1.w; |
| |
| AND supports only signed and unsigned integer data type modifiers. If no |
| type modifier is specified, both operands and the result are treated as |
| signed integers. |
| |
| |
| Section 2.X.8.Z, BRK: Break out of Loop Instruction |
| |
| The BRK instruction conditionally transfers control to the instruction |
| immediately following the next ENDREP instruction. A BRK instruction has |
| no effect if the condition code test evaluates to FALSE. |
| |
| The following pseudocode describes the operation of the instruction: |
| |
| if (TestCC(cc.c***) || TestCC(cc.*c**) || |
| TestCC(cc.**c*) || TestCC(cc.***c)) { |
| continue execution at instruction following the next ENDREP; |
| } |
| |
| |
| Section 2.X.8.Z, CAL: Subroutine Call |
| |
| The CAL instruction conditionally transfers control to the instruction |
| following the label specified in the instruction. It also pushes a |
| reference to the instruction immediately following the CAL instruction |
| onto the call stack, where execution will continue after executing the |
| matching RET instruction. The following pseudocode describes the |
| operation of the instruction: |
| |
| if (TestCC(cc.c***) || TestCC(cc.*c**) || |
| TestCC(cc.**c*) || TestCC(cc.***c)) { |
| if (callStackDepth >= MAX_PROGRAM_CALL_DEPTH_NV) { |
| // undefined results |
| } else { |
| callStack[callStackDepth] = nextInstruction; |
| callStackDepth++; |
| } |
| // continue execution at instruction following <instTarget> |
| } else { |
| // do nothing |
| } |
| |
| In the pseudocode, <instTarget> is the label specified in the instruction |
| matching the <branchLabel> grammar rule, <callStackDepth> is the current |
| depth of the call stack, <callStack> is an array holding the call stack, |
| and <nextInstruction> is a reference to the instruction immediately |
| following the CAL instruction in the program string. |
| |
| If the call stack overflows, the results of the CAL instruction are |
| undefined, and can result in immediate program termination. |
| |
| An instruction label signifies the beginning of a new subroutine. |
| Subroutines may not nest or overlap. If a CAL instruction is executed and |
| subsequent program execution reaches an instruction label before a |
| corresponding RET instruction is executed, the subroutine call returns |
| immediately, as though an unconditional RET instruction were inserted |
| immediately before the instruction label. |
| |
| (Note: On previous vertex program extensions -- NV_vertex_program2 and |
| NV_vertex_program3 -- instruction labels were also used as targets for |
| branch (BRA) instructions. This unstructured branching functionality has |
| been replaced with the structured branching constructs found in this |
| instruction set.) |
| |
| |
| Section 2.X.8.Z, CEIL: Ceiling |
| |
| The CEIL instruction loads a single vector operand and performs a |
| component-wise ceiling operation to generate a result vector. |
| |
| tmp = VectorLoad(op0); |
| iresult.x = ceil(tmp.x); |
| iresult.y = ceil(tmp.y); |
| iresult.z = ceil(tmp.z); |
| iresult.w = ceil(tmp.w); |
| |
| The ceiling operation returns the nearest integer greater than or equal to |
| the operand. For example ceil(-1.7) = -1.0, ceil(+1.0) = +1.0, and |
| ceil(+3.7) = +4.0. |
| |
| CEIL supports all three data type modifiers. The single operand is always |
| treated as a floating-point vector, but the result is written as a |
| floating-point value, a signed integer, or an unsigned integer, as |
| specified by the data type modifier. If a value is not exactly |
| representable using the data type of the result (e.g., an overflow or |
| writing a negative value to an unsigned integer), the result is undefined. |
| |
| |
| Section 2.X.8.Z, CMP: Compare |
| |
| The CMP instructions performs a component-wise comparison of the first |
| operand against zero, and copies the values of the second or third |
| operands based on the results of the compare. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| tmp2 = VectorLoad(op2); |
| result.x = (tmp0.x < 0) ? tmp1.x : tmp2.x; |
| result.y = (tmp0.y < 0) ? tmp1.y : tmp2.y; |
| result.z = (tmp0.z < 0) ? tmp1.z : tmp2.z; |
| result.w = (tmp0.w < 0) ? tmp1.w : tmp2.w; |
| |
| CMP supports all three data type modifiers. CMP with an unsigned data |
| type modifier is not a useful operation, but is not illegal. |
| |
| |
| Section 2.X.8.Z, CONT: Continue with Next Loop Iteration |
| |
| The CONT instruction conditionally transfers control to the next ENDREP |
| instruction. A CONT instruction has no effect if the condition code test |
| evaluates to FALSE. |
| |
| The following pseudocode describes the operation of the instruction: |
| |
| if (TestCC(cc.c***) || TestCC(cc.*c**) || |
| TestCC(cc.**c*) || TestCC(cc.***c)) { |
| continue execution at the next ENDREP; |
| } |
| |
| |
| Section 2.X.8.Z, COS: Cosine with Reduction to [-PI,PI] |
| |
| The COS instruction approximates the trigonometric cosine of the angle |
| specified by the scalar operand and replicates it to all four components |
| of the result vector. The angle is specified in radians and does not have |
| to be in the range [-PI,PI]. |
| |
| tmp = ScalarLoad(op0); |
| result.x = ApproxCosine(tmp); |
| result.y = ApproxCosine(tmp); |
| result.z = ApproxCosine(tmp); |
| result.w = ApproxCosine(tmp); |
| |
| COS supports only floating-point data type modifiers. |
| |
| |
| Section 2.X.8.Z, DDX: Partial Derivative Relative to X |
| |
| The DDX instruction computes approximate partial derivatives of a vector |
| operand with respect to the X window coordinate, and is only available to |
| fragment programs. See the NV_fragment_program4 specification for more |
| details. |
| |
| |
| Section 2.X.8.Z, DDY: Partial Derivative Relative to Y |
| |
| The DDY instruction computes approximate partial derivatives of a vector |
| operand with respect to the Y window coordinate, and is only available to |
| fragment programs. See the NV_fragment_program4 specification for more |
| details. |
| |
| |
| Section 2.X.8.Z, DIV: Divide Vector Components by Scalar |
| |
| The DIV instruction performs a component-wise divide of the first vector |
| operand by the second scalar operand to produce a 4-component result |
| vector. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = ScalarLoad(op1); |
| result.x = tmp0.x / tmp1; |
| result.y = tmp0.y / tmp1; |
| result.z = tmp0.z / tmp1; |
| result.w = tmp0.w / tmp1; |
| |
| DIV supports all three data type modifiers. For floating-point division, |
| this instruction is not guaranteed to produce results identical to a |
| RCP/MUL instruction sequence. |
| |
| The results of an signed or unsigned integer division by zero are |
| undefined. |
| |
| |
| Section 2.X.8.Z, DP2: 2-Component Dot Product |
| |
| The DP2 instruction computes a two-component dot product of the two |
| operands (using the first two components) and replicates the dot product |
| to all four components of the result vector. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| dot = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y); |
| result.x = dot; |
| result.y = dot; |
| result.z = dot; |
| result.w = dot; |
| |
| DP2 supports only floating-point data type modifiers. |
| |
| |
| Section 2.X.8.Z, DP2A: 2-Component Dot Product with Scalar Add |
| |
| The DP2 instruction computes a two-component dot product of the two |
| operands (using the first two components), adds the x component of the |
| third operand, and replicates the result to all four components of the |
| result vector. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| tmp2 = VectorLoad(op2); |
| dot = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + tmp2.x; |
| result.x = dot; |
| result.y = dot; |
| result.z = dot; |
| result.w = dot; |
| |
| DP2A supports only floating-point data type modifiers. |
| |
| |
| Section 2.X.8.Z, DP3: 3-Component Dot Product |
| |
| The DP3 instruction computes a three-component dot product of the two |
| operands (using the x, y, and z components) and replicates the dot product |
| to all four components of the result vector. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| dot = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + |
| (tmp0.z * tmp1.z); |
| result.x = dot; |
| result.y = dot; |
| result.z = dot; |
| result.w = dot; |
| |
| DP3 supports only floating-point data type modifiers. |
| |
| |
| Section 2.X.8.Z, DP4: 4-Component Dot Product |
| |
| The DP4 instruction computes a four-component dot product of the two |
| operands and replicates the dot product to all four components of the |
| result vector. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1): |
| dot = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + |
| (tmp0.z * tmp1.z) + (tmp0.w * tmp1.w); |
| result.x = dot; |
| result.y = dot; |
| result.z = dot; |
| result.w = dot; |
| |
| DP4 supports only floating-point data type modifiers. |
| |
| |
| Section 2.X.8.Z, DPH: Homogeneous Dot Product |
| |
| The DPH instruction computes a three-component dot product of the two |
| operands (using the x, y, and z components), adds the w component of the |
| second operand, and replicates the sum to all four components of the |
| result vector. This is equivalent to a four-component dot product where |
| the w component of the first operand is forced to 1.0. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1): |
| dot = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + |
| (tmp0.z * tmp1.z) + tmp1.w; |
| result.x = dot; |
| result.y = dot; |
| result.z = dot; |
| result.w = dot; |
| |
| DPH supports only floating-point data type modifiers. |
| |
| |
| Section 2.X.8.Z, DST: Distance Vector |
| |
| The DST instruction computes a distance vector from two specially- |
| formatted operands. The first operand should be of the form [NA, d^2, |
| d^2, NA] and the second operand should be of the form [NA, 1/d, NA, 1/d], |
| where NA values are not relevant to the calculation and d is a vector |
| length. If both vectors satisfy these conditions, the result vector will |
| be of the form [1.0, d, d^2, 1/d]. |
| |
| The exact behavior is specified in the following pseudo-code: |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| result.x = 1.0; |
| result.y = tmp0.y * tmp1.y; |
| result.z = tmp0.z; |
| result.w = tmp1.w; |
| |
| Given an arbitrary vector, d^2 can be obtained using the DP3 instruction |
| (using the same vector for both operands) and 1/d can be obtained from d^2 |
| using the RSQ instruction. |
| |
| This distance vector is useful for per-vertex light attenuation |
| calculations: a DP3 operation using the distance vector and an |
| attenuation constants vector as operands will yield the attenuation |
| factor. |
| |
| DST supports only floating-point data type modifiers. |
| |
| |
| Section 2.X.8.Z, ELSE: Start of If Test Else Block |
| |
| The ELSE instruction signifies the end of the "execute if true" portion of |
| an IF/ELSE/ENDIF block and the beginning of the "execute if false" |
| portion. |
| |
| If the condition evaluated at the IF statement was TRUE, when a program |
| reaches the ELSE statement, it has completed the entire "execute if true" |
| portion of the IF/ELSE/ENDIF block. Execution will continue at the |
| corresponding ENDIF instruction. |
| |
| If the condition evaluated at the IF statement was FALSE, program |
| execution would skip over the entire "execute if true" portion of the |
| IF/ELSE/ENDIF block, including the ELSE instruction. |
| |
| |
| Section 2.X.8.Z, EMIT: Emit Vertex |
| |
| The EMIT instruction emits a new vertex to be added to the current output |
| primitive generated by a geometry program, and is only available to |
| geometry programs. See the NV_geometry_program4 specification for more |
| details. |
| |
| |
| Section 2.X.8.Z, ENDIF: End of If Test Block |
| |
| The ENDIF instruction signifies the end of an IF/ELSE/ENDIF block. It has |
| no other effect on program execution. |
| |
| |
| Section 2.X.8,Z, ENDPRIM: End of Primitive |
| |
| A geometry program can emit multiple primitives in a single invocation. |
| The ENDPRIM instruction is used in a geometry program to signify the end |
| of the current primitive and the beginning of a new primitive of the same |
| type. It is only available to geometry programs. See the |
| NV_geometry_program4 specification for more details. |
| |
| |
| Section 2.X.8.Z, ENDREP: End of Repeat Block |
| |
| The ENDREP instruction specifies the end of a REP block. |
| |
| When used with in conjunction with a REP instruction with a loop count, |
| ENDREP decrements the loop counter. If the decremented loop counter is |
| greater than zero, ENDREP transfers control to the instruction immediately |
| after the corresponding REP instruction. If the loop counter is less than |
| or equal to zero, execution continues at the instruction following the |
| ENDREP instruction. When used in conjunction with a REP instruction |
| without loop count, ENDREP always transfers control to the instruction |
| immediately after the REP instruction. |
| |
| if (REP instruction includes a loop count) { |
| LoopCount--; |
| if (LoopCount > 0) { |
| continue execution at instruction following corresponding REP |
| instruction; |
| } |
| } else { |
| continue execution at instruction following corresponding REP |
| instruction; |
| } |
| |
| |
| Section 2.X.8.Z, EX2: Exponential Base 2 |
| |
| The EX2 instruction approximates 2 raised to the power of the scalar |
| operand and replicates the approximation to all four components of the |
| result vector. |
| |
| tmp = ScalarLoad(op0); |
| result.x = Approx2ToX(tmp); |
| result.y = Approx2ToX(tmp); |
| result.z = Approx2ToX(tmp); |
| result.w = Approx2ToX(tmp); |
| |
| EX2 supports only floating-point data type modifiers. |
| |
| |
| Section 2.X.8.Z, FLR: Floor |
| |
| The FLR instruction loads a single vector operand and performs a |
| component-wise floor operation to generate a result vector. |
| |
| tmp = VectorLoad(op0); |
| result.x = floor(tmp.x); |
| result.y = floor(tmp.y); |
| result.z = floor(tmp.z); |
| result.w = floor(tmp.w); |
| |
| The floor operation returns the nearest integer less than or equal to the |
| operand. For example floor(-1.7) = -2.0, floor(+1.0) = +1.0, and floor(+3.7) |
| = +3.0. |
| |
| FLR supports all three data type modifiers. The single operand is always |
| treated as a floating-point value, but the result is written as a |
| floating-point value, a signed integer, or an unsigned integer, as |
| specified by the data type modifier. If a value is not exactly |
| representable using the data type of the result (e.g., an overflow or |
| writing a negative value to an unsigned integer), the result is undefined. |
| |
| |
| Section 2.X.8.Z, FRC: Fraction |
| |
| The FRC instruction extracts the fractional portion of each component of |
| the operand to generate a result vector. The fractional portion of a |
| component is defined as the result after subtracting off the floor of the |
| component (see FLR), and is always in the range [0.0, 1.0). |
| |
| For negative values, the fractional portion is NOT the number written to |
| the right of the decimal point -- the fractional portion of -1.7 is not |
| 0.7 -- it is 0.3. 0.3 is produced by subtracting the floor of -1.7 (-2.0) |
| from -1.7. |
| |
| tmp = VectorLoad(op0); |
| result.x = fraction(tmp.x); |
| result.y = fraction(tmp.y); |
| result.z = fraction(tmp.z); |
| result.w = fraction(tmp.w); |
| |
| FRC supports only floating-point data type modifiers. |
| |
| |
| Section 2.X.8.Z, I2F: Integer to Float |
| |
| The I2F instruction converts the components of an integer vector operand |
| to floating-point to produce a floating-point result vector. |
| |
| tmp = VectorLoad(op0); |
| result.x = (float) tmp.x; |
| result.y = (float) tmp.y; |
| result.z = (float) tmp.z; |
| result.w = (float) tmp.w; |
| |
| I2F supports only signed and unsigned integer data type modifiers. The |
| single operand is interpreted according to the data type modifier. If no |
| data type modifier is specified, the operand is treated as a signed |
| integer vector. The result is always written as a float. |
| |
| |
| Section 2.X.8.Z, IF: Start of If Test Block |
| |
| The IF instruction performs a condition code test to determine what |
| instructions inside an IF/ELSE/ENDIF block are executed. If the test |
| passes, execution continues at the instruction immediately following the |
| IF instruction. If the test fails, IF transfers control to the |
| instruction immediately following the corresponding ELSE instruction (if |
| present) or the ENDIF instruction (if no ELSE is present). |
| |
| Implementations may have a limited ability to nest IF blocks in any |
| subroutine. If the number of IF/ENDIF blocks nested inside each other is |
| MAX_PROGRAM_IF_DEPTH_NV or higher, a program will fail to compile. |
| |
| // Evaluate the condition. If the condition is true, continue at the |
| // next instruction. Otherwise, continue at the |
| if (TestCC(cc.c***) || TestCC(cc.*c**) || |
| TestCC(cc.**c*) || TestCC(cc.***c)) { |
| continue execution at the next instruction; |
| } else if (IF block contains an ELSE statement) { |
| continue execution at instruction following corresponding ELSE; |
| } else { |
| continue execution at instruction following corresponding ENDIF; |
| } |
| |
| (Note: Unlike the NV_fragment_program2 extension, there is no run-time |
| limit on the maximum overall depth of IF/ENDIF nesting. As long as each |
| individual subroutine of the program obeys the static nesting limits, |
| there will be no run-time errors in the program. With the |
| NV_fragment_program2 extension, a program could terminate abnormally if it |
| called a subroutine inside a very deeply nested set of IF/ENDIF blocks and |
| the called subroutine also contained deeply nested IF/ENDIF blocks. SUch |
| an error could occur even if neither subroutine exceeded static limits.) |
| |
| |
| Section 2.X.8.Z, KIL: Kill Fragment |
| |
| The KIL instruction conditionally kills a fragment, and is only available |
| to fragment programs. See the NV_fragment_program4 specification for more |
| details. |
| |
| |
| Section 2.X.8.Z, LG2: Logarithm Base 2 |
| |
| The LG2 instruction approximates the base 2 logarithm of the scalar |
| operand and replicates it to all four components of the result vector. |
| |
| tmp = ScalarLoad(op0); |
| result.x = ApproxLog2(tmp); |
| result.y = ApproxLog2(tmp); |
| result.z = ApproxLog2(tmp); |
| result.w = ApproxLog2(tmp); |
| |
| If the scalar operand is zero or negative, the result is undefined. |
| |
| LG2 supports only floating-point data type modifiers. |
| |
| |
| Section 2.X.8.Z, LIT: Compute Lighting Coefficients |
| |
| The LIT instruction accelerates lighting computations by computing |
| lighting coefficients for ambient, diffuse, and specular light |
| contributions. The "x" component of the single operand is assumed to hold |
| a diffuse dot product (n dot VP_pli, as in the vertex lighting equations |
| in Section 2.13.1). The "y" component of the operand is assumed to hold a |
| specular dot product (n dot h_i). The "w" component of the operand is |
| assumed to hold the specular exponent of the material (s_rm), and is |
| clamped to the range (-128, +128) exclusive. |
| |
| The "x" component of the result vector receives the value that should be |
| multiplied by the ambient light/material product (always 1.0). The "y" |
| component of the result vector receives the value that should be |
| multiplied by the diffuse light/material product (n dot VP_pli). The "z" |
| component of the result vector receives the value that should be |
| multiplied by the specular light/material product (f_i * (n dot h_i) ^ |
| s_rm). The "w" component of the result is the constant 1.0. |
| |
| Negative diffuse and specular dot products are clamped to 0.0, as is done |
| in the standard per-vertex lighting operations. In addition, if the |
| diffuse dot product is zero or negative, the specular coefficient is |
| forced to zero. |
| |
| tmp = VectorLoad(op0); |
| if (tmp.x < 0) tmp.x = 0; |
| if (tmp.y < 0) tmp.y = 0; |
| if (tmp.w < -(128.0-epsilon)) tmp.w = -(128.0-epsilon); |
| else if (tmp.w > 128-epsilon) tmp.w = 128-epsilon; |
| result.x = 1.0; |
| result.y = tmp.x; |
| result.z = (tmp.x > 0) ? RoughApproxPower(tmp.y, tmp.w) : 0.0; |
| result.w = 1.0; |
| |
| Since 0^0 is defined to be 1, RoughApproxPower(0.0, 0.0) will produce 1.0. |
| |
| LIT supports only floating-point data type modifiers. |
| |
| |
| Section 2.X.8.Z, LRP: Linear Interpolation |
| |
| The LRP instruction performs a component-wise linear interpolation between |
| the second and third operands using the first operand as the blend factor. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| tmp2 = VectorLoad(op2); |
| result.x = tmp0.x * tmp1.x + (1 - tmp0.x) * tmp2.x; |
| result.y = tmp0.y * tmp1.y + (1 - tmp0.y) * tmp2.y; |
| result.z = tmp0.z * tmp1.z + (1 - tmp0.z) * tmp2.z; |
| result.w = tmp0.w * tmp1.w + (1 - tmp0.w) * tmp2.w; |
| |
| LRP supports only floating-point data type modifiers. |
| |
| |
| Section 2.X.8.Z, MAD: Multiply and Add |
| |
| The MAD instruction performs a component-wise multiply of the first two |
| operands, and then does a component-wise add of the product to the third |
| operand to yield a result vector. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| tmp2 = VectorLoad(op2); |
| result.x = tmp0.x * tmp1.x + tmp2.x; |
| result.y = tmp0.y * tmp1.y + tmp2.y; |
| result.z = tmp0.z * tmp1.z + tmp2.z; |
| result.w = tmp0.w * tmp1.w + tmp2.w; |
| |
| The multiplication and addition operations in this instruction are subject |
| to the same rules as described for the MUL and ADD instructions. |
| |
| MAD supports all three data type modifiers. |
| |
| |
| Section 2.X.8.Z, MAX: Maximum |
| |
| The MAX instruction computes component-wise maximums of the values in the |
| two operands to yield a result vector. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| result.x = (tmp0.x > tmp1.x) ? tmp0.x : tmp1.x; |
| result.y = (tmp0.y > tmp1.y) ? tmp0.y : tmp1.y; |
| result.z = (tmp0.z > tmp1.z) ? tmp0.z : tmp1.z; |
| result.w = (tmp0.w > tmp1.w) ? tmp0.w : tmp1.w; |
| |
| MAX supports all three data type modifiers. |
| |
| |
| Section 2.X.8.Z, MIN: Minimum |
| |
| The MIN instruction computes component-wise minimums of the values in the |
| two operands to yield a result vector. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| result.x = (tmp0.x > tmp1.x) ? tmp1.x : tmp0.x; |
| result.y = (tmp0.y > tmp1.y) ? tmp1.y : tmp0.y; |
| result.z = (tmp0.z > tmp1.z) ? tmp1.z : tmp0.z; |
| result.w = (tmp0.w > tmp1.w) ? tmp1.w : tmp0.w; |
| |
| MIN supports all three data type modifiers. |
| |
| |
| Section 2.X.8.Z, MOD: Modulus |
| |
| The MOD instruction performs a component-wise modulus operation on the first |
| vector operand by the second scalar operand to produce a 4-component result |
| vector. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = ScalarLoad(op1); |
| result.x = tmp0.x % tmp1; |
| result.y = tmp0.y % tmp1; |
| result.z = tmp0.z % tmp1; |
| result.w = tmp0.w % tmp1; |
| |
| MOD supports both signed and unsigned integer data type modifiers. If no |
| data type modifier is specified, both operands and the result are treated |
| as signed integers. |
| |
| A result component is undefined if the corresponding component of the |
| first operand is negative or if the second operand is less than or equal |
| to zero. |
| |
| |
| Section 2.X.8.Z, MOV: Move |
| |
| The MOV instruction copies the value of the operand to yield a result |
| vector. |
| |
| result = VectorLoad(op0); |
| |
| MOV supports all three data type modifiers. |
| |
| |
| Section 2.X.8.Z, MUL: Multiply |
| |
| The MUL instruction performs a component-wise multiply of the two operands |
| to yield a result vector. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| result.x = tmp0.x * tmp1.x; |
| result.y = tmp0.y * tmp1.y; |
| result.z = tmp0.z * tmp1.z; |
| result.w = tmp0.w * tmp1.w; |
| |
| MUL supports all three data type modifiers. The MUL instruction |
| additionally supports three special modifiers. |
| |
| The "S24" and "U24" modifiers specify "fast" signed or unsigned integer |
| multiplies of 24-bit quantities, respectively. The results of such |
| multiplies are undefined if either operand is outside the range |
| [-2^23,+2^23-1] for S24 or [0,2^24-1] for U24. If "S24" or "U24" is |
| specified, the data type is implied and normal data type modifiers may not |
| be provided. |
| |
| The "HI" modifier specifies a 32-bit integer multiply that returns the 32 |
| most significant bits of the 64-bit product. Integer multiplies without |
| the "HI" modifier normally return the least significant bits of the |
| product. If "HI" is specified, either of the "S" or "U" integer data type |
| modifiers must also be specified. |
| |
| Note that if condition code updates are performed on integer multiplies, |
| the overflow or carry flags are always cleared, even if the product |
| overflowed. If it is necessary to determine if the results of an integer |
| multiply overflowed, the MUL.HI instruction may be used. |
| |
| |
| Section 2.X.8.Z, NOT: Bitwise Not |
| |
| The NOT instruction performs a component-wise bitwise NOT operation on the |
| source vector to produce a result vector. |
| |
| tmp = VectorLoad(op0); |
| tmp.x = ~tmp.x; |
| tmp.y = ~tmp.y; |
| tmp.z = ~tmp.z; |
| tmp.w = ~tmp.w; |
| |
| NOT supports only integer data type modifiers. If no type modifier is |
| specified, the operand and the result are treated as signed integers. |
| |
| |
| Section 2.X.8.Z, NRM: Normalize 3-Component Vector |
| |
| The NRM instruction normalizes the vector given by the x, y, and z |
| components of the vector operand to produce the x, y, and z components of |
| the result vector. The w component of the result is undefined. |
| |
| tmp = VectorLoad(op0); |
| scale = ApproxRSQ(tmp.x * tmp.x + tmp.y * tmp.y + tmp.z * tmp.z); |
| result.x = tmp.x * scale; |
| result.y = tmp.y * scale; |
| result.z = tmp.z * scale; |
| result.w = undefined; |
| |
| NRM supports only floating-point data type modifiers. |
| |
| |
| Section 2.X.8.Z, OR: Bitwise Or |
| |
| The OR instruction performs a bitwise OR operation on the components of |
| the two source vectors to yield a result vector. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| result.x = tmp0.x | tmp1.x; |
| result.y = tmp0.y | tmp1.y; |
| result.z = tmp0.z | tmp1.z; |
| result.w = tmp0.w | tmp1.w; |
| |
| OR supports only integer data type modifiers. If no type modifier is |
| specified, both operands and the result are treated as signed integers. |
| |
| |
| Section 2.X.8.Z, PK2H: Pack Two 16-bit Floats |
| |
| The PK2H instruction converts the "x" and "y" components of the single |
| floating-point vector operand into 16-bit floating-point format, packs the |
| bit representation of these two floats into a 32-bit unsigned integer, and |
| replicates that value to all four components of the result vector. The |
| PK2H instruction can be reversed by the UP2H instruction below. |
| |
| tmp0 = VectorLoad(op0); |
| /* result obtained by combining raw bits of tmp0.x, tmp0.y */ |
| result.x = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16); |
| result.y = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16); |
| result.z = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16); |
| result.w = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16); |
| |
| PK2H supports all three data type modifiers. The single operand is always |
| treated as a floating-point value, but the result is written as a |
| floating-point value, a signed integer, or an unsigned integer, as |
| specified by the data type modifier. For integer results, the bits can be |
| interpreted as described above. For floating-point result variables, the |
| packed results do not constitute a meaningful floating-point variable and |
| should only be used to feed future unpack instructions. |
| |
| A program will fail to load if it contains a PK2H instruction that writes |
| its results to a variable declared as "SHORT". |
| |
| |
| Section 2.X.8.Z, PK2US: Pack Two Floats as Unsigned 16-bit |
| |
| The PK2US instruction converts the "x" and "y" components of the single |
| floating-point vector operand into a packed pair of 16-bit unsigned |
| scalars. The scalars are represented in a bit pattern where all '0' bits |
| corresponds to 0.0 and all '1' bits corresponds to 1.0. The bit |
| representations of the two converted components are packed into a 32-bit |
| unsigned integer, and that value is replicated to all four components of |
| the result vector. The PK2US instruction can be reversed by the UP2US |
| instruction below. |
| |
| tmp0 = VectorLoad(op0); |
| if (tmp0.x < 0.0) tmp0.x = 0.0; |
| if (tmp0.x > 1.0) tmp0.x = 1.0; |
| if (tmp0.y < 0.0) tmp0.y = 0.0; |
| if (tmp0.y > 1.0) tmp0.y = 1.0; |
| us.x = round(65535.0 * tmp0.x); /* us is a ushort vector */ |
| us.y = round(65535.0 * tmp0.y); |
| /* result obtained by combining raw bits of us. */ |
| result.x = ((us.x) | (us.y << 16)); |
| result.y = ((us.x) | (us.y << 16)); |
| result.z = ((us.x) | (us.y << 16)); |
| result.w = ((us.x) | (us.y << 16)); |
| |
| PK2US supports all three data type modifiers. The single operand is |
| always treated as a floating-point value, but the result is written as a |
| floating-point value, a signed integer, or an unsigned integer, as |
| specified by the data type modifier. For integer result variables, the |
| bits can be interpreted as described above. For floating-point result |
| variables, the packed results do not constitute a meaningful |
| floating-point variable and should only be used to feed future unpack |
| instructions. |
| |
| A program will fail to load if it contains a PK2US instruction that writes |
| its results to a variable declared as "SHORT". |
| |
| |
| Section 2.X.8.Z, PK4B: Pack Four Floats as Signed 8-bit |
| |
| The PK4B instruction converts the four components of the single |
| floating-point vector operand into 8-bit signed quantities. The signed |
| quantities are represented in a bit pattern where all '0' bits corresponds |
| to -128/127 and all '1' bits corresponds to +127/127. The bit |
| representations of the four converted components are packed into a 32-bit |
| unsigned integer, and that value is replicated to all four components of |
| the result vector. The PK4B instruction can be reversed by the UP4B |
| instruction below. |
| |
| tmp0 = VectorLoad(op0); |
| if (tmp0.x < -128/127) tmp0.x = -128/127; |
| if (tmp0.y < -128/127) tmp0.y = -128/127; |
| if (tmp0.z < -128/127) tmp0.z = -128/127; |
| if (tmp0.w < -128/127) tmp0.w = -128/127; |
| if (tmp0.x > +127/127) tmp0.x = +127/127; |
| if (tmp0.y > +127/127) tmp0.y = +127/127; |
| if (tmp0.z > +127/127) tmp0.z = +127/127; |
| if (tmp0.w > +127/127) tmp0.w = +127/127; |
| ub.x = round(127.0 * tmp0.x + 128.0); /* ub is a ubyte vector */ |
| ub.y = round(127.0 * tmp0.y + 128.0); |
| ub.z = round(127.0 * tmp0.z + 128.0); |
| ub.w = round(127.0 * tmp0.w + 128.0); |
| /* result obtained by combining raw bits of ub. */ |
| result.x = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24)); |
| result.y = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24)); |
| result.z = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24)); |
| result.w = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24)); |
| |
| PK4B supports all three data type modifiers. The single operand is always |
| treated as a floating-point value, but the result is written as a |
| floating-point value, a signed integer, or an unsigned integer, as |
| specified by the data type modifier. For integer result variables, the |
| bits can be interpreted as described above. For floating-point result |
| variables, the packed results do not constitute a meaningful |
| floating-point variable and should only be used to feed future unpack |
| instructions. A program will fail to load if it contains a PK4B |
| instruction that writes its results to a variable declared as "SHORT". |
| |
| |
| Section 2.X.8.Z, PK4UB: Pack Four Floats as Unsigned 8-bit |
| |
| The PK4UB instruction converts the four components of the single |
| floating-point vector operand into a packed grouping of 8-bit unsigned |
| scalars. The scalars are represented in a bit pattern where all '0' bits |
| corresponds to 0.0 and all '1' bits corresponds to 1.0. The bit |
| representations of the four converted components are packed into a 32-bit |
| unsigned integer, and that value is replicated to all four components of |
| the result vector. The PK4UB instruction can be reversed by the UP4UB |
| instruction below. |
| |
| tmp0 = VectorLoad(op0); |
| if (tmp0.x < 0.0) tmp0.x = 0.0; |
| if (tmp0.x > 1.0) tmp0.x = 1.0; |
| if (tmp0.y < 0.0) tmp0.y = 0.0; |
| if (tmp0.y > 1.0) tmp0.y = 1.0; |
| if (tmp0.z < 0.0) tmp0.z = 0.0; |
| if (tmp0.z > 1.0) tmp0.z = 1.0; |
| if (tmp0.w < 0.0) tmp0.w = 0.0; |
| if (tmp0.w > 1.0) tmp0.w = 1.0; |
| ub.x = round(255.0 * tmp0.x); /* ub is a ubyte vector */ |
| ub.y = round(255.0 * tmp0.y); |
| ub.z = round(255.0 * tmp0.z); |
| ub.w = round(255.0 * tmp0.w); |
| /* result obtained by combining raw bits of ub. */ |
| result.x = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24)); |
| result.y = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24)); |
| result.z = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24)); |
| result.w = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24)); |
| |
| PK4UB supports all three data type modifiers. The single operand is |
| always treated as a floating-point value, but the result is written as a |
| floating-point value, a signed integer, or an unsigned integer, as |
| specified by the data type modifier. For integer result variables, the |
| bits can be interpreted as described above. For floating-point result |
| variables, the packed results do not constitute a meaningful |
| floating-point variable and should only be used to feed future unpack |
| instructions. |
| |
| A program will fail to load if it contains a PK4UB instruction that writes |
| its results to a variable declared as "SHORT". |
| |
| |
| Section 2.X.8.Z, POW: Exponentiate |
| |
| The POW instruction approximates the value of the first scalar operand |
| raised to the power of the second scalar operand and replicates it to all |
| four components of the result vector. |
| |
| tmp0 = ScalarLoad(op0); |
| tmp1 = ScalarLoad(op1); |
| result.x = ApproxPower(tmp0, tmp1); |
| result.y = ApproxPower(tmp0, tmp1); |
| result.z = ApproxPower(tmp0, tmp1); |
| result.w = ApproxPower(tmp0, tmp1); |
| |
| The exponentiation approximation function may be implemented using the |
| base 2 exponentiation and logarithm approximation operations in the EX2 |
| and LG2 instructions. In particular, |
| |
| ApproxPower(a,b) = ApproxExp2(b * ApproxLog2(a)). |
| |
| Note that a logarithm may be involved even for cases where the exponent is |
| an integer. This means that it may not be possible to exponentiate |
| correctly with a negative base. In constrast, it is possible in a |
| "normal" mathematical formulation to raise negative numbers to integral |
| powers (e.g., (-3)^2== 9, and (-0.5)^-2==4). |
| |
| POW supports only floating-point data type modifiers. |
| |
| |
| Section 2.X.8.Z, RCC: Reciprocal (Clamped) |
| |
| The RCC instruction approximates the reciprocal of the scalar operand, |
| clamps the result to one of two ranges, and replicates the clamped result |
| to all four components of the result vector. |
| |
| If the approximated reciprocal is greater than 0.0, the result is clamped |
| to the range [2^-64, 2^+64]. If the approximate reciprocal is not greater |
| than zero, the result is clamped to the range [-2^+64, -2^-64]. |
| |
| tmp = ScalarLoad(op0); |
| result.x = ClampApproxReciprocal(tmp); |
| result.y = ClampApproxReciprocal(tmp); |
| result.z = ClampApproxReciprocal(tmp); |
| result.w = ClampApproxReciprocal(tmp); |
| |
| RCC supports only floating-point data type modifiers. |
| |
| |
| Section 2.X.8.Z, RCP: Reciprocal |
| |
| The RCP instruction approximates the reciprocal of the scalar operand and |
| replicates it to all four components of the result vector. |
| |
| tmp = ScalarLoad(op0); |
| result.x = ApproxReciprocal(tmp); |
| result.y = ApproxReciprocal(tmp); |
| result.z = ApproxReciprocal(tmp); |
| result.w = ApproxReciprocal(tmp); |
| |
| RCP supports only floating-point data type modifiers. |
| |
| |
| Section 2.X.8.Z, REP: Start of Repeat Block |
| |
| The REP instruction begins a REP/ENDREP block. The REP instruction |
| supports an optional operand whose x component specifies the initial value |
| for the loop count. The loop count indicates the number of times the |
| instructions between the REP and corresponding ENDREP instruction will be |
| executed. If the initial value of the loop count is not positive, the |
| entire block is skipped and execution continues at the instruction |
| following the corresponding ENDREP instruction. If the loop count is |
| specified as a floating-point value, it is converted to the largest |
| integer less than or equal to the specified value (i.e., taking its |
| floor). |
| |
| If no operand is provided to REP, the loop count is ignored and the |
| corresponding ENDREP instruction unconditionally transfers control to the |
| instruction immediately following the REP instruction. The only way to |
| exit such a loop is with the BRK instruction. To prevent obvious infinite |
| loops, a program that includes a REP/ENDREP block with no loop count will |
| fail to compile unless it contains either a BRK instruction at the current |
| nesting level or a RET instruction at any nesting level. |
| |
| Implementations may have a limited ability to nest REP/ENDREP blocks. If |
| the number of REP/ENDREP blocks nested inside each other is |
| MAX_PROGRAM_LOOP_DEPTH_NV or higher, a program will fail to compile. |
| |
| // Set up loop information for the new nesting level. |
| tmp = VectorLoad(op0); |
| LoopCount = floor(tmp.x); |
| if (LoopCount <= 0) { |
| continue execution at the corresponding ENDREP; |
| } |
| |
| REP supports all three data type modifiers. The single operand is |
| interpreted according to the data type modifier. |
| |
| (Note: Unlike the NV_fragment_program2 extension, REP blocks in this |
| extension support fully general looping; the specified loop count can be |
| computed in the program itself. Additionally, there is no run-time limit |
| on the maximum overall depth of REP/ENDREP nesting. As long as each |
| individual subroutine of the program obeys the static nesting limits, |
| there will be no run-time errors in the program. With the |
| NV_fragment_program2 extension, a program could terminate abnormally if it |
| called a subroutine inside a deeply nested set of REP/ENDREP blocks and |
| the called subroutine also contained deeply nested REP/ENDREP blocks. |
| Such an error could occur even if neither subroutine exceeded static |
| limits.) |
| |
| |
| Section 2.X.8.Z, RET: Subroutine Return |
| |
| The RET instruction conditionally returns from a subroutine initiated by a |
| CAL instruction by popping an instruction reference off the top of the |
| call stack and transferring control to the referenced instruction. The |
| following pseudocode describes the operation of the instruction: |
| |
| if (TestCC(cc.c***) || TestCC(cc.*c**) || |
| TestCC(cc.**c*) || TestCC(cc.***c)) { |
| if (callStackDepth <= 0) { |
| // terminate program |
| } else { |
| callStackDepth--; |
| instruction = callStack[callStackDepth]; |
| } |
| |
| // continue execution at <instruction> |
| } else { |
| // do nothing |
| } |
| |
| In the pseudocode, <callStackDepth> is the depth of the call stack, |
| <callStack> is an array holding the call stack, and <instruction> is a |
| reference to an instruction previously pushed onto the call stack. |
| |
| If the call stack is empty when RET executes, the program terminates |
| normally. |
| |
| |
| Section 2.X.8.Z, RFL: Reflection Vector |
| |
| The RFL instruction computes the reflection of the second vector operand |
| (the "direction" vector) about the vector specified by the first vector |
| operand (the "axis" vector). Both operands are treated as 3D vectors (the |
| w components are ignored). The result vector is another 3D vector (the |
| "reflected direction" vector). The length of the result vector, ignoring |
| rounding errors, should equal that of the second operand. |
| |
| axis = VectorLoad(op0); |
| direction = VectorLoad(op1); |
| tmp.w = (axis.x * axis.x + axis.y * axis.y + axis.z * axis.z); |
| tmp.x = (axis.x * direction.x + axis.y * direction.y + |
| axis.z * direction.z); |
| tmp.x = 2.0 * tmp.x; |
| tmp.x = tmp.x / tmp.w; |
| result.x = tmp.x * axis.x - direction.x; |
| result.y = tmp.x * axis.y - direction.y; |
| result.z = tmp.x * axis.z - direction.z; |
| result.w = undefined; |
| |
| RFL supports only floating-point data type modifiers. |
| |
| |
| Section 2.X.8.Z, ROUND: Round to Nearest Integer |
| |
| The ROUND instruction loads a single vector operand and performs a |
| component-wise round operation to generate a result vector. |
| |
| tmp = VectorLoad(op0); |
| result.x = round(tmp.x); |
| result.y = round(tmp.y); |
| result.z = round(tmp.z); |
| result.w = round(tmp.w); |
| |
| The round operation returns the nearest integer to the operand. If the |
| fractional portion of the operand is 0.5, round() selects the nearest even |
| integer. For example round(-1.7) = -2.0, round(+1.0) = +1.0, and |
| round(+3.7) = +4.0. |
| |
| ROUND supports all three data type modifiers. The single operand is |
| always treated as a floating-point value, but the result is written as a |
| floating-point value, a signed integer, or an unsigned integer, as |
| specified by the data type modifier. If a value is not exactly |
| representable using the data type of the result (e.g., an overflow or |
| writing a negative value to an unsigned integer), the result is undefined. |
| |
| |
| Section 2.X.8.Z, RSQ: Reciprocal Square Root |
| |
| The RSQ instruction approximates the reciprocal of the square root of the |
| scalar operand and replicates it to all four components of the result |
| vector. |
| |
| tmp = ScalarLoad(op0); |
| result.x = ApproxRSQRT(tmp); |
| result.y = ApproxRSQRT(tmp); |
| result.z = ApproxRSQRT(tmp); |
| result.w = ApproxRSQRT(tmp); |
| |
| If the operand is less than or equal to zero, the results of the |
| instruction are undefined. |
| |
| RSQ supports only floating-point data type modifiers. |
| |
| Note that this instruction differs from the RSQ instruction in |
| ARB_vertex_program in that it does not implicitly take the absolute value |
| of its operand. The |abs| operator can be used to achieve equivalent |
| semantics. |
| |
| |
| Section 2.X.8.Z, SAD: Sum of Absolute Differences |
| |
| The SAD instruction performs a component-wise difference of the first two |
| integer operands (subtracting the second from the first), and then does a |
| component-wise add of the absolute value of the difference to the third |
| unsigned integer operand to yield an unsigned integer result vector. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| tmp2 = VectorLoad(op2); |
| result.x = abs(tmp0.x - tmp1.x) + tmp2.x; |
| result.y = abs(tmp0.y - tmp1.y) + tmp2.y; |
| result.z = abs(tmp0.z - tmp1.z) + tmp2.z; |
| result.w = abs(tmp0.w - tmp1.w) + tmp2.w; |
| |
| SAD supports signed and unsigned integer data type modifiers. The first |
| two operands are interpreted according to the data type modifier. The |
| third operand and the result are always unsigned integers. |
| |
| |
| Section 2.X.8.Z, SCS: Sine/Cosine without Reduction |
| |
| The SCS instruction approximates the trigonometric sine and cosine of the |
| angle specified by the scalar operand and places the cosine in the x |
| component and the sine in the y component of the result vector. The z and |
| w components of the result vector are undefined. The angle is specified |
| in radians and must be in the range [-PI,PI]. |
| |
| tmp = ScalarLoad(op0); |
| result.x = ApproxCosine(tmp); |
| result.y = ApproxSine(tmp); |
| result.z = undefined; |
| result.w = undefined; |
| |
| If the scalar operand is not in the range [-PI,PI], the result vector is |
| undefined. |
| |
| SCS supports only floating-point data type modifiers. |
| |
| |
| Section 2.X.8.Z, SEQ: Set on Equal |
| |
| The SEQ instruction performs a component-wise comparison of the two |
| operands. Each component of the result vector returns a TRUE value |
| (described below) if the corresponding component of the first operand is |
| equal to that of the second, and a FALSE value otherwise. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| result.x = (tmp0.x == tmp1.x) ? TRUE : FALSE; |
| result.y = (tmp0.y == tmp1.y) ? TRUE : FALSE; |
| result.z = (tmp0.z == tmp1.z) ? TRUE : FALSE; |
| result.w = (tmp0.w == tmp1.w) ? TRUE : FALSE; |
| |
| SEQ supports all data type modifiers. For floating-point data types, the |
| TRUE value is 1.0 and the FALSE value is 0.0. For signed integer data |
| types, the TRUE value is -1 and the FALSE value is 0. For unsigned |
| integer data types, the TRUE value is the maximum integer value (all bits |
| are ones) and the FALSE value is zero. |
| |
| |
| Section 2.X.8.Z, SFL: Set on False |
| |
| The SFL instruction is a degenerate case of the other "Set on" |
| instructions that sets all components of the result vector to a FALSE |
| value (described below). |
| |
| result.x = FALSE; |
| result.y = FALSE; |
| result.z = FALSE; |
| result.w = FALSE; |
| |
| SFL supports all data type modifiers. For floating-point data types, the |
| FALSE value is 0.0. For signed and unsigned integer data types, the FALSE |
| value is zero. |
| |
| |
| Section 2.X.8.Z, SGE: Set on Greater Than or Equal |
| |
| The SGE instruction performs a component-wise comparison of the two |
| operands. Each component of the result vector returns a TRUE value |
| (described below) if the corresponding component of the first operand is |
| greater than or equal to that of the second, and a FALSE value otherwise. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| result.x = (tmp0.x >= tmp1.x) ? TRUE : FALSE; |
| result.y = (tmp0.y >= tmp1.y) ? TRUE : FALSE; |
| result.z = (tmp0.z >= tmp1.z) ? TRUE : FALSE; |
| result.w = (tmp0.w >= tmp1.w) ? TRUE : FALSE; |
| |
| SGE supports all data type modifiers. For floating-point data types, the |
| TRUE value is 1.0 and the FALSE value is 0.0. For signed integer data |
| types, the TRUE value is -1 and the FALSE value is 0. For unsigned |
| integer data types, the TRUE value is the maximum integer value (all bits |
| are ones) and the FALSE value is zero. |
| |
| |
| Section 2.X.8.Z, SGT: Set on Greater Than |
| |
| The SGT instruction performs a component-wise comparison of the two |
| operands. Each component of the result vector returns a TRUE value |
| (described below) if the corresponding component of the first operand is |
| greater than that of the second, and a FALSE value otherwise. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| result.x = (tmp0.x > tmp1.x) ? TRUE : FALSE; |
| result.y = (tmp0.y > tmp1.y) ? TRUE : FALSE; |
| result.z = (tmp0.z > tmp1.z) ? TRUE : FALSE; |
| result.w = (tmp0.w > tmp1.w) ? TRUE : FALSE; |
| |
| SGT supports all data type modifiers. For floating-point data types, the |
| TRUE value is 1.0 and the FALSE value is 0.0. For signed integer data |
| types, the TRUE value is -1 and the FALSE value is 0. For unsigned |
| integer data types, the TRUE value is the maximum integer value (all bits |
| are ones) and the FALSE value is zero. |
| |
| |
| Section 2.X.8.Z, SHL: Shift Left |
| |
| The SHL instruction performs a component-wise left shift of the bits of |
| the first operand by the value of the second scalar operand to produce a |
| result vector. The bits vacated during the shift operation are filled |
| with zeroes. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = ScalarLoad(op1); |
| result.x = tmp0.x << tmp1; |
| result.y = tmp0.y << tmp1; |
| result.z = tmp0.z << tmp1; |
| result.w = tmp0.w << tmp1; |
| |
| The results of a shift operation ("<<") are undefined if the value of the |
| second operand is negative, or greater than or equal to the number of bits |
| in the first operand. |
| |
| SHL supports both signed and unsigned integer data type modifiers. If no |
| modifier is provided, the operands and the result are treated as signed |
| integers. |
| |
| |
| Section 2.X.8.Z, SHR: Shift Right |
| |
| The SHR instruction performs a component-wise right shift of the bits of |
| the first operand by the value of the second scalar operand to produce a |
| result vector. The bits vacated during shift operation are filled with |
| zeros if the operand is non-negative and ones otherwise. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = ScalarLoad(op1); |
| result.x = tmp0.x >> tmp1; |
| result.y = tmp0.y >> tmp1; |
| result.z = tmp0.z >> tmp1; |
| result.w = tmp0.w >> tmp1; |
| |
| The results of a shift operation (">>") are undefined if the value of the |
| second operand is negative, or greater than or equal to the number of bits |
| in the first operand. |
| |
| SHR supports both signed and unsigned integer data type modifiers. If no |
| modifiers are provided, the operands and the result are treated as signed |
| integers. |
| |
| |
| Section 2.X.8.Z, SIN: Sine with Reduction to [-PI,PI] |
| |
| The SIN instruction approximates the trigonometric sine of the angle |
| specified by the scalar operand and replicates it to all four components |
| of the result vector. The angle is specified in radians and does not have |
| to be in the range [-PI,PI]. |
| |
| tmp = ScalarLoad(op0); |
| result.x = ApproxSine(tmp); |
| result.y = ApproxSine(tmp); |
| result.z = ApproxSine(tmp); |
| result.w = ApproxSine(tmp); |
| |
| SIN supports only floating-point data type modifiers. |
| |
| |
| Section 2.X.8.Z, SLE: Set on Less Than or Equal |
| |
| The SLE instruction performs a component-wise comparison of the two |
| operands. Each component of the result vector returns a TRUE value |
| (described below) if the corresponding component of the first operand is |
| less than or equal to that of the second, and a FALSE value otherwise. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| result.x = (tmp0.x <= tmp1.x) ? TRUE : FALSE; |
| result.y = (tmp0.y <= tmp1.y) ? TRUE : FALSE; |
| result.z = (tmp0.z <= tmp1.z) ? TRUE : FALSE; |
| result.w = (tmp0.w <= tmp1.w) ? TRUE : FALSE; |
| |
| SLE supports all data type modifiers. For floating-point data types, the |
| TRUE value is 1.0 and the FALSE value is 0.0. For signed integer data |
| types, the TRUE value is -1 and the FALSE value is 0. For unsigned |
| integer data types, the TRUE value is the maximum integer value (all bits |
| are ones) and the FALSE value is zero. |
| |
| |
| Section 2.X.8.Z, SLT: Set on Less Than |
| |
| The SLT instruction performs a component-wise comparison of the two |
| operands. Each component of the result vector returns a TRUE value |
| (described below) if the corresponding component of the first operand is |
| less than that of the second, and a FALSE value otherwise. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| result.x = (tmp0.x < tmp1.x) ? TRUE : FALSE; |
| result.y = (tmp0.y < tmp1.y) ? TRUE : FALSE; |
| result.z = (tmp0.z < tmp1.z) ? TRUE : FALSE; |
| result.w = (tmp0.w < tmp1.w) ? TRUE : FALSE; |
| |
| SLT supports all data type modifiers. For floating-point data types, the |
| TRUE value is 1.0 and the FALSE value is 0.0. For signed integer d
|