| Name |
| |
| NV_gpu_program4 |
| |
| Name Strings |
| |
| GL_NV_gpu_program4 |
| |
| Contact |
| |
| Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com) |
| |
| Status |
| |
| Shipping for GeForce 8 Series (November 2006) |
| |
| Version |
| |
| Last Modified Date: 09/11/2014 |
| NVIDIA Revision: 11 |
| |
| Number |
| |
| 322 |
| |
| Dependencies |
| |
| This extension is written against to OpenGL 2.0 specification. |
| |
| OpenGL 2.0 is not required, but we expect all implementations of this |
| extension will also support OpenGL 2.0. |
| |
| This extension is also written against the ARB_vertex_program |
| specification, which provides the basic mechanisms for the assembly |
| programming model used by this extension. |
| |
| This extension serves as the basis for the NV_fragment_program4, |
| NV_geometry_program4, and NV_vertex_program4, which all build on this |
| extension to support fragment, geometry, and vertex programs, |
| respectively. If "GL_NV_gpu_program4" is found in the extension string, |
| all of these extensions are supported. |
| |
| NV_parameter_buffer_object affects the definition of this extension. |
| |
| ARB_texture_rectangle trivially affects the definition of this extension. |
| |
| EXT_gpu_program_parameters trivially affects the definition of this |
| extension. |
| |
| EXT_texture_integer trivially affects the definition of this extension. |
| |
| EXT_texture_array trivially affects the definition of this extension. |
| |
| EXT_texture_buffer_object trivially affects the definition of this |
| extension. |
| |
| NV_primitive_restart trivially affects the definition of this extension. |
| |
| Overview |
| |
| This specification documents the common instruction set and basic |
| functionality provided by NVIDIA's 4th generation of assembly instruction |
| sets supporting programmable graphics pipeline stages. |
| |
| The instruction set builds upon the basic framework provided by the |
| ARB_vertex_program and ARB_fragment_program extensions to expose |
| considerably more capable hardware. In addition to new capabilities for |
| vertex and fragment programs, this extension provides a new program type |
| (geometry programs) further described in the NV_geometry_program4 |
| specification. |
| |
| NV_gpu_program4 provides a unified instruction set -- all instruction set |
| features are available for all program types, except for a small number of |
| features that make sense only for a specific program type. It provides |
| fully capable signed and unsigned integer data types, along with a set of |
| arithmetic, logical, and data type conversion instructions capable of |
| operating on integers. It also provides a uniform set of structured |
| branching constructs (if tests, loops, and subroutines) that fully support |
| run-time condition testing. |
| |
| This extension provides several new texture mapping capabilities. Shadow |
| cube maps are supported, where cube map faces can encode depth values. |
| Texture lookup instructions can include an immediate texel offset, which |
| can assist in advanced filtering. New instructions are provided to fetch |
| a single texel by address in a texture map (TXF) and query the size of a |
| specified texture level (TXQ). |
| |
| By and large, vertex and fragment programs written to ARB_vertex_program |
| and ARB_fragment_program can be ported directly by simply changing the |
| program header from "!!ARBvp1.0" or "!!ARBfp1.0" to "!!NVvp4.0" or |
| "!!NVfp4.0", and then modifying the code to take advantage of the expanded |
| feature set. There are a small number of areas where this extension is |
| not a functional superset of previous vertex program extensions, which are |
| documented in this specification. |
| |
| |
| New Procedures and Functions |
| |
| void ProgramLocalParameterI4iNV(enum target, uint index, |
| int x, int y, int z, int w); |
| void ProgramLocalParameterI4ivNV(enum target, uint index, |
| const int *params); |
| void ProgramLocalParametersI4ivNV(enum target, uint index, |
| sizei count, const int *params); |
| void ProgramLocalParameterI4uiNV(enum target, uint index, |
| uint x, uint y, uint z, uint w); |
| void ProgramLocalParameterI4uivNV(enum target, uint index, |
| const uint *params); |
| void ProgramLocalParametersI4uivNV(enum target, uint index, |
| sizei count, const uint *params); |
| |
| void ProgramEnvParameterI4iNV(enum target, uint index, |
| int x, int y, int z, int w); |
| void ProgramEnvParameterI4ivNV(enum target, uint index, |
| const int *params); |
| void ProgramEnvParametersI4ivNV(enum target, uint index, |
| sizei count, const int *params); |
| void ProgramEnvParameterI4uiNV(enum target, uint index, |
| uint x, uint y, uint z, uint w); |
| void ProgramEnvParameterI4uivNV(enum target, uint index, |
| const uint *params); |
| void ProgramEnvParametersI4uivNV(enum target, uint index, |
| sizei count, const uint *params); |
| |
| void GetProgramLocalParameterIivNV(enum target, uint index, |
| int *params); |
| void GetProgramLocalParameterIuivNV(enum target, uint index, |
| uint *params); |
| void GetProgramEnvParameterIivNV(enum target, uint index, |
| int *params); |
| void GetProgramEnvParameterIuivNV(enum target, uint index, |
| uint *params); |
| |
| New Tokens |
| |
| |
| Accepted by the <pname> parameter of GetBooleanv, GetIntegerv, |
| GetFloatv, and GetDoublev: |
| |
| MIN_PROGRAM_TEXEL_OFFSET_EXT 0x8904 |
| MAX_PROGRAM_TEXEL_OFFSET_EXT 0x8905 |
| |
| (note: these tokens are shared with the EXT_gpu_shader4 extension.) |
| |
| Accepted by the <pname> parameter of GetProgramivARB: |
| |
| PROGRAM_ATTRIB_COMPONENTS_NV 0x8906 |
| PROGRAM_RESULT_COMPONENTS_NV 0x8907 |
| MAX_PROGRAM_ATTRIB_COMPONENTS_NV 0x8908 |
| MAX_PROGRAM_RESULT_COMPONENTS_NV 0x8909 |
| MAX_PROGRAM_GENERIC_ATTRIBS_NV 0x8DA5 |
| MAX_PROGRAM_GENERIC_RESULTS_NV 0x8DA6 |
| |
| Additions to Chapter 2 of the OpenGL 1.5 Specification (OpenGL Operation) |
| |
| (Modify "Section 2.14.1" of the ARB_vertex_program specification, |
| describing program parameters.) |
| |
| Each program object has an associated array of program local parameters. |
| Program local parameters are four-component vectors whose components can |
| hold floating-point, signed integer, or unsigned integer values. The data |
| type of each local parameter is established when the parameter's values |
| are assigned. If a program attempts to read a local parameter using a |
| data type other than the one used when the parameter is set, the values |
| returned are undefined. ... The commands |
| |
| void ProgramLocalParameter4fARB(enum target, uint index, |
| float x, float y, float z, float w); |
| void ProgramLocalParameter4fvARB(enum target, uint index, |
| const float *params); |
| void ProgramLocalParameter4dARB(enum target, uint index, |
| double x, double y, double z, double w); |
| void ProgramLocalParameter4dvARB(enum target, uint index, |
| const double *params); |
| |
| void ProgramLocalParameterI4iNV(enum target, uint index, |
| int x, int y, int z, int w); |
| void ProgramLocalParameterI4ivNV(enum target, uint index, |
| const int *params); |
| void ProgramLocalParameterI4uiNV(enum target, uint index, |
| uint x, uint y, uint z, uint w); |
| void ProgramLocalParameterI4uivNV(enum target, uint index, |
| const uint *params); |
| |
| update the values of the program local parameter numbered <index> |
| belonging to the program object currently bound to <target>. For the |
| non-vector versions of these commands, the four components of the |
| parameter are updated with the values of <x>, <y>, <z>, and <w>, |
| respectively. For the vector versions, the components of the parameter |
| are updated with the array of four values pointed to by <params>. The |
| error INVALID_VALUE is generated if <index> is greater than or equal to |
| the number of program local parameters supported by <target>. |
| |
| The commands |
| |
| void ProgramLocalParameters4fvNV(enum target, uint index, |
| sizei count, const float *params); |
| void ProgramLocalParametersI4ivNV(enum target, uint index, |
| sizei count, const int *params); |
| void ProgramLocalParametersI4uivNV(enum target, uint index, |
| sizei count, const uint *params); |
| |
| update the values of the program local parameters numbered <index> through |
| <index> + <count> - 1 with the array of 4 * <count> values pointed to by |
| <params>. The error INVALID_VALUE is generated if the sum of <index> and |
| <count> is greater than the number of program local parameters supported |
| by <target>. |
| |
| When a program local parameter is updated, the data type of its components |
| is assigned according to the data type of the provided values. If values |
| provided are of type "float" or "double", the components of the parameter |
| are floating-point. If the values provided are of type "int", the |
| components of the parameter are signed integers. If the values provided |
| are of type "uint", the components of the parameter are unsigned integers. |
| |
| Additionally, each program target has an associated array of program |
| environment parameters. Unlike program local parameters, program |
| environment parameters are shared by all program objects of a given |
| target. Program environment parameters are four-component vectors whose |
| components can hold floating-point, signed integer, or unsigned integer |
| values. The data type of each environment parameter is established when |
| the parameter's values are assigned. If a program attempts to read an |
| environment parameter using a data type other than the one used when the |
| parameter is set, the values returned are undefined. ... The commands |
| |
| void ProgramEnvParameter4fARB(enum target, uint index, |
| float x, float y, float z, float w); |
| void ProgramEnvParameter4fvARB(enum target, uint index, |
| const float *params); |
| void ProgramEnvParameter4dARB(enum target, uint index, |
| double x, double y, double z, double w); |
| void ProgramEnvParameter4dvARB(enum target, uint index, |
| const double *params); |
| void ProgramEnvParameterI4iNV(enum target, uint index, |
| int x, int y, int z, int w); |
| void ProgramEnvParameterI4ivNV(enum target, uint index, |
| const int *params); |
| void ProgramEnvParameterI4uiNV(enum target, uint index, |
| uint x, uint y, uint z, uint w); |
| void ProgramEnvParameterI4uivNV(enum target, uint index, |
| const uint *params); |
| |
| update the values of the program environment parameter numbered <index> |
| for the given program target <target>. For the non-vector versions of |
| these commands, the four components of the parameter are updated with the |
| values of <x>, <y>, <z>, and <w>, respectively. For the vector versions, |
| the four components of the parameter are updated with the array of four |
| values pointed to by <params>. The error INVALID_VALUE is generated if |
| <index> is greater than or equal to the number of program environment |
| parameters supported by <target>. |
| |
| The commands |
| |
| void ProgramEnvParameters4fvNV(enum target, uint index, |
| sizei count, const float *params); |
| void ProgramEnvParametersI4ivNV(enum target, uint index, |
| sizei count, const int *params); |
| void ProgramEnvParametersI4uivNV(enum target, uint index, |
| sizei count, const uint *params); |
| |
| update the values of the program environment parameters numbered <index> |
| through <index> + <count> - 1 with the array of 4 * <count> values pointed |
| to by <params>. The error INVALID_VALUE is generated if the sum of |
| <index> and <count> is greater than the number of program local parameters |
| supported by <target>. |
| |
| When a program environment parameter is updated, the data type of its |
| components is assigned according to the data type of the provided values. |
| If values provided are of type "float" or "double", the components of the |
| parameter are floating-point. If the values provided are of type "int", |
| the components of the parameter are signed integers. If the values |
| provided are of type "uint", the components of the parameter are unsigned |
| integers. |
| |
| ... |
| |
| |
| Insert New Section 2.X between Sections 2.Y and 2.Z: |
| |
| Section 2.X, GPU Programs |
| |
| The GL provides a number of different program targets that allow an |
| application to either replace certain fixed-function pipeline stages with |
| a fully programmable model or use a program to control aspects of the GL |
| pipeline that previously had only hard-wired behavior. |
| |
| A common base instruction set is available for all program types, |
| providing both integer and floating-point operations. Structured |
| branching operations and subroutine calls are available. Texture |
| mapping (loading data from external images) is supported for all |
| program types. The main differences between the different program |
| types are the set of available inputs and outputs, which are program type- |
| specific, and a few instructions that are meaningful for only a subset |
| of program types. |
| |
| |
| |
| Section 2.X.2, Program Grammar |
| |
| GPU program strings are specified as an array of ASCII characters |
| containing the program text. When a GPU program is loaded by a call to |
| ProgramStringARB, the program string is parsed into a set of tokens |
| possibly separated by whitespace. Spaces, tabs, newlines, carriage |
| returns, and comments are considered whitespace. Comments begin with the |
| character "#" and are terminated by a newline, a carriage return, or the |
| end of the program array. |
| |
| The Backus-Naur Form (BNF) grammar below specifies the syntactically valid |
| sequences for GPU programs. The set of valid tokens can be inferred |
| from the grammar. A line containing "/* empty */" represents an empty |
| string and is used to indicate optional rules. A program is invalid if it |
| contains any tokens or characters not defined in this specification. |
| |
| Note that this extension is not a standalone extension and a small number |
| of grammar rules are left to be defined in the extensions defining the |
| specific vertex, fragment, and geometry program types. |
| |
| |
| <program> ::= <optionSequence> <declSequence> |
| <statementSequence> "END" |
| |
| <optionSequence> ::= <option> <optionSequence> |
| | /* empty */ |
| |
| <option> ::= "OPTION" <identifier> ";" |
| |
| <declSequence> ::= /* empty */ |
| |
| <statementSequence> ::= <statement> <statementSequence> |
| | /* empty */ |
| |
| <statement> ::= <instruction> ";" |
| | <namingStatement> ";" |
| | <instLabel> ":" |
| |
| <instruction> ::= <ALUInstruction> |
| | <TexInstruction> |
| | <FlowInstruction> |
| |
| <ALUInstruction> ::= <VECTORop_instruction> |
| | <SCALARop_instruction> |
| | <BINSCop_instruction> |
| | <BINop_instruction> |
| | <VECSCAop_instruction> |
| | <TRIop_instruction> |
| | <SWZop_instruction> |
| |
| <TexInstruction> ::= <TEXop_instruction> |
| | <TXDop_instruction> |
| |
| <FlowInstruction> ::= <BRAop_instruction> |
| | <FLOWCCop_instruction> |
| | <IFop_instruction> |
| | <REPop_instruction> |
| | <ENDFLOWop_instruction> |
| |
| <VECTORop_instruction> ::= <VECTORop> <opModifiers> <instResult> "," |
| <instOperandV> |
| |
| <VECTORop> ::= "ABS" |
| | "CEIL" |
| | "FLR" |
| | "FRC" |
| | "I2F" |
| | "LIT" |
| | "MOV" |
| | "NOT" |
| | "NRM" |
| | "PK2H" |
| | "PK2US" |
| | "PK4B" |
| | "PK4UB" |
| | "ROUND" |
| | "SSG" |
| | "TRUNC" |
| |
| <SCALARop_instruction> ::= <SCALARop> <opModifiers> <instResult> "," |
| <instOperandS> |
| |
| <SCALARop> ::= "COS" |
| | "EX2" |
| | "LG2" |
| | "RCC" |
| | "RCP" |
| | "RSQ" |
| | "SCS" |
| | "SIN" |
| | "UP2H" |
| | "UP2US" |
| | "UP4B" |
| | "UP4UB" |
| |
| <BINSCop_instruction> ::= <BINSCop> <opModifiers> <instResult> "," |
| <instOperandS> "," <instOperandS> |
| |
| <BINSCop> ::= "POW" |
| |
| <VECSCAop_instruction> ::= <VECSCAop> <opModifiers> <instResult> "," |
| <instOperandV> "," <instOperandS> |
| |
| <VECSCAop> ::= "DIV" |
| | "SHL" |
| | "SHR" |
| | "MOD" |
| |
| <BINop_instruction> ::= <BINop> <opModifiers> <instResult> "," |
| <instOperandV> "," <instOperandV> |
| |
| <BINop> ::= "ADD" |
| | "AND" |
| | "DP3" |
| | "DP4" |
| | "DPH" |
| | "DST" |
| | "MAX" |
| | "MIN" |
| | "MUL" |
| | "OR" |
| | "RFL" |
| | "SEQ" |
| | "SFL" |
| | "SGE" |
| | "SGT" |
| | "SLE" |
| | "SLT" |
| | "SNE" |
| | "STR" |
| | "SUB" |
| | "XPD" |
| | "DP2" |
| | "XOR" |
| |
| <TRIop_instruction> ::= <TRIop> <opModifiers> <instResult> "," |
| <instOperandV> "," <instOperandV> "," |
| <instOperandV> |
| |
| <TRIop> ::= "CMP" |
| | "DP2A" |
| | "LRP" |
| | "MAD" |
| | "SAD" |
| | "X2D" |
| |
| <SWZop_instruction> ::= <SWZop> <opModifiers> <instResult> "," |
| <instOperandVNS> "," <extendedSwizzle> |
| |
| <SWZop> ::= "SWZ" |
| |
| <TEXop_instruction> ::= <TEXop> <opModifiers> <instResult> "," |
| <instOperandV> "," <texAccess> |
| |
| <TEXop> ::= "TEX" |
| | "TXB" |
| | "TXF" |
| | "TXL" |
| | "TXP" |
| | "TXQ" |
| |
| <TXDop_instruction> ::= <TXDop> <opModifiers> <instResult> "," |
| <instOperandV> "," <instOperandV> "," |
| <instOperandV> "," <texAccess> |
| |
| <TXDop> ::= "TXD" |
| |
| <BRAop_instruction> ::= <BRAop> <opModifiers> <instTarget> |
| <optBranchCond> |
| |
| <BRAop> ::= "CAL" |
| |
| <FLOWCCop_instruction> ::= <FLOWCCop> <opModifiers> <optBranchCond> |
| |
| <FLOWCCop> ::= "RET" |
| | "BRK" |
| | "CONT" |
| |
| <IFop_instruction> ::= <IFop> <opModifiers> <ccTest> |
| |
| <IFop> ::= "IF" |
| |
| <REPop_instruction> ::= <REPop> <opModifiers> <instOperandV> |
| | <REPop> <opModifiers> |
| |
| <REPop> ::= "REP" |
| |
| <ENDFLOWop_instruction> ::= <ENDFLOWop> <opModifiers> |
| |
| <ENDFLOWop> ::= "ELSE" |
| | "ENDIF" |
| | "ENDREP" |
| |
| <opModifiers> ::= <opModifierItem> <opModifiers> |
| | /* empty */ |
| |
| <opModifierItem> ::= "." <opModifier> |
| |
| <opModifier> ::= "F" |
| | "U" |
| | "S" |
| | "CC" |
| | "CC0" |
| | "CC1" |
| | "SAT" |
| | "SSAT" |
| | "NTC" |
| | "S24" |
| | "U24" |
| | "HI" |
| |
| <texAccess> ::= <texImageUnit> "," <texTarget> |
| | <texImageUnit> "," <texTarget> "," <texOffset> |
| |
| <texImageUnit> ::= "texture" <optArrayMemAbs> |
| |
| <texTarget> ::= "1D" |
| | "2D" |
| | "3D" |
| | "CUBE" |
| | "RECT" |
| | "SHADOW1D" |
| | "SHADOW2D" |
| | "SHADOWRECT" |
| | "ARRAY1D" |
| | "ARRAY2D" |
| | "SHADOWCUBE" |
| | "SHADOWARRAY1D" |
| | "SHADOWARRAY2D" |
| |
| <texOffset> ::= "(" <texOffsetComp> ")" |
| | "(" <texOffsetComp> "," <texOffsetComp> ")" |
| | "(" <texOffsetComp> "," <texOffsetComp> "," |
| <texOffsetComp> ")" |
| |
| <texOffsetComp> ::= <optSign> <int> |
| |
| <optBranchCond> ::= /* empty */ |
| | <ccMask> |
| |
| <instOperandV> ::= <instOperandAbsV> |
| | <instOperandBaseV> |
| |
| <instOperandAbsV> ::= <operandAbsNeg> "|" <instOperandBaseV> "|" |
| |
| <instOperandBaseV> ::= <operandNeg> <attribUseV> |
| | <operandNeg> <tempUseV> |
| | <operandNeg> <paramUseV> |
| | <operandNeg> <bufferUseV> |
| |
| <instOperandS> ::= <instOperandAbsS> |
| | <instOperandBaseS> |
| |
| <instOperandAbsS> ::= <operandAbsNeg> "|" <instOperandBaseS> "|" |
| |
| <instOperandBaseS> ::= <operandNeg> <attribUseS> |
| | <operandNeg> <tempUseS> |
| | <operandNeg> <paramUseS> |
| | <operandNeg> <bufferUseS> |
| |
| <instOperandVNS> ::= <attribUseVNS> |
| | <tempUseVNS> |
| | <paramUseVNS> |
| | <bufferUseVNS> |
| |
| <operandAbsNeg> ::= <optSign> |
| |
| <operandNeg> ::= <optSign> |
| |
| <instResult> ::= <instResultCC> |
| | <instResultBase> |
| |
| <instResultCC> ::= <instResultBase> <ccMask> |
| |
| <instResultBase> ::= <tempUseW> |
| | <resultUseW> |
| |
| <namingStatement> ::= <varMods> <ATTRIB_statement> |
| | <varMods> <PARAM_statement> |
| | <varMods> <TEMP_statement> |
| | <varMods> <OUTPUT_statement> |
| | <varMods> <BUFFER_statement> |
| | <ALIAS_statement> |
| |
| <ATTRIB_statement> ::= "ATTRIB" <establishName> "=" <attribUseD> |
| |
| <PARAM_statement> ::= <PARAM_singleStmt> |
| | <PARAM_multipleStmt> |
| |
| <PARAM_singleStmt> ::= "PARAM" <establishName> <paramSingleInit> |
| |
| <PARAM_multipleStmt> ::= "PARAM" <establishName> <optArraySize> |
| <paramMultipleInit> |
| |
| <paramSingleInit> ::= "=" <paramUseDB> |
| |
| <paramMultipleInit> ::= "=" "{" <paramMultInitList> "}" |
| |
| <paramMultInitList> ::= <paramUseDM> |
| | <paramUseDM> "," <paramMultInitList> |
| |
| <TEMP_statement> ::= "TEMP" <varNameList> |
| |
| <OUTPUT_statement> ::= "OUTPUT" <establishName> "=" <resultUseD> |
| |
| <varMods> ::= <varModifier> <varMods> |
| | /* empty */ |
| |
| <varModifier> ::= "SHORT" |
| | "LONG" |
| | "INT" |
| | "UINT" |
| | "FLOAT" |
| |
| <ALIAS_statement> ::= "ALIAS" <establishName> "=" <establishedName> |
| |
| <BUFFER_statement> ::= <bufferDeclType> <establishName> "=" |
| <bufferSingleInit> |
| | <bufferDeclType> <establishName> |
| <optArraySize> "=" <bufferMultInit> |
| |
| <bufferDeclType> ::= "BUFFER" |
| | "BUFFER4" |
| |
| <bufferSingleInit> ::= "=" <bufferUseDB> |
| |
| <bufferMultInit> ::= "=" "{" <bufferMultInitList> "}" |
| |
| <bufferMultInitList> ::= <bufferUseDM> |
| | <bufferUseDM> "," <bufferMultInitList> |
| |
| <varNameList> ::= <establishName> |
| | <establishName> "," <varNameList> |
| |
| <attribUseV> ::= <attribBasic> <swizzleSuffix> |
| | <attribVarName> <swizzleSuffix> |
| | <attribVarName> <arrayMem> <swizzleSuffix> |
| | <attribColor> <swizzleSuffix> |
| | <attribColor> "." <colorType> <swizzleSuffix> |
| |
| <attribUseS> ::= <attribBasic> <scalarSuffix> |
| | <attribVarName> <scalarSuffix> |
| | <attribVarName> <arrayMem> <scalarSuffix> |
| | <attribColor> <scalarSuffix> |
| | <attribColor> "." <colorType> <scalarSuffix> |
| |
| <attribUseVNS> ::= <attribBasic> |
| | <attribVarName> |
| | <attribVarName> <arrayMem> |
| | <attribColor> |
| | <attribColor> "." <colorType> |
| |
| <attribUseD> ::= <attribBasic> |
| | <attribColor> |
| | <attribColor> "." <colorType> |
| | <attribMulti> |
| |
| <paramUseV> ::= <paramVarName> <optArrayMem> <swizzleSuffix> |
| | <stateSingleItem> <swizzleSuffix> |
| | <programSingleItem> <swizzleSuffix> |
| | <constantVector> <swizzleSuffix> |
| | <constantScalar> |
| |
| <paramUseS> ::= <paramVarName> <optArrayMem> <scalarSuffix> |
| | <stateSingleItem> <scalarSuffix> |
| | <programSingleItem> <scalarSuffix> |
| | <constantVector> <scalarSuffix> |
| | <constantScalar> |
| |
| <paramUseVNS> ::= <paramVarName> <optArrayMem> |
| | <stateSingleItem> |
| | <programSingleItem> |
| | <constantVector> |
| | <constantScalar> |
| |
| <paramUseDB> ::= <stateSingleItem> |
| | <programSingleItem> |
| | <constantVector> |
| | <signedConstantScalar> |
| |
| <paramUseDM> ::= <stateMultipleItem> |
| | <programMultipleItem> |
| | <constantVector> |
| | <signedConstantScalar> |
| |
| <stateMultipleItem> ::= <stateSingleItem> |
| | "state" "." <stateMatrixRows> |
| |
| <stateSingleItem> ::= "state" "." <stateMaterialItem> |
| | "state" "." <stateLightItem> |
| | "state" "." <stateLightModelItem> |
| | "state" "." <stateLightProdItem> |
| | "state" "." <stateFogItem> |
| | "state" "." <stateMatrixRow> |
| | "state" "." <stateTexGenItem> |
| | "state" "." <stateClipPlaneItem> |
| | "state" "." <statePointItem> |
| | "state" "." <stateTexEnvItem> |
| | "state" "." <stateDepthItem> |
| |
| <stateMaterialItem> ::= "material" "." <stateMatProperty> |
| | "material" "." <faceType> "." |
| <stateMatProperty> |
| |
| <stateMatProperty> ::= "ambient" |
| | "diffuse" |
| | "specular" |
| | "emission" |
| | "shininess" |
| |
| <stateLightItem> ::= "light" <arrayMemAbs> "." <stateLightProperty> |
| |
| <stateLightProperty> ::= "ambient" |
| | "diffuse" |
| | "specular" |
| | "position" |
| | "attenuation" |
| | "spot" "." <stateSpotProperty> |
| | "half" |
| |
| <stateSpotProperty> ::= "direction" |
| |
| <stateLightModelItem> ::= "lightmodel" "." <stateLModProperty> |
| |
| <stateLModProperty> ::= "ambient" |
| | "scenecolor" |
| | <faceType> "." "scenecolor" |
| |
| <stateLightProdItem> ::= "lightprod" <arrayMemAbs> "." |
| <stateLProdProperty> |
| | "lightprod" <arrayMemAbs> "." <faceType> "." |
| <stateLProdProperty> |
| |
| <stateLProdProperty> ::= "ambient" |
| | "diffuse" |
| | "specular" |
| |
| <stateFogItem> ::= "fog" "." <stateFogProperty> |
| |
| <stateFogProperty> ::= "color" |
| | "params" |
| |
| <stateMatrixRows> ::= <stateMatrixItem> |
| | <stateMatrixItem> "." <stateMatModifier> |
| | <stateMatrixItem> "." "row" <arrayRange> |
| | <stateMatrixItem> "." <stateMatModifier> "." |
| "row" <arrayRange> |
| |
| <stateMatrixRow> ::= <stateMatrixItem> "." "row" <arrayMemAbs> |
| | <stateMatrixItem> "." <stateMatModifier> "." |
| "row" <arrayMemAbs> |
| |
| <stateMatrixItem> ::= "matrix" "." <stateMatrixName> |
| |
| <stateMatModifier> ::= "inverse" |
| | "transpose" |
| | "invtrans" |
| |
| <stateMatrixName> ::= "modelview" <optArrayMemAbs> |
| | "projection" |
| | "mvp" |
| | "texture" <optArrayMemAbs> |
| | "program" <arrayMemAbs> |
| |
| <stateTexGenItem> ::= "texgen" <optArrayMemAbs> "." |
| <stateTexGenType> "." <stateTexGenCoord> |
| |
| <stateTexGenType> ::= "eye" |
| | "object" |
| |
| <stateTexGenCoord> ::= "s" |
| | "t" |
| | "r" |
| | "q" |
| |
| <stateClipPlaneItem> ::= "clip" <arrayMemAbs> "." "plane" |
| |
| <statePointItem> ::= "point" "." <statePointProperty> |
| |
| <statePointProperty> ::= "size" |
| | "attenuation" |
| |
| <stateTexEnvItem> ::= "texenv" <optArrayMemAbs> "." |
| <stateTexEnvProperty> |
| |
| <stateTexEnvProperty> ::= "color" |
| |
| <stateDepthItem> ::= "depth" "." <stateDepthProperty> |
| |
| <stateDepthProperty> ::= "range" |
| |
| <programSingleItem> ::= <progEnvParam> |
| | <progLocalParam> |
| |
| <programMultipleItem> ::= <progEnvParams> |
| | <progLocalParams> |
| |
| <progEnvParams> ::= "program" "." "env" <arrayMemAbs> |
| | "program" "." "env" <arrayRange> |
| |
| <progEnvParam> ::= "program" "." "env" <arrayMemAbs> |
| |
| <progLocalParams> ::= "program" "." "local" <arrayMemAbs> |
| | "program" "." "local" <arrayRange> |
| |
| <progLocalParam> ::= "program" "." "local" <arrayMemAbs> |
| |
| <constantVector> ::= "{" <constantVectorList> "}" |
| |
| <constantVectorList> ::= <signedConstantScalar> |
| | <signedConstantScalar> "," |
| <signedConstantScalar> |
| | <signedConstantScalar> "," |
| <signedConstantScalar> "," |
| <signedConstantScalar> |
| | <signedConstantScalar> "," |
| <signedConstantScalar> "," |
| <signedConstantScalar> "," |
| <signedConstantScalar> |
| |
| <signedConstantScalar> ::= <optSign> <constantScalar> |
| |
| <constantScalar> ::= <floatConstant> |
| | <intConstant> |
| |
| <floatConstant> ::= <float> |
| |
| <intConstant> ::= <int> |
| |
| <tempUseV> ::= <tempVarName> <swizzleSuffix> |
| |
| <tempUseS> ::= <tempVarName> <scalarSuffix> |
| |
| <tempUseVNS> ::= <tempVarName> |
| |
| <tempUseW> ::= <tempVarName> <optWriteMask> |
| |
| <resultUseW> ::= <resultBasic> <optWriteMask> |
| | <resultVarName> <optWriteMask> |
| |
| <resultUseD> ::= <resultBasic> |
| |
| <bufferUseV> ::= <bufferVarName> <optArrayMem> <swizzleSuffix> |
| |
| <bufferUseS> ::= <bufferVarName> <optArrayMem> <scalarSuffix> |
| |
| <bufferUseVNS> ::= <bufferVarName> <optArrayMem> |
| |
| <bufferUseDB> ::= <bufferBinding> <arrayMemAbs> |
| |
| <bufferUseDM> ::= <bufferBinding> <arrayMemAbs> |
| | <bufferBinding> <arrayRange> |
| | <bufferBinding> |
| |
| <bufferBinding> ::= "program" "." "buffer" <arrayMemAbs> |
| |
| <optArraySize> ::= "[" "]" |
| | "[" <int> "]" |
| |
| <optArrayMem> ::= /* empty */ |
| | <arrayMem> |
| |
| <arrayMem> ::= <arrayMemAbs> |
| | <arrayMemRel> |
| |
| <optArrayMemAbs> ::= /* empty */ |
| | <arrayMemAbs> |
| |
| <arrayMemAbs> ::= "[" <int> "]" |
| |
| <arrayMemRel> ::= "[" <arrayMemReg> <arrayMemOffset> "]" |
| |
| <arrayMemReg> ::= <addrUseS> |
| |
| <arrayMemOffset> ::= /* empty */ |
| | "+" <int> |
| | "-" <int> |
| |
| <arrayRange> ::= "[" <int> ".." <int> "]" |
| |
| <addrUseS> ::= <addrVarName> <scalarSuffix> |
| |
| <ccMask> ::= "(" <ccTest> ")" |
| |
| <ccTest> ::= <ccMaskRule> <swizzleSuffix> |
| |
| <ccMaskRule> ::= "EQ" |
| | "GE" |
| | "GT" |
| | "LE" |
| | "LT" |
| | "NE" |
| | "TR" |
| | "FL" |
| | "EQ0" |
| | "GE0" |
| | "GT0" |
| | "LE0" |
| | "LT0" |
| | "NE0" |
| | "TR0" |
| | "FL0" |
| | "EQ1" |
| | "GE1" |
| | "GT1" |
| | "LE1" |
| | "LT1" |
| | "NE1" |
| | "TR1" |
| | "FL1" |
| | "NAN" |
| | "NAN0" |
| | "NAN1" |
| | "LEG" |
| | "LEG0" |
| | "LEG1" |
| | "CF" |
| | "CF0" |
| | "CF1" |
| | "NCF" |
| | "NCF0" |
| | "NCF1" |
| | "OF" |
| | "OF0" |
| | "OF1" |
| | "NOF" |
| | "NOF0" |
| | "NOF1" |
| | "AB" |
| | "AB0" |
| | "AB1" |
| | "BLE" |
| | "BLE0" |
| | "BLE1" |
| | "SF" |
| | "SF0" |
| | "SF1" |
| | "NSF" |
| | "NSF0" |
| | "NSF1" |
| |
| <optWriteMask> ::= /* empty */ |
| | <xyzwMask> |
| | <rgbaMask> |
| |
| <xyzwMask> ::= "." "x" |
| | "." "y" |
| | "." "xy" |
| | "." "z" |
| | "." "xz" |
| | "." "yz" |
| | "." "xyz" |
| | "." "w" |
| | "." "xw" |
| | "." "yw" |
| | "." "xyw" |
| | "." "zw" |
| | "." "xzw" |
| | "." "yzw" |
| | "." "xyzw" |
| |
| <rgbaMask> ::= "." "r" |
| | "." "g" |
| | "." "rg" |
| | "." "b" |
| | "." "rb" |
| | "." "gb" |
| | "." "rgb" |
| | "." "a" |
| | "." "ra" |
| | "." "ga" |
| | "." "rga" |
| | "." "ba" |
| | "." "rba" |
| | "." "gba" |
| | "." "rgba" |
| |
| <swizzleSuffix> ::= /* empty */ |
| | "." <component> |
| | "." <xyzwSwizzle> |
| | "." <rgbaSwizzle> |
| |
| <extendedSwizzle> ::= <extSwizComp> "," <extSwizComp> "," |
| <extSwizComp> "," <extSwizComp> |
| |
| <extSwizComp> ::= <optSign> <xyzwExtSwizSel> |
| | <optSign> <rgbaExtSwizSel> |
| |
| <xyzwExtSwizSel> ::= "0" |
| | "1" |
| | <xyzwComponent> |
| |
| <rgbaExtSwizSel> ::= <rgbaComponent> |
| |
| <scalarSuffix> ::= "." <component> |
| |
| <component> ::= <xyzwComponent> |
| | <rgbaComponent> |
| |
| <xyzwComponent> ::= "x" |
| | "y" |
| | "z" |
| | "w" |
| |
| <rgbaComponent> ::= "r" |
| | "g" |
| | "b" |
| | "a" |
| |
| <optSign> ::= /* empty */ |
| | "-" |
| | "+" |
| |
| <faceType> ::= "front" |
| | "back" |
| |
| <colorType> ::= "primary" |
| | "secondary" |
| |
| <instLabel> ::= <identifier> |
| |
| <instTarget> ::= <identifier> |
| |
| <establishedName> ::= <identifier> |
| |
| <establishName> ::= <identifier> |
| |
| |
| The <int> rule matches an integer constant. The integer consists of a |
| sequence of one or more digits ("0" through "9"), or a sequence in |
| hexadecimal form beginning with "0x" followed by a sequence of one or more |
| hexadecimal digits ("0" through "9", "a" through "f", "A" through "F"). |
| |
| The <float> rule matches a floating-point constant consisting of an |
| integer part, a decimal point, a fraction part, an "e" or "E", and an |
| optionally signed integer exponent. The integer and fraction parts both |
| consist of a sequence of one or more digits ("0" through "9"). Either the |
| integer part or the fraction parts (not both) may be missing; either the |
| decimal point or the "e" (or "E") and the exponent (not both) may be |
| missing. Most grammar rules that allow floating-point values also allow |
| integers matching the <int> rule. |
| |
| The <identifier> rule matches a sequence of one or more letters ("A" |
| through "Z", "a" through "z"), digits ("0" through "9), underscores ("_"), |
| or dollar signs ("$"); the first character must not be a number. Upper |
| and lower case letters are considered different (names are |
| case-sensitive). The following strings are reserved keywords and may not |
| be used as identifiers: "fragment" (for fragment programs only), "vertex" |
| (for vertex and geometry programs), "primitive" (for fragment and geometry |
| programs), "program", "result", "state", and "texture". |
| |
| The <tempVarName>, <paramVarName>, <attribVarName>, <resultVarName>, and |
| <bufferName> rules match identifiers that have been previously established |
| as names of temporary, program parameter, attribute, result, and program |
| parameter buffer variables, respectively. |
| |
| The <xyzwSwizzle> and <rgbaSwizzle> rules match any 4-character strings |
| consisting only of the characters "x", "y", "z", and "w" (<xyzwSwizzle>) |
| or "r", "g", "b", "a" (<rgbaSwizzle>). |
| |
| The error INVALID_OPERATION is generated if a program fails to load |
| because it is not syntactically correct or for one of the semantic |
| restrictions described in the following sections. |
| |
| A successfully loaded program is parsed into a sequence of instructions. |
| Each instruction is identified by its tokenized name. The operation of |
| these instructions when executed is defined in section 2.X.4. A |
| successfully loaded program string replaces the program string previously |
| loaded into the specified program object. If the OUT_OF_MEMORY error is |
| generated by ProgramStringARB, no change is made to the previous contents |
| of the current program object. |
| |
| |
| Section 2.X.3, Program Variables |
| |
| Programs may operate on a number of different variables during their |
| execution. The following sections define the different classes of |
| variables that can be declared and used by a program. |
| |
| Some variable classes require variable bindings. Variable classes with |
| bindings refer to state that is either generated or consumed outside the |
| program. Examples of variable bindings include a vertex's normal, the |
| position of a vertex computed by a vertex program, an interpolated texture |
| coordinate, and the diffuse color of light 1. Variables that are used |
| only during program execution do not have bindings. |
| |
| Variables may be declared explicitly according to the <namingStatement> |
| grammar rule. Explicit variable declarations allow a program to establish |
| a variable name that can be used to refer to a specified resource in |
| subsequent instructions. Variables may be declared anywhere in the |
| program string, but must be declared prior to use. A program will fail to |
| load if it declares the same variable name more than once, or if it refers |
| to a variable name that has not been previously declared in the program |
| string. |
| |
| Variables may also be declared implicitly, simply by using a variable |
| binding as an operand in a program instruction. Such uses are considered |
| to automatically create a nameless variable using the specified binding. |
| Only variable from classes with bindings can be declared implicitly. |
| |
| |
| Section 2.X.3.1, Program Variable Types |
| |
| Explicit variable declarations may include one or more modifiers that |
| specify additional information about the variable, such as the size and |
| data type of the components of the variable. Variable modifiers are |
| specified according to the <varModifier> grammar rule. |
| |
| By default, variables are considered typeless. They can be used in |
| instructions that read or write the variable as floating-point values, |
| signed integers, or unsigned integers. If a variable is written using one |
| data type but then read using a different one, the results of the |
| operation are undefined. Variables with bindings are considered to be |
| read or written when their values are produced or consumed; the data type |
| used by the GL is specified in the description of each binding. |
| |
| Explicitly declared variables may optionally have one data type modifier, |
| which can be used to detect data type mismatch errors. Type modifers of |
| "INT", "UINT", and "FLOAT" indicate that the components of the variable |
| are stored as signed integers, unsigned integers, or floating-point |
| values, respectively. A program will fail to load if it attempts to read |
| or write a variable using a data type other than the one indicated by the |
| data type modifier. Variables without a data type modifier can be read or |
| written using any data type. |
| |
| Explicitly declared variables may optionally have one storage size |
| modifier. Variables decared as "SHORT" will be represented using at least |
| 16 bits per component. "SHORT" floating-point values will have at least 5 |
| bits of exponent and 10 bits of mantissa. Variables declared as "LONG" |
| will be represented with at least 32 bits per component. "LONG" |
| floating-point values will have at least 8 bits of exponent and 23 bits of |
| mantissa. If no size modifier is provided, the GL will automatically |
| select component sizes. Implementations are not required to support more |
| than one component size, so "SHORT", "LONG", and the default could all |
| refer to the same component size. The "LONG" modifier is supported only |
| for declarations of temporary variables ("TEMP"). The "SHORT" modifier is |
| supported only for declarations of temporary variables and result |
| variables ("OUTPUT"). |
| |
| Each variable declaration can include at most one data type and one |
| storage size modifier. A program will fail to load if it specifies |
| multiple data type or multiple storage size modifiers in a single variable |
| declaration. |
| |
| (NOTE: Fragment programs also support the modifiers "FLAT", "CENTROID", |
| and "NOPERSPECTIVE", which control how per-fragment attribute values are |
| produced. These modifiers are described in detail in the |
| NV_fragment_program4 specification.) |
| |
| Explicitly declared variables of all types may be declared as arrays. An |
| array variable has one or more members, numbered 0 through <n>-1, where |
| <n> is the number of entries in the array. The total number of entries in |
| the array can be declared using the <optArraySize> grammar rule. For |
| variable classes without bindings, an array size must be specified in the |
| program, and must be a positive integer. For variable classes with |
| bindings, a declared size is optional, and is taken from the number of |
| bindings assigned in the declaration if omitted. A program will fail to |
| load if the declared size of an array variable does not match the number |
| of assigned bindings. |
| |
| When a variable is declared as an array, instructions that use the |
| variable must specify an array member to access according to the |
| <arrayMem> grammar rule. A program will fail to load if it contains an |
| instruction that accesses an array variable without specifying an array |
| member or an instruction that specifies an array member for a non-array |
| variable. |
| |
| |
| Section 2.X.3.2, Program Attribute Variables |
| |
| Program attribute variables represent per-vertex or per-fragment inputs to |
| the program. All attribute variables have associated bindings, and are |
| read-only during program execution. Attribute variables may be declared |
| explicitly via the <ATTRIB_statement> grammar rule, or implicitly by using |
| an attribute binding in an instruction. |
| |
| The set of available attribute bindings depends on the program type, and |
| is enumerated in the specifications for each program type. |
| |
| The set of bindings allowed for attribute array variables is limited to |
| attribute state grouped in arrays (e.g., texture coordinates, generic |
| vertex attributes). Additionally, all bindings assigned to the array must |
| be of the same binding type and must increase consecutively. Examples of |
| valid and invalid binding lists include: |
| |
| vertex.attrib[1], vertex.attrib[2] # valid, 2-entry array |
| vertex.texcoord[0..3] # valid, 4-entry array |
| vertex.attrib[1], vertex.attrib[3] # invalid, skipped attrib 2 |
| vertex.attrib[2], vertex.attrib[1] # invalid, wrong order |
| vertex.attrib[1], vertex.texcoord[2] # invalid, different types |
| |
| Additionally, attribute bindings may be used in no more than one array |
| variable accessed with relative addressing. |
| |
| Implementations may have a limit on the total number of attribute binding |
| components used by each program target (MAX_PROGRAM_ATTRIB_COMPONENTS_NV). |
| Programs that use more attribute binding components than this limit will |
| fail to load. The method of counting used attribute binding components is |
| implementation-dependent, but must satisfy the following properties: |
| |
| * If an attribute binding is not referenced in a program, or is |
| referenced only in declarations of attribute variables that are not |
| used, none of its components are counted. |
| |
| * An attribute binding component may be counted as used only if there |
| exists an instruction operand where |
| |
| - the component is enabled for read by the swizzle pattern (Section |
| 2.X.4.2), and |
| |
| - the attribute binding is |
| |
| - referenced directly by the operand, |
| |
| - bound to a declared variable referenced by the operand, or |
| |
| - bound to a declared array variable where another binding in |
| the array satisfies one of the two previous conditions. |
| |
| Implementations are not required to optimize out unused elements of an |
| attribute array or components that are used in only some elements of |
| an array. The last of these rules is intended to cover the case where |
| the same attribute binding is used in multiple variables. |
| |
| For example, an operand whose swizzle pattern selects only the x |
| component may result in the x component of an attribute binding being |
| counted, but may never result in the counting of the y, z, or w |
| components of any attribute binding. |
| |
| * Implementations are not required to determine that components read by |
| an instruction are actually unused due to: |
| |
| - instruction write masks (for example, a component-wise ADD |
| operation that only writes the "x" component doesn't have to read |
| the "y", "z", and "w" components of its operands) or |
| |
| - any other properties of the instruction (for example, the DP3 |
| instruction computes a 3-component dot product doesn't have to |
| read the "w" component of its operands). |
| |
| |
| Section 2.X.3.3, Program Parameters |
| |
| Program parameter variables are used as constants during program |
| execution. All program parameter variables have associated bindings and |
| are read-only during program execution. Program parameters retain their |
| values across program invocations, although their values may change |
| between invocations due to GL state changes. Program parameter variables |
| may be declared explicitly via the <PARAM_statement> grammar rule, or |
| implicitly by using a parameter binding in an instruction. Except where |
| otherwise specified, program parameter bindings always specify |
| floating-point values. |
| |
| When declaring program parameter array variables, all bindings are |
| supported and can be assigned to array members in any order. The only |
| restriction is that no parameter binding may be used more than once in |
| array variables accessed using relative addressing. A program will fail |
| to load if any program parameter binding is used more than once in a |
| single array accessed using relative addressing or used at least once in |
| two or more arrays accessed using relative addressing. |
| |
| |
| Constant Bindings |
| |
| If a program parameter binding matches the <constantScalar> or |
| <signedConstantScalar> grammar rules, the corresponding program parameter |
| variable is bound to the vector (X,X,X,X), where X is the value of the |
| specified constant. |
| |
| If a program parameter binding matches <constantVector>, the corresponding |
| program parameter variable is bound to the vector (X,Y,Z,W), where X, Y, |
| Z, and W are the values corresponding to the first, second, third, and |
| fourth match of <signedConstantScalar>. If fewer than four constants are |
| specified, Y, Z, and W assume the values 0, 0, and 1, if their respective |
| constants are not specified. |
| |
| Constant bindings can be interpreted as having signed integer, unsigned |
| integer, or floating-point values, depending on how they are used in the |
| program text. For constants in variable declarations, the components of |
| the constant are interpreted according to the variable's component data |
| type modifier. If no data type modifier is specified in a declaration, |
| constants are interpreted as floating-point values. For constant bindings |
| used directly in an instruction, the components of the constant are |
| interpreted according to the required data type of the operand. A program |
| will fail to load if it specifies a floating-point constant value |
| (matching the <floatConstant> grammar rule) that should be interpreted as |
| a signed or unsigned integer, or a negative integer constant value that |
| should be interpreted as an unsigned integer. |
| |
| If the value used to specify a floating-point constant can not be exactly |
| represented, the nearest floating-point value will be used. If the value |
| used to specify an integer constant is too large to be represented, the |
| program will fail to load. |
| |
| |
| Program Environment/Local Parameter Bindings |
| |
| Binding Components Underlying State |
| ------------------------- ---------- ------------------------------- |
| program.env[a] (x,y,z,w) program environment parameter a |
| program.local[a] (x,y,z,w) program local parameter a |
| program.env[a..b] (x,y,z,w) program environment parameters |
| a through b |
| program.local[a..b] (x,y,z,w) program local parameters |
| a through b |
| |
| Table X.1: Program Environment/Local Parameter Bindings. <a> and <b> |
| indicate parameter numbers, where <a> must be less than or equal to <b>. |
| |
| If a program parameter binding matches "program.env[a]" or |
| "program.local[a]", the four components of the program parameter variable |
| are filled with the four components of program environment parameter <a> |
| or program local parameter <a> respectively. |
| |
| Additionally, for program parameter array bindings, "program.env[a..b]" |
| and "program.local[a..b]" are equivalent to specifying program environment |
| or local parameters <a> through <b> in order, respectively. A program |
| using any of these bindings will fail to load if <a> is greater than <b>. |
| |
| Program environment and local parameters are typeless, and may be |
| specified as signed integer, unsigned integer, or floating-point |
| variables. If a program environment parameter is read using a data type |
| other than the one used to specify it, an undefined value is returned. |
| |
| |
| Material Property Bindings |
| |
| Binding Components Underlying State |
| ----------------------------- ---------- ---------------------------- |
| state.material.ambient (r,g,b,a) front ambient material color |
| state.material.diffuse (r,g,b,a) front diffuse material color |
| state.material.specular (r,g,b,a) front specular material color |
| state.material.emission (r,g,b,a) front emissive material color |
| state.material.shininess (s,0,0,1) front material shininess |
| state.material.front.ambient (r,g,b,a) front ambient material color |
| state.material.front.diffuse (r,g,b,a) front diffuse material color |
| state.material.front.specular (r,g,b,a) front specular material color |
| state.material.front.emission (r,g,b,a) front emissive material color |
| state.material.front.shininess (s,0,0,1) front material shininess |
| state.material.back.ambient (r,g,b,a) back ambient material color |
| state.material.back.diffuse (r,g,b,a) back diffuse material color |
| state.material.back.specular (r,g,b,a) back specular material color |
| state.material.back.emission (r,g,b,a) back emissive material color |
| state.material.back.shininess (s,0,0,1) back material shininess |
| |
| Table X.3: Material Property Bindings. If a material face is not |
| specified in the binding, the front property is used. |
| |
| If a program parameter binding matches any of the material properties |
| listed in Table X.3, the program parameter variable is filled according to |
| the table. For ambient, diffuse, specular, or emissive colors, the "x", |
| "y", "z", and "w" components are filled with the "r", "g", "b", and "a" |
| components, respectively, of the corresponding material color. For |
| material shininess, the "x" component is filled with the material's |
| specular exponent, and the "y", "z", and "w" components are filled with |
| the floating-point constants 0, 0, and 1, respectively. Bindings |
| containing ".back" refer to the back material; all other bindings refer to |
| the front material. |
| |
| Material properties can be changed inside a Begin/End pair, either |
| directly by calling Material, or indirectly through color material. |
| However, such property changes are not guaranteed to update program |
| parameter bindings until the following End command. Program parameter |
| variables bound to material properties changed inside a Begin/End pair are |
| undefined until the following End command. |
| |
| |
| Light Property Bindings |
| |
| Binding Components Underlying State |
| ----------------------------- ---------- ---------------------------- |
| state.light[n].ambient (r,g,b,a) light n ambient color |
| state.light[n].diffuse (r,g,b,a) light n diffuse color |
| state.light[n].specular (r,g,b,a) light n specular color |
| state.light[n].position (x,y,z,w) light n position |
| state.light[n].attenuation (a,b,c,e) light n attenuation constants |
| and spot light exponent |
| state.light[n].spot.direction (x,y,z,c) light n spot direction and |
| cutoff angle cosine |
| state.light[n].half (x,y,z,1) light n infinite half-angle |
| state.lightmodel.ambient (r,g,b,a) light model ambient color |
| state.lightmodel.scenecolor (r,g,b,a) light model front scene color |
| state.lightmodel. (r,g,b,a) light model front scene color |
| front.scenecolor |
| state.lightmodel. (r,g,b,a) light model back scene color |
| back.scenecolor |
| state.lightprod[n].ambient (r,g,b,a) light n / front material |
| ambient color product |
| state.lightprod[n].diffuse (r,g,b,a) light n / front material |
| diffuse color product |
| state.lightprod[n].specular (r,g,b,a) light n / front material |
| specular color product |
| state.lightprod[n]. (r,g,b,a) light n / front material |
| front.ambient ambient color product |
| state.lightprod[n]. (r,g,b,a) light n / front material |
| front.diffuse diffuse color product |
| state.lightprod[n]. (r,g,b,a) light n / front material |
| front.specular specular color product |
| state.lightprod[n]. (r,g,b,a) light n / back material |
| back.ambient ambient color product |
| state.lightprod[n]. (r,g,b,a) light n / back material |
| back.diffuse diffuse color product |
| state.lightprod[n]. (r,g,b,a) light n / back material |
| back.specular specular color product |
| |
| Table X.4: Light Property Bindings. <n> indicates a light number. |
| |
| If a program parameter binding matches "state.light[n].ambient", |
| "state.light[n].diffuse", or "state.light[n].specular", the "x", "y", "z", |
| and "w" components of the program parameter variable are filled with the |
| "r", "g", "b", and "a" components, respectively, of the corresponding |
| light color. |
| |
| If a program parameter binding matches "state.light[n].position", the "x", |
| "y", "z", and "w" components of the program parameter variable are filled |
| with the "x", "y", "z", and "w" components, respectively, of the light |
| position. |
| |
| If a program parameter binding matches "state.light[n].attenuation", the |
| "x", "y", and "z" components of the program parameter variable are filled |
| with the constant, linear, and quadratic attenuation parameters of the |
| specified light, respectively (section 2.13.1). The "w" component of the |
| program parameter variable is filled with the spot light exponent of the |
| specified light. |
| |
| If a program parameter binding matches "state.light[n].spot.direction", |
| the "x", "y", and "z" components of the program parameter variable are |
| filled with the "x", "y", and "z" components of the spot light direction |
| of the specified light, respectively (section 2.13.1). The "w" component |
| of the program parameter variable is filled with the cosine of the spot |
| light cutoff angle of the specified light. |
| |
| If a program parameter binding matches "state.light[n].half", the "x", |
| "y", and "z" components of the program parameter variable are filled with |
| the x, y, and z components, respectively, of the normalized infinite |
| half-angle vector |
| |
| h_inf = || P + (0, 0, 1) ||. |
| |
| The "w" component is filled with 1.0. In the computation of h_inf, P |
| consists of the x, y, and z coordinates of the normalized vector from the |
| eye position P_e to the eye-space light position P_pli (section 2.13.1). |
| h_inf is defined to correspond to the normalized half-angle vector when |
| using an infinite light (w coordinate of the position is zero) and an |
| infinite viewer (v_bs is FALSE). For local lights or a local viewer, |
| h_inf is well-defined but does not match the normalized half-angle vector, |
| which will vary depending on the vertex position. |
| |
| If a program parameter binding matches "state.lightmodel.ambient", the |
| "x", "y", "z", and "w" components of the program parameter variable are |
| filled with the "r", "g", "b", and "a" components of the light model |
| ambient color, respectively. |
| |
| If a program parameter binding matches "state.lightmodel.scenecolor" or |
| "state.lightmodel.front.scenecolor", the "x", "y", and "z" components of |
| the program parameter variable are filled with the "r", "g", and "b" |
| components respectively of the "front scene color" |
| |
| c_scene = a_cs * a_cm + e_cm, |
| |
| where a_cs is the light model ambient color, a_cm is the front ambient |
| material color, and e_cm is the front emissive material color. The "w" |
| component of the program parameter variable is filled with the alpha |
| component of the front diffuse material color. If a program parameter |
| binding matches "state.lightmodel.back.scenecolor", a similar back scene |
| color, computed using back-facing material properties, is used. The front |
| and back scene colors match the values that would be assigned to vertices |
| using conventional lighting if all lights were disabled. |
| |
| If a program parameter binding matches anything beginning with |
| "state.lightprod[n]", the "x", "y", and "z" components of the program |
| parameter variable are filled with the "r", "g", and "b" components, |
| respectively, of the corresponding light product. The three light product |
| components are the products of the corresponding color components of the |
| specified material property and the light color of the specified light |
| (see Table X.4). The "w" component of the program parameter variable is |
| filled with the alpha component of the specified material property. |
| |
| Light products depend on material properties, which can be changed inside |
| a Begin/End pair. Such property changes are not guaranteed to take effect |
| until the following End command. Program parameter variables bound to |
| light products whose corresponding material property changes inside a |
| Begin/End pair are undefined until the following End command. |
| |
| |
| Texture Coordinate Generation Property Bindings |
| |
| Binding Components Underlying State |
| ------------------------- ---------- ---------------------------- |
| state.texgen[n].eye.s (a,b,c,d) TexGen eye linear plane |
| coefficients, s coord, unit n |
| state.texgen[n].eye.t (a,b,c,d) TexGen eye linear plane |
| coefficients, t coord, unit n |
| state.texgen[n].eye.r (a,b,c,d) TexGen eye linear plane |
| coefficients, r coord, unit n |
| state.texgen[n].eye.q (a,b,c,d) TexGen eye linear plane |
| coefficients, q coord, unit n |
| state.texgen[n].object.s (a,b,c,d) TexGen object linear plane |
| coefficients, s coord, unit n |
| state.texgen[n].object.t (a,b,c,d) TexGen object linear plane |
| coefficients, t coord, unit n |
| state.texgen[n].object.r (a,b,c,d) TexGen object linear plane |
| coefficients, r coord, unit n |
| state.texgen[n].object.q (a,b,c,d) TexGen object linear plane |
| coefficients, q coord, unit n |
| |
| Table X.5: Texture Coordinate Generation Property Bindings. "[n]" is |
| optional -- texture unit <n> is used if specified; texture unit 0 is |
| used otherwise. |
| |
| If a program parameter binding matches a set of TexGen plane coefficients, |
| the "x", "y", "z", and "w" components of the program parameter variable |
| are filled with the coefficients p1, p2, p3, and p4, respectively, for |
| object linear coefficients, and the coefficents p1', p2', p3', and p4', |
| respectively, for eye linear coefficients (section 2.10.4). |
| |
| |
| Fog Property Bindings |
| |
| Binding Components Underlying State |
| ----------------------------- ---------- ---------------------------- |
| state.fog.color (r,g,b,a) RGB fog color (section 3.10) |
| state.fog.params (d,s,e,r) fog density, linear start |
| and end, and 1/(end-start) |
| (section 3.10) |
| |
| Table X.6: Fog Property Bindings |
| |
| If a program parameter binding matches "state.fog.color", the "x", "y", |
| "z", and "w" components of the program parameter variable are filled with |
| the "r", "g", "b", and "a" components, respectively, of the fog color |
| (section 3.10). |
| |
| If a program parameter binding matches "state.fog.params", the "x", "y", |
| and "z" components of the program parameter variable are filled with the |
| fog density, linear fog start, and linear fog end parameters (section |
| 3.10), respectively. The "w" component is filled with 1/(end-start), |
| where end and start are the linear fog end and start parameters, |
| respectively. |
| |
| |
| Clip Plane Property Bindings |
| |
| Binding Components Underlying State |
| ----------------------------- ---------- ---------------------------- |
| state.clip[n].plane (a,b,c,d) clip plane n coefficients |
| |
| Table X.7: Clip Plane Property Bindings. <n> specifies the clip plane |
| number, and is required. |
| |
| If a program parameter binding matches "state.clip[n].plane", the "x", |
| "y", "z", and "w" components of the program parameter variable are filled |
| with the coefficients p1', p2', p3', and p4', respectively, of clip plane |
| <n> (section 2.11). |
| |
| |
| Point Property Bindings |
| |
| Binding Components Underlying State |
| ----------------------------- ---------- ---------------------------- |
| state.point.size (s,n,x,f) point size, min and max size |
| clamps, and fade threshold |
| (section 3.3) |
| state.point.attenuation (a,b,c,1) point size attenuation consts |
| |
| Table X.8: Point Property Bindings |
| |
| If a program parameter binding matches "state.point.size", the "x", "y", |
| "z", and "w" components of the program parameter variable are filled with |
| the point size, minimum point size, maximum point size, and fade |
| threshold, respectively (section 3.3). |
| |
| If a program parameter binding matches "state.point.attenuation", the "x", |
| "y", and "z" components of the program parameter variable are filled with |
| the constant, linear, and quadratic point size attenuation parameters (a, |
| b, and c), respectively (section 3.3). The "w" component is filled with |
| 1.0. |
| |
| |
| Texture Environment Property Bindings |
| |
| Binding Components Underlying State |
| ------------------------- ---------- ---------------------------- |
| state.texenv[n].color (r,g,b,a) texture environment n color |
| |
| Table X.9: Texture Environment Property Bindings. "[n]" is optional -- |
| texture unit <n> is used if specified; texture unit 0 is used otherwise. |
| |
| If a program parameter binding matches "state.texenv[n].color", the "x", |
| "y", "z", and "w" components of the program parameter variable are filled |
| with the "r", "g", "b", and "a" components, respectively, of the |
| corresponding texture environment color. Note that only "legacy" texture |
| units, as queried by MAX_TEXTURE_UNITS, include texture environment state. |
| Texture image units and texture coordinate sets do not have associated |
| texture environment state. |
| |
| |
| Depth Property Bindings |
| |
| Binding Components Underlying State |
| --------------------------- ---------- ---------------------------- |
| state.depth.range (n,f,d,1) Depth range near, far, and |
| (far-near) (section 2.10.1) |
| |
| Table X.10: Depth Property Bindings |
| |
| If a program parameter binding matches "state.depth.range", the "x" and |
| "y" components of the program parameter variable are filled with the |
| mappings of near and far clipping planes to window coordinates, |
| respectively. The "z" component is filled with the difference of the |
| mappings of near and far clipping planes, far minus near. The "w" |
| component is filled with 1.0. |
| |
| |
| Matrix Property Bindings |
| |
| Binding Underlying State |
| ------------------------------------ --------------------------- |
| * state.matrix.modelview[n] modelview matrix n |
| state.matrix.projection projection matrix |
| state.matrix.mvp modelview-projection matrix |
| * state.matrix.texture[n] texture matrix n |
| state.matrix.program[n] program matrix n |
| |
| Table X.11: Base Matrix Property Bindings. The "[n]" syntax indicates |
| a specific matrix number. For modelview and texture matrices, a matrix |
| number is optional, and matrix zero will be used if the matrix number is |
| omitted. These base bindings may further be modified by a |
| inverse/transpose selector and a row selector. |
| |
| If the beginning of a program parameter binding matches any of the matrix |
| binding names listed in Table X.11, the binding corresponds to a 4x4 |
| matrix. If the parameter binding is followed by ".inverse", ".transpose", |
| or ".invtrans" (<stateMatModifier> grammar rule), the inverse, transpose, |
| or transpose of the inverse, respectively, of the matrix specified in |
| Table X.11 is selected. Otherwise, the matrix specified in Table X.11 is |
| selected. If the specified matrix is poorly-conditioned (singular or |
| nearly so), its inverse matrix is undefined. The binding name |
| "state.matrix.mvp" refers to the product of modelview matrix zero and the |
| projection matrix, defined as |
| |
| MVP = P * M0, |
| |
| where P is the projection matrix and M0 is modelview matrix zero. |
| |
| If the selected matrix is followed by ".row[<a>]" (matching the |
| <stateMatrixRow> grammar rule), the "x", "y", "z", and "w" components of |
| the program parameter variable are filled with the four entries of row <a> |
| of the selected matrix. In the example, |
| |
| PARAM m0 = state.matrix.modelview[1].row[0]; |
| PARAM m1 = state.matrix.projection.transpose.row[3]; |
| |
| the variable "m0" is set to the first row (row 0) of modelview matrix 1 |
| and "m1" is set to the last row (row 3) of the transpose of the projection |
| matrix. |
| |
| For program parameter array bindings, multiple rows of the selected matrix |
| can be bound via the <stateMatrixRows> grammar rule. If the selected |
| matrix binding is followed by ".row[<a>..<b>]", the result is equivalent |
| to specifying matrix rows <a> through <b>, in order. A program will fail |
| to load if <a> is greater than <b>. If no row selection is specified |
| (<optMatrixRows> matches ""), matrix rows 0 through 3 are bound in order. |
| In the example, |
| |
| PARAM m2[] = { state.matrix.program[0].row[1..2] }; |
| PARAM m3[] = { state.matrix.program[0].transpose }; |
| |
| the array "m2" has two entries, containing rows 1 and 2 of program matrix |
| zero, and "m3" has four entries, containing all four rows of the transpose |
| of program matrix zero. |
| |
| |
| Section 2.X.3.4, Program Temporaries |
| |
| Program temporary variables are used to hold temporary results during |
| program execution. Temporaries do not persist between program |
| invocations, and are undefined at the beginning of each program |
| invocation. |
| |
| Temporary variables are declared explicitly using the <TEMP_statement> |
| grammar rule. Each such statement can declare one or more temporaries. |
| Temporaries can not be declared implicitly. Temporaries can be declared |
| using any component size ("SHORT" or "LONG") and type ("FLOAT" or "INT") |
| modifier. |
| |
| Temporary variables may be declared as arrays. Temporary variables |
| declared as arrays may be stored in slower memory than those not declared |
| as arrays, and it is recommended to use non-array variables unless array |
| functionality is required. |
| |
| |
| Section 2.X.3.5, Program Results |
| |
| Program result variables represent the per-vertex or per-fragment results |
| of the program. All result variables have associated bindings, are |
| write-only during program execution, and are undefined at the beginning of |
| each program invocation. Any vertex or fragment attributes corresponding |
| to unwritten result variables will be undefined in subsequent stages of |
| the pipeline. Result variables may be declared explicitly via the |
| <OUTPUT_statement> grammar rule, or implicitly by using a result binding |
| in an instruction. |
| |
| The set of available result bindings depends on the program type, and is |
| enumerated in the specifications for each program type. |
| |
| Result variables may generally be declared as arrays, but the set of |
| bindings allowed for arrays is limited to state grouped in arrays (e.g., |
| texture coordinates, clip distances, colors). Additionally, all bindings |
| assigned to the array must be of the same binding type and must increase |
| consecutively. Examples of valid and invalid binding lists for vertex |
| programs include: |
| |
| result.clip[1], result.clip[2] # valid, 2-entry array |
| result.texcoord[0..3] # valid, 4-entry array |
| result.texcoord[1], result.texcoord[3] # invalid, skipped texcoord 2 |
| result.texcoord[2], result.texcoord[1] # invalid, wrong order |
| result.texcoord[1], result.clip[2] # invalid, different types |
| |
| Additionally, result bindings may be used in no more than one array |
| addressed with relative addressing. |
| |
| Implementations may have a limit on the total number of result binding |
| components used by each program target (MAX_PROGRAM_RESULT_COMPONENTS_NV). |
| Programs that require more result binding components than this limit will |
| fail to load. The method of counting used result binding components is |
| implementation-dependent, but must satisfy the following properties: |
| |
| * If a result binding is not referenced in a program, or is referenced |
| only in declarations of result variables that are not used, none of |
| its components are counted. |
| |
| * A result binding component may be counted as used only if there exists |
| an instruction operand where |
| |
| - the component is enabled in the write mask (Section 2.X.4.3), and |
| |
| - the result binding is either |
| |
| - referenced directly by the operand, |
| |
| - bound to a declared variable referenced by the operand, or |
| |
| - bound to a declared array variable where another binding in |
| the array satisfies one of the two previous conditions. |
| |
| Implementations are not required to optimize out unused elements of an |
| result array or components that are used in only some elements of an |
| array. The last of these rules is intended to cover the case where |
| the same result binding is used in multiple variables. |
| |
| For example, an instruction whose write mask selects only the x |
| component may result in the x component of a result binding being |
| counted, but may never result in the counting of the y, z, or w |
| components of any result binding. |
| |
| |
| Section 2.X.3.6, Program Parameter Buffers |
| |
| Program parameter buffers are arrays consisting of single-component |
| typeless values or four-component typeless vectors stored in a buffer |
| object. The GL provides an implementation-dependent number of buffer |
| object binding points for each program target, to which buffer objects can |
| be attached. Program parameter buffer variables can be changed either by |
| updating the contents of bound buffer objects, or simply by changing the |
| buffer object attached to a binding point. |
| |
| Program parameter buffer variables are used as constants during program |
| execution. All program parameter buffer variables have an associated |
| binding and are read-only during program execution. Program parameter |
| buffers retain their values across program invocations, although their |
| values may change as buffer object bindings or contents change. Program |
| parameter buffer variables must be declared explicitly via the |
| <BUFFER_statement> grammar rule. Program parameter buffer bindings can |
| not be used directly in executable instructions. |
| |
| Program parameter buffer variables are treated as an array of |
| single-component values if the <bufferDeclType> grammar rule matches |
| "BUFFER" or as an array of four-component vectors if it matches "BUFFER4". |
| A program will fail to load if a variable declared as "BUFFER" and another |
| variable declared as "BUFFER4" use the same buffer binding point. |
| |
| Program parameter buffer variables may be declared as arrays, but all |
| bindings assigned to the array must use the same binding point and must |
| increase consecutively. |
| |
| Binding Components Underlying State |
| ----------------------------- ---------- ----------------------------- |
| program.buffer[a][b] (x,x,x,x) program parameter buffer a, |
| element b |
| program.buffer[a][b..c] (x,x,x,x) program parameter buffer a, |
| elements b through c |
| program.buffer[a] (x,x,x,x) program parameter buffer a, |
| all elements |
| |
| Table X.12: Program Parameter Buffer Bindings. <a> indicates a buffer |
| number, <b> and <c> indicate individual elements. |
| |
| If a program parameter buffer binding matches "program.buffer[a][b]", the |
| program parameter variable are filled with element <b> of the buffer |
| object bound to binding point <a>. Each element of the bound buffer |
| object is treated a one or four words of data that can hold integer or |
| floating-point values. When a single-component binding is evaluated, the |
| selected word is broadcast to all four components of the variable. When a |
| four-component binding is evaluated, the four components of the buffer |
| element are loaded into the variable. If no buffer object is bound to |
| binding point <a>, or the bound buffer object is not large enough to hold |
| an element <b>, the values used are undefined. The binding point <a> must |
| be a nonnegative integer constant. |
| |
| For program parameter buffer array declarations, "program.buffer[a][b..c]" |
| is equivalent to specifying elements <b> through <c> of the buffer object |
| bound to binding point <a> in order. |
| |
| For program parameter buffer array declarations, "program.buffer[a]" is |
| equivalent to specifying the entire buffer -- elements 0 through <n>-1, |
| where <n> is either the size of the array (if declared) or the |
| implementation-dependent maximum parameter buffer object size limit (if no |
| size is declared). |
| |
| |
| Section 2.X.3.7, Program Condition Code Registers |
| |
| The program condition code registers are four-component vectors. Each |
| component of this register is a collection of single-bit flags, including |
| a sign flag (SF), a zero flag (ZF), an overflow flag (OF), and a carry |
| flag (CF). There are two condition code registers (CC0 and CC1), whose |
| values are undefined at the beginning of program execution. |
| |
| Most program instructions can optionally update one of the condition code |
| registers, by designating the condition code to update in the instruction. |
| When a condition code component is updated, the four flags of each |
| component of the condition code are set according to the corresponding |
| component of the instruction result. Full details on the condition code |
| updates and tests can be found in Section 2.X.4.3. |
| |
| The value of these four flags can be combined in various condition code |
| tests, which can be used to mask writes to destination variables and to |
| perform conditional branches or other condition operations. |
| |
| |
| Section 2.X.3.8, Program Aliases |
| |
| Programs can create aliases by matching the <ALIAS_statement> grammar |
| rule. Aliases allow programs to use multiple variable names to refer to a |
| single underlying variable. For example, the statement |
| |
| ALIAS var1 = var0 |
| |
| establishes a variable name of "var1". Subsequent references to "var1" in |
| the program text are treated as references to "var0". The left hand side |
| of an ALIAS statement must be a new variable name, and the right hand side |
| must be an established variable name. |
| |
| Aliases are not considered variable declarations, so do not count against |
| the limits on the number of variable declarations allowed in the program |
| text. |
| |
| |
| Section 2.X.3.9, Program Resource Limits |
| |
| (see ARB_vertex_program specification, incorporates all the different |
| limits on instruction counts, temporaries, attribute bindings, program |
| parameters, and so on) |
| |
| |
| Section 2.X.4, Program Execution Environment |
| |
| The set of instructions supported for GPU programs is given in Table X.13 |
| below and is described in detail in Section 2.X.8. An instruction can use |
| up to three operands when it executes, and most instructions can write a |
| single result vector. Instructions may also specify one or more |
| modifiers, according to the <opModifiers> grammar rule. Instruction |
| modifiers affect how the specified operation is performed. |
| |
| GPU programs may operate on signed integer, unsigned integer, or |
| floating-point values; some instructions are capable of operating on any |
| of the three types. However, the data type of the operands and the result |
| are always determined based solely on the instruction and its modifiers. |
| If any of the variables used in the instruction are typeless, they will be |
| interpreted according to the data type derived from the instruction. If |
| any variables with a conflicting data type are used in the instruction, |
| the program will fail to load unless the "NTC" (no type checking) |
| instruction modifier is specified. |
| |
| Modifiers |
| Instruction F I C S H D Out Inputs Description |
| ----------- - - - - - - --- -------- -------------------------------- |
| ABS X X X X X F v v absolute value |
| ADD X X X X X F v v,v add |
| AND - X X - - S v v,v bitwise and |
| BRK - - - - - - - c break out of loop instruction |
| CAL - - - - - - - c subroutine call |
| CEIL X X X X X F v vf ceiling |
| CMP X X X X X F v v,v,v compare |
| CONT - - - - - - - c continue with next loop interation |
| COS X - X X X F s s cosine with reduction to [-PI,PI] |
| DIV X X X X X F v v,s divide vector components by scalar |
| DP2 X - X X X F s v,v 2-component dot product |
| DP2A X - X X X F s v,v,v 2-comp. dot product w/scalar add |
| DP3 X - X X X F s v,v 3-component dot product |
| DP4 X - X X X F s v,v 4-component dot product |
| DPH X - X X X F s v,v homogeneous dot product |
| DST X - X X X F v v,v distance vector |
| ELSE - - - - - - - - start if test else block |
| ENDIF - - - - - - - - end if test block |
| ENDREP - - - - - - - - end of repeat block |
| EX2 X - X X X F s s exponential base 2 |
| FLR X X X X X F v vf floor |
| FRC X - X X X F v v fraction |
| I2F - X X - - S vf v integer to float |
| IF - - - - - - - c start of if test block |
| KIL X X - - X F - vc kill fragment |
| LG2 X - X X X F s s logarithm base 2 |
| LIT X - X X X F v v compute lighting coefficients |
| LRP X - X X X F v v,v,v linear interpolation |
| MAD X X X X X F v v,v,v multiply and add |
| MAX X X X X X F v v,v maximum |
| MIN X X X X X F v v,v minimum |
| MOD - X X - - S v v,s modulus vector components by scalar |
| MOV X X X X X F v v move |
| MUL X X X X X F v v,v multiply |
| NOT - X X - - S v v bitwise not |
| NRM X - X X X F v v normalize 3-component vector |
| OR - X X - - S v v,v bitwise or |
| PK2H X X - - - F s vf pack two 16-bit floats |
| PK2US X X - - - F s vf pack two floats as unsigned 16-bit |
| PK4B X X - - - F s vf pack four floats as signed 8-bit |
| PK4UB X X - - - F s vf pack four floats as unsigned 8-bit |
| POW X - X X X F s s,s exponentiate |
| RCC X - X X X F s s reciprocal (clamped) |
| RCP X - X X X F s s reciprocal |
| REP X X - - X F - v start of repeat block |
| RET - - - - - - - c subroutine return |
| RFL X - X X X F v v,v reflection vector |
| ROUND X X X X X F v vf round to nearest integer |
| RSQ X - X X X F s s reciprocal square root |
| SAD - X X - - S vu v,v,vu sum of absolute differences |
| SCS X - X X X F v s sine/cosine without reduction |
| SEQ X X X X X F v v,v set on equal |
| SFL X X X X X F v v,v set on false |
| SGE X X X X X F v v,v set on greater than or equal |
| SGT X X X X X F v v,v set on greater than |
| SHL - X X - - S v v,s shift left |
| SHR - X X - - S v v,s shift right |
| SIN X - X X X F s s sine with reduction to [-PI,PI] |
| SLE X X X X X F v v,v set on less than or equal |
| SLT X X X X X F v v,v set on less than |
| SNE X X X X X F v v,v set on not equal |
| SSG X - X X X F v v set sign |
| STR X X X X X F v v,v set on true |
| SUB X X X X X F v v,v subtract |
| SWZ X - X X X F v v extended swizzle |
| TEX X X X X - F v vf texture sample |
| TRUNC X X X X X F v vf truncate (round toward zero) |
| TXB X X X X - F v vf texture sample with bias |
| TXD X X X X - F v vf,vf,vf texture sample w/partials |
| TXF X X X X - F v vs texel fetch |
| TXL X X X X - F v vf texture sample w/LOD |
| TXP X X X X - F v vf texture sample w/projection |
| TXQ - - - - - S vs vs texture info query |
| UP2H X X X X - F vf s unpack two 16-bit floats |
| UP2US X X X X - F vf s unpack two unsigned 16-bit ints |
| UP4B X X X X - F vf s unpack four signed 8-bit ints |
| UP4UB X X X X - F vf s unpack four unsigned 8-bit ints |
| X2D X - X X X F v v,v,v 2D coordinate transformation |
| XOR - X X - - S v v,v exclusive or |
| XPD X - X X X F v v,v cross product |
| |
| Table X.13: Summary of NV_gpu_program4 instructions. The "Modifiers" |
| columns specify the set of modifiers allowed for the instruction: |
| |
| F = floating-point data type modifiers |
| I = signed and unsigned integer data type modifiers |
| C = condition code update modifiers |
| S = clamping (saturation) modifiers |
| H = half-precision float data type suffix |
| D = default data type modifier (F, U, or S) |
| |
| The input and output columns describe the formats of the operands and |
| results of the instruction. |
| |
| v: 4-component vector (data type is inherited from operation) |
| vf: 4-component vector (data type is always floating-point) |
| vs: 4-component vector (data type is always signed integer) |
| vu: 4-component vector (data type is always unsigned integer) |
| s: scalar (replicated if written to a vector destination; |
| data type is inherited from operation) |
| c: condition code test result (e.g., "EQ", "GT1.x") |
| vc: 4-component vector or condition code test |
| |
| |
| Section 2.X.4.1, Program Instruction Modifiers |
| |
| There are several types of instruction modifiers available. A data type |
| modifier specifies that an instruction should operate on signed integer, |
| unsigned integer, or floating-point data, when multiple data types are |
| supported. A clamping modifier applies to instructions with |
| floating-point results, and specifies the range to which the results |
| should be clamped. A condition code update modifier specifies that the |
| instruction should update one of the condition code variables. Several |
| other special modifiers are also provided. |
| |
| Instruction modifiers may be specified as stand-alone modifiers or as |
| suffixes concatenated with the opcode name. A program will fail to load |
| if it contains an instruction that |
| |
| * specifies more than one modifier of any given type, |
| |
| * specifies a clamping modifier on an instruction, unless it produces |
| floating-point results, or |
| |
| * specifies a modifier that is not supported by the instruction (see |
| Table X.13 and the instruction description). |
| |
| Stand-alone instruction modifiers are specified according to the |
| <opModifiers> grammar rule using a ".<modifier>" syntax. Multiple |
| modifers, separated by periods, may be specified. The set of supported |
| modifiers is described in Table X.14. |
| |
| Modifier Description |
| -------- ----------------------------------------------- |
| F Floating-point operation |
| U Fixed-point operation, unsigned operands |
| S Fixed-point operation, signed operands |
| CC Update condition code register zero |
| CC0 Update condition code register zero |
| CC1 Update condition code register one |
| SAT Floating-point results clamped to [0,1] |
| SSAT Floating-point results clamped to [-1,1] |
| NTC Disable type-checking on operands/results |
| S24 Signed multiply (24-bit operands) |
| U24 Unsigned multiply (24-bit operands) |
| HI Multiplies two 32-bit integer operands, returns |
| the 32 MSBs of the product |
| |
| Table X.14, Instruction Modifers. |
| |
| "F", "U", and "S" modifiers are data type modifiers and specify that the |
| instruction should operate on floating-point, unsigned integer, or |
| signed integer values, respectively. For example, "ADD.F", "ADD.U", and |
| "ADD.S" specify component-wise addition of floating-point, unsigned |
| integer, or signed integer vectors, respectively. These modifiers specify |
| a data type, but do not specify a precision at which the operation is |
| performed. Floating-point operations will be carried out with an internal |
| precision no less than that used to represent the largest operand. |
| Fixed-point operations will be carried out using at least as many bits as |
| used to represent the largest operand. Operands represented with fewer |
| bits than used to perform the instruction will be promoted to a larger |
| data type. Signed integer operands will be sign-extended, where the most |
| significant bits are filled with ones if the operand is negative and zero |
| otherwise. Unsigned integer operands will be zero-extended, where the |
| most significant bits are always filled with zeroes. For some |
| instructions, the data type of some operands or the result are fixed; in |
| these cases, the data type modifier specifies the data type of the |
| remaining values. |
| |
| "CC", "CC0", and "CC1" are condition code update modifiers that specify |
| that one of the condition code registers should be updated based on the |
| result of the instruction, as described in section 2.X.4.3. "CC" and |
| "CC0" specify that the condition code register CC0 be updated; "CC1" |
| specifies an update to CC1. If no condition code update modifier is |
| provided, the condition code registers will not be affected. |
| |
| "SAT" and "SSAT" are clamping modifiers that specify that the |
| floating-point components of the instruction result should be clamped to |
| [0,1] or [-1,1], respectively, before updating the condition code and the |
| destination variable. If no clamping suffix is specified, unclamped |
| results will be used for condition code updates (if any) and destination |
| variable writes. Clamping modifiers are not supported on instructions |
| that do not produce floating-point results. |
| |
| "NTC" (no type checking) disables data type checking on the instruction, |
| and allows instructions to use operands or result variables whose data |
| types are inconsistent with the expected data types of the instruction. |
| |
| "S24", "U24", and "HI" are special modifiers that are allowed only for the |
| MUL instruction, and are described in detail where MUL is documented. No |
| more than one such modifier may be provided for any instruction. |
| |
| If an instruction supports data type modifiers, but none is provided, a |
| default data type will be chosen based on the instruction, as specified in |
| Table X.13 and the instruction set description (Section 2.X.8). If |
| condition code update or clamping modifiers are not specified, the |
| corresponding operation will not be performed. |
| |
| Additionally, each instruction name may have one or more suffixes, |
| concatenated onto the base instruction name, that operate as instruction |
| modifiers. For conciseness, these suffixes are not spelled out in the |
| grammar -- the base opcode name is used as a placeholder for the opcode |
| and all of its possible suffixes. Instruction suffixes are provided |
| mainly for compatibility with prior GPU program instruction sets (e.g., |
| NV_vertex_program3, NV_fragment_program2, and predecessors). The set of |
| allowable suffixes, and their equivalent stand-alone modifiers, are listed |
| in Table X.15. |
| |
| Suffix Modifier Description |
| ------ ---------- --------------------------------------------------- |
| R F Floating-point operation, 32-bit precision |
| H F(*) Floating-point operation, at least 16-bit precision |
| C CC0 Update condition code register zero |
| C0 CC0 Update condition code register zero |
| C1 CC1 Update condition code register one |
| _SAT SAT Floating-point results clamped to [0,1] |
| _SSAT SSAT Floating-point results clamped to [-1,1] |
| |
| Table X.15, Instruction Suffixes. |
| |
| The "R" and "H" suffixes specify floating-point operations and are |
| equivalent to the "F" data type modifier. They additionally specify a |
| minimum precision for the operations. Instructions with an "R" precision |
| modifier will be carried out at no less than IEEE single-precision |
| floating-point (8 bits of exponent, 23 bits of mantissa). Instructions |
| with an "H" precision modifier will be carried out at no less than 16-bit |
| floating-point precision (5 bits of exponent, 10 bits of mantissa). |
| |
| An instruction may have multiple suffixes, but they must appear in order, |
| with data type suffixes first, followed by condition code update suffixes, |
| followed by clamping suffixes. For example, "ADDR" carries out an add at |
| 32-bit precision. "ADDH_SAT" carries out an add at 16-bit precision (or |
| better) and clamps the results to [0,1]. "ADDRC1_SSAT" carries out an add |
| at 32-bit floating-point precision, clamps the results to [-1,1], and |
| updates condition code one based on the clamped result. |
| |
| |
| Section 2.X.4.2, Program Operands |
| |
| Most program instructions operate on one or more scalar or vector |
| operands. Each operand specifies an operand variable, which is either the |
| name of a previously declared variable or an implicit variable declaration |
| created by using a variable binding in the instruction. Attribute, |
| parameter, or parameter buffer variables can be declared implicitly by |
| using a valid binding name in an operand. Instruction operands are |
| specified by the <instOperandV>, <instOperandS>, or <instOperandVNS> |
| grammar rules. |
| |
| If the operand variable is not an array, its contents are loaded directly. |
| If the operand variable is an array, a single element of the array is |
| loaded according to the <arrayMem> grammar rule. The elements of an array |
| are numbered from 0 to <n>-1, where <n> is the number of entries in the |
| array. Array members can be accessed using either absolute or relative |
| addressing. |
| |
| Absolute array addressing is used when the <arrayMemAbs> grammar rule is |
| matched; the array member to load is specified by the matching integer. |
| Out-of-bounds array absolute accesses are not allowed. If the specified |
| member number is greater than or equal to the size of the array, the |
| program will fail to load. |
| |
| Relative array addressing is used when the <arrayMemRel> grammar rule is |
| matched. This grammar rule allows the program to specify a scalar integer |
| operand and an optional constant offset, according to the <arrayMemReg> |
| and <arrayMemOffset> grammar rules. When performing relative addressing, |
| the GL evaluates the specified integer scalar operand (according to the |
| rules specified in this section) and adds the constant offset. The array |
| member loaded is given by this sum. The constant offset is considered |
| zero if an offset is omitted. If the sum is negative or exceeds the size |
| of the array, the results of the access are undefined, but may not lead to |
| program or GL termination. The set of constant offsets supported for |
| relative addressing is limited to values in the range [0,<n>-1], where <n> |
| is the size of the array. A program will fail to load if it specifies an |
| offset outside that range. If offsets outside that range are required, |
| they can be applied by using an integer ADD instruction writing to a |
| temporary variable. |
| |
| After the operand is loaded, its components can be rearranged according to |
| the <swizzleSuffix> grammar rule, or it can be converted to a scalar |
| operand according to the <scalarSuffix> grammar rule. |
| |
| The <swizzleSuffix> grammar rule rearranges the components of a loaded |
| vector to produce another vector. If the <swizzleSuffix> rule matches the |
| <xyzwSwizzle> or <rgbaSwizzle> grammar rule, a pattern of the form ".????" |
| is used, where each question mark is replaced with one of "x", "y", "z", |
| "w", "r", "g", "b", or a". For such patterns, the x, y, z, and w |
| components of the operand are taken from the vector components named by |
| the first, second, third, and fourth character of the pattern, |
| respectively. Swizzle components of "r", "g", "b", and "a" are equivalent |
| to "x", "y", "z", and "w", respectively. For example, if the swizzle |
| suffix is ".yzzx" or ".gbbr" and the specified source contains {2,8,9,0}, |
| the result is the vector {8,9,9,2}. If the <swizzleSuffix> matches the |
| <component> grammar rule, a pattern of the form ".?" is used. For this |
| pattern, all four components of the operand are taken from the single |
| component identified by the pattern. If the swizzle suffix is omitted, |
| components are not rearranged and swizzling has no effect, as though |
| ".xyzw" were specified. |
| |
| The swizzle suffix rules do not allow mixing "x", "y", "z", or "w" |
| selectors with "r", "g", "b", or "a" selectors. A program will fail to |
| load if it contains a swizzle suffix with selectors from both of these |
| sets. |
| |
| The <scalarSuffix> grammar rule converts a vector to a scalar by selecting |
| a single component. The <scalarSuffix> rule is similar to the swizzle |
| selector, except that only a single component is selected. If the scalar |
| suffix is ".y" and the specified source contains {2,8,9,0}, the value is |
| the scalar value 8. |
| |
| Next, a component-wise negate operation is performed on the operand if the |
| <operandNeg> grammar rule matches "-". Negation is not performed if the |
| operand has no sign prefix, or is prefixed with "+". For unsigned integer |
| operands, the negate operand performs a two's complement operation. |
| |
| Next, a component-wise absolute value operation is performed on the |
| operand if the <instOperandAbsV> or <instOperandAbsS> grammar rule is |
| matched, by surrounding the operand with two "|" characters. The result |
| is optionally negated if the <operandAbsNeg> grammar rule matches "-". |
| For unsigned integer operands, the absolute value operation has no effect. |
| |
| |
| Section 2.X.4.3, Program Destination Variable Update |
| |
| Most program instructions perform computations that produce a result, |
| which will be written to a variable. Each instruction that computes a |
| result specifies a destination variable, which is either the name of a |
| previously declared variable or an implicit variable declaration created |
| by using a variable binding in the instruction. Result variables can be |
| declared implicitly by using a valid program result binding name in the |
| result portion of the instruction. Instruction results are specified |
| according to the <instResult> grammar rule. |
| |
| The destination variable may be a single member of an array. In this |
| case, a single array member is specified using the <arrayMem> grammar |
| rule, and the array member to update is computed in the exact same manner |
| as done for operand loads. If the array member is computed at run time, |
| and is negative or greater than or equal to the size of the array, the |
| results of the destination variable update are undefined and could result |
| in overwriting other program variables. |
| |
| The results of the operation may be obtained at a different precision than |
| that used to store the destination variable. If so, the results are |
| converted to match the size of the destination variable. For |
| floating-point values, the results are rounded to the nearest |
| floating-point value that can be represented in the destination variable. |
| If a result component is larger in magnitude than the largest |
| representable floating-point value in the data type of the destination |
| variable, an infinity encoding (+/-INF) is used. Signed or unsigned |
| integer values are sign-extended or zero-extended, respectively, if the |
| destination variable has more bits than the result, and have their most |
| significant bits discarded if the destination variable has fewer bits. |
| |
| Writes to individual components of a vector destination variable can be |
| controlled at compile time by individual component write masks specified |
| in the instruction. The component write mask is specified by the |
| <optWriteMask> grammar rule, and is a string of up to four characters, |
| naming the components to enable for writing. If no write mask is |
| specified, all components are enabled for writing. The characters "x", |
| "y", "z", and "w" match the x, y, z, and w components respectively. For |
| example, a write mask mask of ".xzw" indicates that the x, z, and w |
| components should be enabled for writing but the y component should not be |
| written. The grammar requires that the destination register mask |
| components must be listed in "xyzw" order. Additionally, write mask |
| components of "r", "g", "b", and "a" are equivalent to "x", "y", "z", and |
| "w", respectively. The grammar does not allow mixing "x", "y", "z", or |
| "w" components with "r", "g", "b", and "a" ones. |
| |
| Writes to individual components of a vector destination variable, or to a |
| scalar destination variable, can also be controlled at run time using |
| condition code write masks. The condition code write mask is specified by |
| the <ccMask> grammar rule. If a mask is specified, a condition code |
| variable is loaded according to the <ccMaskRule> grammar rule and tested |
| as described in Table X.16 to produce a four-component vector of TRUE/FALSE |
| values. |
| |
| mask rule test name condition |
| --------------- ---------------------- ----------------- |
| EQ, EQ0, EQ1 equal !SF && ZF |
| GE, GE0, GE1 greater than or equal !(SF ^ OF) |
| GT, GT0, GT1 greater than (!SF ^ OF) && !ZF |
| LE, LE0, LE1 less than or equal SF ^ (ZF || OF) |
| LT, LT0, LT1 less than (SF && !ZF) ^ OF |
| NE, NE0, NE1 not equal SF || !ZF |
| FL, FL0, FL1 false always false |
| TR, TR0, TR1 true always true |
| |
| NAN, NAN0, NAN1 not a number SF && ZF |
| LEG, LEG0, LEG1 less, equal, or greater !SF || !ZF |
| (anything but a NaN) |
| |
| CF, CF0, CF1 carry flag CF |
| NCF, NCF0, NCF1 no carry flag !CF |
| OF, OF0, OF1 overflow flag OF |
| NOF, NOF0, NOF1 no overflow flag !OF |
| SF, SF0, SF1 sign flag SF |
| NSF, NSF0, NSF1 no sign flag !SF |
| AB, AB0, AB1 above CF && !ZF |
| BLE, BLE0, BLE1 below or equal !CF || ZF |
| |
| Table X.16, Condition Code Tests. The allowed rules are specified in |
| the "mask rule" column. If "0" or "1" is appended to the rule name |
| (e.g., "EQ1"), the corresponding condition code register (CC1 in this |
| example) is loaded, otherwise CC0 is loaded. After loading, each |
| component is tested, using the expression listed in the "condition" |
| column. |
| |
| After the condition code tests are performed, the four-component result |
| can be swizzled according to the <swizzleSuffix> grammar rule. Individual |
| components of the destination variable are written only if the |
| corresponding component of the swizzled condition code test result is |
| TRUE. If both a (compile-time) component write mask and a condition code |
| write mask are specified, destination variable components are written only |
| if the corresponding component is enabled in both masks. |
| |
| A program instruction can also optionally update one of the two condition |
| code registers if the "CC", "CC0", or "CC1" instruction modifier are |
| specified. These instruction modifiers update condition code register |
| CC0, CC0, or CC1, respectively. The instructions "ADD.CC" or "ADD.CC0" |
| will perform an add and update condition code zero, "ADD.CC1" will add and |
| update condition code one, and "ADD" will simply perform the add without a |
| condition code update. The components of the selected condition code |
| register are updated if and only if the corresponding component of the |
| destination variable are enabled by both write masks. For the purposes of |
| condition code update, a scalar destination variable is treated as a |
| vector where the scalar result is written to "x" (if enabled in the write |
| mask), and writes to the "y", "z", and "w" components are disabled. |
| |
| When condition code components are written, the condition code flags are |
| updated based on the corresponding component of the result. If a |
| component of the destination register is not enabled for writes, the |
| corresponding condition code component is also unchanged. |
| |
| For floating-point results, the sign flag (SF) is set if the result is |
| less than zero or is a NaN (not a number) value. The zero flag (ZF) is |
| set if the result is equal to zero or is a NaN. |
| |
| For signed and unsigned integer results, the sign flag (SF) is set if the |
| most significant bit of the value written to the result variable is set |
| and the zero flag (ZF) is set if the result written is zero. For |
| instructions other than those performing an integer add or subtract (ADD, |
| MAD, SAD, SUB), the overflow and carry flags (OF and CF) are cleared. |
| |
| For integer add or subtract operations, the overflow and carry flags by |
| doing both signed and unsigned adds/subtracts as follows: |
| |
| The overflow flag (OF) is set by interpreting the two operands as signed |
| integers and performing a signed add or subtract. If the result is |
| representable as a signed integer (i.e., doesn't overflow), the overflow |
| flag is cleared; otherwise, it is set. |
| |
| The carry flag (CF) is set by interpreting the two operands as unsigned |
| integers and performing an unsigned add or subtract. If the result of |
| an add is representable as an unsigned integer (i.e., doesn't overflow), |
| the carry flag is cleared; otherwise, it is set. If the result of a |
| subtract is greater than or equal to zero, the carry flag is set; |
| otherwise, it is cleared. |
| |
| For the purposes of condition code setting, negation modifiers turn add |
| operations into subtracts and vice versa. If the operation is equivalent |
| to an add with both operands negated (-A-B), the carry and overflow flags |
| are both undefined. |
| |
| |
| Section 2.X.4.4, Program Texture Access |
| |
| Certain program instructions may access texture images, as described in |
| section 3.8. The coordinates, level-of-detail, and partial derivatives |
| used for performing the texture lookup are derived from values provided in |
| the program as described in the various sub-sections of Section 2.X.8. |
| These descriptions use the function |
| |
| result_t_vec |
| TextureSample(float_vec coord, float lod, float_vec ddx, |
| float_vec ddy, int_vec offset); |
| |
| which obtains a filtered texel value <tau> as described in Section 3.8.8 |
| and returns a 4-component vector (R,G,B,A) according to the format |
| conversions specified in Table 3.21. The result vector is interpreted as |
| floating-point, signed integer, or unsigned integer, according to the data |
| type modifier of the instruction. If the internal format of the texture |
| does not match the instruction's data type modifer, the results of the |
| texture lookup are undefined. |
| |
| (Note: For unextended OpenGL 2.0, all supported texture internal formats |
| store integer values but return floating-point results in the range [0,1] |
| on a texture lookup. The ARB_texture_float extension introduces |
| floating-point internal format where components are both stored and |
| returned as floating-point values. The EXT_texture_integer extension |
| introduces formats that both store and return either signed or unsigned |
| integer values.) |
| |
| <coord> is a four-component floating-point vector from which the (s,t,r) |
| texture coordinates used for the texture access, the layer used for array |
| textures, and the reference value used for depth comparisons (section |
| 3.8.14) are extracted according to Table X.17. If the texture is a cube |
| map, (s,t,r) is projected to one of the six cube faces to produce a new |
| (s,t) vector according to Section 3.8.6. For array textures, the layer |
| used is derived by rounding the extracted floating-point component to the |
| nearest integer and clamping the result to the range [0,<n>-1], where <n> |
| is the number of layers in the texture. |
| |
| <lod> specifies the level of detail parameter and replaces the value |
| computed in equation 3.18. <ddx> and <ddy> specify partial derivatives |
| (ds/dx, dt/dx, dr/dx, ds/dy, dt/dy, and dr/dy) for the texture |
| coordinates, and may be used to derive footprint shapes for anisotropic |
| texture filtering. |
| |
| <offset> is a constant 3-component signed integer vector specified |
| according to the <texOffset> grammar rule, which is added to the computed |
| <u>, <v>, and <w> texel locations prior to sampling. One, two, or three |
| components may be specified in the instruction; if fewer than three are |
| specified, the remaining offset components are zero. A limited range of |
| offset values are supported; the minimum and maximum <texOffset> values |
| are implementation-dependent and given by MIN_PROGRAM_TEXEL_OFFSET_EXT and |
| MAX_PROGRAM_TEXEL_OFFSET_EXT, respectively. A program will fail to load: |
| |
| * if the texture target specified in the instruction is 1D, ARRAY1D, |
| SHADOW1D, or SHADOWARRAY1D, and the second or third component of the |
| offset vector is non-zero, |
| |
| * if the texture target specified in the instruction is 2D, RECT, |
| ARRAY2D, SHADOW2D, SHADOWRECT, or SHADOWARRAY2D, and the third |
| component of the offset vector is non-zero, |
| |
| * if the texture target is CUBE or SHADOWCUBE, and any component of the |
| offset vector is non-zero -- texel offsets are not supported for cube |
| map or buffer textures, or |
| |
| * if any component of the offset vector is less than |
| MIN_PROGRAM_TEXEL_OFFSET_EXT or greater than |
| MAX_PROGRAM_TEXEL_OFFSET_EXT. |
| |
| (NOTE: Texel offsets are a new feature provided by this extension and are |
| described in more detail in edits to Section 3.8 below.) |
| |
| The texture used by TextureSample() is one of the textures bound to the |
| texture image unit whose number is specified in the instruction according |
| to the <texImageUnit> grammar rule. The texture target accessed is |
| specified according to the <texTarget> grammar rule and Table X.17. |
| Fixed-function texture enables are always ignored when determining the |
| texture to access in a program. |
| |
| coordinates used |
| texTarget Texture Type s t r layer shadow |
| ---------------- --------------------- ----- ----- ------ |
| 1D TEXTURE_1D x - - - - |
| 2D TEXTURE_2D x y - - - |
| 3D TEXTURE_3D x y z - - |
| CUBE TEXTURE_CUBE_MAP x y z - - |
| RECT TEXTURE_RECTANGLE_ARB x y - - - |
| ARRAY1D TEXTURE_1D_ARRAY_EXT x - - y - |
| ARRAY2D TEXTURE_2D_ARRAY_EXT x y - z - |
| SHADOW1D TEXTURE_1D x - - - z |
| SHADOW2D TEXTURE_2D x y - - z |
| SHADOWRECT TEXTURE_RECTANGLE_ARB x y - - z |
| SHADOWCUBE TEXTURE_CUBE_MAP x y z - w |
| SHADOWARRAY1D TEXTURE_1D_ARRAY_EXT x - - y z |
| SHADOWARRAY2D TEXTURE_2D_ARRAY_EXT x y - z w |
| BUFFER TEXTURE_BUFFER_EXT <not supported> |
| |
| Table X.17: Texture types accessed for each of the <texTarget>, and |
| coordinate mappings. The "SHADOW" and "ARRAY" targets are special |
| pseudo-targets described below. The "coordinates used" column indicate |
| the input values used for each coordinate of the texture lookup, the |
| layer selector for array textures, and the reference value for texture |
| comparisons. Buffer textures are not supported by normal texture lookup |
| functions, but are supported by TXF and TXQ, described below. |
| |
| Texture targets with "SHADOW" are used to access textures with a |
| DEPTH_COMPONENT base internal format using depth comparisons (Section |
| 3.8.14). Results of a texture access are undefined: |
| |
| * if a "SHADOW" target is used, and the corresponding texture has a base |
| internal format other than DEPTH_COMPONENT or a TEXTURE_COMPARE_MODE |
| of NONE, or |
| |
| * if a non-"SHADOW" target is used, and the corresponding texture has a |
| base internal format of DEPTH_COMPONENT and a TEXTURE_COMPARE_MODE |
| other than NONE. |
| |
| If the texture being accessed is not complete (or cube complete for |
| cubemap textures), no texture access is performed and the result is |
| undefined. |
| |
| A program will fail to load if it attempts to sample from multiple texture |
| targets (including the SHADOW pseudo-targets) on the same texture image |
| unit. For example, a program containing any two the following |
| instructions will fail to load: |
| |
| TEX out, coord, texture[0], 1D; |
| TEX out, coord, texture[0], 2D; |
| TEX out, coord, texture[0], ARRAY2D; |
| TEX out, coord, texture[0], SHADOW2D; |
| TEX out, coord, texture[0], 3D; |
| |
| Additionally, multiple texture targets for a single texture image unit may |
| not be used at the same time by the GL. The error INVALID_OPERATION is |
| generated by Begin, RasterPos, or any command that performs an implicit |
| Begin if an enabled program accesses one texture target for a texture unit |
| while another enabled program or fixed-function fragment processing |
| accesses a different texture target for the same texture image unit. |
| |
| Some texture instructions use standard methods to compute partial |
| derivatives and/or the level-of-detail used to perform texture accesses. |
| For fragment programs, the functions |
| |
| float_vec ComputePartialsX(float_vec coord); |
| float_vec ComputePartialsY(float_vec coord); |
| |
| compute approximate component-wise partial derivatives of the |
| floating-point vector <coord> relative to the X and Y coordinates, |
| respectively. For vertex and geometry programs, these functions always |
| return (0,0,0,0). The function |
| |
| float ComputeLOD(float_vec ddx, float_vec ddy); |
| |
| maps partial derivative vectors <ddx> and <ddy> to ds/dx, dt/dx, dr/dx, |
| ds/dy, dt/dy, and dr/dy and computes lambda_base(x,y) according to |
| equation 3.18. |
| |
| The TXF instruction provides the ability to extract a single texel from a |
| specified texture image using the function |
| |
| result_t_vec TexelFetch(int_vec coord, int_vec offset); |
| |
| The extracted texel is converted to an (R,G,B,A) vector according to Table |
| 3.21. The result vector is interpreted as floating-point, signed integer, |
| or unsigned integer, according to the data type modifier of the |
| instruction. If the internal format of the texture is not compatible with |
| the instruction's data type modifer, the extracted texel value is |
| undefined. |
| |
| <coord> is a four-component signed integer vector used to identify the |
| single texel accessed. The (i,j,k) coordinates of the texel and the layer |
| used for array textures are extracted according to Table X.18. The level |
| of detail accessed is obtained by adding the w component of <coord> to the |
| base level (level_base). <offset> is a constant 3-component signed |
| integer vector added to the texel coordinates prior to the texel fetch as |
| described above. In addition to the restrictions described above, |
| non-zero offset components are also not supported for BUFFER targets. |
| |
| The texture used by TexelFetch() is specified by the image unit and target |
| parameters provided in the instruction, as for TextureSample() above. |
| Single texel fetches can not perform depth comparisons or access cubemaps. |
| If a program contains a TXF instruction specifying one of the "SHADOW" or |
| "CUBE" targets, it will fail to load. |
| |
| coordinates used |
| texTarget supported i j k layer lod |
| ---------------- --------- ----- ----- --- |
| 1D yes x - - - w |
| 2D yes x y - - w |
| 3D yes x y z - w |
| CUBE no - - - - - |
| RECT yes x y - - w |
| ARRAY1D yes x - - y w |
| ARRAY2D yes x y - z w |
| SHADOW1D no - - - - - |
| SHADOW2D no - - - - - |
| SHADOWRECT no - - - - - |
| SHADOWCUBE no - - - - - |
| SHADOWARRAY1D no - - - - - |
| SHADOWARRAY2D no - - - - - |
| BUFFER yes x - - - - |
| |
| Table X.18, Mappings of texel fetch coordinates to texel location. |
| |
| Single-texel fetches do not support LOD clamping or any texture wrap mode, |
| and require a mipmapped minification filter to access any level of detail |
| other than the base level. The results of the texel fetch are undefined: |
| |
| * if the computed LOD is less than the texture's base level (level_base) |
| or greater than the maximum level (level_max), |
| |
| * if the computed LOD is not the texture's base level and the texture's |
| minification filter is NEAREST or LINEAR, |
| |
| * if the layer specified for array textures is negative or greater than |
| the number of layers in the array texture, |
| |
| * if the texel at (i,j,k) coordinates refer to a border texel outside |
| the defined extents of the specified LOD, where |
| |
| i < -b_s, j < -b_s, k < -b_s, |
| i >= w_s - b_s, j >= h_s - b_s, or k >= d_s - b_s, |
| |
| where the size parameters (w_s, h_s, d_s, and b_s) refer to the width, |
| height, depth, and border size of the image, as in equations 3.15, |
| 3.16, and 3.17, or |
| |
| * if the texture being accessed is not complete (or cube complete for |
| cubemaps). |
| |
| |
| Section 2.X.5, Program Flow Control |
| |
| In addition to basic arithmetic, logical, and texture instructions, a |
| number of flow control instructions are provided, which are described in |
| detail in Section 2.X.8. Programs can contain several types of |
| instruction blocks: IF/ELSE/ENDIF blocks, REP/ENDREP blocks, and |
| subroutine blocks. IF/ELSE/ENDIF blocks are a set of instructions |
| beginning with an "IF" instruction, ending with an "ENDIF" instruction, |
| and possibly containing an optional "ELSE" instruction. REP/ENDREP blocks |
| are a set of instructions beginning with a "REP" instruction and ending |
| with an "ENDREP" instruction. Subroutine blocks begin with an instruction |
| label identifying the name of the subroutine and ending just before the |
| next instruction label or the end of the program. Examples include the |
| following: |
| |
| MOVC CC, R0; |
| IF GT.x; |
| MOV R0, R1; # executes if R0.x > 0 |
| ELSE; |
| MOV R0, R2; # executes if R0.x <= 0 |
| ENDIF; |
| |
| REP repCount; |
| ADD R0, R0, R1; |
| ENDREP; |
| |
| square: # subroutine to compute R0^2 |
| MUL R0, R0, R0; |
| RET; |
| main: |
| MOV R0, 9.0; |
| CAL square; # compute 9.0^2 in R0 |
| |
| IF/ELSE/ENDIF and REP/ENDREP blocks may be nested inside each other, and |
| inside subroutines. In all cases, each instruction block must be |
| terminated with the appropriate instruction (ENDIF for IF, ENDREP for |
| REP). Nested instruction blocks must be wholly contained within a block |
| -- if a REP instruction is found between an IF and ELSE instruction, the |
| corresponding ENDREP must also be present between the IF and ELSE. |
| Subroutines may not be nested inside IF/ELSE/ENDIF or REP/ENDREP blocks, |
| or inside other subroutines. A program will fail to load if any |
| instruction block is terminated by an incorrect instruction, is not |
| terminated before the block containing it, or contains an instruction |
| label. |
| |
| IF/ELSE/ENDIF blocks evaluate a condition to determine which instructions |
| to execute. If the condition is true, all instructions between the IF and |
| ELSE are executed. If the condition is false, all instructions between |
| the ELSE and ENDIF are executed. The ELSE instruction is optional. If |
| the ELSE is omitted, all instructions between the IF and ENDIF are |
| executed if the condition is true, or skipped if the condition is false. |
| A limited amount of nesting is supported -- a program will fail to load if |
| an IF instruction is nested inside MAX_PROGRAM_IF_DEPTH_NV or more |
| IF/ELSE/ENDIF blocks. |
| |
| REP/ENDREP blocks are used to execute a sequence of instructions multiple |
| times. The REP instruction includes an optional scalar operand to specify |
| a loop count indicating the number of times the block of instructions |
| should be repeated. If the loop count is omitted, the contents of a |
| REP/ENDREP block will be repeated indefinitely until the loop is |
| explicitly terminated. A limited amount of nesting is supported -- a |
| program will fail to load if a REP instruction is nested inside |
| MAX_PROGRAM_LOOP_DEPTH_NV or more REP/ENDREP blocks. |
| |
| Within a REP/ENDREP block, the CONT instruction can be used to terminate |
| the current iteration of the loop by effectively jumping to the ENDREP |
| instruction. The BRK instruction can be used to terminate the entire loop |
| by effectively jumping to the instruction immediately following the ENDREP |
| instruction. If CONT and BRK instructions are found inside multiply |
| nested REP/ENDREP blocks, they apply to the innermost block. A program |
| will fail to load if it includes a CONT or BRK instruction that is not |
| contained inside a REP/ENDREP block. |
| |
| A REP/ENDREP block without a specified loop count can result in an |
| infinite loop. To prevent obvious infinite loops, a program will fail to |
| load if it contains a REP/ENDREP block that contains neither a BRK |
| instruction at the current nesting level or a RET instruction at any |
| nesting level. |
| |
| Subroutines are supported via the CAL and RET instructions. A subroutine |
| block is identified by an instruction, which can be any valid identifier |
| according to the <instLabel> grammar rule. The CAL instruction identifies |
| a subroutine name to call according to the <instTarget> grammar rule. |
| Instruction labels used in CAL instructions do not need to be defined in |
| the program text that precedes the instruction, but a program will fail to |
| load if it includes a CAL instruction that references an instruction label |
| that is not defined anywhere in the program. When a CAL instruction is |
| executed, it transfers control to the instruction immediately following |
| the specified instruction label. Subsequent instructions in that |
| subroutine are executed until a RET instruction is executed, or until |
| program execution reaches another instruction label or the end of the |
| program text. After the subroutine finishes, execution continues with the |
| instruction immediately following the CAL instruction. When a RET |
| instruction is issued, it will break out of any IF/ELSE/ENDIF or |
| REP/ENDREP blocks that contain it. |
| |
| Subroutines may call other subroutines before completing, up to an |
| implementation-dependent maximum depth of MAX_PROGRAM_CALL_DEPTH_NV calls. |
| Subroutines may call any subroutine in the program, including themselves, |
| as long as the call depth limit is obeyed. The results of issuing a CAL |
| instruction while MAX_PROGRAM_CALL_DEPTH subroutines have not completed |
| has undefined results, including possible program termination. |
| |
| Several flow control instructions include condition code tests. The IF |
| instruction requires a condition test to determine what instructions are |
| executed. The CONT, BRK, CAL, and RET instructions have an optional |
| condition code test; if the test fails, the instructions are not executed. |
| Condition code tests are specified by the <ccTest> grammar rule. The test |
| is evaluated like the condition code write mask (section 2.X.4.3), and |
| passes if and only if any of the four components passes. |
| |
| If an instruction label named "main" is specified, GPU program execution |
| begins with the instruction immediately following that label. Otherwise, |
| it begins with the first instruction of the program. Instructions are |
| executed in sequence until either a RET instruction is issued in the main |
| subroutine or the end of the program text is reached. |
| |
| |
| Section 2.X.6, Program Options |
| |
| Programs may specify a number of options to indicate that one or more |
| extended language features are used by the program. All program options |
| used by the program must be declared at the beginning of the program |
| string. Each program option specified in a program string will modify the |
| syntactic or semantic rules used to interpet the program and the execution |
| environment used to execute the program. Features in program options |
| not declared by the program are ignored, even if the option is otherwise |
| supported by the GL. Each option declaration consists of two tokens: the |
| keyword "OPTION" and an identifier. |
| |
| The set of available options depends on the program type, and is |
| enumerated in the specifications for each program type. Some program |
| types may not provide any options. |
| |
| |
| Section 2.X.7, Program Declarations |
| |
| Programs may include a number of declaration statements to specify |
| characteristics of the program. Each declaration statement is followed by |
| one or more arguments, separated by commas. |
| |
| The set of available declarations depends on the program type, and is |
| enumerated in the specifications for each program type. Some program |
| types may not provide declarations. |
| |
| |
| Section 2.X.8, Program Instruction Set |
| |
| The following sections enumerate the set of instructions supported for GPU |
| programs. |
| |
| Some instructions allow the use of one of the three basic data type |
| modifiers (floating point, signed integer, and unsigned integer). Unless |
| otherwise mentioned: |
| |
| * the result and all of the operands will be interpreted according to |
| the specified data type, and |
| |
| * if no data type modifier is specified, the instruction will operate as |
| though a floating-point modifier ("F") were specified. |
| |
| Some instructions will override one or both of these rules. |
| |
| |
| Section 2.X.8.Z, ABS: Absolute Value |
| |
| The ABS instruction performs a component-wise absolute value operation on |
| the single operand to yield a result vector. |
| |
| tmp = VectorLoad(op0); |
| result.x = abs(tmp.x); |
| result.y = abs(tmp.y); |
| result.z = abs(tmp.z); |
| result.w = abs(tmp.w); |
| |
| ABS supports all three data type modifiers. Taking the absolute value of |
| an unsigned integer is not a useful operation, but is not illegal. |
| |
| |
| Section 2.X.8.Z, ADD: Add |
| |
| The ADD instruction performs a component-wise add of the two operands to |
| yield a result vector. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| result.x = tmp0.x + tmp1.x; |
| result.y = tmp0.y + tmp1.y; |
| result.z = tmp0.z + tmp1.z; |
| result.w = tmp0.w + tmp1.w; |
| |
| ADD supports all three data type modifiers. |
| |
| |
| Section 2.X.8.Z, AND: Bitwise AND |
| |
| The AND instruction performs a bitwise AND operation on the components of |
| the two source vectors to yield a result vector. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| result.x = tmp0.x & tmp1.x; |
| result.y = tmp0.y & tmp1.y; |
| result.z = tmp0.z & tmp1.z; |
| result.w = tmp0.w & tmp1.w; |
| |
| AND supports only signed and unsigned integer data type modifiers. If no |
| type modifier is specified, both operands and the result are treated as |
| signed integers. |
| |
| |
| Section 2.X.8.Z, BRK: Break out of Loop Instruction |
| |
| The BRK instruction conditionally transfers control to the instruction |
| immediately following the next ENDREP instruction. A BRK instruction has |
| no effect if the condition code test evaluates to FALSE. |
| |
| The following pseudocode describes the operation of the instruction: |
| |
| if (TestCC(cc.c***) || TestCC(cc.*c**) || |
| TestCC(cc.**c*) || TestCC(cc.***c)) { |
| continue execution at instruction following the next ENDREP; |
| } |
| |
| |
| Section 2.X.8.Z, CAL: Subroutine Call |
| |
| The CAL instruction conditionally transfers control to the instruction |
| following the label specified in the instruction. It also pushes a |
| reference to the instruction immediately following the CAL instruction |
| onto the call stack, where execution will continue after executing the |
| matching RET instruction. The following pseudocode describes the |
| operation of the instruction: |
| |
| if (TestCC(cc.c***) || TestCC(cc.*c**) || |
| TestCC(cc.**c*) || TestCC(cc.***c)) { |
| if (callStackDepth >= MAX_PROGRAM_CALL_DEPTH_NV) { |
| // undefined results |
| } else { |
| callStack[callStackDepth] = nextInstruction; |
| callStackDepth++; |
| } |
| // continue execution at instruction following <instTarget> |
| } else { |
| // do nothing |
| } |
| |
| In the pseudocode, <instTarget> is the label specified in the instruction |
| matching the <branchLabel> grammar rule, <callStackDepth> is the current |
| depth of the call stack, <callStack> is an array holding the call stack, |
| and <nextInstruction> is a reference to the instruction immediately |
| following the CAL instruction in the program string. |
| |
| If the call stack overflows, the results of the CAL instruction are |
| undefined, and can result in immediate program termination. |
| |
| An instruction label signifies the beginning of a new subroutine. |
| Subroutines may not nest or overlap. If a CAL instruction is executed and |
| subsequent program execution reaches an instruction label before a |
| corresponding RET instruction is executed, the subroutine call returns |
| immediately, as though an unconditional RET instruction were inserted |
| immediately before the instruction label. |
| |
| (Note: On previous vertex program extensions -- NV_vertex_program2 and |
| NV_vertex_program3 -- instruction labels were also used as targets for |
| branch (BRA) instructions. This unstructured branching functionality has |
| been replaced with the structured branching constructs found in this |
| instruction set.) |
| |
| |
| Section 2.X.8.Z, CEIL: Ceiling |
| |
| The CEIL instruction loads a single vector operand and performs a |
| component-wise ceiling operation to generate a result vector. |
| |
| tmp = VectorLoad(op0); |
| iresult.x = ceil(tmp.x); |
| iresult.y = ceil(tmp.y); |
| iresult.z = ceil(tmp.z); |
| iresult.w = ceil(tmp.w); |
| |
| The ceiling operation returns the nearest integer greater than or equal to |
| the operand. For example ceil(-1.7) = -1.0, ceil(+1.0) = +1.0, and |
| ceil(+3.7) = +4.0. |
| |
| CEIL supports all three data type modifiers. The single operand is always |
| treated as a floating-point vector, but the result is written as a |
| floating-point value, a signed integer, or an unsigned integer, as |
| specified by the data type modifier. If a value is not exactly |
| representable using the data type of the result (e.g., an overflow or |
| writing a negative value to an unsigned integer), the result is undefined. |
| |
| |
| Section 2.X.8.Z, CMP: Compare |
| |
| The CMP instructions performs a component-wise comparison of the first |
| operand against zero, and copies the values of the second or third |
| operands based on the results of the compare. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| tmp2 = VectorLoad(op2); |
| result.x = (tmp0.x < 0) ? tmp1.x : tmp2.x; |
| result.y = (tmp0.y < 0) ? tmp1.y : tmp2.y; |
| result.z = (tmp0.z < 0) ? tmp1.z : tmp2.z; |
| result.w = (tmp0.w < 0) ? tmp1.w : tmp2.w; |
| |
| CMP supports all three data type modifiers. CMP with an unsigned data |
| type modifier is not a useful operation, but is not illegal. |
| |
| |
| Section 2.X.8.Z, CONT: Continue with Next Loop Iteration |
| |
| The CONT instruction conditionally transfers control to the next ENDREP |
| instruction. A CONT instruction has no effect if the condition code test |
| evaluates to FALSE. |
| |
| The following pseudocode describes the operation of the instruction: |
| |
| if (TestCC(cc.c***) || TestCC(cc.*c**) || |
| TestCC(cc.**c*) || TestCC(cc.***c)) { |
| continue execution at the next ENDREP; |
| } |
| |
| |
| Section 2.X.8.Z, COS: Cosine with Reduction to [-PI,PI] |
| |
| The COS instruction approximates the trigonometric cosine of the angle |
| specified by the scalar operand and replicates it to all four components |
| of the result vector. The angle is specified in radians and does not have |
| to be in the range [-PI,PI]. |
| |
| tmp = ScalarLoad(op0); |
| result.x = ApproxCosine(tmp); |
| result.y = ApproxCosine(tmp); |
| result.z = ApproxCosine(tmp); |
| result.w = ApproxCosine(tmp); |
| |
| COS supports only floating-point data type modifiers. |
| |
| |
| Section 2.X.8.Z, DDX: Partial Derivative Relative to X |
| |
| The DDX instruction computes approximate partial derivatives of a vector |
| operand with respect to the X window coordinate, and is only available to |
| fragment programs. See the NV_fragment_program4 specification for more |
| details. |
| |
| |
| Section 2.X.8.Z, DDY: Partial Derivative Relative to Y |
| |
| The DDY instruction computes approximate partial derivatives of a vector |
| operand with respect to the Y window coordinate, and is only available to |
| fragment programs. See the NV_fragment_program4 specification for more |
| details. |
| |
| |
| Section 2.X.8.Z, DIV: Divide Vector Components by Scalar |
| |
| The DIV instruction performs a component-wise divide of the first vector |
| operand by the second scalar operand to produce a 4-component result |
| vector. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = ScalarLoad(op1); |
| result.x = tmp0.x / tmp1; |
| result.y = tmp0.y / tmp1; |
| result.z = tmp0.z / tmp1; |
| result.w = tmp0.w / tmp1; |
| |
| DIV supports all three data type modifiers. For floating-point division, |
| this instruction is not guaranteed to produce results identical to a |
| RCP/MUL instruction sequence. |
| |
| The results of an signed or unsigned integer division by zero are |
| undefined. |
| |
| |
| Section 2.X.8.Z, DP2: 2-Component Dot Product |
| |
| The DP2 instruction computes a two-component dot product of the two |
| operands (using the first two components) and replicates the dot product |
| to all four components of the result vector. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| dot = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y); |
| result.x = dot; |
| result.y = dot; |
| result.z = dot; |
| result.w = dot; |
| |
| DP2 supports only floating-point data type modifiers. |
| |
| |
| Section 2.X.8.Z, DP2A: 2-Component Dot Product with Scalar Add |
| |
| The DP2 instruction computes a two-component dot product of the two |
| operands (using the first two components), adds the x component of the |
| third operand, and replicates the result to all four components of the |
| result vector. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| tmp2 = VectorLoad(op2); |
| dot = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + tmp2.x; |
| result.x = dot; |
| result.y = dot; |
| result.z = dot; |
| result.w = dot; |
| |
| DP2A supports only floating-point data type modifiers. |
| |
| |
| Section 2.X.8.Z, DP3: 3-Component Dot Product |
| |
| The DP3 instruction computes a three-component dot product of the two |
| operands (using the x, y, and z components) and replicates the dot product |
| to all four components of the result vector. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| dot = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + |
| (tmp0.z * tmp1.z); |
| result.x = dot; |
| result.y = dot; |
| result.z = dot; |
| result.w = dot; |
| |
| DP3 supports only floating-point data type modifiers. |
| |
| |
| Section 2.X.8.Z, DP4: 4-Component Dot Product |
| |
| The DP4 instruction computes a four-component dot product of the two |
| operands and replicates the dot product to all four components of the |
| result vector. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1): |
| dot = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + |
| (tmp0.z * tmp1.z) + (tmp0.w * tmp1.w); |
| result.x = dot; |
| result.y = dot; |
| result.z = dot; |
| result.w = dot; |
| |
| DP4 supports only floating-point data type modifiers. |
| |
| |
| Section 2.X.8.Z, DPH: Homogeneous Dot Product |
| |
| The DPH instruction computes a three-component dot product of the two |
| operands (using the x, y, and z components), adds the w component of the |
| second operand, and replicates the sum to all four components of the |
| result vector. This is equivalent to a four-component dot product where |
| the w component of the first operand is forced to 1.0. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1): |
| dot = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + |
| (tmp0.z * tmp1.z) + tmp1.w; |
| result.x = dot; |
| result.y = dot; |
| result.z = dot; |
| result.w = dot; |
| |
| DPH supports only floating-point data type modifiers. |
| |
| |
| Section 2.X.8.Z, DST: Distance Vector |
| |
| The DST instruction computes a distance vector from two specially- |
| formatted operands. The first operand should be of the form [NA, d^2, |
| d^2, NA] and the second operand should be of the form [NA, 1/d, NA, 1/d], |
| where NA values are not relevant to the calculation and d is a vector |
| length. If both vectors satisfy these conditions, the result vector will |
| be of the form [1.0, d, d^2, 1/d]. |
| |
| The exact behavior is specified in the following pseudo-code: |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| result.x = 1.0; |
| result.y = tmp0.y * tmp1.y; |
| result.z = tmp0.z; |
| result.w = tmp1.w; |
| |
| Given an arbitrary vector, d^2 can be obtained using the DP3 instruction |
| (using the same vector for both operands) and 1/d can be obtained from d^2 |
| using the RSQ instruction. |
| |
| This distance vector is useful for per-vertex light attenuation |
| calculations: a DP3 operation using the distance vector and an |
| attenuation constants vector as operands will yield the attenuation |
| factor. |
| |
| DST supports only floating-point data type modifiers. |
| |
| |
| Section 2.X.8.Z, ELSE: Start of If Test Else Block |
| |
| The ELSE instruction signifies the end of the "execute if true" portion of |
| an IF/ELSE/ENDIF block and the beginning of the "execute if false" |
| portion. |
| |
| If the condition evaluated at the IF statement was TRUE, when a program |
| reaches the ELSE statement, it has completed the entire "execute if true" |
| portion of the IF/ELSE/ENDIF block. Execution will continue at the |
| corresponding ENDIF instruction. |
| |
| If the condition evaluated at the IF statement was FALSE, program |
| execution would skip over the entire "execute if true" portion of the |
| IF/ELSE/ENDIF block, including the ELSE instruction. |
| |
| |
| Section 2.X.8.Z, EMIT: Emit Vertex |
| |
| The EMIT instruction emits a new vertex to be added to the current output |
| primitive generated by a geometry program, and is only available to |
| geometry programs. See the NV_geometry_program4 specification for more |
| details. |
| |
| |
| Section 2.X.8.Z, ENDIF: End of If Test Block |
| |
| The ENDIF instruction signifies the end of an IF/ELSE/ENDIF block. It has |
| no other effect on program execution. |
| |
| |
| Section 2.X.8,Z, ENDPRIM: End of Primitive |
| |
| A geometry program can emit multiple primitives in a single invocation. |
| The ENDPRIM instruction is used in a geometry program to signify the end |
| of the current primitive and the beginning of a new primitive of the same |
| type. It is only available to geometry programs. See the |
| NV_geometry_program4 specification for more details. |
| |
| |
| Section 2.X.8.Z, ENDREP: End of Repeat Block |
| |
| The ENDREP instruction specifies the end of a REP block. |
| |
| When used with in conjunction with a REP instruction with a loop count, |
| ENDREP decrements the loop counter. If the decremented loop counter is |
| greater than zero, ENDREP transfers control to the instruction immediately |
| after the corresponding REP instruction. If the loop counter is less than |
| or equal to zero, execution continues at the instruction following the |
| ENDREP instruction. When used in conjunction with a REP instruction |
| without loop count, ENDREP always transfers control to the instruction |
| immediately after the REP instruction. |
| |
| if (REP instruction includes a loop count) { |
| LoopCount--; |
| if (LoopCount > 0) { |
| continue execution at instruction following corresponding REP |
| instruction; |
| } |
| } else { |
| continue execution at instruction following corresponding REP |
| instruction; |
| } |
| |
| |
| Section 2.X.8.Z, EX2: Exponential Base 2 |
| |
| The EX2 instruction approximates 2 raised to the power of the scalar |
| operand and replicates the approximation to all four components of the |
| result vector. |
| |
| tmp = ScalarLoad(op0); |
| result.x = Approx2ToX(tmp); |
| result.y = Approx2ToX(tmp); |
| result.z = Approx2ToX(tmp); |
| result.w = Approx2ToX(tmp); |
| |
| EX2 supports only floating-point data type modifiers. |
| |
| |
| Section 2.X.8.Z, FLR: Floor |
| |
| The FLR instruction loads a single vector operand and performs a |
| component-wise floor operation to generate a result vector. |
| |
| tmp = VectorLoad(op0); |
| result.x = floor(tmp.x); |
| result.y = floor(tmp.y); |
| result.z = floor(tmp.z); |
| result.w = floor(tmp.w); |
| |
| The floor operation returns the nearest integer less than or equal to the |
| operand. For example floor(-1.7) = -2.0, floor(+1.0) = +1.0, and floor(+3.7) |
| = +3.0. |
| |
| FLR supports all three data type modifiers. The single operand is always |
| treated as a floating-point value, but the result is written as a |
| floating-point value, a signed integer, or an unsigned integer, as |
| specified by the data type modifier. If a value is not exactly |
| representable using the data type of the result (e.g., an overflow or |
| writing a negative value to an unsigned integer), the result is undefined. |
| |
| |
| Section 2.X.8.Z, FRC: Fraction |
| |
| The FRC instruction extracts the fractional portion of each component of |
| the operand to generate a result vector. The fractional portion of a |
| component is defined as the result after subtracting off the floor of the |
| component (see FLR), and is always in the range [0.0, 1.0). |
| |
| For negative values, the fractional portion is NOT the number written to |
| the right of the decimal point -- the fractional portion of -1.7 is not |
| 0.7 -- it is 0.3. 0.3 is produced by subtracting the floor of -1.7 (-2.0) |
| from -1.7. |
| |
| tmp = VectorLoad(op0); |
| result.x = fraction(tmp.x); |
| result.y = fraction(tmp.y); |
| result.z = fraction(tmp.z); |
| result.w = fraction(tmp.w); |
| |
| FRC supports only floating-point data type modifiers. |
| |
| |
| Section 2.X.8.Z, I2F: Integer to Float |
| |
| The I2F instruction converts the components of an integer vector operand |
| to floating-point to produce a floating-point result vector. |
| |
| tmp = VectorLoad(op0); |
| result.x = (float) tmp.x; |
| result.y = (float) tmp.y; |
| result.z = (float) tmp.z; |
| result.w = (float) tmp.w; |
| |
| I2F supports only signed and unsigned integer data type modifiers. The |
| single operand is interpreted according to the data type modifier. If no |
| data type modifier is specified, the operand is treated as a signed |
| integer vector. The result is always written as a float. |
| |
| |
| Section 2.X.8.Z, IF: Start of If Test Block |
| |
| The IF instruction performs a condition code test to determine what |
| instructions inside an IF/ELSE/ENDIF block are executed. If the test |
| passes, execution continues at the instruction immediately following the |
| IF instruction. If the test fails, IF transfers control to the |
| instruction immediately following the corresponding ELSE instruction (if |
| present) or the ENDIF instruction (if no ELSE is present). |
| |
| Implementations may have a limited ability to nest IF blocks in any |
| subroutine. If the number of IF/ENDIF blocks nested inside each other is |
| MAX_PROGRAM_IF_DEPTH_NV or higher, a program will fail to compile. |
| |
| // Evaluate the condition. If the condition is true, continue at the |
| // next instruction. Otherwise, continue at the |
| if (TestCC(cc.c***) || TestCC(cc.*c**) || |
| TestCC(cc.**c*) || TestCC(cc.***c)) { |
| continue execution at the next instruction; |
| } else if (IF block contains an ELSE statement) { |
| continue execution at instruction following corresponding ELSE; |
| } else { |
| continue execution at instruction following corresponding ENDIF; |
| } |
| |
| (Note: Unlike the NV_fragment_program2 extension, there is no run-time |
| limit on the maximum overall depth of IF/ENDIF nesting. As long as each |
| individual subroutine of the program obeys the static nesting limits, |
| there will be no run-time errors in the program. With the |
| NV_fragment_program2 extension, a program could terminate abnormally if it |
| called a subroutine inside a very deeply nested set of IF/ENDIF blocks and |
| the called subroutine also contained deeply nested IF/ENDIF blocks. SUch |
| an error could occur even if neither subroutine exceeded static limits.) |
| |
| |
| Section 2.X.8.Z, KIL: Kill Fragment |
| |
| The KIL instruction conditionally kills a fragment, and is only available |
| to fragment programs. See the NV_fragment_program4 specification for more |
| details. |
| |
| |
| Section 2.X.8.Z, LG2: Logarithm Base 2 |
| |
| The LG2 instruction approximates the base 2 logarithm of the scalar |
| operand and replicates it to all four components of the result vector. |
| |
| tmp = ScalarLoad(op0); |
| result.x = ApproxLog2(tmp); |
| result.y = ApproxLog2(tmp); |
| result.z = ApproxLog2(tmp); |
| result.w = ApproxLog2(tmp); |
| |
| If the scalar operand is zero or negative, the result is undefined. |
| |
| LG2 supports only floating-point data type modifiers. |
| |
| |
| Section 2.X.8.Z, LIT: Compute Lighting Coefficients |
| |
| The LIT instruction accelerates lighting computations by computing |
| lighting coefficients for ambient, diffuse, and specular light |
| contributions. The "x" component of the single operand is assumed to hold |
| a diffuse dot product (n dot VP_pli, as in the vertex lighting equations |
| in Section 2.13.1). The "y" component of the operand is assumed to hold a |
| specular dot product (n dot h_i). The "w" component of the operand is |
| assumed to hold the specular exponent of the material (s_rm), and is |
| clamped to the range (-128, +128) exclusive. |
| |
| The "x" component of the result vector receives the value that should be |
| multiplied by the ambient light/material product (always 1.0). The "y" |
| component of the result vector receives the value that should be |
| multiplied by the diffuse light/material product (n dot VP_pli). The "z" |
| component of the result vector receives the value that should be |
| multiplied by the specular light/material product (f_i * (n dot h_i) ^ |
| s_rm). The "w" component of the result is the constant 1.0. |
| |
| Negative diffuse and specular dot products are clamped to 0.0, as is done |
| in the standard per-vertex lighting operations. In addition, if the |
| diffuse dot product is zero or negative, the specular coefficient is |
| forced to zero. |
| |
| tmp = VectorLoad(op0); |
| if (tmp.x < 0) tmp.x = 0; |
| if (tmp.y < 0) tmp.y = 0; |
| if (tmp.w < -(128.0-epsilon)) tmp.w = -(128.0-epsilon); |
| else if (tmp.w > 128-epsilon) tmp.w = 128-epsilon; |
| result.x = 1.0; |
| result.y = tmp.x; |
| result.z = (tmp.x > 0) ? RoughApproxPower(tmp.y, tmp.w) : 0.0; |
| result.w = 1.0; |
| |
| Since 0^0 is defined to be 1, RoughApproxPower(0.0, 0.0) will produce 1.0. |
| |
| LIT supports only floating-point data type modifiers. |
| |
| |
| Section 2.X.8.Z, LRP: Linear Interpolation |
| |
| The LRP instruction performs a component-wise linear interpolation between |
| the second and third operands using the first operand as the blend factor. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| tmp2 = VectorLoad(op2); |
| result.x = tmp0.x * tmp1.x + (1 - tmp0.x) * tmp2.x; |
| result.y = tmp0.y * tmp1.y + (1 - tmp0.y) * tmp2.y; |
| result.z = tmp0.z * tmp1.z + (1 - tmp0.z) * tmp2.z; |
| result.w = tmp0.w * tmp1.w + (1 - tmp0.w) * tmp2.w; |
| |
| LRP supports only floating-point data type modifiers. |
| |
| |
| Section 2.X.8.Z, MAD: Multiply and Add |
| |
| The MAD instruction performs a component-wise multiply of the first two |
| operands, and then does a component-wise add of the product to the third |
| operand to yield a result vector. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| tmp2 = VectorLoad(op2); |
| result.x = tmp0.x * tmp1.x + tmp2.x; |
| result.y = tmp0.y * tmp1.y + tmp2.y; |
| result.z = tmp0.z * tmp1.z + tmp2.z; |
| result.w = tmp0.w * tmp1.w + tmp2.w; |
| |
| The multiplication and addition operations in this instruction are subject |
| to the same rules as described for the MUL and ADD instructions. |
| |
| MAD supports all three data type modifiers. |
| |
| |
| Section 2.X.8.Z, MAX: Maximum |
| |
| The MAX instruction computes component-wise maximums of the values in the |
| two operands to yield a result vector. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| result.x = (tmp0.x > tmp1.x) ? tmp0.x : tmp1.x; |
| result.y = (tmp0.y > tmp1.y) ? tmp0.y : tmp1.y; |
| result.z = (tmp0.z > tmp1.z) ? tmp0.z : tmp1.z; |
| result.w = (tmp0.w > tmp1.w) ? tmp0.w : tmp1.w; |
| |
| MAX supports all three data type modifiers. |
| |
| |
| Section 2.X.8.Z, MIN: Minimum |
| |
| The MIN instruction computes component-wise minimums of the values in the |
| two operands to yield a result vector. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| result.x = (tmp0.x > tmp1.x) ? tmp1.x : tmp0.x; |
| result.y = (tmp0.y > tmp1.y) ? tmp1.y : tmp0.y; |
| result.z = (tmp0.z > tmp1.z) ? tmp1.z : tmp0.z; |
| result.w = (tmp0.w > tmp1.w) ? tmp1.w : tmp0.w; |
| |
| MIN supports all three data type modifiers. |
| |
| |
| Section 2.X.8.Z, MOD: Modulus |
| |
| The MOD instruction performs a component-wise modulus operation on the first |
| vector operand by the second scalar operand to produce a 4-component result |
| vector. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = ScalarLoad(op1); |
| result.x = tmp0.x % tmp1; |
| result.y = tmp0.y % tmp1; |
| result.z = tmp0.z % tmp1; |
| result.w = tmp0.w % tmp1; |
| |
| MOD supports both signed and unsigned integer data type modifiers. If no |
| data type modifier is specified, both operands and the result are treated |
| as signed integers. |
| |
| A result component is undefined if the corresponding component of the |
| first operand is negative or if the second operand is less than or equal |
| to zero. |
| |
| |
| Section 2.X.8.Z, MOV: Move |
| |
| The MOV instruction copies the value of the operand to yield a result |
| vector. |
| |
| result = VectorLoad(op0); |
| |
| MOV supports all three data type modifiers. |
| |
| |
| Section 2.X.8.Z, MUL: Multiply |
| |
| The MUL instruction performs a component-wise multiply of the two operands |
| to yield a result vector. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| result.x = tmp0.x * tmp1.x; |
| result.y = tmp0.y * tmp1.y; |
| result.z = tmp0.z * tmp1.z; |
| result.w = tmp0.w * tmp1.w; |
| |
| MUL supports all three data type modifiers. The MUL instruction |
| additionally supports three special modifiers. |
| |
| The "S24" and "U24" modifiers specify "fast" signed or unsigned integer |
| multiplies of 24-bit quantities, respectively. The results of such |
| multiplies are undefined if either operand is outside the range |
| [-2^23,+2^23-1] for S24 or [0,2^24-1] for U24. If "S24" or "U24" is |
| specified, the data type is implied and normal data type modifiers may not |
| be provided. |
| |
| The "HI" modifier specifies a 32-bit integer multiply that returns the 32 |
| most significant bits of the 64-bit product. Integer multiplies without |
| the "HI" modifier normally return the least significant bits of the |
| product. If "HI" is specified, either of the "S" or "U" integer data type |
| modifiers must also be specified. |
| |
| Note that if condition code updates are performed on integer multiplies, |
| the overflow or carry flags are always cleared, even if the product |
| overflowed. If it is necessary to determine if the results of an integer |
| multiply overflowed, the MUL.HI instruction may be used. |
| |
| |
| Section 2.X.8.Z, NOT: Bitwise Not |
| |
| The NOT instruction performs a component-wise bitwise NOT operation on the |
| source vector to produce a result vector. |
| |
| tmp = VectorLoad(op0); |
| tmp.x = ~tmp.x; |
| tmp.y = ~tmp.y; |
| tmp.z = ~tmp.z; |
| tmp.w = ~tmp.w; |
| |
| NOT supports only integer data type modifiers. If no type modifier is |
| specified, the operand and the result are treated as signed integers. |
| |
| |
| Section 2.X.8.Z, NRM: Normalize 3-Component Vector |
| |
| The NRM instruction normalizes the vector given by the x, y, and z |
| components of the vector operand to produce the x, y, and z components of |
| the result vector. The w component of the result is undefined. |
| |
| tmp = VectorLoad(op0); |
| scale = ApproxRSQ(tmp.x * tmp.x + tmp.y * tmp.y + tmp.z * tmp.z); |
| result.x = tmp.x * scale; |
| result.y = tmp.y * scale; |
| result.z = tmp.z * scale; |
| result.w = undefined; |
| |
| NRM supports only floating-point data type modifiers. |
| |
| |
| Section 2.X.8.Z, OR: Bitwise Or |
| |
| The OR instruction performs a bitwise OR operation on the components of |
| the two source vectors to yield a result vector. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| result.x = tmp0.x | tmp1.x; |
| result.y = tmp0.y | tmp1.y; |
| result.z = tmp0.z | tmp1.z; |
| result.w = tmp0.w | tmp1.w; |
| |
| OR supports only integer data type modifiers. If no type modifier is |
| specified, both operands and the result are treated as signed integers. |
| |
| |
| Section 2.X.8.Z, PK2H: Pack Two 16-bit Floats |
| |
| The PK2H instruction converts the "x" and "y" components of the single |
| floating-point vector operand into 16-bit floating-point format, packs the |
| bit representation of these two floats into a 32-bit unsigned integer, and |
| replicates that value to all four components of the result vector. The |
| PK2H instruction can be reversed by the UP2H instruction below. |
| |
| tmp0 = VectorLoad(op0); |
| /* result obtained by combining raw bits of tmp0.x, tmp0.y */ |
| result.x = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16); |
| result.y = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16); |
| result.z = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16); |
| result.w = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16); |
| |
| PK2H supports all three data type modifiers. The single operand is always |
| treated as a floating-point value, but the result is written as a |
| floating-point value, a signed integer, or an unsigned integer, as |
| specified by the data type modifier. For integer results, the bits can be |
| interpreted as described above. For floating-point result variables, the |
| packed results do not constitute a meaningful floating-point variable and |
| should only be used to feed future unpack instructions. |
| |
| A program will fail to load if it contains a PK2H instruction that writes |
| its results to a variable declared as "SHORT". |
| |
| |
| Section 2.X.8.Z, PK2US: Pack Two Floats as Unsigned 16-bit |
| |
| The PK2US instruction converts the "x" and "y" components of the single |
| floating-point vector operand into a packed pair of 16-bit unsigned |
| scalars. The scalars are represented in a bit pattern where all '0' bits |
| corresponds to 0.0 and all '1' bits corresponds to 1.0. The bit |
| representations of the two converted components are packed into a 32-bit |
| unsigned integer, and that value is replicated to all four components of |
| the result vector. The PK2US instruction can be reversed by the UP2US |
| instruction below. |
| |
| tmp0 = VectorLoad(op0); |
| if (tmp0.x < 0.0) tmp0.x = 0.0; |
| if (tmp0.x > 1.0) tmp0.x = 1.0; |
| if (tmp0.y < 0.0) tmp0.y = 0.0; |
| if (tmp0.y > 1.0) tmp0.y = 1.0; |
| us.x = round(65535.0 * tmp0.x); /* us is a ushort vector */ |
| us.y = round(65535.0 * tmp0.y); |
| /* result obtained by combining raw bits of us. */ |
| result.x = ((us.x) | (us.y << 16)); |
| result.y = ((us.x) | (us.y << 16)); |
| result.z = ((us.x) | (us.y << 16)); |
| result.w = ((us.x) | (us.y << 16)); |
| |
| PK2US supports all three data type modifiers. The single operand is |
| always treated as a floating-point value, but the result is written as a |
| floating-point value, a signed integer, or an unsigned integer, as |
| specified by the data type modifier. For integer result variables, the |
| bits can be interpreted as described above. For floating-point result |
| variables, the packed results do not constitute a meaningful |
| floating-point variable and should only be used to feed future unpack |
| instructions. |
| |
| A program will fail to load if it contains a PK2US instruction that writes |
| its results to a variable declared as "SHORT". |
| |
| |
| Section 2.X.8.Z, PK4B: Pack Four Floats as Signed 8-bit |
| |
| The PK4B instruction converts the four components of the single |
| floating-point vector operand into 8-bit signed quantities. The signed |
| quantities are represented in a bit pattern where all '0' bits corresponds |
| to -128/127 and all '1' bits corresponds to +127/127. The bit |
| representations of the four converted components are packed into a 32-bit |
| unsigned integer, and that value is replicated to all four components of |
| the result vector. The PK4B instruction can be reversed by the UP4B |
| instruction below. |
| |
| tmp0 = VectorLoad(op0); |
| if (tmp0.x < -128/127) tmp0.x = -128/127; |
| if (tmp0.y < -128/127) tmp0.y = -128/127; |
| if (tmp0.z < -128/127) tmp0.z = -128/127; |
| if (tmp0.w < -128/127) tmp0.w = -128/127; |
| if (tmp0.x > +127/127) tmp0.x = +127/127; |
| if (tmp0.y > +127/127) tmp0.y = +127/127; |
| if (tmp0.z > +127/127) tmp0.z = +127/127; |
| if (tmp0.w > +127/127) tmp0.w = +127/127; |
| ub.x = round(127.0 * tmp0.x + 128.0); /* ub is a ubyte vector */ |
| ub.y = round(127.0 * tmp0.y + 128.0); |
| ub.z = round(127.0 * tmp0.z + 128.0); |
| ub.w = round(127.0 * tmp0.w + 128.0); |
| /* result obtained by combining raw bits of ub. */ |
| result.x = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24)); |
| result.y = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24)); |
| result.z = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24)); |
| result.w = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24)); |
| |
| PK4B supports all three data type modifiers. The single operand is always |
| treated as a floating-point value, but the result is written as a |
| floating-point value, a signed integer, or an unsigned integer, as |
| specified by the data type modifier. For integer result variables, the |
| bits can be interpreted as described above. For floating-point result |
| variables, the packed results do not constitute a meaningful |
| floating-point variable and should only be used to feed future unpack |
| instructions. A program will fail to load if it contains a PK4B |
| instruction that writes its results to a variable declared as "SHORT". |
| |
| |
| Section 2.X.8.Z, PK4UB: Pack Four Floats as Unsigned 8-bit |
| |
| The PK4UB instruction converts the four components of the single |
| floating-point vector operand into a packed grouping of 8-bit unsigned |
| scalars. The scalars are represented in a bit pattern where all '0' bits |
| corresponds to 0.0 and all '1' bits corresponds to 1.0. The bit |
| representations of the four converted components are packed into a 32-bit |
| unsigned integer, and that value is replicated to all four components of |
| the result vector. The PK4UB instruction can be reversed by the UP4UB |
| instruction below. |
| |
| tmp0 = VectorLoad(op0); |
| if (tmp0.x < 0.0) tmp0.x = 0.0; |
| if (tmp0.x > 1.0) tmp0.x = 1.0; |
| if (tmp0.y < 0.0) tmp0.y = 0.0; |
| if (tmp0.y > 1.0) tmp0.y = 1.0; |
| if (tmp0.z < 0.0) tmp0.z = 0.0; |
| if (tmp0.z > 1.0) tmp0.z = 1.0; |
| if (tmp0.w < 0.0) tmp0.w = 0.0; |
| if (tmp0.w > 1.0) tmp0.w = 1.0; |
| ub.x = round(255.0 * tmp0.x); /* ub is a ubyte vector */ |
| ub.y = round(255.0 * tmp0.y); |
| ub.z = round(255.0 * tmp0.z); |
| ub.w = round(255.0 * tmp0.w); |
| /* result obtained by combining raw bits of ub. */ |
| result.x = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24)); |
| result.y = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24)); |
| result.z = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24)); |
| result.w = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24)); |
| |
| PK4UB supports all three data type modifiers. The single operand is |
| always treated as a floating-point value, but the result is written as a |
| floating-point value, a signed integer, or an unsigned integer, as |
| specified by the data type modifier. For integer result variables, the |
| bits can be interpreted as described above. For floating-point result |
| variables, the packed results do not constitute a meaningful |
| floating-point variable and should only be used to feed future unpack |
| instructions. |
| |
| A program will fail to load if it contains a PK4UB instruction that writes |
| its results to a variable declared as "SHORT". |
| |
| |
| Section 2.X.8.Z, POW: Exponentiate |
| |
| The POW instruction approximates the value of the first scalar operand |
| raised to the power of the second scalar operand and replicates it to all |
| four components of the result vector. |
| |
| tmp0 = ScalarLoad(op0); |
| tmp1 = ScalarLoad(op1); |
| result.x = ApproxPower(tmp0, tmp1); |
| result.y = ApproxPower(tmp0, tmp1); |
| result.z = ApproxPower(tmp0, tmp1); |
| result.w = ApproxPower(tmp0, tmp1); |
| |
| The exponentiation approximation function may be implemented using the |
| base 2 exponentiation and logarithm approximation operations in the EX2 |
| and LG2 instructions. In particular, |
| |
| ApproxPower(a,b) = ApproxExp2(b * ApproxLog2(a)). |
| |
| Note that a logarithm may be involved even for cases where the exponent is |
| an integer. This means that it may not be possible to exponentiate |
| correctly with a negative base. In constrast, it is possible in a |
| "normal" mathematical formulation to raise negative numbers to integral |
| powers (e.g., (-3)^2== 9, and (-0.5)^-2==4). |
| |
| POW supports only floating-point data type modifiers. |
| |
| |
| Section 2.X.8.Z, RCC: Reciprocal (Clamped) |
| |
| The RCC instruction approximates the reciprocal of the scalar operand, |
| clamps the result to one of two ranges, and replicates the clamped result |
| to all four components of the result vector. |
| |
| If the approximated reciprocal is greater than 0.0, the result is clamped |
| to the range [2^-64, 2^+64]. If the approximate reciprocal is not greater |
| than zero, the result is clamped to the range [-2^+64, -2^-64]. |
| |
| tmp = ScalarLoad(op0); |
| result.x = ClampApproxReciprocal(tmp); |
| result.y = ClampApproxReciprocal(tmp); |
| result.z = ClampApproxReciprocal(tmp); |
| result.w = ClampApproxReciprocal(tmp); |
| |
| RCC supports only floating-point data type modifiers. |
| |
| |
| Section 2.X.8.Z, RCP: Reciprocal |
| |
| The RCP instruction approximates the reciprocal of the scalar operand and |
| replicates it to all four components of the result vector. |
| |
| tmp = ScalarLoad(op0); |
| result.x = ApproxReciprocal(tmp); |
| result.y = ApproxReciprocal(tmp); |
| result.z = ApproxReciprocal(tmp); |
| result.w = ApproxReciprocal(tmp); |
| |
| RCP supports only floating-point data type modifiers. |
| |
| |
| Section 2.X.8.Z, REP: Start of Repeat Block |
| |
| The REP instruction begins a REP/ENDREP block. The REP instruction |
| supports an optional operand whose x component specifies the initial value |
| for the loop count. The loop count indicates the number of times the |
| instructions between the REP and corresponding ENDREP instruction will be |
| executed. If the initial value of the loop count is not positive, the |
| entire block is skipped and execution continues at the instruction |
| following the corresponding ENDREP instruction. If the loop count is |
| specified as a floating-point value, it is converted to the largest |
| integer less than or equal to the specified value (i.e., taking its |
| floor). |
| |
| If no operand is provided to REP, the loop count is ignored and the |
| corresponding ENDREP instruction unconditionally transfers control to the |
| instruction immediately following the REP instruction. The only way to |
| exit such a loop is with the BRK instruction. To prevent obvious infinite |
| loops, a program that includes a REP/ENDREP block with no loop count will |
| fail to compile unless it contains either a BRK instruction at the current |
| nesting level or a RET instruction at any nesting level. |
| |
| Implementations may have a limited ability to nest REP/ENDREP blocks. If |
| the number of REP/ENDREP blocks nested inside each other is |
| MAX_PROGRAM_LOOP_DEPTH_NV or higher, a program will fail to compile. |
| |
| // Set up loop information for the new nesting level. |
| tmp = VectorLoad(op0); |
| LoopCount = floor(tmp.x); |
| if (LoopCount <= 0) { |
| continue execution at the corresponding ENDREP; |
| } |
| |
| REP supports all three data type modifiers. The single operand is |
| interpreted according to the data type modifier. |
| |
| (Note: Unlike the NV_fragment_program2 extension, REP blocks in this |
| extension support fully general looping; the specified loop count can be |
| computed in the program itself. Additionally, there is no run-time limit |
| on the maximum overall depth of REP/ENDREP nesting. As long as each |
| individual subroutine of the program obeys the static nesting limits, |
| there will be no run-time errors in the program. With the |
| NV_fragment_program2 extension, a program could terminate abnormally if it |
| called a subroutine inside a deeply nested set of REP/ENDREP blocks and |
| the called subroutine also contained deeply nested REP/ENDREP blocks. |
| Such an error could occur even if neither subroutine exceeded static |
| limits.) |
| |
| |
| Section 2.X.8.Z, RET: Subroutine Return |
| |
| The RET instruction conditionally returns from a subroutine initiated by a |
| CAL instruction by popping an instruction reference off the top of the |
| call stack and transferring control to the referenced instruction. The |
| following pseudocode describes the operation of the instruction: |
| |
| if (TestCC(cc.c***) || TestCC(cc.*c**) || |
| TestCC(cc.**c*) || TestCC(cc.***c)) { |
| if (callStackDepth <= 0) { |
| // terminate program |
| } else { |
| callStackDepth--; |
| instruction = callStack[callStackDepth]; |
| } |
| |
| // continue execution at <instruction> |
| } else { |
| // do nothing |
| } |
| |
| In the pseudocode, <callStackDepth> is the depth of the call stack, |
| <callStack> is an array holding the call stack, and <instruction> is a |
| reference to an instruction previously pushed onto the call stack. |
| |
| If the call stack is empty when RET executes, the program terminates |
| normally. |
| |
| |
| Section 2.X.8.Z, RFL: Reflection Vector |
| |
| The RFL instruction computes the reflection of the second vector operand |
| (the "direction" vector) about the vector specified by the first vector |
| operand (the "axis" vector). Both operands are treated as 3D vectors (the |
| w components are ignored). The result vector is another 3D vector (the |
| "reflected direction" vector). The length of the result vector, ignoring |
| rounding errors, should equal that of the second operand. |
| |
| axis = VectorLoad(op0); |
| direction = VectorLoad(op1); |
| tmp.w = (axis.x * axis.x + axis.y * axis.y + axis.z * axis.z); |
| tmp.x = (axis.x * direction.x + axis.y * direction.y + |
| axis.z * direction.z); |
| tmp.x = 2.0 * tmp.x; |
| tmp.x = tmp.x / tmp.w; |
| result.x = tmp.x * axis.x - direction.x; |
| result.y = tmp.x * axis.y - direction.y; |
| result.z = tmp.x * axis.z - direction.z; |
| result.w = undefined; |
| |
| RFL supports only floating-point data type modifiers. |
| |
| |
| Section 2.X.8.Z, ROUND: Round to Nearest Integer |
| |
| The ROUND instruction loads a single vector operand and performs a |
| component-wise round operation to generate a result vector. |
| |
| tmp = VectorLoad(op0); |
| result.x = round(tmp.x); |
| result.y = round(tmp.y); |
| result.z = round(tmp.z); |
| result.w = round(tmp.w); |
| |
| The round operation returns the nearest integer to the operand. If the |
| fractional portion of the operand is 0.5, round() selects the nearest even |
| integer. For example round(-1.7) = -2.0, round(+1.0) = +1.0, and |
| round(+3.7) = +4.0. |
| |
| ROUND supports all three data type modifiers. The single operand is |
| always treated as a floating-point value, but the result is written as a |
| floating-point value, a signed integer, or an unsigned integer, as |
| specified by the data type modifier. If a value is not exactly |
| representable using the data type of the result (e.g., an overflow or |
| writing a negative value to an unsigned integer), the result is undefined. |
| |
| |
| Section 2.X.8.Z, RSQ: Reciprocal Square Root |
| |
| The RSQ instruction approximates the reciprocal of the square root of the |
| scalar operand and replicates it to all four components of the result |
| vector. |
| |
| tmp = ScalarLoad(op0); |
| result.x = ApproxRSQRT(tmp); |
| result.y = ApproxRSQRT(tmp); |
| result.z = ApproxRSQRT(tmp); |
| result.w = ApproxRSQRT(tmp); |
| |
| If the operand is less than or equal to zero, the results of the |
| instruction are undefined. |
| |
| RSQ supports only floating-point data type modifiers. |
| |
| Note that this instruction differs from the RSQ instruction in |
| ARB_vertex_program in that it does not implicitly take the absolute value |
| of its operand. The |abs| operator can be used to achieve equivalent |
| semantics. |
| |
| |
| Section 2.X.8.Z, SAD: Sum of Absolute Differences |
| |
| The SAD instruction performs a component-wise difference of the first two |
| integer operands (subtracting the second from the first), and then does a |
| component-wise add of the absolute value of the difference to the third |
| unsigned integer operand to yield an unsigned integer result vector. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| tmp2 = VectorLoad(op2); |
| result.x = abs(tmp0.x - tmp1.x) + tmp2.x; |
| result.y = abs(tmp0.y - tmp1.y) + tmp2.y; |
| result.z = abs(tmp0.z - tmp1.z) + tmp2.z; |
| result.w = abs(tmp0.w - tmp1.w) + tmp2.w; |
| |
| SAD supports signed and unsigned integer data type modifiers. The first |
| two operands are interpreted according to the data type modifier. The |
| third operand and the result are always unsigned integers. |
| |
| |
| Section 2.X.8.Z, SCS: Sine/Cosine without Reduction |
| |
| The SCS instruction approximates the trigonometric sine and cosine of the |
| angle specified by the scalar operand and places the cosine in the x |
| component and the sine in the y component of the result vector. The z and |
| w components of the result vector are undefined. The angle is specified |
| in radians and must be in the range [-PI,PI]. |
| |
| tmp = ScalarLoad(op0); |
| result.x = ApproxCosine(tmp); |
| result.y = ApproxSine(tmp); |
| result.z = undefined; |
| result.w = undefined; |
| |
| If the scalar operand is not in the range [-PI,PI], the result vector is |
| undefined. |
| |
| SCS supports only floating-point data type modifiers. |
| |
| |
| Section 2.X.8.Z, SEQ: Set on Equal |
| |
| The SEQ instruction performs a component-wise comparison of the two |
| operands. Each component of the result vector returns a TRUE value |
| (described below) if the corresponding component of the first operand is |
| equal to that of the second, and a FALSE value otherwise. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| result.x = (tmp0.x == tmp1.x) ? TRUE : FALSE; |
| result.y = (tmp0.y == tmp1.y) ? TRUE : FALSE; |
| result.z = (tmp0.z == tmp1.z) ? TRUE : FALSE; |
| result.w = (tmp0.w == tmp1.w) ? TRUE : FALSE; |
| |
| SEQ supports all data type modifiers. For floating-point data types, the |
| TRUE value is 1.0 and the FALSE value is 0.0. For signed integer data |
| types, the TRUE value is -1 and the FALSE value is 0. For unsigned |
| integer data types, the TRUE value is the maximum integer value (all bits |
| are ones) and the FALSE value is zero. |
| |
| |
| Section 2.X.8.Z, SFL: Set on False |
| |
| The SFL instruction is a degenerate case of the other "Set on" |
| instructions that sets all components of the result vector to a FALSE |
| value (described below). |
| |
| result.x = FALSE; |
| result.y = FALSE; |
| result.z = FALSE; |
| result.w = FALSE; |
| |
| SFL supports all data type modifiers. For floating-point data types, the |
| FALSE value is 0.0. For signed and unsigned integer data types, the FALSE |
| value is zero. |
| |
| |
| Section 2.X.8.Z, SGE: Set on Greater Than or Equal |
| |
| The SGE instruction performs a component-wise comparison of the two |
| operands. Each component of the result vector returns a TRUE value |
| (described below) if the corresponding component of the first operand is |
| greater than or equal to that of the second, and a FALSE value otherwise. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| result.x = (tmp0.x >= tmp1.x) ? TRUE : FALSE; |
| result.y = (tmp0.y >= tmp1.y) ? TRUE : FALSE; |
| result.z = (tmp0.z >= tmp1.z) ? TRUE : FALSE; |
| result.w = (tmp0.w >= tmp1.w) ? TRUE : FALSE; |
| |
| SGE supports all data type modifiers. For floating-point data types, the |
| TRUE value is 1.0 and the FALSE value is 0.0. For signed integer data |
| types, the TRUE value is -1 and the FALSE value is 0. For unsigned |
| integer data types, the TRUE value is the maximum integer value (all bits |
| are ones) and the FALSE value is zero. |
| |
| |
| Section 2.X.8.Z, SGT: Set on Greater Than |
| |
| The SGT instruction performs a component-wise comparison of the two |
| operands. Each component of the result vector returns a TRUE value |
| (described below) if the corresponding component of the first operand is |
| greater than that of the second, and a FALSE value otherwise. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| result.x = (tmp0.x > tmp1.x) ? TRUE : FALSE; |
| result.y = (tmp0.y > tmp1.y) ? TRUE : FALSE; |
| result.z = (tmp0.z > tmp1.z) ? TRUE : FALSE; |
| result.w = (tmp0.w > tmp1.w) ? TRUE : FALSE; |
| |
| SGT supports all data type modifiers. For floating-point data types, the |
| TRUE value is 1.0 and the FALSE value is 0.0. For signed integer data |
| types, the TRUE value is -1 and the FALSE value is 0. For unsigned |
| integer data types, the TRUE value is the maximum integer value (all bits |
| are ones) and the FALSE value is zero. |
| |
| |
| Section 2.X.8.Z, SHL: Shift Left |
| |
| The SHL instruction performs a component-wise left shift of the bits of |
| the first operand by the value of the second scalar operand to produce a |
| result vector. The bits vacated during the shift operation are filled |
| with zeroes. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = ScalarLoad(op1); |
| result.x = tmp0.x << tmp1; |
| result.y = tmp0.y << tmp1; |
| result.z = tmp0.z << tmp1; |
| result.w = tmp0.w << tmp1; |
| |
| The results of a shift operation ("<<") are undefined if the value of the |
| second operand is negative, or greater than or equal to the number of bits |
| in the first operand. |
| |
| SHL supports both signed and unsigned integer data type modifiers. If no |
| modifier is provided, the operands and the result are treated as signed |
| integers. |
| |
| |
| Section 2.X.8.Z, SHR: Shift Right |
| |
| The SHR instruction performs a component-wise right shift of the bits of |
| the first operand by the value of the second scalar operand to produce a |
| result vector. The bits vacated during shift operation are filled with |
| zeros if the operand is non-negative and ones otherwise. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = ScalarLoad(op1); |
| result.x = tmp0.x >> tmp1; |
| result.y = tmp0.y >> tmp1; |
| result.z = tmp0.z >> tmp1; |
| result.w = tmp0.w >> tmp1; |
| |
| The results of a shift operation (">>") are undefined if the value of the |
| second operand is negative, or greater than or equal to the number of bits |
| in the first operand. |
| |
| SHR supports both signed and unsigned integer data type modifiers. If no |
| modifiers are provided, the operands and the result are treated as signed |
| integers. |
| |
| |
| Section 2.X.8.Z, SIN: Sine with Reduction to [-PI,PI] |
| |
| The SIN instruction approximates the trigonometric sine of the angle |
| specified by the scalar operand and replicates it to all four components |
| of the result vector. The angle is specified in radians and does not have |
| to be in the range [-PI,PI]. |
| |
| tmp = ScalarLoad(op0); |
| result.x = ApproxSine(tmp); |
| result.y = ApproxSine(tmp); |
| result.z = ApproxSine(tmp); |
| result.w = ApproxSine(tmp); |
| |
| SIN supports only floating-point data type modifiers. |
| |
| |
| Section 2.X.8.Z, SLE: Set on Less Than or Equal |
| |
| The SLE instruction performs a component-wise comparison of the two |
| operands. Each component of the result vector returns a TRUE value |
| (described below) if the corresponding component of the first operand is |
| less than or equal to that of the second, and a FALSE value otherwise. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| result.x = (tmp0.x <= tmp1.x) ? TRUE : FALSE; |
| result.y = (tmp0.y <= tmp1.y) ? TRUE : FALSE; |
| result.z = (tmp0.z <= tmp1.z) ? TRUE : FALSE; |
| result.w = (tmp0.w <= tmp1.w) ? TRUE : FALSE; |
| |
| SLE supports all data type modifiers. For floating-point data types, the |
| TRUE value is 1.0 and the FALSE value is 0.0. For signed integer data |
| types, the TRUE value is -1 and the FALSE value is 0. For unsigned |
| integer data types, the TRUE value is the maximum integer value (all bits |
| are ones) and the FALSE value is zero. |
| |
| |
| Section 2.X.8.Z, SLT: Set on Less Than |
| |
| The SLT instruction performs a component-wise comparison of the two |
| operands. Each component of the result vector returns a TRUE value |
| (described below) if the corresponding component of the first operand is |
| less than that of the second, and a FALSE value otherwise. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| result.x = (tmp0.x < tmp1.x) ? TRUE : FALSE; |
| result.y = (tmp0.y < tmp1.y) ? TRUE : FALSE; |
| result.z = (tmp0.z < tmp1.z) ? TRUE : FALSE; |
| result.w = (tmp0.w < tmp1.w) ? TRUE : FALSE; |
| |
| SLT supports all data type modifiers. For floating-point data types, the |
| TRUE value is 1.0 and the FALSE value is 0.0. For signed integer data |
| types, the TRUE value is -1 and the FALSE value is 0. For unsigned |
| integer data types, the TRUE value is the maximum integer value (all bits |
| are ones) and the FALSE value is zero. |
| |
| |
| Section 2.X.8.Z, SNE: Set on Not Equal |
| |
| The SNE instruction performs a component-wise comparison of the two |
| operands. Each component of the result vector returns a TRUE value |
| (described below) if the corresponding component of the first operand is |
| less than that of the second, and a FALSE value otherwise. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| result.x = (tmp0.x != tmp1.x) ? TRUE : FALSE; |
| result.y = (tmp0.y != tmp1.y) ? TRUE : FALSE; |
| result.z = (tmp0.z != tmp1.z) ? TRUE : FALSE; |
| result.w = (tmp0.w != tmp1.w) ? TRUE : FALSE; |
| |
| SNE supports all data type modifiers. For floating-point data types, the |
| TRUE value is 1.0 and the FALSE value is 0.0. For signed integer data |
| types, the TRUE value is -1 and the FALSE value is 0. For unsigned |
| integer data types, the TRUE value is the maximum integer value (all bits |
| are ones) and the FALSE value is zero. |
| |
| |
| Section 2.X.8.Z, SSG: Set Sign |
| |
| The SSG instruction generates a result vector containing the signs of |
| each component of the single vector operand. Each component of the |
| result vector is 1.0 if the corresponding component of the operand |
| is greater than zero, 0.0 if the corresponding component of the |
| operand is equal to zero, and -1.0 if the corresponding component |
| of the operand is less than zero. |
| |
| tmp = VectorLoad(op0); |
| result.x = SetSign(tmp.x); |
| result.y = SetSign(tmp.y); |
| result.z = SetSign(tmp.z); |
| result.w = SetSign(tmp.w); |
| |
| SSG supports only floating-point data type modifiers. |
| |
| |
| Section 2.X.8.Z, STR: Set on True |
| |
| The STR instruction is a degenerate case of the other "Set on" |
| instructions that sets all components of the result vector to a TRUE value |
| (described below). |
| |
| result.x = TRUE; |
| result.y = TRUE; |
| result.z = TRUE; |
| result.w = TRUE; |
| |
| STR supports all data type modifiers. For floating-point data types, the |
| TRUE value is 1.0. For signed integer data types, the TRUE value is -1. |
| For unsigned integer data types, the TRUE value is the maximum integer |
| value (all bits are ones). |
| |
| |
| Section 2.X.8.Z, SUB: Subtract |
| |
| The SUB instruction performs a component-wise subtraction of the second |
| operand from the first to yield a result vector. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| result.x = tmp0.x - tmp1.x; |
| result.y = tmp0.y - tmp1.y; |
| result.z = tmp0.z - tmp1.z; |
| result.w = tmp0.w - tmp1.w; |
| |
| SUB supports all three data type modifiers. |
| |
| |
| Section 2.X.8.Z, SWZ: Extended Swizzle |
| |
| The SWZ instruction loads the single vector operand, and performs a |
| swizzle operation more powerful than that provided for loading normal |
| vector operands to yield an instruction vector. |
| |
| After the operand is loaded, the "x", "y", "z", and "w" components of the |
| result vector are selected by the first, second, third, and fourth matches |
| of the <extSwizComp> pattern in the <extendedSwizzle> rule. |
| |
| A result component can be selected from any of the four components of the |
| operand or the constants 0.0 and 1.0. The result component can also be |
| optionally negated. The following pseudocode describes the component |
| selection method. "operand" refers to the vector operand, "select" is an |
| enumerant where the values ZERO, ONE, X, Y, Z, and W correspond to the |
| <extSwizSel> rule matching "0", "1", "x", "y", "z", and "w", respectively. |
| "negate" is TRUE if and only if the <optionalSign> rule in <extSwizComp> |
| matches "-". |
| |
| float ExtSwizComponent(floatVec operand, enum select, boolean negate) |
| { |
| float result; |
| switch (select) { |
| case ZERO: result = 0.0; break; |
| case ONE: result = 1.0; break; |
| case X: result = operand.x; break; |
| case Y: result = operand.y; break; |
| case Z: result = operand.z; break; |
| case W: result = operand.w; break; |
| } |
| if (negate) { |
| result = -result; |
| } |
| return result; |
| } |
| |
| The entire extended swizzle operation is then defined using the following |
| pseudocode: |
| |
| tmp = VectorLoad(op0); |
| result.x = ExtSwizComponent(tmp, xSelect, xNegate); |
| result.y = ExtSwizComponent(tmp, ySelect, yNegate); |
| result.z = ExtSwizComponent(tmp, zSelect, zNegate); |
| result.w = ExtSwizComponent(tmp, wSelect, wNegate); |
| |
| "xSelect", "xNegate", "ySelect", "yNegate", "zSelect", "zNegate", |
| "wSelect", and "wNegate" correspond to the "select" and "negate" values |
| above for the four <extSwizComp> matches. |
| |
| Since this instruction allows for component selection and negation for |
| each individual component, the grammar does not allow the use of the |
| normal swizzle and negation operations allowed for vector operands in |
| other instructions. |
| |
| SWZ supports only floating-point data type modifiers. |
| |
| |
| Section 2.X.8.Z, TEX: Texture Sample |
| |
| The TEX instruction takes the four components of a single floating-point |
| source vector and performs a filtered texture access as described in |
| Section 2.X.4.4. The returned (R,G,B,A) value is written to the |
| floating-point result vector. Partial derivatives and the level of detail |
| are computed automatically. |
| |
| tmp = VectorLoad(op0); |
| ddx = ComputePartialsX(tmp); |
| ddy = ComputePartialsY(tmp); |
| lambda = ComputeLOD(ddx, ddy); |
| result = TextureSample(tmp, lambda, ddx, ddy, texelOffset); |
| |
| TEX supports all three data type modifiers. The single operand is always |
| treated as a floating-point vector; the results are interpreted according |
| to the data type modifier. |
| |
| |
| Section 2.X.8.Z, TRUNC: Truncate (Round Toward Zero) |
| |
| The TRUNC instruction loads a single vector operand and performs a |
| component-wise truncate operation to generate a result vector. |
| |
| tmp = VectorLoad(op0); |
| result.x = trunc(tmp.x); |
| result.y = trunc(tmp.y); |
| result.z = trunc(tmp.z); |
| result.w = trunc(tmp.w); |
| |
| The truncate operation returns the nearest integer to zero smaller in |
| magnitude than the operand. For example trunc(-1.7) = -1.0, trunc(+1.0) = |
| +1.0, and trunc(+3.7) = +3.0. |
| |
| TRUNC supports all three data type modifiers. The single operand is |
| always treated as a floating-point value, but the result is written as a |
| floating-point value, a signed integer, or an unsigned integer, as |
| specified by the data type modifier. If a value is not exactly |
| representable using the data type of the result (e.g., an overflow or |
| writing a negative value to an unsigned integer), the result is undefined. |
| |
| |
| Section 2.X.8.Z, TXB: Texture Sample with Bias |
| |
| The TXB instruction takes the four components of a single floating-point |
| source vector and performs a filtered texture access as described in |
| Section 2.X.4.4. The returned (R,G,B,A) value is written to the |
| floating-point result vector. Partial derivatives and the level of detail |
| are computed automatically, but the fourth component of the source vector |
| is added to the computed LOD prior to sampling. |
| |
| tmp = VectorLoad(op0); |
| ddx = ComputePartialsX(tmp); |
| ddy = ComputePartialsY(tmp); |
| lambda = ComputeLOD(ddx, ddy); |
| result = TextureSample(tmp, lambda + tmp.w, ddx, ddy, texelOffset); |
| |
| The single source vector in the TXB instruction does not have enough |
| coordinates to specify a lookup into a two-dimensional array texture or |
| cube map texture with both an LOD bias and an explicit reference value for |
| depth comparison. A program will fail to load if it contains a TXB |
| instruction with a target of SHADOWCUBE or SHADOWARRAY2D. |
| |
| TXB supports all three data type modifiers. The single operand is always |
| treated as a floating-point vector; the results are interpreted according |
| to the data type modifier. |
| |
| |
| Section 2.X.8.Z, TXD: Texture Sample with Partials |
| |
| The TXD instruction takes the four components of the first floating-point |
| source vector and performs a filtered texture access as described in |
| Section 2.X.4.4. The returned (R,G,B,A) value is written to the |
| floating-point result vector. The partial derivatives of the texture |
| coordinates with respect to X and Y are specified by the second and third |
| floating-point source vectors. The level of detail is computed |
| automatically using the provided partial derivatives. |
| |
| Note that for cube map texture targets, the provided partial derivatives |
| are in the coordinate system used before texture coordinates are projected |
| onto the appropriate cube face. The partial derivatives of the |
| post-projection texture coordinates, which are used for level-of-detail |
| and anisotropic filtering calculations, are derived from the original |
| coordinates and partial derivatives in an implementation-dependent manner. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| tmp2 = VectorLoad(op2); |
| lambda = ComputeLOD(tmp1, tmp2); |
| result = TextureSample(tmp0, lambda, tmp1, tmp2, texelOffset); |
| |
| TXD supports all three data type modifiers. All three operands are always |
| treated as floating-point vectors; the results are interpreted according |
| to the data type modifier. |
| |
| |
| Section 2.X.8.Z, TXF: Texel Fetch |
| |
| The TXF instruction takes the four components of a single signed integer |
| source vector and performs a single texel fetch as described in Section |
| 2.X.4.4. The first three components provide the <i>, <j>, and <k> values |
| for the texel fetch, and the fourth component is used to determine the LOD |
| to access. The returned (R,G,B,A) value is written to the floating-point |
| result vector. Partial derivatives are irrelevant for single texel |
| fetches. |
| |
| tmp = VectorLoad(op0); |
| result = TexelFetch(tmp, texelOffset); |
| |
| TXF supports all three data type modifiers. The single vector operand is |
| treated as a signed integer vector; the results are interpreted according |
| to the data type modifier. |
| |
| |
| Section 2.X.8.Z, TXL: Texture Sample with LOD |
| |
| The TXL instruction takes the four components of a single floating-point |
| source vector and performs a filtered texture access as described in |
| Section 2.X.4.4. The returned (R,G,B,A) value is written to the |
| floating-point result vector. The level of detail is taken from the |
| fourth component of the source vector. |
| |
| Partial derivatives are not computed by the TXL instruction and |
| anisotropic filtering is not performed. |
| |
| tmp = VectorLoad(op0); |
| ddx = (0,0,0); |
| ddy = (0,0,0); |
| result = TextureSample(tmp, tmp.w, ddx, ddy, texelOffset); |
| |
| The single source vector in the TXL instruction does not have enough |
| coordinates to specify a lookup into a 2D array or cube map texture with |
| both an explicit LOD and a reference value for depth comparison. A |
| program will fail to load if it contains a TXL instruction with a target |
| of SHADOWCUBE or SHADOWARRAY2D. |
| |
| TXL supports all three data type modifiers. The single vector operand is |
| treated as a floating-point vector; the results are interpreted according |
| to the data type modifier. |
| |
| |
| Section 2.X.8.Z, TXP: Texture Sample with Projection |
| |
| The TXP instruction divides the first three components of its single |
| floating-point source vector by its fourth component, maps the results to |
| s, t, and r, and performs a filtered texture access as described in |
| Section 2.X.4.4. The returned (R,G,B,A) value is written to the |
| floating-point result vector. Partial derivatives and the level of detail |
| are computed automatically. |
| |
| tmp0 = VectorLoad(op0); |
| tmp0.x = tmp0.x / tmp0.w; |
| tmp0.y = tmp0.y / tmp0.w; |
| tmp0.z = tmp0.z / tmp0.w; |
| ddx = ComputePartialsX(tmp); |
| ddy = ComputePartialsY(tmp); |
| lambda = ComputeLOD(ddx, ddy); |
| result = TextureSample(tmp, lambda, ddx, ddy, texelOffset); |
| |
| The single source vector in the TXP instruction does not have enough |
| coordinates to specify a lookup into a 2D array or cube map texture with |
| both a Q coordinate and an explicit reference value for depth comparison. |
| A program will fail to load if it contains a TXP instruction with a target |
| of SHADOWCUBE or SHADOWARRAY2D. |
| |
| TXP supports all three data type modifiers. The single vector operand is |
| treated as a floating-point vector; the results are interpreted according |
| to the data type modifier. |
| |
| |
| Section 2.X.8.Z, TXQ: Texture Size Query |
| |
| The TXQ instruction takes the first component of the single integer vector |
| operand, adds the number of the base level of the specified texture to |
| determine a texture image level, and returns an integer result vector |
| containing the size of the image at that level of the texture. |
| |
| For one-dimensional and one-dimensional array textures, the "x" component |
| of the result vector is filled with the width of the image(s). For |
| two-dimensional, rectangle, cube map, and two-dimensional array textures, |
| the "x" and "y" components are filled with the width and height of the |
| image(s). For three-dimensional textures, the "x", "y", and "z" |
| components are filled with the width, height, and depth of the image. |
| Additionally, the number of layers in an array texture is returned in the |
| "y" component of the result for one-dimensional array textures or the "z" |
| component for two-dimensional array textures. All other components of the |
| result vector is undefined. For the purposes of this instruction, the |
| width, height, and depth of a texture do NOT include any border. |
| |
| tmp0 = VectorLoad(op0); |
| tmp0.x = tmp0.x + texture[op1].target[op2].base_level; |
| result.x = texture[op1].target[op2].level[tmp0.x].width; |
| result.y = texture[op1].target[op2].level[tmp0.x].height; |
| result.z = texture[op1].target[op2].level[tmp0.x].depth; |
| |
| If the level computed by adding the operand to the base level of the |
| texture is less than the base level number or greater than the maximum |
| level number, the results are undefined. |
| |
| TXQ supports no data type modifiers; the scalar operand and the result |
| vector are both interpreted as signed integers. |
| |
| |
| Section 2.X.8.Z, UP2H: Unpack Two 16-bit Floats |
| |
| The UP2H instruction unpacks two 16-bit floats stored together in a 32-bit |
| scalar operand. The first 16-bit float (stored in the 16 least |
| significant bits) is written into the "x" and "z" components of the result |
| vector; the second is written into the "y" and "w" components of the |
| result vector. |
| |
| This operation undoes the type conversion and packing performed by |
| the PK2H instruction. |
| |
| tmp = ScalarLoad(op0); |
| result.x = (fp16) (RawBits(tmp) & 0xFFFF); |
| result.y = (fp16) ((RawBits(tmp) >> 16) & 0xFFFF); |
| result.z = (fp16) (RawBits(tmp) & 0xFFFF); |
| result.w = (fp16) ((RawBits(tmp) >> 16) & 0xFFFF); |
| |
| UP2H supports all three data type modifiers. The single operand is read |
| as a floating-point value, a signed integer, or an unsigned integer, as |
| specified by the data type modifier; the 32 least significant bits of the |
| encoding are used for unpacking. For floating-point operand variables, it |
| is expected (but not required) that the operand was produced by a previous |
| pack instruction. The result is always written as a floating-point |
| vector. |
| |
| A program will fail to load if it contains a UP2H instruction whose |
| operand is a variable declared as "SHORT". |
| |
| |
| Section 2.X.8.Z, UP2US: Unpack Two Unsigned 16-bit Integers |
| |
| The UP2US instruction unpacks two 16-bit unsigned values packed |
| together in a 32-bit scalar operand. The unsigned quantities are |
| encoded where a bit pattern of all '0' bits corresponds to 0.0 and |
| a pattern of all '1' bits corresponds to 1.0. The "x" and "z" |
| components of the result vector are obtained from the 16 least |
| significant bits of the operand; the "y" and "w" components are |
| obtained from the 16 most significant bits. |
| |
| This operation undoes the type conversion and packing performed by |
| the PK2US instruction. |
| |
| tmp = ScalarLoad(op0); |
| result.x = ((RawBits(tmp) >> 0) & 0xFFFF) / 65535.0; |
| result.y = ((RawBits(tmp) >> 16) & 0xFFFF) / 65535.0; |
| result.z = ((RawBits(tmp) >> 0) & 0xFFFF) / 65535.0; |
| result.w = ((RawBits(tmp) >> 16) & 0xFFFF) / 65535.0; |
| |
| UP2US supports all three data type modifiers. The single operand is read |
| as a floating-point value, a signed integer, or an unsigned integer, as |
| specified by the data type modifier; the 32 least significant bits of the |
| encoding are used for unpacking. For floating-point operand variables, it |
| is expected (but not required) that the operand was produced by a previous |
| pack instruction. The result is always written as a floating-point |
| vector. |
| |
| A GPU program will fail to load if it contains a UP2S instruction |
| whose operand is a variable declared as "SHORT". |
| |
| |
| Section 2.X.8.Z, UP4B: Unpack Four Signed 8-bit Integers |
| |
| The UP4B instruction unpacks four 8-bit signed values packed together |
| in a 32-bit scalar operand. The signed quantities are encoded where |
| a bit pattern of all '0' bits corresponds to -128/127 and a pattern |
| of all '1' bits corresponds to +127/127. The "x" component of the |
| result vector is the converted value corresponding to the 8 least |
| significant bits of the operand; the "w" component corresponds to |
| the 8 most significant bits. |
| |
| This operation undoes the type conversion and packing performed by |
| the PK4B instruction. |
| |
| tmp = ScalarLoad(op0); |
| result.x = (((RawBits(tmp) >> 0) & 0xFF) - 128) / 127.0; |
| result.y = (((RawBits(tmp) >> 8) & 0xFF) - 128) / 127.0; |
| result.z = (((RawBits(tmp) >> 16) & 0xFF) - 128) / 127.0; |
| result.w = (((RawBits(tmp) >> 24) & 0xFF) - 128) / 127.0; |
| |
| UP2B supports all three data type modifiers. The single operand is read |
| as a floating-point value, a signed integer, or an unsigned integer, as |
| specified by the data type modifier; the 32 least significant bits of the |
| encoding are used for unpacking. For floating-point operand variables, it |
| is expected (but not required) that the operand was produced by a previous |
| pack instruction. The result is always written as a floating-point |
| vector. |
| |
| A program will fail to load if it contains a UP4B instruction whose |
| operand is a variable declared as "SHORT". |
| |
| |
| Section 2.X.8.Z, UP4UB: Unpack Four Unsigned 8-bit Integers |
| |
| The UP4UB instruction unpacks four 8-bit unsigned values packed |
| together in a 32-bit scalar operand. The unsigned quantities are |
| encoded where a bit pattern of all '0' bits corresponds to 0.0 and a |
| pattern of all '1' bits corresponds to 1.0. The "x" component of the |
| result vector is obtained from the 8 least significant bits of the |
| operand; the "w" component is obtained from the 8 most significant |
| bits. |
| |
| This operation undoes the type conversion and packing performed by |
| the PK4UB instruction. |
| |
| tmp = ScalarLoad(op0); |
| result.x = ((RawBits(tmp) >> 0) & 0xFF) / 255.0; |
| result.y = ((RawBits(tmp) >> 8) & 0xFF) / 255.0; |
| result.z = ((RawBits(tmp) >> 16) & 0xFF) / 255.0; |
| result.w = ((RawBits(tmp) >> 24) & 0xFF) / 255.0; |
| |
| UP4UB supports all three data type modifiers. The single operand is read |
| as a floating-point value, a signed integer, or an unsigned integer, as |
| specified by the data type modifier; the 32 least significant bits of the |
| encoding are used for unpacking. For floating-point operand variables, it |
| is expected (but not required) that the operand was produced by a previous |
| pack instruction. The result is always written as a floating-point |
| vector. |
| |
| A program will fail to load if it contains a UP4UB instruction whose |
| operand is a variable declared as "SHORT". |
| |
| |
| Section 2.X.8.Z, X2D: 2D Coordinate Transformation |
| |
| The X2D instruction multiplies the 2D offset vector specified by the |
| "x" and "y" components of the second vector operand by the 2x2 matrix |
| specified by the four components of the third vector operand, and adds |
| the transformed offset vector to the 2D vector specified by the "x" |
| and "y" components of the first vector operand. The first component |
| of the sum is written to the "x" and "z" components of the result; |
| the second component is written to the "y" and "w" components of |
| the result. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| tmp2 = VectorLoad(op2); |
| result.x = tmp0.x + tmp1.x * tmp2.x + tmp1.y * tmp2.y; |
| result.y = tmp0.y + tmp1.x * tmp2.z + tmp1.y * tmp2.w; |
| result.z = tmp0.x + tmp1.x * tmp2.x + tmp1.y * tmp2.y; |
| result.w = tmp0.y + tmp1.x * tmp2.z + tmp1.y * tmp2.w; |
| |
| X2D supports only floating-point data type modifiers. |
| |
| |
| Section 2.X.8.Z, XOR: Exclusive Or |
| |
| The XOR instruction performs a bitwise XOR operation on the components of |
| the two source vectors to yield a result vector. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| result.x = tmp0.x ^ tmp1.x; |
| result.y = tmp0.y ^ tmp1.y; |
| result.z = tmp0.z ^ tmp1.z; |
| result.w = tmp0.w ^ tmp1.w; |
| |
| XOR supports only integer data type modifiers. If no type modifier is |
| specified, both operands and the result are treated as signed integers. |
| |
| |
| Section 2.X.8.Z, XPD: Cross Product |
| |
| The XPD instruction computes the cross product using the first three |
| components of its two vector operands to generate the x, y, and z |
| components of the result vector. The w component of the result vector is |
| undefined. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| result.x = tmp0.y * tmp1.z - tmp0.z * tmp1.y; |
| result.y = tmp0.z * tmp1.x - tmp0.x * tmp1.z; |
| result.z = tmp0.x * tmp1.y - tmp0.y * tmp1.x; |
| result.w = undefined; |
| |
| XPD supports only floating-point data type modifiers. |
| |
| |
| Additions to Chapter 3 of the OpenGL 1.5 Specification (Rasterization) |
| |
| Modify Section 3.8.1, Texture Image Specification, p. 150 |
| |
| (modify 4th paragraph, p. 151 -- add cubemaps to the list of texture |
| targets that can be used with DEPTH_COMPONENT textures) Textures with a |
| base internal format of DEPTH_COMPONENT are supported by texture image |
| specification commands only if <target> is TEXTURE_1D, TEXTURE_2D, |
| TEXTURE_CUBE_MAP, TEXTURE_RECTANGLE_ARB, TEXTURE_1D_ARRAY_EXT, |
| TEXTURE_2D_ARRAY_EXT, PROXY_TEXTURE_1D PROXY_TEXTURE_2D, |
| PROXY_TEXTURE_CUBE_MAP, PROXY_TEXTURE_RECTANGLE_ARB, |
| PROXY_TEXTURE_1D_ARRAY_EXT, or PROXY_TEXTURE_2D_ARRAY_EXT. Using this |
| format in conjunction with any other target will result in an |
| INVALID_OPERATION error. |
| |
| |
| Delete Section 3.8.7, Texture Wrap Modes. (The language in this section |
| is folded into updates to the following section, and is no longer needed |
| here.) |
| |
| |
| Modify Section 3.8.8, Texture Minification: |
| |
| (replace the last paragraph, p. 171): Let s(x,y) be the function that |
| associates an s texture coordinate with each set of window coordinates |
| (x,y) that lie within a primitive; define t(x,y) and r(x,y) analogously. |
| Let |
| |
| u(x,y) = w_t * s(x,y) + offsetu_shader, |
| v(x,y) = h_t * t(x,y) + offsetv_shader, |
| w(x,y) = d_t * r(x,y) + offsetw_shader, and |
| |
| where w_t, h_t, and d_t are as defined by equations 3.15, 3.16, and 3.17 |
| with w_s, h_s, and d_s equal to the width, height, and depth of the image |
| array whose level is level_base. (offsetu_shader, offsetv_shader, |
| offsetw_shader) is the texel offset specified in the vertex, geometry, or |
| fragment program instruction used to perform the access. For |
| fixed-function texture accesses, all three shader offsets are taken to be |
| zero. For a one-dimensional texture, define v(x,y) == 0 and w(x,y) === 0; |
| for two-dimensional textures, define w(x,y) == 0. |
| |
| After u(x,y), v(x,y), and w(x,y) are generated, they are clamped if the |
| corresponding texture wrap modes are CLAMP or MIRROR_CLAMP_EXT. Let |
| |
| u'(x,y) = clamp(u(x,y), 0, w_t), if TEXTURE_WRAP_S is CLAMP |
| clamp(u(x,y), -w_t, w_t), if TEXTURE_WRAP_S is |
| MIRROR_CLAMP_EXT, or |
| u(x,y), otherwise |
| v'(x,y) = clamp(v(x,y), 0, w_t), if TEXTURE_WRAP_T is CLAMP |
| clamp(v(x,y), -w_t, w_t), if TEXTURE_WRAP_T is |
| MIRROR_CLAMP_EXT, or |
| v(x,y), otherwise |
| w'(x,y) = clamp(w(x,y), 0, w_t), if TEXTURE_WRAP_R is CLAMP |
| clamp(w(x,y), -w_t, w_t), if TEXTURE_WRAP_R is |
| MIRROR_CLAMP_EXT, or |
| w(x,y), otherwise, |
| |
| where clamp(<a>,<b>,<c>) returns <b> if <a> is less than <b>, <c> if a is |
| greater than <c>, and <a> otherwise. |
| |
| (start a new paragraph with "For a polygon, rho is given at a fragment |
| with window coordinates...", and then continue with the original spec |
| text.) |
| |
| (replace text starting with the last paragraph on p. 172, continuing to |
| the end of p. 174) |
| |
| When lambda indicates minification, the value assigned to |
| TEXTURE_MIN_FILTER is used to determine how the texture value for a |
| fragment is selected. |
| |
| When TEXTURE_MIN_FILTER is NEAREST, the texel in the image array of level |
| level_base that is nearest (in Manhattan distance) to that specified by |
| (s,t,r) is obtained. Let i, j, and k be integers such that: |
| |
| i = apply_wrap(floor(u'(x,y))), |
| j = apply_wrap(floor(v'(x,y))), and |
| k = apply_wrap(floor(w'(x,y))), |
| |
| where the coordinate returned by apply_wrap() is as defined by Table X.19. |
| The values of i, j, and k are then modified according to the texture wrap |
| modes, as described in Table 3.19, to produce new values (i', j', and k'). |
| For a three-dimensional texture, the texel at location (i,j,k) becomes the |
| texture value. For a two-dimensional texture, k is irrelevant, and the |
| texel at location (i,j) becomes the texture value. For a one-dimensional |
| texture, j and k are irrelevant, and the texel at location i becomes the |
| texture value. |
| |
| Wrap mode Result |
| -------------------------- ------------------------------------------ |
| CLAMP_TO_EDGE clamp(coord, 0, size-1) |
| CLAMP_TO_BORDER clamp(coord, -1, size) |
| CLAMP { clamp(coord, 0, size-1), |
| { for NEAREST filtering |
| { clamp(coord, -1, size), |
| { for LINEAR filtering |
| REPEAT mod(coord, size) |
| MIRROR_CLAMP_TO_EDGE_EXT clamp(mirror(coord), 0, size-1) |
| MIRROR_CLAMP_TO_BORDER_EXT clamp(mirror(size), 0, size) |
| MIRROR_CLAMP_EXT { clamp(mirror(coord), 0, size-1), |
| { for NEAREST filtering |
| { clamp(mirror(size), 0, size), |
| { for LINEAR filtering |
| MIRRORED_REPEAT (size-1) - mirror(mod(coord, 2*size)-size) |
| |
| Table X.19: Texel location wrap mode application. mod(<a>,<b>) is |
| defined to return <a>-<b>*floor(<a>/<b>), and mirror(<a>) is defined to |
| return <a> if <a> is greater than or equal to zero or -(1+<a>) |
| otherwise. The values of "wrap mode" and size are TEXTURE_WRAP_S and |
| w_t, TEXTURE_WRAP_T and h_t, and TEXTURE_WRAP_R and d_t, for i, j, and k |
| coordinates, respectively. The coordinate clamp and MIRROR_CLAMP_EXT |
| depends on the filtering mode (NEAREST or LINEAR). |
| |
| If the selected (i,j,k), (i,j), or i location refers to a border texel |
| that satisfies any of the following conditions: |
| |
| i < -b_s, |
| j < -b_s, |
| k < -b_s, |
| i >= w_t + b_s, |
| j >= h_t + b_s, or |
| j >= d_t + b_s, |
| |
| then the border values defined by TEXTURE_BORDER_COLOR are used in place |
| of the non-existent texel. If the texture contains color components, the |
| values of TEXTURE_BORDER_COLOR are interpreted as an RGBA color to match |
| the texture's internal format in a manner consistent with table 3.15. If |
| the texture contains depth components, the first component of |
| TEXTURE_BORDER_COLOR is interpreted as a depth value. |
| |
| When TEXTURE_MIN_FILTER is LINEAR, a 2x2x2 cube of texels in the image |
| array of level level_base is selected. Let: |
| |
| i_0 = apply_wrap(floor(u' - 0.5)), |
| j_0 = apply_wrap(floor(v' - 0.5)), |
| k_0 = apply_wrap(floor(w' - 0.5)), |
| i_1 = apply_wrap(floor(u' - 0.5) + 1), |
| j_1 = apply_wrap(floor(v' - 0.5) + 1), |
| k_1 = apply_wrap(floor(w' - 0.5) + 1), |
| alpha = frac(u' - 0.5), |
| beta = frac(v' - 0.5), |
| gamma = frac(w' - 0.5), |
| |
| where frac(<x>) denotes the fractional part of <x>. |
| |
| For a three-dimensional texture, the texture value tau is found as... |
| |
| (replace last paragraph, p.174) For any texel in the equation above that |
| refers to a border texel outside the defined range of the image, the texel |
| value is taken from the texture border color as with NEAREST filtering. |
| |
| |
| Modify Section 3.8.14, Texture Comparison Modes (p. 185) |
| |
| (modify 2nd paragraph, p. 188, indicating that the Q texture coordinate is |
| used for depth comparisons on cubemap textures) |
| |
| Let D_t be the depth texture value, in the range [0, 1]. For |
| fixed-function texture lookups, let R be the interpolated <r> texture |
| coordinate, clamped to the range [0, 1]. For texture lookups generated by |
| a program instruction, let R be the reference value for depth comparisons |
| provided in the instruction, also clamped to [0, 1]. Then the effective |
| texture value L_t, I_t, or A_t is computed as follows: |
| |
| |
| Additions to Chapter 4 of the OpenGL 1.5 Specification (Per-Fragment |
| Operations and the Frame Buffer) |
| |
| None. |
| |
| |
| Additions to Chapter 5 of the OpenGL 1.5 Specification (Special Functions) |
| |
| None. |
| |
| |
| Additions to Chapter 6 of the OpenGL 1.5 Specification (State and |
| State Requests) |
| |
| Modify Section 6.1.12 of the ARB_vertex_program specification. |
| |
| (Add new integer program parameter queries, plus language that program |
| environment or local parameter query results are undefined if the query |
| specifies a data type incompatible with the data type of the parameter |
| being queried.) |
| |
| The commands |
| |
| void GetProgramEnvParameterdvARB(enum target, uint index, |
| double *params); |
| void GetProgramEnvParameterfvARB(enum target, uint index, |
| float *params); |
| void GetProgramEnvParameterIivNV(enum target, uint index, |
| int *params); |
| void GetProgramEnvParameterIuivNV(enum target, uint index, |
| uint *params); |
| |
| obtain the current value for the program environment parameter numbered |
| <index> for the given program target <target>, and places the information |
| in the array <params>. The values returned are undefined if the data type |
| of the components of the parameter is not compatible with the data type of |
| <params>. Floating-point components are compatible with "double" or |
| "float"; signed and unsigned integer components are compatible with "int" |
| and "uint", respectively. The error INVALID_ENUM is generated if <target> |
| specifies a nonexistent program target or a program target that does not |
| support program environment parameters. The error INVALID_VALUE is |
| generated if <index> is greater than or equal to the |
| implementation-dependent number of supported program environment |
| parameters for the program target. |
| |
| ... |
| |
| The commands |
| |
| void GetProgramLocalParameterdvARB(enum target, uint index, |
| double *params); |
| void GetProgramLocalParameterfvARB(enum target, uint index, |
| float *params); |
| void GetProgramLocalParameterIivNV(enum target, uint index, |
| int *params); |
| void GetProgramLocalParameterIuivNV(enum target, uint index, |
| uint *params); |
| |
| obtain the current value for the program local parameter numbered <index> |
| belonging to the program object currently bound to <target>, and places |
| the information in the array <params>. The values returned are undefined |
| if the data type of the components of the parameter is not compatible with |
| the data type of <params>. Floating-point components are compatible with |
| "double' or "float"; signed and unsigned integer components are compatible |
| with "int" and "uint", respectively. The error INVALID_ENUM is generated |
| if <target> specifies a nonexistent program target or a program target |
| that does not support program local parameters. The error INVALID_VALUE |
| is generated if <index> is greater than or equal to the |
| implementation-dependent number of supported program local parameters for |
| the program target. |
| |
| ... |
| |
| The command |
| |
| void GetProgramivARB(enum target, enum pname, int *params); |
| |
| obtains program state for the program target <target>, writing ... |
| |
| (add new paragraphs describing the new supported queries) |
| |
| If <pname> is PROGRAM_ATTRIB_COMPONENTS_NV or |
| PROGRAM_RESULT_COMPONENTS_NV, GetProgramivARB returns a single integer |
| holding the number of active attribute or result variable components, |
| respectively, used by the program object currently bound to <target>. |
| |
| If <pname> is MAX_PROGRAM_ATTRIB_COMPONENTS or |
| MAX_PROGRAM_RESULT_COMPONENTS_NV, GetProgramivARB returns a single integer |
| holding the maximum number of active attribute or result variable |
| components, respectively, supported for programs of type <target>. |
| |
| |
| Additions to Appendix A of the OpenGL 1.5 Specification (Invariance) |
| |
| None. |
| |
| |
| Additions to the AGL/GLX/WGL Specifications |
| |
| None. |
| |
| |
| GLX Protocol |
| |
| The following new rendering commands are sent to the server as part |
| of a glXRender request. |
| |
| ProgramLocalParameterI4ivNV |
| |
| 2 28 rendering command length |
| 2 4303 rendering command opcode |
| 4 ENUM target |
| 4 CARD32 index |
| 4 INT32 params[0] |
| 4 INT32 params[1] |
| 4 INT32 params[2] |
| 4 INT32 params[3] |
| |
| ProgramLocalParameterI4uivNV |
| |
| 2 28 rendering command length |
| 2 4305 rendering command opcode |
| 4 ENUM target |
| 4 CARD32 index |
| 4 CARD32 params[0] |
| 4 CARD32 params[1] |
| 4 CARD32 params[2] |
| 4 CARD32 params[3] |
| |
| ProgramEnvParameterI4ivNV |
| |
| 2 28 rendering command length |
| 2 4307 rendering command opcode |
| 4 ENUM target |
| 4 CARD32 index |
| 4 INT32 params[0] |
| 4 INT32 params[1] |
| 4 INT32 params[2] |
| 4 INT32 params[3] |
| |
| ProgramEnvParameterI4uivNV |
| |
| 2 28 rendering command length |
| 2 4309 rendering command opcode |
| 4 ENUM target |
| 4 CARD32 index |
| 4 CARD32 params[0] |
| 4 CARD32 params[1] |
| 4 CARD32 params[2] |
| 4 CARD32 params[3] |
| |
| Following new rendering commands are added. These can be sent as a |
| glXRender or glXRenderLarge request. |
| |
| ProgramLocalParametersI4ivNV |
| |
| 2 16+count*4*4 rendering command length |
| 2 4304 rendering command opcode |
| 4 ENUM target |
| 4 CARD32 index |
| 4 CARD32 count |
| 4*count*4 LISTofINT32 params |
| |
| If the command is encoded in a glXRenderLarge request, the |
| command opcode and command length fields above are expanded to |
| 4 bytes each: |
| |
| 4 20+count*4*4 rendering command length |
| 4 4304 rendering command opcode |
| |
| ProgramLocalParametersI4uivNV |
| |
| 2 16+count*4*4 rendering command length |
| 2 4306 rendering command opcode |
| 4 ENUM target |
| 4 CARD32 index |
| 4 CARD32 count |
| 4*count*4 LISTofCARD32 params |
| |
| If the command is encoded in a glXRenderLarge request, the |
| command opcode and command length fields above are expanded to |
| 4 bytes each: |
| |
| 4 20+count*4*4 rendering command length |
| 4 4306 rendering command opcode |
| |
| ProgramEnvParametersI4ivNV |
| |
| 2 16+count*4*4 rendering command length |
| 2 4308 rendering command opcode |
| 4 ENUM target |
| 4 CARD32 index |
| 4 CARD32 count |
| 4*count*4 LISTofCARD32 params |
| |
| If the command is encoded in a glXRenderLarge request, the |
| command opcode and command length fields above are expanded to |
| 4 bytes each: |
| |
| 4 20+count*4*4 rendering command length |
| 4 4308 rendering command opcode |
| |
| ProgramEnvParametersI4uivNV |
| |
| 2 16+count*4*4 rendering command length |
| 2 4310 rendering command opcode |
| 4 ENUM target |
| 4 CARD32 index |
| 4 INT32 count |
| 4*count*4 LISTofCARD32 params |
| |
| If the command is encoded in a glXRenderLarge request, the |
| command opcode and command length fields above are expanded to |
| 4 bytes each: |
| |
| 4 20+count*4*4 rendering command length |
| 4 4310 rendering command opcode |
| |
| The remaining commands are non-rendering commands. These commands |
| are sent separately (i.e., not as part of a glXRender or |
| glXRenderLarge request), using the glXVendorPrivateWithReply |
| request: |
| |
| GetProgramLocalParameterIivNV |
| 1 CARD8 opcode (X assigned) |
| 1 17 GLX opcode (X_GLXVendorPrivateWithReply) |
| 2 5 request length |
| 4 1365 vendor specific opcode |
| 4 GLX_CONTEXT_TAG context tag |
| 4 ENUM target |
| 4 CARD32 index |
| => |
| 1 1 reply |
| 1 CARD8 unused |
| 2 CARD16 sequence number |
| 4 4 reply length |
| 24 CARD32 unused |
| 16 INT32 params |
| |
| GetProgramLocalParameterIuivNV |
| 1 CARD8 opcode (X assigned) |
| 1 17 GLX opcode (X_GLXVendorPrivateWithReply) |
| 2 5 request length |
| 4 1366 vendor specific opcode |
| 4 GLX_CONTEXT_TAG context tag |
| 4 ENUM target |
| 4 CARD32 index |
| => |
| 1 1 reply |
| 1 CARD8 unused |
| 2 CARD16 sequence number |
| 4 4 reply length |
| 24 CARD32 unused |
| 16 CARD32 params |
| |
| GetProgramEnvParameterIivNV |
| 1 CARD8 opcode (X assigned) |
| 1 17 GLX opcode (X_GLXVendorPrivateWithReply) |
| 2 5 request length |
| 4 1367 vendor specific opcode |
| 4 GLX_CONTEXT_TAG context tag |
| 4 ENUM target |
| 4 CARD32 index |
| => |
| 1 1 reply |
| 1 CARD8 unused |
| 2 CARD16 sequence number |
| 4 4 reply length |
| 24 CARD32 unused |
| 16 INT32 params |
| |
| GetProgramEnvParameterIuivNV |
| 1 CARD8 opcode (X assigned) |
| 1 17 GLX opcode (X_GLXVendorPrivateWithReply) |
| 2 5 request length |
| 4 1368 vendor specific opcode |
| 4 GLX_CONTEXT_TAG context tag |
| 4 ENUM target |
| 4 CARD32 index |
| => |
| 1 1 reply |
| 1 CARD8 unused |
| 2 CARD16 sequence number |
| 4 4 reply length |
| 24 CARD32 unused |
| 16 CARD32 params |
| |
| Errors |
| |
| The error INVALID_VALUE is generated by ProgramLocalParameter4fARB, |
| ProgramLocalParameter4fvARB, ProgramLocalParameter4dARB, |
| ProgramLocalParameter4dvARB, ProgramLocalParameterI4iNV, |
| ProgramLocalParameterI4ivNV, ProgramLocalParameterI4uiNV, |
| ProgramLocalParameterI4uivNV, GetProgramLocalParameter4fvARB, |
| GetProgramLocalParameter4dvARB, GetProgramLocalParameterI4ivNV, and |
| GetProgramLocalParameterI4uivNV if <index> is greater than or equal to the |
| number of program local parameters supported by <target>. |
| |
| The error INVALID_VALUE is generated by ProgramEnvParameter4fARB, |
| ProgramEnvParameter4fvARB, ProgramEnvParameter4dARB, |
| ProgramEnvParameter4dvARB, ProgramEnvParameterI4iNV, |
| ProgramEnvParameterI4ivNV, ProgramEnvParameterI4uiNV, |
| ProgramEnvParameterI4uivNV, GetProgramEnvParameter4fvARB, |
| GetProgramEnvParameter4dvARB, GetProgramEnvParameterI4ivNV, and |
| GetProgramEnvParameterI4uivNV if <index> is greater than or equal to the |
| number of program environment parameters supported by <target>. |
| |
| The error INVALID_VALUE is generated by ProgramLocalParameters4fvNV, |
| ProgramLocalParametersI4ivNV, and ProgramLocalParametersI4uivNV if the sum |
| of <index> and <count> is greater than the number of program local |
| parameters supported by <target>. |
| |
| The error INVALID_VALUE is generated by ProgramEnvParameters4fvNV, |
| ProgramEnvParametersI4ivNV, and ProgramEnvParametersI4uivNV if the sum of |
| <index> and <count> is greater than the number of program environment |
| parameters supported by <target>. |
| |
| |
| Dependencies on NV_parameter_buffer_object |
| |
| If NV_parameter_buffer_object is not supported, references to program |
| parameter buffer variables and bindings should be removed. |
| |
| |
| Dependencies on ARB_texture_rectangle |
| |
| If ARB_texture_rectangle is not supported, references to rectangle |
| textures and the RECT and SHADOWRECT texture target identifiers should be |
| removed. |
| |
| |
| Dependencies on EXT_gpu_program_parameters |
| |
| If EXT_gpu_program_parameters is not supported, references to the |
| Program{Local,Env}Parameters4fvNV commands, which set multiple program |
| local or environment parameters in a single call, should be removed. |
| These prototypes were included in this spec for completeness only. |
| |
| |
| Dependencies on EXT_texture_integer |
| |
| If EXT_texture_integer is not supported, references to texture lookups |
| returning integer values in Section 2.X.4.4 (Texture Access) should be |
| removed, and all texture formats are considered to produce floating-point |
| values. |
| |
| |
| Dependencies on EXT_texture_array |
| |
| If EXT_texture_array is not supported, references to array textures in |
| Section 2.X.4.4 (Texture Access) and elsewhere should be removed, as |
| should all references to the "ARRAY1D", "ARRAY2D", "SHADOWARRAY1D", and |
| "SHADOWARRAY2D" tokens. |
| |
| |
| Dependencies on EXT_texture_buffer_object |
| |
| If EXT_texture_buffer_object is not supported, references to buffer |
| textures in Section 2.X.4.4 (Texture Access) and elsewhere should be |
| removed, as should all references to the "BUFFER" tokens. |
| |
| |
| Dependencies on NV_primitive_restart |
| |
| If NV_primitive_restart is supported, index values causing a primitive |
| restart are not considered as specifying an End command, followed by |
| another Begin. Primitive restart is therefore not guaranteed to |
| immediately update bindings for material properties changed inside a |
| Begin/End. The spec language says they "are not guaranteed to update |
| program parameter bindings until the following End command." |
| |
| |
| New State |
| |
| Initial |
| Get Value Type Get Command Value Description Sec Attrib |
| ---------------------------- ---- --------------- ------- ---------------------- ------ ------ |
| PROGRAM_ATTRIB_COMPONENTS_NV Z+ GetProgramivARB - number of components 6.1.12 - |
| used for attributes |
| PROGRAM_RESULT_COMPONENTS_NV Z+ GetProgramivARB - number of components 6.1.12 - |
| used for results |
| |
| Table X.20. New Program Object State. Program object queries return |
| attributes of the program object currently bound to the program target |
| <target>. |
| |
| |
| New Implementation Dependent State |
| |
| Minimum |
| Get Value Type Get Command Value Description Sec. Attrib |
| -------------------------------- ---- --------------- ------- --------------------- ------ ------ |
| MIN_PROGRAM_TEXEL_OFFSET_EXT Z GetIntegerv -8 minimum texel offset 2.x.4.4 - |
| allowed in lookup |
| MAX_PROGRAM_TEXEL_OFFSET_EXT Z GetIntegerv +7 maximum texel offset 2.x.4.4 - |
| allowed in lookup |
| MAX_PROGRAM_ATTRIB_COMPONENTS_NV Z+ GetProgramivARB (*) maximum number of 6.1.12 - |
| components allowed |
| for attributes |
| MAX_PROGRAM_RESULT_COMPONENTS_NV Z+ GetProgramivARB (*) maximum number of 6.1.12 - |
| components allowed |
| for results |
| MAX_PROGRAM_GENERIC_ATTRIBS_NV Z+ GetProgramivARB (*) number of generic 6.1.12 - |
| attribute vectors |
| supported |
| MAX_PROGRAM_GENERIC_RESULTS_NV Z+ GetProgramivARB (*) number of generic 6.1.12 - |
| result vectors |
| supported |
| MAX_PROGRAM_CALL_DEPTH_NV Z+ GetProgramivARB 4 maximum program 2.X.5 - |
| call stack depth |
| MAX_PROGRAM_IF_DEPTH_NV Z+ GetProgramivARB 48 maximum program 2.X.5 - |
| if nesting |
| MAX_PROGRAM_LOOP_DEPTH_NV Z+ GetProgramivARB 4 maximum program 2.X.5 - |
| loop nesting |
| |
| Table X.21: New Implementation-Dependent Values Introduced by |
| NV_gpu_program4. (*) means that the required minimum is program |
| type-specific. There are separate limits for each program type. |
| |
| |
| Issues |
| |
| (1) How does this extension differ from previous NV_vertex_program and |
| NV_fragment_program extensions? |
| |
| RESOLVED: |
| |
| - This extension provides a uniform set of instructions and bindings. |
| Unlike previous extensions, the set of instructions and bindings |
| available is generally the same. The only exceptions are a small |
| number of instructions and bindings that make sense for one specific |
| program type. |
| |
| - This extension supports integer data types and provides a |
| full-fledged integer instruction set. |
| |
| - This extension supports array variables of all types, including |
| temporaries. Array variables can be accessed directly or indirectly |
| (using integer temporaries as indices). |
| |
| - This extension provides a uniform set of structured branching |
| constructs (if tests, loops, subroutines) that fully support |
| run-time condition testing. Previous versions of NV_vertex_program |
| provided unstructured branching. Previous versions of |
| NV_fragment_program provided structure branching constructs, but the |
| support was more limited -- for example, looping constructs couldn't |
| specify loop counts with values computed at run time. |
| |
| - This extension supports geometry programs, which are described in |
| more detail in the NV_geometry_program4 extension. |
| |
| - This extension provides the ability to specify and use cubemap |
| textures with a DEPTH_COMPONENT internal format. Shadow mapping is |
| supported; the Q texture coordinate is used as the reference value |
| for comparisons. |
| |
| (2) Is this extension backward-compatible with previous NV_vertex_program |
| and NV_fragment_program extensions? If not, what support has been |
| removed? |
| |
| RESOLVED: This extension is largely, but not completely, |
| backward-compatible. Functionality removed includes: |
| |
| - Unstructured branching: NV_vertex_program2 included a general |
| branch instruction "BRA" that could be used to jump to an arbitrary |
| instruction. The "CAL" instruction could "call" to an arbitrary |
| instruction into code that was not necessarily structured as simple |
| subroutine blocks. Arbitrary unstructured branching can be |
| difficult to implement efficiently on highly parallel GPU |
| architectures, while basic structured branching is not nearly as |
| difficult. |
| |
| This extension retains the "CAL" instruction but treats each block |
| of code between instruction labels as a separate subroutine. The |
| "BRA" instruction and arbitrary branching has been removed. The |
| structured branching constructs in this extension are sufficient to |
| implement almost all of the looping/branching support in high-level |
| languages ("goto" being the most obvious exception). |
| |
| - Address registers: NV_vertex_program added the notion of address |
| registers, which were effectively under-powered integer temporaries. |
| The set of instructions used to manipulate address registers was |
| severely limited. NV_vertex_program[23] extended the original |
| scalars to vectors and added a few more instructions to manipulate |
| address registers. Fragment programs had no address registers until |
| NV_fragment_program2 added the loop counter, which was very similar |
| in functionality to vertex program address registers, but even more |
| limited. This extension adds true integer temporaries, which can |
| accomplish everything old address registers could do, and much more. |
| Address register support was removed to simplify the API. |
| |
| - NV_fragment_program2 LOOP construct: NV_fragment_program2 added a |
| LOOP instruction, which let you repeat a block of code <N> times, |
| with a parallel loop counter that started at <A> and stepped by <B> |
| on each iteration. This construct was signficantly limited in |
| several ways -- the loop count had to be constant, and you could |
| only access the innermost loop counter in a nested loop. This |
| extension discards the support and retains the simpler "REP" |
| construct to implement loops. If desired, a loop counter can be |
| implemented by manipulating an integer temporary. The "BRK" |
| instruction (conditional break) is retained, and a "CONT" |
| instruction (conditional continue) is added. Additionally, the loop |
| count need not be a constant. |
| |
| - NV_vertex_program and ARB_vertex_program EXP and LOG instructions: |
| NV_vertex_program provided EXP and LOG instructions that computed a |
| rough approximation of 2^x or log_2(x) and provided some additional |
| values that could help refine the approximation. Those opcodes were |
| carried forward into ARB_vertex_program. Both ARB_vertex_program |
| and NV_vertex_program2 provided EX2 and LG2 instructions that |
| computed a better approximation. All fragment program extensions |
| also provided EX2 and LG2, but did not bother to include EXP and |
| LOG. On the hardware targeted by this extension, there is no |
| advantage to using EXP and LOG, so these opcodes have been removed |
| for simplicity. |
| |
| - NV_vertex_program3 and NV_fragment_program2 provide the ability to |
| do indirect addressing of inputs/outputs when using bindings in |
| instructions -- for example: |
| |
| MOV R0, vertex.attrib[A0.x+2]; # vertex |
| MOV result.texcoord[A0.y], R1; # vertex |
| MOV R2, fragment.texcoord[A0.x]; # fragment |
| |
| This extension provides indexing capability, but using named array |
| variables instead. |
| |
| ATTRIB attribs[] = { vertex.attrib[2..5] }; |
| MOV R0, attribs[A0.x]; |
| OUTPUT outcoords[] = { result.texcoord[0..3] }; |
| MOV outcoords[A0.y], R1; |
| ATTRIB texcoords[] = { fragment.texcoord[0..2] }; |
| MOV R2, texcoords[A0.x]; |
| |
| This approach makes the set of attribute and result bindings more |
| regular. Additionally, it helps the assembler determine which |
| vertex/fragment attributes are actually needed -- when the assembler |
| sees constructs like "fragment.texcoord[A0.x]", it must treat *all* |
| texture coordinates as live unless it can determine the range of |
| values used for indexing. The named array variable approach |
| explicitly identifies which attributes are needed when indexing is |
| used. |
| |
| Functionality altered includes: |
| |
| - The RSQ instruction in the original NV_vertex_program and |
| ARB_vertex_program extensions implicitly took the absolute value of |
| their operand. Since the ARB extensions don't have numerics |
| guarantees, computing the reciprocal square root of a negative value |
| was not meaningful. To allow for the possibility of taking the |
| reciprocal square root of a negative value (which should yield NaN |
| -- "not a number"), the RSQ instruction in this instruction no |
| longer implicitly takes the absolute value of its operand. |
| Equivalent functionality can be achieved using the explicit |abs| |
| absolute value operator on the operand to RSQ. |
| |
| - The results of texture lookups accessing inconsistent textures are |
| now undefined, instead of producing a fixed constant vector. |
| |
| |
| (3) What should this set of extensions be called? |
| |
| RESOLVED: NV_gpu_program4, NV_vertex_program4, NV_fragment_program4, |
| and NV_geometry_program4. Only NV_gpu_program4 will appear in the |
| extension string; the other three specifications exist simply to define |
| vertex, fragment, and geometry program-specific features. |
| |
| The "gpu_program" name was chosen due to the common instruction set |
| intended to run on GPUs. On previous chip generations, the vertex and |
| fragment instruction sets were similar, but there were enough |
| differences to package them separately. |
| |
| The choice of "4" indicates that this is the fourth generation of |
| programmable hardware from NVIDIA. The GeForce3 and GeForce4 series |
| supported NV_vertex_program. The GeForce FX series supported |
| NV_vertex_program2 and added fragment programmability with |
| NV_fragment_program. Around this time, the OpenGL Architecture Review |
| Board (ARB) approved ARB_vertex_program and ARB_fragment_program |
| extensions, and NVIDIA added NV_vertex_program2_option and |
| NV_fragment_program_option extensions exposing GeForce FX features using |
| the ARB extensions' instruction set. The GeForce6 and GeForce7 series |
| brought the NV_vertex_program3 and NV_fragment_program2 extensions, |
| which extend the ARB extensions further. This extension adds geometry |
| programs, and brings the "version number" for each of these extensions |
| up to "4". |
| |
| |
| (4) This instruction adds integer data type support in programmable |
| shaders that were previously float-centric. Should applications be able |
| to pass integer values directly to the shaders, and if so, how does it |
| work? |
| |
| RESOLVED: The diagram at the bottom of this issue depicts data flows in |
| the GL, as extended by this and related extensions. |
| |
| This extension generalizes some state to be "typeless", instead of being |
| strongly typed (and almost invariably floating-point) as in the core |
| specification. We introduce a new set of functions to specify GL state |
| as signed or unsigned integer values, instead of floating point values. |
| These functions include: |
| |
| * VertexAttribI*{i,ui}() -- Specify generic vertex attributes as |
| integers. This extension does not create "integer" versions for |
| fixed-function attribute functions (e.g., glColor, glTexCoord), |
| which remain fully floating-point. |
| |
| * Program{Env,Local}ParameterI*{i,ui}() -- Specify environment and |
| local parameters as integers. |
| |
| * TexImage*() with EXT_texture_integer internal formats -- Specify |
| texture images as containing integer data whose values are not |
| converted to floating-point values. |
| |
| * EXT_parameter_buffer_object functions -- Bind (typeless) buffer |
| object data stores for use as program parameters. These buffer |
| objects can be loaded with either integer or floating-point data. |
| |
| * EXT_texture_buffer_object functions -- Bind (typeless) buffer object |
| data stores for use as textures. These buffer objects can be loaded |
| with either integer or floating-point data. |
| |
| Each type of program (using NV_gpu_program4 and related extension) can |
| read attributes using any data type (float, signed integer, unsigned |
| integer) and write result values used by subsequent stages using any |
| data type. |
| |
| Finally, there are several new places where integer data can be |
| consumed by the GL: |
| |
| * NV_transform_feedback -- Stream transformed vertex attribute |
| components to a (typeless) buffer object. The transformed |
| attributes can be written as signed or unsigned integers in vertex |
| and geometry programs. |
| |
| * EXT_texture_integer internal formats and framebuffer objects -- |
| Provide support for rendering to integer texture formats, where |
| final fragment values are treated as signed or unsigned integers, |
| rather than floating-point values. |
| |
| The diagram below represents a substantial portion of the GL pipeline. |
| Each line connecting blocks represents an interface where data is |
| "produced" from the GL state or by fixed-function or programmable |
| pipeline stages and "consumed" by another pipeline stage. Each producer |
| and consumer is labeled with a data type. For producers, the |
| "(typeless)" designation generally means that the state and/or output |
| can be written as floating-point values or as signed or unsigned |
| integers. "(float)" means that the outputs are always written as |
| floating-point. The same distinction applies to consumers -- |
| "(typeless)" means that the consumer is capable of reading inputs using |
| any data type, and "(float)" means that consumer always reads inputs as |
| floating-point values. |
| |
| To get sane results, applications must ensure that each value passed |
| between pipeline stages is produced and consumed using the same data |
| type. If a value is written in one stage as a floating-point value; it |
| must be read as a floating-point value as well. If such a value is read |
| as a signed or unsigned integer, its value is considered undefined. In |
| practice, the raw bits used to represent the floating-point (IEEE |
| single-precision floating-point encoding in the initial implementation |
| of this spec) will be treated as an integer. |
| |
| Type matching between stages is not enforced by the GL, because the |
| overhead of doing so would be substantial. Such overhead would include: |
| |
| * matching the inputs and outputs of each pipeline stage |
| (fixed-function or programmable) every time the program |
| configuration or fixed-function state changes, |
| |
| * tracking the data type of each generic vertex attribute and checking |
| it against the vertex program's inputs, |
| |
| * tracking the data type of each program parameter and checking it |
| against the manner the parameters were used in programs, |
| |
| * matching color buffers against fragment program outputs. |
| |
| Such error checking is certainly valuable, but the additional CPU |
| overhead cost is substantial. Given that current CPUs often have a hard |
| time keeping up with high-end GPUs, adding more overhead is a step in |
| the wrong direction. We expect developer tools, such as instrumented |
| drivers, to be able to provide type checking on most interfaces. |
| |
| The diagram below depicts assembly programmability. Using vertex, |
| geometry, and fragment shaders provided by the OpenGL Shading Language |
| (GLSL) isn't substantially different from the assembly interface, except |
| that the interfaces between programmable pipeline stages are more |
| tightly coupled in GLSL (vertex, geometry, and fragment shaders are |
| linked together into a single program object), and that shader variables |
| are more strongly typed in GLSL than in the assembly interface. |
| |
| In the figure below, the first programmable stage is vertex program |
| execution. For all inputs read by the vertex program, they must be |
| specified in the GL vertex APIs (immediate mode or vertex arrays) using |
| a data type matching the data type read by the shader. Additionally, |
| vertex programs (and all other program types) can read program |
| parameters, parameter buffers, and textures. In all cases the |
| parameter, buffer, or texture data must be accessed in the shader using |
| the same data type used to specify the data. If vertex programs are |
| disabled, fixed-function vertex processing is used. Fixed-function |
| vertex processing is fully floating-point, and all the conventional |
| vertex attributes and state used by fixed-function are floating-point |
| values. |
| |
| After vertex processing, an optional geometry program can be executed, |
| which reads attributes written by vertex programs (or fixed-functon) and |
| writes out new vertex attributes. The vertex attributes it reads must |
| have been written by the vertex program (or fixed-function) using a |
| matching data type. |
| |
| After geometry program execution, vertex attributes can optionally be |
| written out to buffer objects using the NV_transform_feedback extension. |
| The vertex attributes are written by the GL to the buffer objects using |
| the same data type used to write the attribute in the geometry program |
| (or vertex program if geometry programs are disabled). |
| |
| Then, rasterization generates fragments based on transformed vertices. |
| Most attributes written by vertex or geometry programs can be read by |
| fragment programs, after the rasterization hardware "interpolates" them. |
| This extension allows fragment programs to control how each attribute is |
| interpolated. If an attribute is flat-shaded, it will be taken from the |
| output attribute of the provoking vertex of the primitive using the same |
| data type. If an attribute is smooth-shaded, the per-vertex attributes |
| will be interpreted as a floating-point value, and a floating-point |
| result. One necessary consequence of this is that any integer |
| per-fragment attributes must be flat-shaded. To prevent some |
| interpolation type errors, assembly and GLSL fragment shaders will not |
| compile if they declare an integer fragment attribute that is not flat |
| shaded. [NOTE: While point primitives generally have constant |
| attributes, any integer attributes must still be flat-shaded; point |
| rasterization may perform (degenerate) floating-point interpolation.] |
| |
| Fragment programs must read attributes using data types matching the |
| outputs of the interpolation or flat-shading operations. They may write |
| one or more color outputs using any data type, but the data type used |
| must match the corresponding framebuffer attachments. Outputs directed |
| at signed or unsigned integer textures (EXT_texture_integer) must be |
| written using the appropriate integer data type; all other outputs must |
| be written as floating-point values. Note that some of the |
| fixed-function per-fragment operations (e.g., blending, alpha test) are |
| specified as floating-point operations and are skipped when directed at |
| signed or unsigned integer color buffers. |
| |
| |
| |
| generic conventional |
| vertex vertex |
| attributes attributes |
| | (typeless) | (float) |
| | | |
| | | |
| | +----------------------+ |
| program | | | |
| parameters ----+ | | | |
| (typeless) | | | (typeless) | (float) |
| | V V V |
| constant +-+----------> vertex fixed-function |
| buffers ----+ |(typeless) program vertex |
| (typeless) | | | | |
| | | | (typeless) | (float) |
| textures ----+ | V | |
| (typeless) | |<----------------------+ |
| | | | |
| | | +---------------+ |
| | | | | |
| | | | (typeless) | |
| | | V | |
| | +---------> geometry | |
| | |(typeless) program | |
| | | | | |
| | | | (typeless) | |
| | | V | |
| | | |<--------------+ |
| | | | |
| | | | |
| | | +-----------------+ |
| | | | |(typeless) |
| | | | v |
| | | | transform |
| | | | feedback |
| | | | buffers |
| | | | |
| | | | |
| | | +-----------------------+ |
| | | | | |
| | | | (float) | (typeless) |
| | | V V |
| | | interpolated flat |
| | | attributes attributes |
| | | | | |
| | | | (float) | (typeless) |
| | | V | |
| | | |<----------------------+ |
| | | | |
| | | +-----------------------+ |
| | | | | |
| | | | (typeless) | (float) |
| | |(typeless) V V |
| | +---------> fragment +------> fixed-function |
| | program |(float) fragment |
| | | | | |
| +--------------------------/|/--------+ | |
| | | |
| | (typeless) | (float) |
| V | |
| |<----------------------+ |
| | |
| +-----------------------+------ .... |
| | | |
| | (typeless) | (typeless) |
| V V |
| color color |
| attachment attachment |
| 0 1 |
| |
| |
| (5) Instructions can operate on signed integer, unsigned integer, and |
| floating-point values. Some operations make sense on all three data |
| types? How is this supported, and what type checking support is provided |
| by the assembler? |
| |
| RESOLVED: One important property of the instruction set is that the |
| data type for all operands and the result is fully specified by the |
| instructions themselves. For instructions (such as ADD) that make sense |
| for both integer and floating-point values, an optional data type |
| modifier is provided to indicate which type of operation should be |
| performed. For example, "ADD.S", "ADD.U", and "ADD.F", add signed |
| integers, unsigned integers, or floating-point values, respectively. If |
| no data type modifier is provided, ".F" is assumed if the instruction |
| can apply to floating-point values and ".S" is assumed otherwise. |
| |
| To help identify errors where the wrong data type is used -- for |
| example, adding integer values in an ADD instruction that omits a data |
| type modifier and thus defaults to "ADD.F" -- variables may be declared |
| with optional data type modifiers. In the following code: |
| |
| INT TEMP a; |
| UINT TEMP b; |
| FLOAT TEMP c; |
| TEMP d; |
| |
| "a", "b", "c", and "d" are declared as temporary variables holding |
| signed integer, unsigned integer, floating-point, and typeless values. |
| Since each instruction fully specifies the data type of each operand and |
| its result, these data types can be checked against the data type |
| assigned to the variables operated on. If the types don't match, and |
| the variable is not typeless, an error is reported. The opcode modifier |
| ".NTC" can be used to ignore such errors on a per-opcode basis, if |
| required. |
| |
| Note that when bindings are used directly in instructions, they are |
| always considered typeless for simplicity. Some fixed-function bindings |
| have an obvious data type, but other bindings (e.g., program parameters) |
| can hold either integer or floating-point values, depending on how they |
| were specified. |
| |
| Variable data types are optional. Typeless variables are provided |
| because some programs may want to reuse the same variable in several |
| places with different data types. |
| |
| (6) Should both signed (INT) and unsigned integer (UINT) data types be |
| provided? |
| |
| RESOLVED: Yes. Signed and unsigned integer operations are supported. |
| Providing both "INT" and "UINT" variable modifiers distinguish between |
| signed and unsigned values for type checking purposes, to ensure that |
| unsigned values aren't read as signed values and vice versa. |
| |
| This specification says if a value is read a signed integer, but was |
| written as an unsigned integer, the value returned is undefined. |
| However, signed and unsigned integers are interchangeable in practice, |
| except for very large unsigned integers (which can't be represented as |
| signed values of the equivalent size) or negative signed integers. |
| |
| If programs know that they won't generate negative or very large values, |
| signed and unsigned integers can be used interchangeably. To avoid type |
| errors in the assembler in this case, typeless variables can be used. |
| Or the ".NTC" modifier can be used when appropriate. |
| |
| (7) Integer and floating-point constants are supported in the instruction |
| set. Integer constants might be interpreted to mean either "real integer" |
| values or floating-point values. How are they supported? |
| |
| RESOLVED: When an obvious floating point constant is specified (e.g., |
| "3.0"), the developers' intent is clear. If you try to use a |
| floating-point value in an instruction that wants an integer operand, or |
| a declaration of an integer parameter variable, the program will fail to |
| load. An integer constant used in an instruction isn't quite as clear. |
| But its meaning can be easily inferred because the operand types of |
| instructions are well-known at compile time. An integer multiply |
| involving the constant "2" will interpret the "2" as an integer. A |
| floating-point multiply involving the same constant "2" will interpret |
| it as a floating-point value. |
| |
| The only real problem is for a parameter declaration that is typeless. |
| For typed variables, the intent is clear: |
| |
| INT PARAM two = 2; # use integer 2 |
| FLOAT PARAM twoPt0 = 2; # use floating-point 2.0 |
| |
| For typeless variables, there's no context to go on: |
| |
| PARAM two = 2; # 2? 2.0? |
| |
| This extension is intended to be largely upward-compatible with |
| ARB_vertex_program, ARB_fragment_program, and the other extensions built |
| on top of them. In all of these, the previous declaration is legal and |
| means "2.0". For compatibility, we choose to interpret integer |
| constants in this case as floating-point values. The assembler in the |
| NVIDIA implementation will issue a warning if this case ever occurs. |
| |
| This extension does not provide decoration of integer constant values -- |
| we considered adding suffixed integers such as "2U" to mean "2, and |
| don't even think about converting me to a float!". We expect that it |
| will be sufficient to use the "INT" or "FLOAT" modifiers to disambiguate |
| effectively. |
| |
| (8) Should hexadecimal constants (e.g., 0x87A3 or 0xFFFFFFFF) be supported? |
| |
| RESOLVED: Yes. |
| |
| (9) Should we provide data type modifiers with explicit component sizes? |
| For example, "INT8", "FLOAT16", or "INT32". If so, should we provide a |
| mechanism to query the size (in bits) of a variable, or of different |
| variable types/qualifiers? |
| |
| RESOLVED: No. |
| |
| (10) Should this extension provide better support for array variables? |
| |
| RESOLVED: Yes; array variables of all types are allowed. |
| |
| In ARB_vertex_program, program parameter (constant) variables could be |
| addressed as arrays. Temporary variables, vertex attributes, and vertex |
| results could not be declared as arrays. |
| |
| In NV_vertex_program3 and NV_fragment_program2, relative addressing was |
| supported in program bindings: |
| |
| MOV R0, vertex.attrib[A0.x]; # vertex |
| MOV result.texcoord[A0.x], R0; # vertex |
| MOV R0, fragment.texcoord[A0.x]; # fragment -- inside LOOP |
| |
| Explicitly declared attribute or result arrays were not supported, and |
| temporaries could also not be arrays. |
| |
| This extension allows users to declare attribute, result, and temporary |
| arrays such as: |
| |
| ATTRIB attribs[] = { vertex.attrib[7..11] }; |
| TEMP scratch[10]; |
| RESULT texcoords[] = { result.texcoord[0..3] }; |
| |
| Additionally, the relative addressing mechanisms provided by |
| NV_vertex_program3 and NV_fragment_program2 are NOT supported in this |
| extension -- instead, declared array variables are the only way to get |
| relative addressing. Using declared arrays allows the assembler to |
| identify which attributes will actually be used. An expression like |
| "vertex.texcoord[A0.x]" doesn't identify which texture coordinates are |
| referenced, and the assembler must be conservative in this case and |
| assume that they all are. |
| |
| (11) Is relative addressing of temporaries allowed? |
| |
| RESOLVED: Yes. However, arrays of temporaries may end up being stored |
| in off-chip memory, and may be slower to access than non-array |
| temporaries. |
| |
| (12) Should this extension add bindings to pass generic attributes between |
| vertex, geometry, and fragment programs, or are texture coordinates |
| sufficient? |
| |
| RESOLVED: While texture coordinates have been used in the past, generic |
| attributes should be provided. |
| |
| The assembler provides a large set of bindings and automatically |
| eliminates generic attributes or components that are unused. At each |
| interface between programs, there is an implementation-dependent limit |
| on the number of attribute components that can be passed. |
| |
| There are several reasons that this approach was chosen. First, if the |
| number of attributes that can be passed between program stages exceeds |
| the number of existing texture coordinate sets supported when specifying |
| vertex, a second implementation-dependent number of texture coordinates |
| would need to be exposed to cover the number supported between stages. |
| Second, the mechanisms described above reduce or eliminate the need to |
| pack attributes into four component vectors. Third, "texture |
| coordinates" that have been historically used for texture lookups don't |
| need to be used to pass values that aren't used this way. |
| |
| (13) The structured branching support in NV_fragment_program2 provides a |
| REP instruction that says to repeat a block of code <N> times, as well as |
| a LOOP instruction that does the same, but also provides a special loop |
| counter variable. What sort of looping mechanism should we provide here? |
| |
| RESOLVED: Provide only the REP instruction. The functionality provided |
| by the LOOP instruction can be easily achieved by using an integer |
| temporary as the loop index. This avoids two annoyances of the old LOOP |
| models: (a) the loop index (A0.x) is a special variable name, while all |
| other variables are declared normally and (b) instructions can only |
| access the loop index of the innermost loop -- loop indices at higher |
| nesting levels are not accessible. |
| |
| One other option was a considered -- a "LOOPV" instruction (LOOP with a |
| variable where the program specified a variable name and component to |
| hold the loop index, instead of using the implicit variable name "A0.x". |
| In the end, it was decided that using an integer temporary as a loop |
| counter was sufficient. |
| |
| (14) The structured branching support in NV_fragment_program2 provides a |
| REP instruction that requires a loop count. Some looping constructs may |
| not have a definite loop count, such as a "while" statement in C. Should |
| this construct be supported, and if so, how? |
| |
| RESOLVED: The REP instruction is extended to make the loop count |
| optional. If no loop count is provided, the REP instruction specified a |
| loop that can only be exited using the BRK (break) or RET instructions. |
| To avoid obvious infinite loops, an error will be reported if a |
| REP/ENDREP block contains no BRK instruction at the current nesting |
| level and no RET instruction at any nesting level. |
| |
| To implement a loop like "while (value < 7.0) ...", code such as the |
| following can be used: |
| |
| TEMP cc; # dummy variable |
| REP; |
| SLT.CC cc.x, value.x, 7.0; # compare value.x to 7.0, set CC0 |
| BRK NE.x; # break out if not true |
| ... |
| ... # presumably update value! |
| ... |
| ENDREP; |
| |
| (15) The structured branching support in NV_fragment_program2 provides a |
| BRK instruction that operates like C's "break" statement. Should we |
| provide something similar to C's "continue" statement, which skips to the |
| next iteration of the loop? |
| |
| RESOLVED: Yes, a new CONT opcode is provided for this purpose. |
| |
| (16) Can the BRK or CONT instructions break out of multiple levels of |
| nested loops at once? |
| |
| RESOLVED: No. BRK and CONT only exit the current nesting level. To |
| break out of multiple levels of nested loops, multiple BRK/CONT |
| instructions are required. |
| |
| (17) For REP instructions, is the loop counter reloaded on each iteration |
| of the loop? |
| |
| RESOLVED: No. The loop counter is loaded once at the top of the loop, |
| compared to zero at the top of the loop, and decremented when each loop |
| iteration completes. A program may overwrite the variable used to |
| specify the initial value of the loop counter inside the loop without |
| affecting the number of times the loop body is executed. |
| |
| (18) How are floating-point values represented in this extension? What |
| about floating-point arithmetic operations? |
| |
| RESOLVED: In the initial hardware implementation of this extension, |
| floating-point values are represented using the standard 32-bit IEEE |
| single-precision encoding, consisting of a sign bit, 8 exponent bits, |
| and 23 mantissa bits. Special encodings for NaN (not a number), +/-INF |
| (infinity), and positive and negative zero are supported. Denorms |
| (values less than 2^-126, which have an exponent encoding of "0" and no |
| implied leading one) are supported, but may be flushed to zero, |
| preserving the sign bit of the original value. Arithmetic operations |
| are carried out at single-precision using normal IEEE floating-point |
| rules, including special rules for generating infinities, NaNs, and |
| zeros of each sign. |
| |
| Floating-point temporaries declared as "SHORT" may be, but are not |
| necessarily, stored as 16-bit "fp16" values (sign bit, five exponent |
| bits, ten mantissa bits), as specified in the NV_float_buffer and |
| ARB_half_float_pixel extensions. |
| |
| (19) Should we provide a method to declare how fragment attributes are |
| interpolated? It is possible to have flat-shaded attributes, |
| perspective-corrected attributes, and centroid-sampled attributes. |
| |
| RESOLVED: Yes. Fragment program attribute variable declarations may |
| specify the "FLAT", "NOPERSPECTIVE", and "CENTROID" modifiers. |
| |
| These modifiers are documented in detail in the NV_fragment_program4 |
| specification. |
| |
| (20) Should vertex and primitive identifiers be supported? If so, how? |
| |
| RESOLVED: A vertex identifier is available as "vertex.id" in a vertex |
| program. The vertex ID is equal to value effectively passed to |
| ArrayElement when the vertex is specified, and is defined only if vertex |
| arrays are used with buffer objects (VBOs). |
| |
| A primitive identifier is available as "primitive.id" in a geometry or |
| fragment program. The primitive ID is equal to the number of primitives |
| processed since the last implicit or explicit call to glBegin(). |
| |
| See the NV_vertex_program4 spec for more information on vertex IDs, and |
| the NV_geometry_program4 or NV_fragment_program4 specs for more |
| information on primitive IDs. |
| |
| (21) For integer opcodes, should a bitwise inversion operator "~" be |
| provided, analogous to existing negation operator? |
| |
| RESOLVED: No. If this operator were provided, it might allow a program |
| to evaluate the expression "a&(~b)" using a single instruction: |
| |
| AND.U a, a, ~b; |
| |
| Instead, it is necessary to instead do something like: |
| |
| UINT TEMP t; |
| NOT.U t, b; |
| AND.U a, a, t; |
| |
| If necessary, this functionality could be added in a subsequent |
| extension. |
| |
| (22) What happens if you negate or take the absolute value of the |
| biggest-magnitude negative integer? |
| |
| RESOLVED: Signed integers are represented using two's complement |
| representation. For 32-bit integers, the largest possible value is |
| 2^31-1; the smallest possible value is -2^31. There is no way to |
| represent 2^31, which is what these operators "should" return. The |
| value returned in this case is the original value of -2^31. |
| |
| (23) How do condition codes work? How are they different from those |
| provided in previous NVIDIA extensions? |
| |
| RESOLVED: There are two condition codes -- CC0 and CC1 -- each of which |
| is a four-component vector. The condition codes are set based on the |
| result of an instruction that specifies a condition code update |
| modifier. Examples include: |
| |
| ADD.S.CC R0, R1, R2; # add signed integers R1 and R2, update |
| # CC0 based on the result, write the |
| # final value to R0 |
| ADD.F.CC1 R3, R4, R5; # add floats R4 and R5, update CC1 based |
| # on the result, write the final value |
| # to R3 |
| ADD.U.CC0 R6.xy, R7, R8; # add unsigned integers R7 and R8, update |
| # CC0 (x and y components) based on the |
| # result, write the final value to R6 |
| # (x and y components) |
| |
| Condition codes can be used for conditional writes, conditional |
| branches, or other operations. The condition codes aren't used |
| directly, but are instead used with a condition code test such as "LT" |
| (less than) or "EQ" (equal to). Examples include: |
| |
| MOV R0 (GT.x), R1; # move R1 to R0 only if the x component of |
| # CC0 indicates a result of ">0" |
| MOV R2 (NE1), R3; # component-wise move of R3 to R2 if the |
| # corresponding component of CC1 |
| # indicates a result of "!=0" |
| IF LE0.xyxy; # execute the block of code if the x or |
| ... # y components of CC0 indicate a result |
| ENDIF; # of "<=0" |
| REP; |
| ... |
| BRK EQ1.xyzx; # break out of loop if the x, y, or z |
| ENDREP; # components of CC1 indicate a result of |
| # "==0". |
| |
| Previous NVIDIA extensions provide eight tests, which are still |
| supported here. The tests "EQ" (equal), "GE" (greater/equal), "GT" |
| (greater than), "LE" (less/equal), "LT" (less than), and "NE" (not |
| equal) can be used to determine the relation of the result used to set |
| the condition code with zero. The tests "TR" (true) and "FL" (false), |
| are special tests that always evaluate to true or false respectively. |
| |
| For floating-point results, a NaN (not a number) encoding causes the |
| "NE" condition to evaluate to TRUE and all other conditions to evaluate |
| to FALSE. IEEE encodings for "negative" and "positive" zero are both |
| treated as equal to zero. |
| |
| Condition codes are implemented as a set of flags, which are set |
| depending on the type of operation, as described in the spec. |
| |
| For instructions that return floating-point or signed integer values, |
| the normal condition code tests reliably indicate the relationship of |
| the result to zero. For instructions that return unsigned values, the |
| condition codes are a bit more complicated. For example, the sign flag |
| is set if the most significant bit of the result written is set. As a |
| result, very large unsigned integer values (e.g., 0x80000000 - |
| 0xFFFFFFFF) are effectively treated as negative values. Condition code |
| tests should be used with care with unsigned results -- to test if an |
| unsigned integer is ">0", use a sequence like: |
| |
| MOV.U.CC R0, R1; # move R1 to R0, set condition code |
| IF NE; # test if the result is "!=0", a very |
| ... # large value might fail "GT"! |
| ENDIF; |
| |
| This extension provides a number of additional condition code tests |
| useful for different floating-point or integer operations: |
| |
| * NAN (not a number) is true if a floating-point result is a NaN. LEG |
| (less, equal to, or greater) is the opposite of NAN. |
| |
| * CF (carry flag) is true if an unsigned add overflows, or if an |
| unsigned subtract produces a non-negative value. NCF (no carry |
| flag) is the opposite of CF. |
| |
| * OF (overflow flag) is true if a signed add or subtract overflows. |
| NOF (no overflow flag) is the opposite of OF. |
| |
| * SF (sign flag) is true if the sign flag is set. NSF (no sign flag) |
| is the opposite of SF. |
| |
| * AB (above) is true if an unsigned subtract produces a positive |
| result. BLE (below or equal) is the opposite of AB, and is true if |
| an unsigned subtract produces a negative result or zero. Note that |
| CF can be used to test if the result is greater than or equal to |
| zero, and NCF can be used to test if the result is less than zero. |
| |
| (24) How do the "set on" instructions (SEQ, SGE, SGT, SLE, SLT, SNE) work |
| with integer values and/or condition codes? |
| |
| RESOLVED: "Set on" instructions comparing signed and unsigned values |
| return zero if the condition is false, and an integer with all bits set |
| if the condition is true. If the result is signed, it is interpreted as |
| -1. If the result is unsigned, it is interpreted the largest unsigned |
| value (0xFFFFFFFF for 32-bit integers). This is different from the |
| floating-point "set on", which is defined to return 1.0. |
| |
| This specific result encoding was chosen so that bitwise operators (NOT, |
| AND, OR, XOR) can be used to evaluate boolean expressions. |
| |
| When performing condition code tests on the results of an integer "set |
| on" instruction, keep in mind that a TRUE result has the most |
| significant bit set and will be interpreted as a negative value. To |
| test if a condition is true, use "NE" (!=0). A condition code test of |
| "GT" will always fail if the condition code was written by an integer |
| "set on" instruction. |
| |
| (25) What new texture functionality is provided? |
| |
| RESOLVED: Several new features are provided. |
| |
| First, the TXF (texel fetch) instruction allows programs to access a |
| texture map like a normal array. Integer coordinates identifying an |
| individual texel and LOD are provided, and the corresponding texture |
| data is returned without filtering of any type. |
| |
| Second, the TXQ (texture size query) instruction allows programs to |
| query the size of a specified level of detail of a texture. This |
| feature allows programs to perform computations dependent on the size of |
| the texture without having to pass the size as a program parameter or |
| via some other mechanism. |
| |
| Third, applications may specify a constant texel offset in a texture |
| instruction that moves the texture sample point by the specified number |
| of texels. This offset can be used to perform custom texture filtering, |
| and is also independent of the size of the texture LOD -- the same |
| offsets are applied, regardless of the mipmap level. |
| |
| Fourth, shadow mapping is supported for cube map textures. The first |
| three coordinates are the normal (s,t,r) coordinates for a cube map |
| texture lookup, and the fourth component is a depth reference value that |
| can be compared to the depth value stored in the texture. |
| |
| (26) What "consistency" requirements are in effect for textures accessed |
| via the TXF (texel fetch) instruction? |
| |
| UNRESOLVED: The texture must be usable for regular texture mapping |
| operations -- if texture sizes or formats are inconsistent and a |
| mipmapped min filter is used, the results are undefined. |
| |
| (27) How does the TXF instruction work with bordered textures? |
| |
| RESOLVED: The entire image can be accessed, including the border |
| texels. For a 64x64 2D texture plus border (66x66 overall), the lower |
| left border texel is accessed using the coordinates (-1,-1); the upper |
| right border texel is accessed using the coordinates (64,64). |
| |
| (28) What should TXQ (texture size query) return for "irrelevant" texture |
| sizes (e.g., height of a 1D texture)? Should it return any other |
| information at the same time? |
| |
| RESOLVED: This specification leaves all "extra" components undefined. |
| |
| (29) How do texture offsets interact with cubemap textures? |
| |
| RESOLVED: They are not supported in this extension. |
| |
| (30) How do texture offsets interact with mipmapped textures? |
| |
| RESOLVED: The texture offsets are added after the (s,t,r) coordinates |
| have been divided by q (if applicable) and converted to (u,v,w) |
| coordinates by multiplying by the size of the selected texture level. |
| The offsets are added to the (u,v,w) coordinates, and always move the |
| sample point by an integral number of texel coordinates. If multiple |
| mipmaps are accessed, the sample point in each mipmap level is moved by |
| an identical offset. The applied offsets are independent of the |
| selected mipmap level. |
| |
| (31) How do shadow cube maps work? |
| |
| UNRESOLVED: An application can define a cube map texture with a |
| DEPTH_COMPONENT internal format, and then render a scene using the cube |
| map faces as the depth buffer(s). When rendering the projection should |
| be set up using the "center" of the cubemap as the eye, and using a |
| normal projection matrix. When applying the shadow map, the fragment |
| program read the (x,y,z) eye coordinates, compute the length of the |
| major axis (MAX(|x|,|y|,|z|) and then transform this coordinate to [0,1] |
| space using the same parameters used to derive Z in the projection |
| matrix. A 4-component vector consisting of x, y, z, and this computed |
| depth value should be passed to the texture lookup, and normal shadow |
| mapping operations will be performed. |
| |
| This issue should include the math needed to do this computation and |
| sample code. |
| |
| (32) Integer multiplies can overflow by a lot. Should there be some way |
| to return the high part of both unsigned and signed integer multiplies? |
| |
| RESOLVED: Yes. The ".HI" multipler is provided to do a return the 32 |
| MSBs of a 32x32 integer multiply. The instruction sequence: |
| |
| INT TEMP R0, R1, R2, R3; |
| MUL.S R0, R2, R3; |
| MUL.S.HI R1, R2, R3; |
| |
| will do a 32x32 signed integer multiply of R2 and R3, with the 32 LSBs of |
| the 64-bit result in R0 and the 32 MSBs in R1. |
| |
| (33) Should there be any other special multiplication modifiers? |
| |
| RESOLVED: Yes. The ".S24" and ".U24" modifiers allow for signed and |
| unsigned integer multiplies where both operands are guaranteed to fit in |
| the least significant 24 bits. On some architectures supporting this |
| extension, ".S24" and ".U24" integer multiplies may be faster than |
| general-purpose ".S" and ".U" multiplies. If either value doesn't fit |
| in 24 bits, the results of the operation are undefined -- |
| implementations may, but are not required to, ignore the MSBs of the |
| operands if ".S24" or ".U24" is specified. |
| |
| (34) This extension provides subroutines, but doesn't provide a stack to |
| push and pop parameters. How do we deal with this? NV_vertex_program3 |
| supported PUSHA/POPA instructions to push and pop address registers. |
| |
| RESOLVED: No explicit stack is required. A program can implement a |
| stack by allocating a temporary array plus a single integer temporary to |
| use as the stack "pointer". For example: |
| |
| TEMP stack[256]; # 256 4-component vectors |
| INT TEMP sp; # sp.x == stack pointer |
| INT TEMP cc; # condition code results |
| |
| function: |
| SGE.S.CC cc.x, sp.x, 256; # compute stackPointer >= 256 |
| RET NE.x; # return if TRUE |
| MOV stack[sp], R0; # push R0 onto the stack |
| ADD.S sp.x, sp.x, 1; |
| ... |
| SUB.S sp.x, sp.x, 1; # pop R0 off the stack |
| MOV R0, stack[sp]; |
| RET |
| |
| (35) Should we provide new vector semantics for previously-defined opcodes |
| (e.g., LG2 computes a component-wise logarithm)? |
| |
| RESOLVED: Not in this extension. The instructions we define here are |
| compatible with the vector or scalar nature of previously defined |
| opcodes. This simplifies the implementation of an assembler that needs |
| to support both old and new instruction sets. |
| |
| (36) Should it really be undefined to read from a register storing data of |
| one type with an instruction of the other type (e.g., to read the bits of |
| a floating-point number as an unsigned integer)? |
| |
| RESOLVED: The spec describes undefined results for simplicity. In |
| practice, mixing data types can be done, where signed integers are |
| represented as two's complement integers and floating-point numbers are |
| represented using IEEE single-precision representation. For example: |
| |
| TEMP R0, R1; # typeless |
| MOV.U R0, 0x3F800000; # R0 = 1.0 |
| MOV.U R1, 0xBF800000; # R1 = -1.0 |
| MUL.F R0, R0, R1; # R0 = -1 * 1 = -1 (0xBF800000) |
| XOR.U R0, R0, R1; # R0 = 0xBF800000 ^ 0xBF800000 = 0 |
| NOT.U R0, R0; # R0 = 0xFFFFFFFF |
| I2F.S R0, R0; # R0 = -1.0 (0xFFFFFFFF = -1 signed) |
| SEQ.F R0, R0, R1; # R0 = 1.0 (-1.0 == -1.0) |
| |
| (37) Buffer objects can be sourced as program parameters using the |
| NV_parameter_buffer_object extension. How are they accessed in a program? |
| |
| RESOLVED: The instruction set and existing program environment and |
| local parameter bindings operate largely on four-component vectors. |
| However, NV_parameter_buffer_object exposes the ability to reach into |
| buffers consisting of user-generated data or data written to the buffer |
| object by the GPU. Such data sets may not consist entirely |
| four-component floating-point vectors, so a four-component vector API |
| may be unnatural. An application might need to reformat its data set to |
| deal with this issue. Or it might generate odd code to compensate for |
| mis-alignment -- for example, reading an array of 3-component vectors by |
| doing two four-component vector accesses and then rotating based on |
| alignment. Neither approach is particularly satisfying. |
| |
| Instead, this extension takes the approach of treating parameter buffers |
| as array of scalar words. When an individual buffer element is read, |
| the single word is replicated to produce a four-component vector. To |
| access an array of 3-component vectors, code like the following can be |
| used: |
| |
| PARAM buffer[] = { program.buffer[0] }; |
| INT TEMP index; |
| TEMP R0; |
| ... |
| MUL.S index, index, 3; # to read "vec3" #X, compute 3*X |
| MOV R0.x, buffer[index+0]; |
| MOV R0.y, buffer[index+1]; |
| MOV R0.z, buffer[index+2]; |
| |
| (38) Should recursion be allowed? If so, how is the total amount of |
| recursion limited? |
| |
| RESOLVED: Recursion is allowed, and a call stack is provided by the |
| implementation. The size of the call stack is limited to the |
| implementation-dependent constant MAX_PROGRAM_CALL_DEPTH, and when a the |
| call stack is full, the results of further CAL instructions is |
| undefined. In the initial implementation of this extension, such |
| instructions will have no effect. |
| |
| Note that no stack is provided to hold local registers; a program may |
| implement its own via a temporary array and integer stack "pointer". |
| |
| (39) Variables are all four-component vectors in previous extensions. |
| Should scalar or small-vector variables be provided? |
| |
| RESOLVED: It would be a useful feature, but it was left out for |
| simplicity. In practice, a variable where only the X component is used |
| will be equivalent to a scalar. |
| |
| (40) The PK* (pack) and UP* (unpack) instructions allow packing multiple |
| components of data into a single component. The bit packing is |
| well-defined. Should we require specific data types (e.g., unsigned |
| integer) to hold packed values? |
| |
| RESOLVED: No. Previous instruction sets only allowed programs to write |
| packed values to a floating-point variable (the only data type |
| provided). We will allow packed results to be written to a variable of |
| any data type. Integer instructions can be used to manipulate bits of |
| packed data in place. |
| |
| (41) What happens when converting integers to floats or vice versa if |
| there is insufficient precision or range to represent the result? |
| |
| RESOLVED: For integer-to-float conversions, the nearest representable |
| floating-point value is used, and the least significant bits of the |
| original integer value are lost. For float-to-integer conversions, |
| out-of-range values are clamped to the nearest representable integer. |
| |
| (42) Why are some of the grammar rules so bizarre (e.g., attribUseD, |
| attribUseV, attribUseS, attribUseVNS)? |
| |
| RESOLVED: This grammar is based upon the original ARB_vertex_program |
| grammar, which has a number of "interesting" characteristics. For |
| example, some of the bindings provided by ARB_vertex_program naturally |
| require some amount of lookahead. For example, a vertex program can |
| write an output color using any of the following: |
| |
| MOV result.color, 0; # primary color |
| MOV result.color.primary, 0; # primary color again |
| MOV result.color.secondary, 0; # secondary color this time |
| |
| The pieces of the color binding are separated by "." tokens. However, |
| writemasks are also supported, which also use "." before the write |
| mask. So, we could also have something like: |
| |
| MOV result.color.xyz, 0; # primary color with W masked off |
| |
| In this form, a parser needs to look at both the "." and the "xyz" to |
| determine that the binding being used is "result.color" (and not |
| "result.color.secondary"). |
| |
| Additionally, some checks that should probably be semantic errors (e.g., |
| allowing different swizzle or scalar operand selectors per instruction, |
| or disallowing both in the case of SWZ) we specified in the original |
| grammar. |
| |
| ARB_fragment_program and subsequent NVIDIA instructions built upon this, |
| and the grammar for this extension was rewritten in the current form so |
| it could be validated more easily. |
| |
| (43) This is an NV extension (NV_gpu_program4). Why does the |
| MAX_PROGRAM_TEXEL_OFFSET_EXT token has an "EXT" suffix? |
| |
| RESOLVED: This token is shared between this extension and the |
| comparable high-level GLSL programmability extension (EXT_gpu_shader4). |
| Rather than provide a duplicate set of token names, we simply use the |
| EXT version here. |
| |
| (44) For the purposes of determining the number of attribute and result |
| components, how are "scalar" attributes counted. For example, only |
| the x component of the "pointsize" per-vertex output is actually |
| relevant. |
| |
| RESOLVED: Implementations are allowed to count all inputs and outputs |
| as full four-component vectors. To avoid this, apply appropriate write |
| masks or swizzles. |
| |
| For example, writing to "result.pointsize" may count as four components. |
| Consistently writing to "result.pointsize.x" may only count as one. |
| Similarly, reading a fragment's fog coordinate as "fragment.fogcoord" |
| may count as four components; "fragment.fogcoord.x" will only count as |
| one. |
| |
| Revision History |
| |
| Rev. Date Author Changes |
| ---- -------- -------- -------------------------------------------- |
| 11 09/11/14 pbrown Fix cut-and-paste error in PK2US section. |
| |
| 10 12/14/09 mgodse Added GLX protocol. |
| |
| 9 10/29/09 pbrown Add language for previously undocumented errors |
| when using "SHORT" and "LONG" modifiers on |
| variable declarations. They're allowed only on |
| "TEMP" statements, except that "SHORT" is |
| allowed for "OUTPUT" as well. |
| |
| 8 08/11/08 jbreton Clarified that when a MOD instruction is |
| performed on negative operands the result is |
| undefined. |
| |
| 7 07/29/08 pbrown Discovered additional issues with texture wrap |
| handling, replaced with logic that applies wrap |
| modes per sample. Add a few instruction |
| pseudo-code lines explicitly identifying |
| undefined components. |
| |
| 6 05/02/08 pbrown Fix the prototype for the internal TexelFetch() |
| function used in the spec language; texel |
| coordinates are signed integers. |
| |
| 5 02/22/08 pbrown Clarified that when counting attribute/result |
| components, irrelevant/undefined components |
| can still count against the limits. |
| |
| 4 02/04/08 pbrown Fix errors in texture wrap mode handling. |
| Added a missing clamp to avoid sampling border |
| in REPEAT mode. Fixed incorrectly specified |
| weights for LINEAR filtering. |
| |
| 3 02/09/07 pbrown Updated status section (now released). |
| |
| 2 10/19/06 pbrown Change the token suffix for maximum texel offset |
| values from NV to EXT, since it is shared with |
| EXT_gpu_shader4. Clarify what happens on a |
| negate of an unsigned value. Fix typo in data |
| type modifier description. Add missing |
| description of the "BUFFER4" declaration |
| keyword. |
| |
| 1 pbrown Internal spec development. |