blob: 204441683a3a4f193250aae8061f616f57ac7518 [file] [log] [blame]
Name
NV_gpu_program4
Name Strings
GL_NV_gpu_program4
Contact
Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com)
Status
Shipping for GeForce 8 Series (November 2006)
Version
Last Modified Date: 09/11/2014
NVIDIA Revision: 11
Number
322
Dependencies
This extension is written against to OpenGL 2.0 specification.
OpenGL 2.0 is not required, but we expect all implementations of this
extension will also support OpenGL 2.0.
This extension is also written against the ARB_vertex_program
specification, which provides the basic mechanisms for the assembly
programming model used by this extension.
This extension serves as the basis for the NV_fragment_program4,
NV_geometry_program4, and NV_vertex_program4, which all build on this
extension to support fragment, geometry, and vertex programs,
respectively. If "GL_NV_gpu_program4" is found in the extension string,
all of these extensions are supported.
NV_parameter_buffer_object affects the definition of this extension.
ARB_texture_rectangle trivially affects the definition of this extension.
EXT_gpu_program_parameters trivially affects the definition of this
extension.
EXT_texture_integer trivially affects the definition of this extension.
EXT_texture_array trivially affects the definition of this extension.
EXT_texture_buffer_object trivially affects the definition of this
extension.
NV_primitive_restart trivially affects the definition of this extension.
Overview
This specification documents the common instruction set and basic
functionality provided by NVIDIA's 4th generation of assembly instruction
sets supporting programmable graphics pipeline stages.
The instruction set builds upon the basic framework provided by the
ARB_vertex_program and ARB_fragment_program extensions to expose
considerably more capable hardware. In addition to new capabilities for
vertex and fragment programs, this extension provides a new program type
(geometry programs) further described in the NV_geometry_program4
specification.
NV_gpu_program4 provides a unified instruction set -- all instruction set
features are available for all program types, except for a small number of
features that make sense only for a specific program type. It provides
fully capable signed and unsigned integer data types, along with a set of
arithmetic, logical, and data type conversion instructions capable of
operating on integers. It also provides a uniform set of structured
branching constructs (if tests, loops, and subroutines) that fully support
run-time condition testing.
This extension provides several new texture mapping capabilities. Shadow
cube maps are supported, where cube map faces can encode depth values.
Texture lookup instructions can include an immediate texel offset, which
can assist in advanced filtering. New instructions are provided to fetch
a single texel by address in a texture map (TXF) and query the size of a
specified texture level (TXQ).
By and large, vertex and fragment programs written to ARB_vertex_program
and ARB_fragment_program can be ported directly by simply changing the
program header from "!!ARBvp1.0" or "!!ARBfp1.0" to "!!NVvp4.0" or
"!!NVfp4.0", and then modifying the code to take advantage of the expanded
feature set. There are a small number of areas where this extension is
not a functional superset of previous vertex program extensions, which are
documented in this specification.
New Procedures and Functions
void ProgramLocalParameterI4iNV(enum target, uint index,
int x, int y, int z, int w);
void ProgramLocalParameterI4ivNV(enum target, uint index,
const int *params);
void ProgramLocalParametersI4ivNV(enum target, uint index,
sizei count, const int *params);
void ProgramLocalParameterI4uiNV(enum target, uint index,
uint x, uint y, uint z, uint w);
void ProgramLocalParameterI4uivNV(enum target, uint index,
const uint *params);
void ProgramLocalParametersI4uivNV(enum target, uint index,
sizei count, const uint *params);
void ProgramEnvParameterI4iNV(enum target, uint index,
int x, int y, int z, int w);
void ProgramEnvParameterI4ivNV(enum target, uint index,
const int *params);
void ProgramEnvParametersI4ivNV(enum target, uint index,
sizei count, const int *params);
void ProgramEnvParameterI4uiNV(enum target, uint index,
uint x, uint y, uint z, uint w);
void ProgramEnvParameterI4uivNV(enum target, uint index,
const uint *params);
void ProgramEnvParametersI4uivNV(enum target, uint index,
sizei count, const uint *params);
void GetProgramLocalParameterIivNV(enum target, uint index,
int *params);
void GetProgramLocalParameterIuivNV(enum target, uint index,
uint *params);
void GetProgramEnvParameterIivNV(enum target, uint index,
int *params);
void GetProgramEnvParameterIuivNV(enum target, uint index,
uint *params);
New Tokens
Accepted by the <pname> parameter of GetBooleanv, GetIntegerv,
GetFloatv, and GetDoublev:
MIN_PROGRAM_TEXEL_OFFSET_EXT 0x8904
MAX_PROGRAM_TEXEL_OFFSET_EXT 0x8905
(note: these tokens are shared with the EXT_gpu_shader4 extension.)
Accepted by the <pname> parameter of GetProgramivARB:
PROGRAM_ATTRIB_COMPONENTS_NV 0x8906
PROGRAM_RESULT_COMPONENTS_NV 0x8907
MAX_PROGRAM_ATTRIB_COMPONENTS_NV 0x8908
MAX_PROGRAM_RESULT_COMPONENTS_NV 0x8909
MAX_PROGRAM_GENERIC_ATTRIBS_NV 0x8DA5
MAX_PROGRAM_GENERIC_RESULTS_NV 0x8DA6
Additions to Chapter 2 of the OpenGL 1.5 Specification (OpenGL Operation)
(Modify "Section 2.14.1" of the ARB_vertex_program specification,
describing program parameters.)
Each program object has an associated array of program local parameters.
Program local parameters are four-component vectors whose components can
hold floating-point, signed integer, or unsigned integer values. The data
type of each local parameter is established when the parameter's values
are assigned. If a program attempts to read a local parameter using a
data type other than the one used when the parameter is set, the values
returned are undefined. ... The commands
void ProgramLocalParameter4fARB(enum target, uint index,
float x, float y, float z, float w);
void ProgramLocalParameter4fvARB(enum target, uint index,
const float *params);
void ProgramLocalParameter4dARB(enum target, uint index,
double x, double y, double z, double w);
void ProgramLocalParameter4dvARB(enum target, uint index,
const double *params);
void ProgramLocalParameterI4iNV(enum target, uint index,
int x, int y, int z, int w);
void ProgramLocalParameterI4ivNV(enum target, uint index,
const int *params);
void ProgramLocalParameterI4uiNV(enum target, uint index,
uint x, uint y, uint z, uint w);
void ProgramLocalParameterI4uivNV(enum target, uint index,
const uint *params);
update the values of the program local parameter numbered <index>
belonging to the program object currently bound to <target>. For the
non-vector versions of these commands, the four components of the
parameter are updated with the values of <x>, <y>, <z>, and <w>,
respectively. For the vector versions, the components of the parameter
are updated with the array of four values pointed to by <params>. The
error INVALID_VALUE is generated if <index> is greater than or equal to
the number of program local parameters supported by <target>.
The commands
void ProgramLocalParameters4fvNV(enum target, uint index,
sizei count, const float *params);
void ProgramLocalParametersI4ivNV(enum target, uint index,
sizei count, const int *params);
void ProgramLocalParametersI4uivNV(enum target, uint index,
sizei count, const uint *params);
update the values of the program local parameters numbered <index> through
<index> + <count> - 1 with the array of 4 * <count> values pointed to by
<params>. The error INVALID_VALUE is generated if the sum of <index> and
<count> is greater than the number of program local parameters supported
by <target>.
When a program local parameter is updated, the data type of its components
is assigned according to the data type of the provided values. If values
provided are of type "float" or "double", the components of the parameter
are floating-point. If the values provided are of type "int", the
components of the parameter are signed integers. If the values provided
are of type "uint", the components of the parameter are unsigned integers.
Additionally, each program target has an associated array of program
environment parameters. Unlike program local parameters, program
environment parameters are shared by all program objects of a given
target. Program environment parameters are four-component vectors whose
components can hold floating-point, signed integer, or unsigned integer
values. The data type of each environment parameter is established when
the parameter's values are assigned. If a program attempts to read an
environment parameter using a data type other than the one used when the
parameter is set, the values returned are undefined. ... The commands
void ProgramEnvParameter4fARB(enum target, uint index,
float x, float y, float z, float w);
void ProgramEnvParameter4fvARB(enum target, uint index,
const float *params);
void ProgramEnvParameter4dARB(enum target, uint index,
double x, double y, double z, double w);
void ProgramEnvParameter4dvARB(enum target, uint index,
const double *params);
void ProgramEnvParameterI4iNV(enum target, uint index,
int x, int y, int z, int w);
void ProgramEnvParameterI4ivNV(enum target, uint index,
const int *params);
void ProgramEnvParameterI4uiNV(enum target, uint index,
uint x, uint y, uint z, uint w);
void ProgramEnvParameterI4uivNV(enum target, uint index,
const uint *params);
update the values of the program environment parameter numbered <index>
for the given program target <target>. For the non-vector versions of
these commands, the four components of the parameter are updated with the
values of <x>, <y>, <z>, and <w>, respectively. For the vector versions,
the four components of the parameter are updated with the array of four
values pointed to by <params>. The error INVALID_VALUE is generated if
<index> is greater than or equal to the number of program environment
parameters supported by <target>.
The commands
void ProgramEnvParameters4fvNV(enum target, uint index,
sizei count, const float *params);
void ProgramEnvParametersI4ivNV(enum target, uint index,
sizei count, const int *params);
void ProgramEnvParametersI4uivNV(enum target, uint index,
sizei count, const uint *params);
update the values of the program environment parameters numbered <index>
through <index> + <count> - 1 with the array of 4 * <count> values pointed
to by <params>. The error INVALID_VALUE is generated if the sum of
<index> and <count> is greater than the number of program local parameters
supported by <target>.
When a program environment parameter is updated, the data type of its
components is assigned according to the data type of the provided values.
If values provided are of type "float" or "double", the components of the
parameter are floating-point. If the values provided are of type "int",
the components of the parameter are signed integers. If the values
provided are of type "uint", the components of the parameter are unsigned
integers.
...
Insert New Section 2.X between Sections 2.Y and 2.Z:
Section 2.X, GPU Programs
The GL provides a number of different program targets that allow an
application to either replace certain fixed-function pipeline stages with
a fully programmable model or use a program to control aspects of the GL
pipeline that previously had only hard-wired behavior.
A common base instruction set is available for all program types,
providing both integer and floating-point operations. Structured
branching operations and subroutine calls are available. Texture
mapping (loading data from external images) is supported for all
program types. The main differences between the different program
types are the set of available inputs and outputs, which are program type-
specific, and a few instructions that are meaningful for only a subset
of program types.
Section 2.X.2, Program Grammar
GPU program strings are specified as an array of ASCII characters
containing the program text. When a GPU program is loaded by a call to
ProgramStringARB, the program string is parsed into a set of tokens
possibly separated by whitespace. Spaces, tabs, newlines, carriage
returns, and comments are considered whitespace. Comments begin with the
character "#" and are terminated by a newline, a carriage return, or the
end of the program array.
The Backus-Naur Form (BNF) grammar below specifies the syntactically valid
sequences for GPU programs. The set of valid tokens can be inferred
from the grammar. A line containing "/* empty */" represents an empty
string and is used to indicate optional rules. A program is invalid if it
contains any tokens or characters not defined in this specification.
Note that this extension is not a standalone extension and a small number
of grammar rules are left to be defined in the extensions defining the
specific vertex, fragment, and geometry program types.
<program> ::= <optionSequence> <declSequence>
<statementSequence> "END"
<optionSequence> ::= <option> <optionSequence>
| /* empty */
<option> ::= "OPTION" <identifier> ";"
<declSequence> ::= /* empty */
<statementSequence> ::= <statement> <statementSequence>
| /* empty */
<statement> ::= <instruction> ";"
| <namingStatement> ";"
| <instLabel> ":"
<instruction> ::= <ALUInstruction>
| <TexInstruction>
| <FlowInstruction>
<ALUInstruction> ::= <VECTORop_instruction>
| <SCALARop_instruction>
| <BINSCop_instruction>
| <BINop_instruction>
| <VECSCAop_instruction>
| <TRIop_instruction>
| <SWZop_instruction>
<TexInstruction> ::= <TEXop_instruction>
| <TXDop_instruction>
<FlowInstruction> ::= <BRAop_instruction>
| <FLOWCCop_instruction>
| <IFop_instruction>
| <REPop_instruction>
| <ENDFLOWop_instruction>
<VECTORop_instruction> ::= <VECTORop> <opModifiers> <instResult> ","
<instOperandV>
<VECTORop> ::= "ABS"
| "CEIL"
| "FLR"
| "FRC"
| "I2F"
| "LIT"
| "MOV"
| "NOT"
| "NRM"
| "PK2H"
| "PK2US"
| "PK4B"
| "PK4UB"
| "ROUND"
| "SSG"
| "TRUNC"
<SCALARop_instruction> ::= <SCALARop> <opModifiers> <instResult> ","
<instOperandS>
<SCALARop> ::= "COS"
| "EX2"
| "LG2"
| "RCC"
| "RCP"
| "RSQ"
| "SCS"
| "SIN"
| "UP2H"
| "UP2US"
| "UP4B"
| "UP4UB"
<BINSCop_instruction> ::= <BINSCop> <opModifiers> <instResult> ","
<instOperandS> "," <instOperandS>
<BINSCop> ::= "POW"
<VECSCAop_instruction> ::= <VECSCAop> <opModifiers> <instResult> ","
<instOperandV> "," <instOperandS>
<VECSCAop> ::= "DIV"
| "SHL"
| "SHR"
| "MOD"
<BINop_instruction> ::= <BINop> <opModifiers> <instResult> ","
<instOperandV> "," <instOperandV>
<BINop> ::= "ADD"
| "AND"
| "DP3"
| "DP4"
| "DPH"
| "DST"
| "MAX"
| "MIN"
| "MUL"
| "OR"
| "RFL"
| "SEQ"
| "SFL"
| "SGE"
| "SGT"
| "SLE"
| "SLT"
| "SNE"
| "STR"
| "SUB"
| "XPD"
| "DP2"
| "XOR"
<TRIop_instruction> ::= <TRIop> <opModifiers> <instResult> ","
<instOperandV> "," <instOperandV> ","
<instOperandV>
<TRIop> ::= "CMP"
| "DP2A"
| "LRP"
| "MAD"
| "SAD"
| "X2D"
<SWZop_instruction> ::= <SWZop> <opModifiers> <instResult> ","
<instOperandVNS> "," <extendedSwizzle>
<SWZop> ::= "SWZ"
<TEXop_instruction> ::= <TEXop> <opModifiers> <instResult> ","
<instOperandV> "," <texAccess>
<TEXop> ::= "TEX"
| "TXB"
| "TXF"
| "TXL"
| "TXP"
| "TXQ"
<TXDop_instruction> ::= <TXDop> <opModifiers> <instResult> ","
<instOperandV> "," <instOperandV> ","
<instOperandV> "," <texAccess>
<TXDop> ::= "TXD"
<BRAop_instruction> ::= <BRAop> <opModifiers> <instTarget>
<optBranchCond>
<BRAop> ::= "CAL"
<FLOWCCop_instruction> ::= <FLOWCCop> <opModifiers> <optBranchCond>
<FLOWCCop> ::= "RET"
| "BRK"
| "CONT"
<IFop_instruction> ::= <IFop> <opModifiers> <ccTest>
<IFop> ::= "IF"
<REPop_instruction> ::= <REPop> <opModifiers> <instOperandV>
| <REPop> <opModifiers>
<REPop> ::= "REP"
<ENDFLOWop_instruction> ::= <ENDFLOWop> <opModifiers>
<ENDFLOWop> ::= "ELSE"
| "ENDIF"
| "ENDREP"
<opModifiers> ::= <opModifierItem> <opModifiers>
| /* empty */
<opModifierItem> ::= "." <opModifier>
<opModifier> ::= "F"
| "U"
| "S"
| "CC"
| "CC0"
| "CC1"
| "SAT"
| "SSAT"
| "NTC"
| "S24"
| "U24"
| "HI"
<texAccess> ::= <texImageUnit> "," <texTarget>
| <texImageUnit> "," <texTarget> "," <texOffset>
<texImageUnit> ::= "texture" <optArrayMemAbs>
<texTarget> ::= "1D"
| "2D"
| "3D"
| "CUBE"
| "RECT"
| "SHADOW1D"
| "SHADOW2D"
| "SHADOWRECT"
| "ARRAY1D"
| "ARRAY2D"
| "SHADOWCUBE"
| "SHADOWARRAY1D"
| "SHADOWARRAY2D"
<texOffset> ::= "(" <texOffsetComp> ")"
| "(" <texOffsetComp> "," <texOffsetComp> ")"
| "(" <texOffsetComp> "," <texOffsetComp> ","
<texOffsetComp> ")"
<texOffsetComp> ::= <optSign> <int>
<optBranchCond> ::= /* empty */
| <ccMask>
<instOperandV> ::= <instOperandAbsV>
| <instOperandBaseV>
<instOperandAbsV> ::= <operandAbsNeg> "|" <instOperandBaseV> "|"
<instOperandBaseV> ::= <operandNeg> <attribUseV>
| <operandNeg> <tempUseV>
| <operandNeg> <paramUseV>
| <operandNeg> <bufferUseV>
<instOperandS> ::= <instOperandAbsS>
| <instOperandBaseS>
<instOperandAbsS> ::= <operandAbsNeg> "|" <instOperandBaseS> "|"
<instOperandBaseS> ::= <operandNeg> <attribUseS>
| <operandNeg> <tempUseS>
| <operandNeg> <paramUseS>
| <operandNeg> <bufferUseS>
<instOperandVNS> ::= <attribUseVNS>
| <tempUseVNS>
| <paramUseVNS>
| <bufferUseVNS>
<operandAbsNeg> ::= <optSign>
<operandNeg> ::= <optSign>
<instResult> ::= <instResultCC>
| <instResultBase>
<instResultCC> ::= <instResultBase> <ccMask>
<instResultBase> ::= <tempUseW>
| <resultUseW>
<namingStatement> ::= <varMods> <ATTRIB_statement>
| <varMods> <PARAM_statement>
| <varMods> <TEMP_statement>
| <varMods> <OUTPUT_statement>
| <varMods> <BUFFER_statement>
| <ALIAS_statement>
<ATTRIB_statement> ::= "ATTRIB" <establishName> "=" <attribUseD>
<PARAM_statement> ::= <PARAM_singleStmt>
| <PARAM_multipleStmt>
<PARAM_singleStmt> ::= "PARAM" <establishName> <paramSingleInit>
<PARAM_multipleStmt> ::= "PARAM" <establishName> <optArraySize>
<paramMultipleInit>
<paramSingleInit> ::= "=" <paramUseDB>
<paramMultipleInit> ::= "=" "{" <paramMultInitList> "}"
<paramMultInitList> ::= <paramUseDM>
| <paramUseDM> "," <paramMultInitList>
<TEMP_statement> ::= "TEMP" <varNameList>
<OUTPUT_statement> ::= "OUTPUT" <establishName> "=" <resultUseD>
<varMods> ::= <varModifier> <varMods>
| /* empty */
<varModifier> ::= "SHORT"
| "LONG"
| "INT"
| "UINT"
| "FLOAT"
<ALIAS_statement> ::= "ALIAS" <establishName> "=" <establishedName>
<BUFFER_statement> ::= <bufferDeclType> <establishName> "="
<bufferSingleInit>
| <bufferDeclType> <establishName>
<optArraySize> "=" <bufferMultInit>
<bufferDeclType> ::= "BUFFER"
| "BUFFER4"
<bufferSingleInit> ::= "=" <bufferUseDB>
<bufferMultInit> ::= "=" "{" <bufferMultInitList> "}"
<bufferMultInitList> ::= <bufferUseDM>
| <bufferUseDM> "," <bufferMultInitList>
<varNameList> ::= <establishName>
| <establishName> "," <varNameList>
<attribUseV> ::= <attribBasic> <swizzleSuffix>
| <attribVarName> <swizzleSuffix>
| <attribVarName> <arrayMem> <swizzleSuffix>
| <attribColor> <swizzleSuffix>
| <attribColor> "." <colorType> <swizzleSuffix>
<attribUseS> ::= <attribBasic> <scalarSuffix>
| <attribVarName> <scalarSuffix>
| <attribVarName> <arrayMem> <scalarSuffix>
| <attribColor> <scalarSuffix>
| <attribColor> "." <colorType> <scalarSuffix>
<attribUseVNS> ::= <attribBasic>
| <attribVarName>
| <attribVarName> <arrayMem>
| <attribColor>
| <attribColor> "." <colorType>
<attribUseD> ::= <attribBasic>
| <attribColor>
| <attribColor> "." <colorType>
| <attribMulti>
<paramUseV> ::= <paramVarName> <optArrayMem> <swizzleSuffix>
| <stateSingleItem> <swizzleSuffix>
| <programSingleItem> <swizzleSuffix>
| <constantVector> <swizzleSuffix>
| <constantScalar>
<paramUseS> ::= <paramVarName> <optArrayMem> <scalarSuffix>
| <stateSingleItem> <scalarSuffix>
| <programSingleItem> <scalarSuffix>
| <constantVector> <scalarSuffix>
| <constantScalar>
<paramUseVNS> ::= <paramVarName> <optArrayMem>
| <stateSingleItem>
| <programSingleItem>
| <constantVector>
| <constantScalar>
<paramUseDB> ::= <stateSingleItem>
| <programSingleItem>
| <constantVector>
| <signedConstantScalar>
<paramUseDM> ::= <stateMultipleItem>
| <programMultipleItem>
| <constantVector>
| <signedConstantScalar>
<stateMultipleItem> ::= <stateSingleItem>
| "state" "." <stateMatrixRows>
<stateSingleItem> ::= "state" "." <stateMaterialItem>
| "state" "." <stateLightItem>
| "state" "." <stateLightModelItem>
| "state" "." <stateLightProdItem>
| "state" "." <stateFogItem>
| "state" "." <stateMatrixRow>
| "state" "." <stateTexGenItem>
| "state" "." <stateClipPlaneItem>
| "state" "." <statePointItem>
| "state" "." <stateTexEnvItem>
| "state" "." <stateDepthItem>
<stateMaterialItem> ::= "material" "." <stateMatProperty>
| "material" "." <faceType> "."
<stateMatProperty>
<stateMatProperty> ::= "ambient"
| "diffuse"
| "specular"
| "emission"
| "shininess"
<stateLightItem> ::= "light" <arrayMemAbs> "." <stateLightProperty>
<stateLightProperty> ::= "ambient"
| "diffuse"
| "specular"
| "position"
| "attenuation"
| "spot" "." <stateSpotProperty>
| "half"
<stateSpotProperty> ::= "direction"
<stateLightModelItem> ::= "lightmodel" "." <stateLModProperty>
<stateLModProperty> ::= "ambient"
| "scenecolor"
| <faceType> "." "scenecolor"
<stateLightProdItem> ::= "lightprod" <arrayMemAbs> "."
<stateLProdProperty>
| "lightprod" <arrayMemAbs> "." <faceType> "."
<stateLProdProperty>
<stateLProdProperty> ::= "ambient"
| "diffuse"
| "specular"
<stateFogItem> ::= "fog" "." <stateFogProperty>
<stateFogProperty> ::= "color"
| "params"
<stateMatrixRows> ::= <stateMatrixItem>
| <stateMatrixItem> "." <stateMatModifier>
| <stateMatrixItem> "." "row" <arrayRange>
| <stateMatrixItem> "." <stateMatModifier> "."
"row" <arrayRange>
<stateMatrixRow> ::= <stateMatrixItem> "." "row" <arrayMemAbs>
| <stateMatrixItem> "." <stateMatModifier> "."
"row" <arrayMemAbs>
<stateMatrixItem> ::= "matrix" "." <stateMatrixName>
<stateMatModifier> ::= "inverse"
| "transpose"
| "invtrans"
<stateMatrixName> ::= "modelview" <optArrayMemAbs>
| "projection"
| "mvp"
| "texture" <optArrayMemAbs>
| "program" <arrayMemAbs>
<stateTexGenItem> ::= "texgen" <optArrayMemAbs> "."
<stateTexGenType> "." <stateTexGenCoord>
<stateTexGenType> ::= "eye"
| "object"
<stateTexGenCoord> ::= "s"
| "t"
| "r"
| "q"
<stateClipPlaneItem> ::= "clip" <arrayMemAbs> "." "plane"
<statePointItem> ::= "point" "." <statePointProperty>
<statePointProperty> ::= "size"
| "attenuation"
<stateTexEnvItem> ::= "texenv" <optArrayMemAbs> "."
<stateTexEnvProperty>
<stateTexEnvProperty> ::= "color"
<stateDepthItem> ::= "depth" "." <stateDepthProperty>
<stateDepthProperty> ::= "range"
<programSingleItem> ::= <progEnvParam>
| <progLocalParam>
<programMultipleItem> ::= <progEnvParams>
| <progLocalParams>
<progEnvParams> ::= "program" "." "env" <arrayMemAbs>
| "program" "." "env" <arrayRange>
<progEnvParam> ::= "program" "." "env" <arrayMemAbs>
<progLocalParams> ::= "program" "." "local" <arrayMemAbs>
| "program" "." "local" <arrayRange>
<progLocalParam> ::= "program" "." "local" <arrayMemAbs>
<constantVector> ::= "{" <constantVectorList> "}"
<constantVectorList> ::= <signedConstantScalar>
| <signedConstantScalar> ","
<signedConstantScalar>
| <signedConstantScalar> ","
<signedConstantScalar> ","
<signedConstantScalar>
| <signedConstantScalar> ","
<signedConstantScalar> ","
<signedConstantScalar> ","
<signedConstantScalar>
<signedConstantScalar> ::= <optSign> <constantScalar>
<constantScalar> ::= <floatConstant>
| <intConstant>
<floatConstant> ::= <float>
<intConstant> ::= <int>
<tempUseV> ::= <tempVarName> <swizzleSuffix>
<tempUseS> ::= <tempVarName> <scalarSuffix>
<tempUseVNS> ::= <tempVarName>
<tempUseW> ::= <tempVarName> <optWriteMask>
<resultUseW> ::= <resultBasic> <optWriteMask>
| <resultVarName> <optWriteMask>
<resultUseD> ::= <resultBasic>
<bufferUseV> ::= <bufferVarName> <optArrayMem> <swizzleSuffix>
<bufferUseS> ::= <bufferVarName> <optArrayMem> <scalarSuffix>
<bufferUseVNS> ::= <bufferVarName> <optArrayMem>
<bufferUseDB> ::= <bufferBinding> <arrayMemAbs>
<bufferUseDM> ::= <bufferBinding> <arrayMemAbs>
| <bufferBinding> <arrayRange>
| <bufferBinding>
<bufferBinding> ::= "program" "." "buffer" <arrayMemAbs>
<optArraySize> ::= "[" "]"
| "[" <int> "]"
<optArrayMem> ::= /* empty */
| <arrayMem>
<arrayMem> ::= <arrayMemAbs>
| <arrayMemRel>
<optArrayMemAbs> ::= /* empty */
| <arrayMemAbs>
<arrayMemAbs> ::= "[" <int> "]"
<arrayMemRel> ::= "[" <arrayMemReg> <arrayMemOffset> "]"
<arrayMemReg> ::= <addrUseS>
<arrayMemOffset> ::= /* empty */
| "+" <int>
| "-" <int>
<arrayRange> ::= "[" <int> ".." <int> "]"
<addrUseS> ::= <addrVarName> <scalarSuffix>
<ccMask> ::= "(" <ccTest> ")"
<ccTest> ::= <ccMaskRule> <swizzleSuffix>
<ccMaskRule> ::= "EQ"
| "GE"
| "GT"
| "LE"
| "LT"
| "NE"
| "TR"
| "FL"
| "EQ0"
| "GE0"
| "GT0"
| "LE0"
| "LT0"
| "NE0"
| "TR0"
| "FL0"
| "EQ1"
| "GE1"
| "GT1"
| "LE1"
| "LT1"
| "NE1"
| "TR1"
| "FL1"
| "NAN"
| "NAN0"
| "NAN1"
| "LEG"
| "LEG0"
| "LEG1"
| "CF"
| "CF0"
| "CF1"
| "NCF"
| "NCF0"
| "NCF1"
| "OF"
| "OF0"
| "OF1"
| "NOF"
| "NOF0"
| "NOF1"
| "AB"
| "AB0"
| "AB1"
| "BLE"
| "BLE0"
| "BLE1"
| "SF"
| "SF0"
| "SF1"
| "NSF"
| "NSF0"
| "NSF1"
<optWriteMask> ::= /* empty */
| <xyzwMask>
| <rgbaMask>
<xyzwMask> ::= "." "x"
| "." "y"
| "." "xy"
| "." "z"
| "." "xz"
| "." "yz"
| "." "xyz"
| "." "w"
| "." "xw"
| "." "yw"
| "." "xyw"
| "." "zw"
| "." "xzw"
| "." "yzw"
| "." "xyzw"
<rgbaMask> ::= "." "r"
| "." "g"
| "." "rg"
| "." "b"
| "." "rb"
| "." "gb"
| "." "rgb"
| "." "a"
| "." "ra"
| "." "ga"
| "." "rga"
| "." "ba"
| "." "rba"
| "." "gba"
| "." "rgba"
<swizzleSuffix> ::= /* empty */
| "." <component>
| "." <xyzwSwizzle>
| "." <rgbaSwizzle>
<extendedSwizzle> ::= <extSwizComp> "," <extSwizComp> ","
<extSwizComp> "," <extSwizComp>
<extSwizComp> ::= <optSign> <xyzwExtSwizSel>
| <optSign> <rgbaExtSwizSel>
<xyzwExtSwizSel> ::= "0"
| "1"
| <xyzwComponent>
<rgbaExtSwizSel> ::= <rgbaComponent>
<scalarSuffix> ::= "." <component>
<component> ::= <xyzwComponent>
| <rgbaComponent>
<xyzwComponent> ::= "x"
| "y"
| "z"
| "w"
<rgbaComponent> ::= "r"
| "g"
| "b"
| "a"
<optSign> ::= /* empty */
| "-"
| "+"
<faceType> ::= "front"
| "back"
<colorType> ::= "primary"
| "secondary"
<instLabel> ::= <identifier>
<instTarget> ::= <identifier>
<establishedName> ::= <identifier>
<establishName> ::= <identifier>
The <int> rule matches an integer constant. The integer consists of a
sequence of one or more digits ("0" through "9"), or a sequence in
hexadecimal form beginning with "0x" followed by a sequence of one or more
hexadecimal digits ("0" through "9", "a" through "f", "A" through "F").
The <float> rule matches a floating-point constant consisting of an
integer part, a decimal point, a fraction part, an "e" or "E", and an
optionally signed integer exponent. The integer and fraction parts both
consist of a sequence of one or more digits ("0" through "9"). Either the
integer part or the fraction parts (not both) may be missing; either the
decimal point or the "e" (or "E") and the exponent (not both) may be
missing. Most grammar rules that allow floating-point values also allow
integers matching the <int> rule.
The <identifier> rule matches a sequence of one or more letters ("A"
through "Z", "a" through "z"), digits ("0" through "9), underscores ("_"),
or dollar signs ("$"); the first character must not be a number. Upper
and lower case letters are considered different (names are
case-sensitive). The following strings are reserved keywords and may not
be used as identifiers: "fragment" (for fragment programs only), "vertex"
(for vertex and geometry programs), "primitive" (for fragment and geometry
programs), "program", "result", "state", and "texture".
The <tempVarName>, <paramVarName>, <attribVarName>, <resultVarName>, and
<bufferName> rules match identifiers that have been previously established
as names of temporary, program parameter, attribute, result, and program
parameter buffer variables, respectively.
The <xyzwSwizzle> and <rgbaSwizzle> rules match any 4-character strings
consisting only of the characters "x", "y", "z", and "w" (<xyzwSwizzle>)
or "r", "g", "b", "a" (<rgbaSwizzle>).
The error INVALID_OPERATION is generated if a program fails to load
because it is not syntactically correct or for one of the semantic
restrictions described in the following sections.
A successfully loaded program is parsed into a sequence of instructions.
Each instruction is identified by its tokenized name. The operation of
these instructions when executed is defined in section 2.X.4. A
successfully loaded program string replaces the program string previously
loaded into the specified program object. If the OUT_OF_MEMORY error is
generated by ProgramStringARB, no change is made to the previous contents
of the current program object.
Section 2.X.3, Program Variables
Programs may operate on a number of different variables during their
execution. The following sections define the different classes of
variables that can be declared and used by a program.
Some variable classes require variable bindings. Variable classes with
bindings refer to state that is either generated or consumed outside the
program. Examples of variable bindings include a vertex's normal, the
position of a vertex computed by a vertex program, an interpolated texture
coordinate, and the diffuse color of light 1. Variables that are used
only during program execution do not have bindings.
Variables may be declared explicitly according to the <namingStatement>
grammar rule. Explicit variable declarations allow a program to establish
a variable name that can be used to refer to a specified resource in
subsequent instructions. Variables may be declared anywhere in the
program string, but must be declared prior to use. A program will fail to
load if it declares the same variable name more than once, or if it refers
to a variable name that has not been previously declared in the program
string.
Variables may also be declared implicitly, simply by using a variable
binding as an operand in a program instruction. Such uses are considered
to automatically create a nameless variable using the specified binding.
Only variable from classes with bindings can be declared implicitly.
Section 2.X.3.1, Program Variable Types
Explicit variable declarations may include one or more modifiers that
specify additional information about the variable, such as the size and
data type of the components of the variable. Variable modifiers are
specified according to the <varModifier> grammar rule.
By default, variables are considered typeless. They can be used in
instructions that read or write the variable as floating-point values,
signed integers, or unsigned integers. If a variable is written using one
data type but then read using a different one, the results of the
operation are undefined. Variables with bindings are considered to be
read or written when their values are produced or consumed; the data type
used by the GL is specified in the description of each binding.
Explicitly declared variables may optionally have one data type modifier,
which can be used to detect data type mismatch errors. Type modifers of
"INT", "UINT", and "FLOAT" indicate that the components of the variable
are stored as signed integers, unsigned integers, or floating-point
values, respectively. A program will fail to load if it attempts to read
or write a variable using a data type other than the one indicated by the
data type modifier. Variables without a data type modifier can be read or
written using any data type.
Explicitly declared variables may optionally have one storage size
modifier. Variables decared as "SHORT" will be represented using at least
16 bits per component. "SHORT" floating-point values will have at least 5
bits of exponent and 10 bits of mantissa. Variables declared as "LONG"
will be represented with at least 32 bits per component. "LONG"
floating-point values will have at least 8 bits of exponent and 23 bits of
mantissa. If no size modifier is provided, the GL will automatically
select component sizes. Implementations are not required to support more
than one component size, so "SHORT", "LONG", and the default could all
refer to the same component size. The "LONG" modifier is supported only
for declarations of temporary variables ("TEMP"). The "SHORT" modifier is
supported only for declarations of temporary variables and result
variables ("OUTPUT").
Each variable declaration can include at most one data type and one
storage size modifier. A program will fail to load if it specifies
multiple data type or multiple storage size modifiers in a single variable
declaration.
(NOTE: Fragment programs also support the modifiers "FLAT", "CENTROID",
and "NOPERSPECTIVE", which control how per-fragment attribute values are
produced. These modifiers are described in detail in the
NV_fragment_program4 specification.)
Explicitly declared variables of all types may be declared as arrays. An
array variable has one or more members, numbered 0 through <n>-1, where
<n> is the number of entries in the array. The total number of entries in
the array can be declared using the <optArraySize> grammar rule. For
variable classes without bindings, an array size must be specified in the
program, and must be a positive integer. For variable classes with
bindings, a declared size is optional, and is taken from the number of
bindings assigned in the declaration if omitted. A program will fail to
load if the declared size of an array variable does not match the number
of assigned bindings.
When a variable is declared as an array, instructions that use the
variable must specify an array member to access according to the
<arrayMem> grammar rule. A program will fail to load if it contains an
instruction that accesses an array variable without specifying an array
member or an instruction that specifies an array member for a non-array
variable.
Section 2.X.3.2, Program Attribute Variables
Program attribute variables represent per-vertex or per-fragment inputs to
the program. All attribute variables have associated bindings, and are
read-only during program execution. Attribute variables may be declared
explicitly via the <ATTRIB_statement> grammar rule, or implicitly by using
an attribute binding in an instruction.
The set of available attribute bindings depends on the program type, and
is enumerated in the specifications for each program type.
The set of bindings allowed for attribute array variables is limited to
attribute state grouped in arrays (e.g., texture coordinates, generic
vertex attributes). Additionally, all bindings assigned to the array must
be of the same binding type and must increase consecutively. Examples of
valid and invalid binding lists include:
vertex.attrib[1], vertex.attrib[2] # valid, 2-entry array
vertex.texcoord[0..3] # valid, 4-entry array
vertex.attrib[1], vertex.attrib[3] # invalid, skipped attrib 2
vertex.attrib[2], vertex.attrib[1] # invalid, wrong order
vertex.attrib[1], vertex.texcoord[2] # invalid, different types
Additionally, attribute bindings may be used in no more than one array
variable accessed with relative addressing.
Implementations may have a limit on the total number of attribute binding
components used by each program target (MAX_PROGRAM_ATTRIB_COMPONENTS_NV).
Programs that use more attribute binding components than this limit will
fail to load. The method of counting used attribute binding components is
implementation-dependent, but must satisfy the following properties:
* If an attribute binding is not referenced in a program, or is
referenced only in declarations of attribute variables that are not
used, none of its components are counted.
* An attribute binding component may be counted as used only if there
exists an instruction operand where
- the component is enabled for read by the swizzle pattern (Section
2.X.4.2), and
- the attribute binding is
- referenced directly by the operand,
- bound to a declared variable referenced by the operand, or
- bound to a declared array variable where another binding in
the array satisfies one of the two previous conditions.
Implementations are not required to optimize out unused elements of an
attribute array or components that are used in only some elements of
an array. The last of these rules is intended to cover the case where
the same attribute binding is used in multiple variables.
For example, an operand whose swizzle pattern selects only the x
component may result in the x component of an attribute binding being
counted, but may never result in the counting of the y, z, or w
components of any attribute binding.
* Implementations are not required to determine that components read by
an instruction are actually unused due to:
- instruction write masks (for example, a component-wise ADD
operation that only writes the "x" component doesn't have to read
the "y", "z", and "w" components of its operands) or
- any other properties of the instruction (for example, the DP3
instruction computes a 3-component dot product doesn't have to
read the "w" component of its operands).
Section 2.X.3.3, Program Parameters
Program parameter variables are used as constants during program
execution. All program parameter variables have associated bindings and
are read-only during program execution. Program parameters retain their
values across program invocations, although their values may change
between invocations due to GL state changes. Program parameter variables
may be declared explicitly via the <PARAM_statement> grammar rule, or
implicitly by using a parameter binding in an instruction. Except where
otherwise specified, program parameter bindings always specify
floating-point values.
When declaring program parameter array variables, all bindings are
supported and can be assigned to array members in any order. The only
restriction is that no parameter binding may be used more than once in
array variables accessed using relative addressing. A program will fail
to load if any program parameter binding is used more than once in a
single array accessed using relative addressing or used at least once in
two or more arrays accessed using relative addressing.
Constant Bindings
If a program parameter binding matches the <constantScalar> or
<signedConstantScalar> grammar rules, the corresponding program parameter
variable is bound to the vector (X,X,X,X), where X is the value of the
specified constant.
If a program parameter binding matches <constantVector>, the corresponding
program parameter variable is bound to the vector (X,Y,Z,W), where X, Y,
Z, and W are the values corresponding to the first, second, third, and
fourth match of <signedConstantScalar>. If fewer than four constants are
specified, Y, Z, and W assume the values 0, 0, and 1, if their respective
constants are not specified.
Constant bindings can be interpreted as having signed integer, unsigned
integer, or floating-point values, depending on how they are used in the
program text. For constants in variable declarations, the components of
the constant are interpreted according to the variable's component data
type modifier. If no data type modifier is specified in a declaration,
constants are interpreted as floating-point values. For constant bindings
used directly in an instruction, the components of the constant are
interpreted according to the required data type of the operand. A program
will fail to load if it specifies a floating-point constant value
(matching the <floatConstant> grammar rule) that should be interpreted as
a signed or unsigned integer, or a negative integer constant value that
should be interpreted as an unsigned integer.
If the value used to specify a floating-point constant can not be exactly
represented, the nearest floating-point value will be used. If the value
used to specify an integer constant is too large to be represented, the
program will fail to load.
Program Environment/Local Parameter Bindings
Binding Components Underlying State
------------------------- ---------- -------------------------------
program.env[a] (x,y,z,w) program environment parameter a
program.local[a] (x,y,z,w) program local parameter a
program.env[a..b] (x,y,z,w) program environment parameters
a through b
program.local[a..b] (x,y,z,w) program local parameters
a through b
Table X.1: Program Environment/Local Parameter Bindings. <a> and <b>
indicate parameter numbers, where <a> must be less than or equal to <b>.
If a program parameter binding matches "program.env[a]" or
"program.local[a]", the four components of the program parameter variable
are filled with the four components of program environment parameter <a>
or program local parameter <a> respectively.
Additionally, for program parameter array bindings, "program.env[a..b]"
and "program.local[a..b]" are equivalent to specifying program environment
or local parameters <a> through <b> in order, respectively. A program
using any of these bindings will fail to load if <a> is greater than <b>.
Program environment and local parameters are typeless, and may be
specified as signed integer, unsigned integer, or floating-point
variables. If a program environment parameter is read using a data type
other than the one used to specify it, an undefined value is returned.
Material Property Bindings
Binding Components Underlying State
----------------------------- ---------- ----------------------------
state.material.ambient (r,g,b,a) front ambient material color
state.material.diffuse (r,g,b,a) front diffuse material color
state.material.specular (r,g,b,a) front specular material color
state.material.emission (r,g,b,a) front emissive material color
state.material.shininess (s,0,0,1) front material shininess
state.material.front.ambient (r,g,b,a) front ambient material color
state.material.front.diffuse (r,g,b,a) front diffuse material color
state.material.front.specular (r,g,b,a) front specular material color
state.material.front.emission (r,g,b,a) front emissive material color
state.material.front.shininess (s,0,0,1) front material shininess
state.material.back.ambient (r,g,b,a) back ambient material color
state.material.back.diffuse (r,g,b,a) back diffuse material color
state.material.back.specular (r,g,b,a) back specular material color
state.material.back.emission (r,g,b,a) back emissive material color
state.material.back.shininess (s,0,0,1) back material shininess
Table X.3: Material Property Bindings. If a material face is not
specified in the binding, the front property is used.
If a program parameter binding matches any of the material properties
listed in Table X.3, the program parameter variable is filled according to
the table. For ambient, diffuse, specular, or emissive colors, the "x",
"y", "z", and "w" components are filled with the "r", "g", "b", and "a"
components, respectively, of the corresponding material color. For
material shininess, the "x" component is filled with the material's
specular exponent, and the "y", "z", and "w" components are filled with
the floating-point constants 0, 0, and 1, respectively. Bindings
containing ".back" refer to the back material; all other bindings refer to
the front material.
Material properties can be changed inside a Begin/End pair, either
directly by calling Material, or indirectly through color material.
However, such property changes are not guaranteed to update program
parameter bindings until the following End command. Program parameter
variables bound to material properties changed inside a Begin/End pair are
undefined until the following End command.
Light Property Bindings
Binding Components Underlying State
----------------------------- ---------- ----------------------------
state.light[n].ambient (r,g,b,a) light n ambient color
state.light[n].diffuse (r,g,b,a) light n diffuse color
state.light[n].specular (r,g,b,a) light n specular color
state.light[n].position (x,y,z,w) light n position
state.light[n].attenuation (a,b,c,e) light n attenuation constants
and spot light exponent
state.light[n].spot.direction (x,y,z,c) light n spot direction and
cutoff angle cosine
state.light[n].half (x,y,z,1) light n infinite half-angle
state.lightmodel.ambient (r,g,b,a) light model ambient color
state.lightmodel.scenecolor (r,g,b,a) light model front scene color
state.lightmodel. (r,g,b,a) light model front scene color
front.scenecolor
state.lightmodel. (r,g,b,a) light model back scene color
back.scenecolor
state.lightprod[n].ambient (r,g,b,a) light n / front material
ambient color product
state.lightprod[n].diffuse (r,g,b,a) light n / front material
diffuse color product
state.lightprod[n].specular (r,g,b,a) light n / front material
specular color product
state.lightprod[n]. (r,g,b,a) light n / front material
front.ambient ambient color product
state.lightprod[n]. (r,g,b,a) light n / front material
front.diffuse diffuse color product
state.lightprod[n]. (r,g,b,a) light n / front material
front.specular specular color product
state.lightprod[n]. (r,g,b,a) light n / back material
back.ambient ambient color product
state.lightprod[n]. (r,g,b,a) light n / back material
back.diffuse diffuse color product
state.lightprod[n]. (r,g,b,a) light n / back material
back.specular specular color product
Table X.4: Light Property Bindings. <n> indicates a light number.
If a program parameter binding matches "state.light[n].ambient",
"state.light[n].diffuse", or "state.light[n].specular", the "x", "y", "z",
and "w" components of the program parameter variable are filled with the
"r", "g", "b", and "a" components, respectively, of the corresponding
light color.
If a program parameter binding matches "state.light[n].position", the "x",
"y", "z", and "w" components of the program parameter variable are filled
with the "x", "y", "z", and "w" components, respectively, of the light
position.
If a program parameter binding matches "state.light[n].attenuation", the
"x", "y", and "z" components of the program parameter variable are filled
with the constant, linear, and quadratic attenuation parameters of the
specified light, respectively (section 2.13.1). The "w" component of the
program parameter variable is filled with the spot light exponent of the
specified light.
If a program parameter binding matches "state.light[n].spot.direction",
the "x", "y", and "z" components of the program parameter variable are
filled with the "x", "y", and "z" components of the spot light direction
of the specified light, respectively (section 2.13.1). The "w" component
of the program parameter variable is filled with the cosine of the spot
light cutoff angle of the specified light.
If a program parameter binding matches "state.light[n].half", the "x",
"y", and "z" components of the program parameter variable are filled with
the x, y, and z components, respectively, of the normalized infinite
half-angle vector
h_inf = || P + (0, 0, 1) ||.
The "w" component is filled with 1.0. In the computation of h_inf, P
consists of the x, y, and z coordinates of the normalized vector from the
eye position P_e to the eye-space light position P_pli (section 2.13.1).
h_inf is defined to correspond to the normalized half-angle vector when
using an infinite light (w coordinate of the position is zero) and an
infinite viewer (v_bs is FALSE). For local lights or a local viewer,
h_inf is well-defined but does not match the normalized half-angle vector,
which will vary depending on the vertex position.
If a program parameter binding matches "state.lightmodel.ambient", the
"x", "y", "z", and "w" components of the program parameter variable are
filled with the "r", "g", "b", and "a" components of the light model
ambient color, respectively.
If a program parameter binding matches "state.lightmodel.scenecolor" or
"state.lightmodel.front.scenecolor", the "x", "y", and "z" components of
the program parameter variable are filled with the "r", "g", and "b"
components respectively of the "front scene color"
c_scene = a_cs * a_cm + e_cm,
where a_cs is the light model ambient color, a_cm is the front ambient
material color, and e_cm is the front emissive material color. The "w"
component of the program parameter variable is filled with the alpha
component of the front diffuse material color. If a program parameter
binding matches "state.lightmodel.back.scenecolor", a similar back scene
color, computed using back-facing material properties, is used. The front
and back scene colors match the values that would be assigned to vertices
using conventional lighting if all lights were disabled.
If a program parameter binding matches anything beginning with
"state.lightprod[n]", the "x", "y", and "z" components of the program
parameter variable are filled with the "r", "g", and "b" components,
respectively, of the corresponding light product. The three light product
components are the products of the corresponding color components of the
specified material property and the light color of the specified light
(see Table X.4). The "w" component of the program parameter variable is
filled with the alpha component of the specified material property.
Light products depend on material properties, which can be changed inside
a Begin/End pair. Such property changes are not guaranteed to take effect
until the following End command. Program parameter variables bound to
light products whose corresponding material property changes inside a
Begin/End pair are undefined until the following End command.
Texture Coordinate Generation Property Bindings
Binding Components Underlying State
------------------------- ---------- ----------------------------
state.texgen[n].eye.s (a,b,c,d) TexGen eye linear plane
coefficients, s coord, unit n
state.texgen[n].eye.t (a,b,c,d) TexGen eye linear plane
coefficients, t coord, unit n
state.texgen[n].eye.r (a,b,c,d) TexGen eye linear plane
coefficients, r coord, unit n
state.texgen[n].eye.q (a,b,c,d) TexGen eye linear plane
coefficients, q coord, unit n
state.texgen[n].object.s (a,b,c,d) TexGen object linear plane
coefficients, s coord, unit n
state.texgen[n].object.t (a,b,c,d) TexGen object linear plane
coefficients, t coord, unit n
state.texgen[n].object.r (a,b,c,d) TexGen object linear plane
coefficients, r coord, unit n
state.texgen[n].object.q (a,b,c,d) TexGen object linear plane
coefficients, q coord, unit n
Table X.5: Texture Coordinate Generation Property Bindings. "[n]" is
optional -- texture unit <n> is used if specified; texture unit 0 is
used otherwise.
If a program parameter binding matches a set of TexGen plane coefficients,
the "x", "y", "z", and "w" components of the program parameter variable
are filled with the coefficients p1, p2, p3, and p4, respectively, for
object linear coefficients, and the coefficents p1', p2', p3', and p4',
respectively, for eye linear coefficients (section 2.10.4).
Fog Property Bindings
Binding Components Underlying State
----------------------------- ---------- ----------------------------
state.fog.color (r,g,b,a) RGB fog color (section 3.10)
state.fog.params (d,s,e,r) fog density, linear start
and end, and 1/(end-start)
(section 3.10)
Table X.6: Fog Property Bindings
If a program parameter binding matches "state.fog.color", the "x", "y",
"z", and "w" components of the program parameter variable are filled with
the "r", "g", "b", and "a" components, respectively, of the fog color
(section 3.10).
If a program parameter binding matches "state.fog.params", the "x", "y",
and "z" components of the program parameter variable are filled with the
fog density, linear fog start, and linear fog end parameters (section
3.10), respectively. The "w" component is filled with 1/(end-start),
where end and start are the linear fog end and start parameters,
respectively.
Clip Plane Property Bindings
Binding Components Underlying State
----------------------------- ---------- ----------------------------
state.clip[n].plane (a,b,c,d) clip plane n coefficients
Table X.7: Clip Plane Property Bindings. <n> specifies the clip plane
number, and is required.
If a program parameter binding matches "state.clip[n].plane", the "x",
"y", "z", and "w" components of the program parameter variable are filled
with the coefficients p1', p2', p3', and p4', respectively, of clip plane
<n> (section 2.11).
Point Property Bindings
Binding Components Underlying State
----------------------------- ---------- ----------------------------
state.point.size (s,n,x,f) point size, min and max size
clamps, and fade threshold
(section 3.3)
state.point.attenuation (a,b,c,1) point size attenuation consts
Table X.8: Point Property Bindings
If a program parameter binding matches "state.point.size", the "x", "y",
"z", and "w" components of the program parameter variable are filled with
the point size, minimum point size, maximum point size, and fade
threshold, respectively (section 3.3).
If a program parameter binding matches "state.point.attenuation", the "x",
"y", and "z" components of the program parameter variable are filled with
the constant, linear, and quadratic point size attenuation parameters (a,
b, and c), respectively (section 3.3). The "w" component is filled with
1.0.
Texture Environment Property Bindings
Binding Components Underlying State
------------------------- ---------- ----------------------------
state.texenv[n].color (r,g,b,a) texture environment n color
Table X.9: Texture Environment Property Bindings. "[n]" is optional --
texture unit <n> is used if specified; texture unit 0 is used otherwise.
If a program parameter binding matches "state.texenv[n].color", the "x",
"y", "z", and "w" components of the program parameter variable are filled
with the "r", "g", "b", and "a" components, respectively, of the
corresponding texture environment color. Note that only "legacy" texture
units, as queried by MAX_TEXTURE_UNITS, include texture environment state.
Texture image units and texture coordinate sets do not have associated
texture environment state.
Depth Property Bindings
Binding Components Underlying State
--------------------------- ---------- ----------------------------
state.depth.range (n,f,d,1) Depth range near, far, and
(far-near) (section 2.10.1)
Table X.10: Depth Property Bindings
If a program parameter binding matches "state.depth.range", the "x" and
"y" components of the program parameter variable are filled with the
mappings of near and far clipping planes to window coordinates,
respectively. The "z" component is filled with the difference of the
mappings of near and far clipping planes, far minus near. The "w"
component is filled with 1.0.
Matrix Property Bindings
Binding Underlying State
------------------------------------ ---------------------------
* state.matrix.modelview[n] modelview matrix n
state.matrix.projection projection matrix
state.matrix.mvp modelview-projection matrix
* state.matrix.texture[n] texture matrix n
state.matrix.program[n] program matrix n
Table X.11: Base Matrix Property Bindings. The "[n]" syntax indicates
a specific matrix number. For modelview and texture matrices, a matrix
number is optional, and matrix zero will be used if the matrix number is
omitted. These base bindings may further be modified by a
inverse/transpose selector and a row selector.
If the beginning of a program parameter binding matches any of the matrix
binding names listed in Table X.11, the binding corresponds to a 4x4
matrix. If the parameter binding is followed by ".inverse", ".transpose",
or ".invtrans" (<stateMatModifier> grammar rule), the inverse, transpose,
or transpose of the inverse, respectively, of the matrix specified in
Table X.11 is selected. Otherwise, the matrix specified in Table X.11 is
selected. If the specified matrix is poorly-conditioned (singular or
nearly so), its inverse matrix is undefined. The binding name
"state.matrix.mvp" refers to the product of modelview matrix zero and the
projection matrix, defined as
MVP = P * M0,
where P is the projection matrix and M0 is modelview matrix zero.
If the selected matrix is followed by ".row[<a>]" (matching the
<stateMatrixRow> grammar rule), the "x", "y", "z", and "w" components of
the program parameter variable are filled with the four entries of row <a>
of the selected matrix. In the example,
PARAM m0 = state.matrix.modelview[1].row[0];
PARAM m1 = state.matrix.projection.transpose.row[3];
the variable "m0" is set to the first row (row 0) of modelview matrix 1
and "m1" is set to the last row (row 3) of the transpose of the projection
matrix.
For program parameter array bindings, multiple rows of the selected matrix
can be bound via the <stateMatrixRows> grammar rule. If the selected
matrix binding is followed by ".row[<a>..<b>]", the result is equivalent
to specifying matrix rows <a> through <b>, in order. A program will fail
to load if <a> is greater than <b>. If no row selection is specified
(<optMatrixRows> matches ""), matrix rows 0 through 3 are bound in order.
In the example,
PARAM m2[] = { state.matrix.program[0].row[1..2] };
PARAM m3[] = { state.matrix.program[0].transpose };
the array "m2" has two entries, containing rows 1 and 2 of program matrix
zero, and "m3" has four entries, containing all four rows of the transpose
of program matrix zero.
Section 2.X.3.4, Program Temporaries
Program temporary variables are used to hold temporary results during
program execution. Temporaries do not persist between program
invocations, and are undefined at the beginning of each program
invocation.
Temporary variables are declared explicitly using the <TEMP_statement>
grammar rule. Each such statement can declare one or more temporaries.
Temporaries can not be declared implicitly. Temporaries can be declared
using any component size ("SHORT" or "LONG") and type ("FLOAT" or "INT")
modifier.
Temporary variables may be declared as arrays. Temporary variables
declared as arrays may be stored in slower memory than those not declared
as arrays, and it is recommended to use non-array variables unless array
functionality is required.
Section 2.X.3.5, Program Results
Program result variables represent the per-vertex or per-fragment results
of the program. All result variables have associated bindings, are
write-only during program execution, and are undefined at the beginning of
each program invocation. Any vertex or fragment attributes corresponding
to unwritten result variables will be undefined in subsequent stages of
the pipeline. Result variables may be declared explicitly via the
<OUTPUT_statement> grammar rule, or implicitly by using a result binding
in an instruction.
The set of available result bindings depends on the program type, and is
enumerated in the specifications for each program type.
Result variables may generally be declared as arrays, but the set of
bindings allowed for arrays is limited to state grouped in arrays (e.g.,
texture coordinates, clip distances, colors). Additionally, all bindings
assigned to the array must be of the same binding type and must increase
consecutively. Examples of valid and invalid binding lists for vertex
programs include:
result.clip[1], result.clip[2] # valid, 2-entry array
result.texcoord[0..3] # valid, 4-entry array
result.texcoord[1], result.texcoord[3] # invalid, skipped texcoord 2
result.texcoord[2], result.texcoord[1] # invalid, wrong order
result.texcoord[1], result.clip[2] # invalid, different types
Additionally, result bindings may be used in no more than one array
addressed with relative addressing.
Implementations may have a limit on the total number of result binding
components used by each program target (MAX_PROGRAM_RESULT_COMPONENTS_NV).
Programs that require more result binding components than this limit will
fail to load. The method of counting used result binding components is
implementation-dependent, but must satisfy the following properties:
* If a result binding is not referenced in a program, or is referenced
only in declarations of result variables that are not used, none of
its components are counted.
* A result binding component may be counted as used only if there exists
an instruction operand where
- the component is enabled in the write mask (Section 2.X.4.3), and
- the result binding is either
- referenced directly by the operand,
- bound to a declared variable referenced by the operand, or
- bound to a declared array variable where another binding in
the array satisfies one of the two previous conditions.
Implementations are not required to optimize out unused elements of an
result array or components that are used in only some elements of an
array. The last of these rules is intended to cover the case where
the same result binding is used in multiple variables.
For example, an instruction whose write mask selects only the x
component may result in the x component of a result binding being
counted, but may never result in the counting of the y, z, or w
components of any result binding.
Section 2.X.3.6, Program Parameter Buffers
Program parameter buffers are arrays consisting of single-component
typeless values or four-component typeless vectors stored in a buffer
object. The GL provides an implementation-dependent number of buffer
object binding points for each program target, to which buffer objects can
be attached. Program parameter buffer variables can be changed either by
updating the contents of bound buffer objects, or simply by changing the
buffer object attached to a binding point.
Program parameter buffer variables are used as constants during program
execution. All program parameter buffer variables have an associated
binding and are read-only during program execution. Program parameter
buffers retain their values across program invocations, although their
values may change as buffer object bindings or contents change. Program
parameter buffer variables must be declared explicitly via the
<BUFFER_statement> grammar rule. Program parameter buffer bindings can
not be used directly in executable instructions.
Program parameter buffer variables are treated as an array of
single-component values if the <bufferDeclType> grammar rule matches
"BUFFER" or as an array of four-component vectors if it matches "BUFFER4".
A program will fail to load if a variable declared as "BUFFER" and another
variable declared as "BUFFER4" use the same buffer binding point.
Program parameter buffer variables may be declared as arrays, but all
bindings assigned to the array must use the same binding point and must
increase consecutively.
Binding Components Underlying State
----------------------------- ---------- -----------------------------
program.buffer[a][b] (x,x,x,x) program parameter buffer a,
element b
program.buffer[a][b..c] (x,x,x,x) program parameter buffer a,
elements b through c
program.buffer[a] (x,x,x,x) program parameter buffer a,
all elements
Table X.12: Program Parameter Buffer Bindings. <a> indicates a buffer
number, <b> and <c> indicate individual elements.
If a program parameter buffer binding matches "program.buffer[a][b]", the
program parameter variable are filled with element <b> of the buffer
object bound to binding point <a>. Each element of the bound buffer
object is treated a one or four words of data that can hold integer or
floating-point values. When a single-component binding is evaluated, the
selected word is broadcast to all four components of the variable. When a
four-component binding is evaluated, the four components of the buffer
element are loaded into the variable. If no buffer object is bound to
binding point <a>, or the bound buffer object is not large enough to hold
an element <b>, the values used are undefined. The binding point <a> must
be a nonnegative integer constant.
For program parameter buffer array declarations, "program.buffer[a][b..c]"
is equivalent to specifying elements <b> through <c> of the buffer object
bound to binding point <a> in order.
For program parameter buffer array declarations, "program.buffer[a]" is
equivalent to specifying the entire buffer -- elements 0 through <n>-1,
where <n> is either the size of the array (if declared) or the
implementation-dependent maximum parameter buffer object size limit (if no
size is declared).
Section 2.X.3.7, Program Condition Code Registers
The program condition code registers are four-component vectors. Each
component of this register is a collection of single-bit flags, including
a sign flag (SF), a zero flag (ZF), an overflow flag (OF), and a carry
flag (CF). There are two condition code registers (CC0 and CC1), whose
values are undefined at the beginning of program execution.
Most program instructions can optionally update one of the condition code
registers, by designating the condition code to update in the instruction.
When a condition code component is updated, the four flags of each
component of the condition code are set according to the corresponding
component of the instruction result. Full details on the condition code
updates and tests can be found in Section 2.X.4.3.
The value of these four flags can be combined in various condition code
tests, which can be used to mask writes to destination variables and to
perform conditional branches or other condition operations.
Section 2.X.3.8, Program Aliases
Programs can create aliases by matching the <ALIAS_statement> grammar
rule. Aliases allow programs to use multiple variable names to refer to a
single underlying variable. For example, the statement
ALIAS var1 = var0
establishes a variable name of "var1". Subsequent references to "var1" in
the program text are treated as references to "var0". The left hand side
of an ALIAS statement must be a new variable name, and the right hand side
must be an established variable name.
Aliases are not considered variable declarations, so do not count against
the limits on the number of variable declarations allowed in the program
text.
Section 2.X.3.9, Program Resource Limits
(see ARB_vertex_program specification, incorporates all the different
limits on instruction counts, temporaries, attribute bindings, program
parameters, and so on)
Section 2.X.4, Program Execution Environment
The set of instructions supported for GPU programs is given in Table X.13
below and is described in detail in Section 2.X.8. An instruction can use
up to three operands when it executes, and most instructions can write a
single result vector. Instructions may also specify one or more
modifiers, according to the <opModifiers> grammar rule. Instruction
modifiers affect how the specified operation is performed.
GPU programs may operate on signed integer, unsigned integer, or
floating-point values; some instructions are capable of operating on any
of the three types. However, the data type of the operands and the result
are always determined based solely on the instruction and its modifiers.
If any of the variables used in the instruction are typeless, they will be
interpreted according to the data type derived from the instruction. If
any variables with a conflicting data type are used in the instruction,
the program will fail to load unless the "NTC" (no type checking)
instruction modifier is specified.
Modifiers
Instruction F I C S H D Out Inputs Description
----------- - - - - - - --- -------- --------------------------------
ABS X X X X X F v v absolute value
ADD X X X X X F v v,v add
AND - X X - - S v v,v bitwise and
BRK - - - - - - - c break out of loop instruction
CAL - - - - - - - c subroutine call
CEIL X X X X X F v vf ceiling
CMP X X X X X F v v,v,v compare
CONT - - - - - - - c continue with next loop interation
COS X - X X X F s s cosine with reduction to [-PI,PI]
DIV X X X X X F v v,s divide vector components by scalar
DP2 X - X X X F s v,v 2-component dot product
DP2A X - X X X F s v,v,v 2-comp. dot product w/scalar add
DP3 X - X X X F s v,v 3-component dot product
DP4 X - X X X F s v,v 4-component dot product
DPH X - X X X F s v,v homogeneous dot product
DST X - X X X F v v,v distance vector
ELSE - - - - - - - - start if test else block
ENDIF - - - - - - - - end if test block
ENDREP - - - - - - - - end of repeat block
EX2 X - X X X F s s exponential base 2
FLR X X X X X F v vf floor
FRC X - X X X F v v fraction
I2F - X X - - S vf v integer to float
IF - - - - - - - c start of if test block
KIL X X - - X F - vc kill fragment
LG2 X - X X X F s s logarithm base 2
LIT X - X X X F v v compute lighting coefficients
LRP X - X X X F v v,v,v linear interpolation
MAD X X X X X F v v,v,v multiply and add
MAX X X X X X F v v,v maximum
MIN X X X X X F v v,v minimum
MOD - X X - - S v v,s modulus vector components by scalar
MOV X X X X X F v v move
MUL X X X X X F v v,v multiply
NOT - X X - - S v v bitwise not
NRM X - X X X F v v normalize 3-component vector
OR - X X - - S v v,v bitwise or
PK2H X X - - - F s vf pack two 16-bit floats
PK2US X X - - - F s vf pack two floats as unsigned 16-bit
PK4B X X - - - F s vf pack four floats as signed 8-bit
PK4UB X X - - - F s vf pack four floats as unsigned 8-bit
POW X - X X X F s s,s exponentiate
RCC X - X X X F s s reciprocal (clamped)
RCP X - X X X F s s reciprocal
REP X X - - X F - v start of repeat block
RET - - - - - - - c subroutine return
RFL X - X X X F v v,v reflection vector
ROUND X X X X X F v vf round to nearest integer
RSQ X - X X X F s s reciprocal square root
SAD - X X - - S vu v,v,vu sum of absolute differences
SCS X - X X X F v s sine/cosine without reduction
SEQ X X X X X F v v,v set on equal
SFL X X X X X F v v,v set on false
SGE X X X X X F v v,v set on greater than or equal
SGT X X X X X F v v,v set on greater than
SHL - X X - - S v v,s shift left
SHR - X X - - S v v,s shift right
SIN X - X X X F s s sine with reduction to [-PI,PI]
SLE X X X X X F v v,v set on less than or equal
SLT X X X X X F v v,v set on less than
SNE X X X X X F v v,v set on not equal
SSG X - X X X F v v set sign
STR X X X X X F v v,v set on true
SUB X X X X X F v v,v subtract
SWZ X - X X X F v v extended swizzle
TEX X X X X - F v vf texture sample
TRUNC X X X X X F v vf truncate (round toward zero)
TXB X X X X - F v vf texture sample with bias
TXD X X X X - F v vf,vf,vf texture sample w/partials
TXF X X X X - F v vs texel fetch
TXL X X X X - F v vf texture sample w/LOD
TXP X X X X - F v vf texture sample w/projection
TXQ - - - - - S vs vs texture info query
UP2H X X X X - F vf s unpack two 16-bit floats
UP2US X X X X - F vf s unpack two unsigned 16-bit ints
UP4B X X X X - F vf s unpack four signed 8-bit ints
UP4UB X X X X - F vf s unpack four unsigned 8-bit ints
X2D X - X X X F v v,v,v 2D coordinate transformation
XOR - X X - - S v v,v exclusive or
XPD X - X X X F v v,v cross product
Table X.13: Summary of NV_gpu_program4 instructions. The "Modifiers"
columns specify the set of modifiers allowed for the instruction:
F = floating-point data type modifiers
I = signed and unsigned integer data type modifiers
C = condition code update modifiers
S = clamping (saturation) modifiers
H = half-precision float data type suffix
D = default data type modifier (F, U, or S)
The input and output columns describe the formats of the operands and
results of the instruction.
v: 4-component vector (data type is inherited from operation)
vf: 4-component vector (data type is always floating-point)
vs: 4-component vector (data type is always signed integer)
vu: 4-component vector (data type is always unsigned integer)
s: scalar (replicated if written to a vector destination;
data type is inherited from operation)
c: condition code test result (e.g., "EQ", "GT1.x")
vc: 4-component vector or condition code test
Section 2.X.4.1, Program Instruction Modifiers
There are several types of instruction modifiers available. A data type
modifier specifies that an instruction should operate on signed integer,
unsigned integer, or floating-point data, when multiple data types are
supported. A clamping modifier applies to instructions with
floating-point results, and specifies the range to which the results
should be clamped. A condition code update modifier specifies that the
instruction should update one of the condition code variables. Several
other special modifiers are also provided.
Instruction modifiers may be specified as stand-alone modifiers or as
suffixes concatenated with the opcode name. A program will fail to load
if it contains an instruction that
* specifies more than one modifier of any given type,
* specifies a clamping modifier on an instruction, unless it produces
floating-point results, or
* specifies a modifier that is not supported by the instruction (see
Table X.13 and the instruction description).
Stand-alone instruction modifiers are specified according to the
<opModifiers> grammar rule using a ".<modifier>" syntax. Multiple
modifers, separated by periods, may be specified. The set of supported
modifiers is described in Table X.14.
Modifier Description
-------- -----------------------------------------------
F Floating-point operation
U Fixed-point operation, unsigned operands
S Fixed-point operation, signed operands
CC Update condition code register zero
CC0 Update condition code register zero
CC1 Update condition code register one
SAT Floating-point results clamped to [0,1]
SSAT Floating-point results clamped to [-1,1]
NTC Disable type-checking on operands/results
S24 Signed multiply (24-bit operands)
U24 Unsigned multiply (24-bit operands)
HI Multiplies two 32-bit integer operands, returns
the 32 MSBs of the product
Table X.14, Instruction Modifers.
"F", "U", and "S" modifiers are data type modifiers and specify that the
instruction should operate on floating-point, unsigned integer, or
signed integer values, respectively. For example, "ADD.F", "ADD.U", and
"ADD.S" specify component-wise addition of floating-point, unsigned
integer, or signed integer vectors, respectively. These modifiers specify
a data type, but do not specify a precision at which the operation is
performed. Floating-point operations will be carried out with an internal
precision no less than that used to represent the largest operand.
Fixed-point operations will be carried out using at least as many bits as
used to represent the largest operand. Operands represented with fewer
bits than used to perform the instruction will be promoted to a larger
data type. Signed integer operands will be sign-extended, where the most
significant bits are filled with ones if the operand is negative and zero
otherwise. Unsigned integer operands will be zero-extended, where the
most significant bits are always filled with zeroes. For some
instructions, the data type of some operands or the result are fixed; in
these cases, the data type modifier specifies the data type of the
remaining values.
"CC", "CC0", and "CC1" are condition code update modifiers that specify
that one of the condition code registers should be updated based on the
result of the instruction, as described in section 2.X.4.3. "CC" and
"CC0" specify that the condition code register CC0 be updated; "CC1"
specifies an update to CC1. If no condition code update modifier is
provided, the condition code registers will not be affected.
"SAT" and "SSAT" are clamping modifiers that specify that the
floating-point components of the instruction result should be clamped to
[0,1] or [-1,1], respectively, before updating the condition code and the
destination variable. If no clamping suffix is specified, unclamped
results will be used for condition code updates (if any) and destination
variable writes. Clamping modifiers are not supported on instructions
that do not produce floating-point results.
"NTC" (no type checking) disables data type checking on the instruction,
and allows instructions to use operands or result variables whose data
types are inconsistent with the expected data types of the instruction.
"S24", "U24", and "HI" are special modifiers that are allowed only for the
MUL instruction, and are described in detail where MUL is documented. No
more than one such modifier may be provided for any instruction.
If an instruction supports data type modifiers, but none is provided, a
default data type will be chosen based on the instruction, as specified in
Table X.13 and the instruction set description (Section 2.X.8). If
condition code update or clamping modifiers are not specified, the
corresponding operation will not be performed.
Additionally, each instruction name may have one or more suffixes,
concatenated onto the base instruction name, that operate as instruction
modifiers. For conciseness, these suffixes are not spelled out in the
grammar -- the base opcode name is used as a placeholder for the opcode
and all of its possible suffixes. Instruction suffixes are provided
mainly for compatibility with prior GPU program instruction sets (e.g.,
NV_vertex_program3, NV_fragment_program2, and predecessors). The set of
allowable suffixes, and their equivalent stand-alone modifiers, are listed
in Table X.15.
Suffix Modifier Description
------ ---------- ---------------------------------------------------
R F Floating-point operation, 32-bit precision
H F(*) Floating-point operation, at least 16-bit precision
C CC0 Update condition code register zero
C0 CC0 Update condition code register zero
C1 CC1 Update condition code register one
_SAT SAT Floating-point results clamped to [0,1]
_SSAT SSAT Floating-point results clamped to [-1,1]
Table X.15, Instruction Suffixes.
The "R" and "H" suffixes specify floating-point operations and are
equivalent to the "F" data type modifier. They additionally specify a
minimum precision for the operations. Instructions with an "R" precision
modifier will be carried out at no less than IEEE single-precision
floating-point (8 bits of exponent, 23 bits of mantissa). Instructions
with an "H" precision modifier will be carried out at no less than 16-bit
floating-point precision (5 bits of exponent, 10 bits of mantissa).
An instruction may have multiple suffixes, but they must appear in order,
with data type suffixes first, followed by condition code update suffixes,
followed by clamping suffixes. For example, "ADDR" carries out an add at
32-bit precision. "ADDH_SAT" carries out an add at 16-bit precision (or
better) and clamps the results to [0,1]. "ADDRC1_SSAT" carries out an add
at 32-bit floating-point precision, clamps the results to [-1,1], and
updates condition code one based on the clamped result.
Section 2.X.4.2, Program Operands
Most program instructions operate on one or more scalar or vector
operands. Each operand specifies an operand variable, which is either the
name of a previously declared variable or an implicit variable declaration
created by using a variable binding in the instruction. Attribute,
parameter, or parameter buffer variables can be declared implicitly by
using a valid binding name in an operand. Instruction operands are
specified by the <instOperandV>, <instOperandS>, or <instOperandVNS>
grammar rules.
If the operand variable is not an array, its contents are loaded directly.
If the operand variable is an array, a single element of the array is
loaded according to the <arrayMem> grammar rule. The elements of an array
are numbered from 0 to <n>-1, where <n> is the number of entries in the
array. Array members can be accessed using either absolute or relative
addressing.
Absolute array addressing is used when the <arrayMemAbs> grammar rule is
matched; the array member to load is specified by the matching integer.
Out-of-bounds array absolute accesses are not allowed. If the specified
member number is greater than or equal to the size of the array, the
program will fail to load.
Relative array addressing is used when the <arrayMemRel> grammar rule is
matched. This grammar rule allows the program to specify a scalar integer
operand and an optional constant offset, according to the <arrayMemReg>
and <arrayMemOffset> grammar rules. When performing relative addressing,
the GL evaluates the specified integer scalar operand (according to the
rules specified in this section) and adds the constant offset. The array
member loaded is given by this sum. The constant offset is considered
zero if an offset is omitted. If the sum is negative or exceeds the size
of the array, the results of the access are undefined, but may not lead to
program or GL termination. The set of constant offsets supported for
relative addressing is limited to values in the range [0,<n>-1], where <n>
is the size of the array. A program will fail to load if it specifies an
offset outside that range. If offsets outside that range are required,
they can be applied by using an integer ADD instruction writing to a
temporary variable.
After the operand is loaded, its components can be rearranged according to
the <swizzleSuffix> grammar rule, or it can be converted to a scalar
operand according to the <scalarSuffix> grammar rule.
The <swizzleSuffix> grammar rule rearranges the components of a loaded
vector to produce another vector. If the <swizzleSuffix> rule matches the
<xyzwSwizzle> or <rgbaSwizzle> grammar rule, a pattern of the form ".????"
is used, where each question mark is replaced with one of "x", "y", "z",
"w", "r", "g", "b", or a". For such patterns, the x, y, z, and w
components of the operand are taken from the vector components named by
the first, second, third, and fourth character of the pattern,
respectively. Swizzle components of "r", "g", "b", and "a" are equivalent
to "x", "y", "z", and "w", respectively. For example, if the swizzle
suffix is ".yzzx" or ".gbbr" and the specified source contains {2,8,9,0},
the result is the vector {8,9,9,2}. If the <swizzleSuffix> matches the
<component> grammar rule, a pattern of the form ".?" is used. For this
pattern, all four components of the operand are taken from the single
component identified by the pattern. If the swizzle suffix is omitted,
components are not rearranged and swizzling has no effect, as though
".xyzw" were specified.
The swizzle suffix rules do not allow mixing "x", "y", "z", or "w"
selectors with "r", "g", "b", or "a" selectors. A program will fail to
load if it contains a swizzle suffix with selectors from both of these
sets.
The <scalarSuffix> grammar rule converts a vector to a scalar by selecting
a single component. The <scalarSuffix> rule is similar to the swizzle
selector, except that only a single component is selected. If the scalar
suffix is ".y" and the specified source contains {2,8,9,0}, the value is
the scalar value 8.
Next, a component-wise negate operation is performed on the operand if the
<operandNeg> grammar rule matches "-". Negation is not performed if the
operand has no sign prefix, or is prefixed with "+". For unsigned integer
operands, the negate operand performs a two's complement operation.
Next, a component-wise absolute value operation is performed on the
operand if the <instOperandAbsV> or <instOperandAbsS> grammar rule is
matched, by surrounding the operand with two "|" characters. The result
is optionally negated if the <operandAbsNeg> grammar rule matches "-".
For unsigned integer operands, the absolute value operation has no effect.
Section 2.X.4.3, Program Destination Variable Update
Most program instructions perform computations that produce a result,
which will be written to a variable. Each instruction that computes a
result specifies a destination variable, which is either the name of a
previously declared variable or an implicit variable declaration created
by using a variable binding in the instruction. Result variables can be
declared implicitly by using a valid program result binding name in the
result portion of the instruction. Instruction results are specified
according to the <instResult> grammar rule.
The destination variable may be a single member of an array. In this
case, a single array member is specified using the <arrayMem> grammar
rule, and the array member to update is computed in the exact same manner
as done for operand loads. If the array member is computed at run time,
and is negative or greater than or equal to the size of the array, the
results of the destination variable update are undefined and could result
in overwriting other program variables.
The results of the operation may be obtained at a different precision than
that used to store the destination variable. If so, the results are
converted to match the size of the destination variable. For
floating-point values, the results are rounded to the nearest
floating-point value that can be represented in the destination variable.
If a result component is larger in magnitude than the largest
representable floating-point value in the data type of the destination
variable, an infinity encoding (+/-INF) is used. Signed or unsigned
integer values are sign-extended or zero-extended, respectively, if the
destination variable has more bits than the result, and have their most
significant bits discarded if the destination variable has fewer bits.
Writes to individual components of a vector destination variable can be
controlled at compile time by individual component write masks specified
in the instruction. The component write mask is specified by the
<optWriteMask> grammar rule, and is a string of up to four characters,
naming the components to enable for writing. If no write mask is
specified, all components are enabled for writing. The characters "x",
"y", "z", and "w" match the x, y, z, and w components respectively. For
example, a write mask mask of ".xzw" indicates that the x, z, and w
components should be enabled for writing but the y component should not be
written. The grammar requires that the destination register mask
components must be listed in "xyzw" order. Additionally, write mask
components of "r", "g", "b", and "a" are equivalent to "x", "y", "z", and
"w", respectively. The grammar does not allow mixing "x", "y", "z", or
"w" components with "r", "g", "b", and "a" ones.
Writes to individual components of a vector destination variable, or to a
scalar destination variable, can also be controlled at run time using
condition code write masks. The condition code write mask is specified by
the <ccMask> grammar rule. If a mask is specified, a condition code
variable is loaded according to the <ccMaskRule> grammar rule and tested
as described in Table X.16 to produce a four-component vector of TRUE/FALSE
values.
mask rule test name condition
--------------- ---------------------- -----------------
EQ, EQ0, EQ1 equal !SF && ZF
GE, GE0, GE1 greater than or equal !(SF ^ OF)
GT, GT0, GT1 greater than (!SF ^ OF) && !ZF
LE, LE0, LE1 less than or equal SF ^ (ZF || OF)
LT, LT0, LT1 less than (SF && !ZF) ^ OF
NE, NE0, NE1 not equal SF || !ZF
FL, FL0, FL1 false always false
TR, TR0, TR1 true always true
NAN, NAN0, NAN1 not a number SF && ZF
LEG, LEG0, LEG1 less, equal, or greater !SF || !ZF
(anything but a NaN)
CF, CF0, CF1 carry flag CF
NCF, NCF0, NCF1 no carry flag !CF
OF, OF0, OF1 overflow flag OF
NOF, NOF0, NOF1 no overflow flag !OF
SF, SF0, SF1 sign flag SF
NSF, NSF0, NSF1 no sign flag !SF
AB, AB0, AB1 above CF && !ZF
BLE, BLE0, BLE1 below or equal !CF || ZF
Table X.16, Condition Code Tests. The allowed rules are specified in
the "mask rule" column. If "0" or "1" is appended to the rule name
(e.g., "EQ1"), the corresponding condition code register (CC1 in this
example) is loaded, otherwise CC0 is loaded. After loading, each
component is tested, using the expression listed in the "condition"
column.
After the condition code tests are performed, the four-component result
can be swizzled according to the <swizzleSuffix> grammar rule. Individual
components of the destination variable are written only if the
corresponding component of the swizzled condition code test result is
TRUE. If both a (compile-time) component write mask and a condition code
write mask are specified, destination variable components are written only
if the corresponding component is enabled in both masks.
A program instruction can also optionally update one of the two condition
code registers if the "CC", "CC0", or "CC1" instruction modifier are
specified. These instruction modifiers update condition code register
CC0, CC0, or CC1, respectively. The instructions "ADD.CC" or "ADD.CC0"
will perform an add and update condition code zero, "ADD.CC1" will add and
update condition code one, and "ADD" will simply perform the add without a
condition code update. The components of the selected condition code
register are updated if and only if the corresponding component of the
destination variable are enabled by both write masks. For the purposes of
condition code update, a scalar destination variable is treated as a
vector where the scalar result is written to "x" (if enabled in the write
mask), and writes to the "y", "z", and "w" components are disabled.
When condition code components are written, the condition code flags are
updated based on the corresponding component of the result. If a
component of the destination register is not enabled for writes, the
corresponding condition code component is also unchanged.
For floating-point results, the sign flag (SF) is set if the result is
less than zero or is a NaN (not a number) value. The zero flag (ZF) is
set if the result is equal to zero or is a NaN.
For signed and unsigned integer results, the sign flag (SF) is set if the
most significant bit of the value written to the result variable is set
and the zero flag (ZF) is set if the result written is zero. For
instructions other than those performing an integer add or subtract (ADD,
MAD, SAD, SUB), the overflow and carry flags (OF and CF) are cleared.
For integer add or subtract operations, the overflow and carry flags by
doing both signed and unsigned adds/subtracts as follows:
The overflow flag (OF) is set by interpreting the two operands as signed
integers and performing a signed add or subtract. If the result is
representable as a signed integer (i.e., doesn't overflow), the overflow
flag is cleared; otherwise, it is set.
The carry flag (CF) is set by interpreting the two operands as unsigned
integers and performing an unsigned add or subtract. If the result of
an add is representable as an unsigned integer (i.e., doesn't overflow),
the carry flag is cleared; otherwise, it is set. If the result of a
subtract is greater than or equal to zero, the carry flag is set;
otherwise, it is cleared.
For the purposes of condition code setting, negation modifiers turn add
operations into subtracts and vice versa. If the operation is equivalent
to an add with both operands negated (-A-B), the carry and overflow flags
are both undefined.
Section 2.X.4.4, Program Texture Access
Certain program instructions may access texture images, as described in
section 3.8. The coordinates, level-of-detail, and partial derivatives
used for performing the texture lookup are derived from values provided in
the program as described in the various sub-sections of Section 2.X.8.
These descriptions use the function
result_t_vec
TextureSample(float_vec coord, float lod, float_vec ddx,
float_vec ddy, int_vec offset);
which obtains a filtered texel value <tau> as described in Section 3.8.8
and returns a 4-component vector (R,G,B,A) according to the format
conversions specified in Table 3.21. The result vector is interpreted as
floating-point, signed integer, or unsigned integer, according to the data
type modifier of the instruction. If the internal format of the texture
does not match the instruction's data type modifer, the results of the
texture lookup are undefined.
(Note: For unextended OpenGL 2.0, all supported texture internal formats
store integer values but return floating-point results in the range [0,1]
on a texture lookup. The ARB_texture_float extension introduces
floating-point internal format where components are both stored and
returned as floating-point values. The EXT_texture_integer extension
introduces formats that both store and return either signed or unsigned
integer values.)
<coord> is a four-component floating-point vector from which the (s,t,r)
texture coordinates used for the texture access, the layer used for array
textures, and the reference value used for depth comparisons (section
3.8.14) are extracted according to Table X.17. If the texture is a cube
map, (s,t,r) is projected to one of the six cube faces to produce a new
(s,t) vector according to Section 3.8.6. For array textures, the layer
used is derived by rounding the extracted floating-point component to the
nearest integer and clamping the result to the range [0,<n>-1], where <n>
is the number of layers in the texture.
<lod> specifies the level of detail parameter and replaces the value
computed in equation 3.18. <ddx> and <ddy> specify partial derivatives
(ds/dx, dt/dx, dr/dx, ds/dy, dt/dy, and dr/dy) for the texture
coordinates, and may be used to derive footprint shapes for anisotropic
texture filtering.
<offset> is a constant 3-component signed integer vector specified
according to the <texOffset> grammar rule, which is added to the computed
<u>, <v>, and <w> texel locations prior to sampling. One, two, or three
components may be specified in the instruction; if fewer than three are
specified, the remaining offset components are zero. A limited range of
offset values are supported; the minimum and maximum <texOffset> values
are implementation-dependent and given by MIN_PROGRAM_TEXEL_OFFSET_EXT and
MAX_PROGRAM_TEXEL_OFFSET_EXT, respectively. A program will fail to load:
* if the texture target specified in the instruction is 1D, ARRAY1D,
SHADOW1D, or SHADOWARRAY1D, and the second or third component of the
offset vector is non-zero,
* if the texture target specified in the instruction is 2D, RECT,
ARRAY2D, SHADOW2D, SHADOWRECT, or SHADOWARRAY2D, and the third
component of the offset vector is non-zero,
* if the texture target is CUBE or SHADOWCUBE, and any component of the
offset vector is non-zero -- texel offsets are not supported for cube
map or buffer textures, or
* if any component of the offset vector is less than
MIN_PROGRAM_TEXEL_OFFSET_EXT or greater than
MAX_PROGRAM_TEXEL_OFFSET_EXT.
(NOTE: Texel offsets are a new feature provided by this extension and are
described in more detail in edits to Section 3.8 below.)
The texture used by TextureSample() is one of the textures bound to the
texture image unit whose number is specified in the instruction according
to the <texImageUnit> grammar rule. The texture target accessed is
specified according to the <texTarget> grammar rule and Table X.17.
Fixed-function texture enables are always ignored when determining the
texture to access in a program.
coordinates used
texTarget Texture Type s t r layer shadow
---------------- --------------------- ----- ----- ------
1D TEXTURE_1D x - - - -
2D TEXTURE_2D x y - - -
3D TEXTURE_3D x y z - -
CUBE TEXTURE_CUBE_MAP x y z - -
RECT TEXTURE_RECTANGLE_ARB x y - - -
ARRAY1D TEXTURE_1D_ARRAY_EXT x - - y -
ARRAY2D TEXTURE_2D_ARRAY_EXT x y - z -
SHADOW1D TEXTURE_1D x - - - z
SHADOW2D TEXTURE_2D x y - - z
SHADOWRECT TEXTURE_RECTANGLE_ARB x y - - z
SHADOWCUBE TEXTURE_CUBE_MAP x y z - w
SHADOWARRAY1D TEXTURE_1D_ARRAY_EXT x - - y z
SHADOWARRAY2D TEXTURE_2D_ARRAY_EXT x y - z w
BUFFER TEXTURE_BUFFER_EXT <not supported>
Table X.17: Texture types accessed for each of the <texTarget>, and
coordinate mappings. The "SHADOW" and "ARRAY" targets are special
pseudo-targets described below. The "coordinates used" column indicate
the input values used for each coordinate of the texture lookup, the
layer selector for array textures, and the reference value for texture
comparisons. Buffer textures are not supported by normal texture lookup
functions, but are supported by TXF and TXQ, described below.
Texture targets with "SHADOW" are used to access textures with a
DEPTH_COMPONENT base internal format using depth comparisons (Section
3.8.14). Results of a texture access are undefined:
* if a "SHADOW" target is used, and the corresponding texture has a base
internal format other than DEPTH_COMPONENT or a TEXTURE_COMPARE_MODE
of NONE, or
* if a non-"SHADOW" target is used, and the corresponding texture has a
base internal format of DEPTH_COMPONENT and a TEXTURE_COMPARE_MODE
other than NONE.
If the texture being accessed is not complete (or cube complete for
cubemap textures), no texture access is performed and the result is
undefined.
A program will fail to load if it attempts to sample from multiple texture
targets (including the SHADOW pseudo-targets) on the same texture image
unit. For example, a program containing any two the following
instructions will fail to load:
TEX out, coord, texture[0], 1D;
TEX out, coord, texture[0], 2D;
TEX out, coord, texture[0], ARRAY2D;
TEX out, coord, texture[0], SHADOW2D;
TEX out, coord, texture[0], 3D;
Additionally, multiple texture targets for a single texture image unit may
not be used at the same time by the GL. The error INVALID_OPERATION is
generated by Begin, RasterPos, or any command that performs an implicit
Begin if an enabled program accesses one texture target for a texture unit
while another enabled program or fixed-function fragment processing
accesses a different texture target for the same texture image unit.
Some texture instructions use standard methods to compute partial
derivatives and/or the level-of-detail used to perform texture accesses.
For fragment programs, the functions
float_vec ComputePartialsX(float_vec coord);
float_vec ComputePartialsY(float_vec coord);
compute approximate component-wise partial derivatives of the
floating-point vector <coord> relative to the X and Y coordinates,
respectively. For vertex and geometry programs, these functions always
return (0,0,0,0). The function
float ComputeLOD(float_vec ddx, float_vec ddy);
maps partial derivative vectors <ddx> and <ddy> to ds/dx, dt/dx, dr/dx,
ds/dy, dt/dy, and dr/dy and computes lambda_base(x,y) according to
equation 3.18.
The TXF instruction provides the ability to extract a single texel from a
specified texture image using the function
result_t_vec TexelFetch(int_vec coord, int_vec offset);
The extracted texel is converted to an (R,G,B,A) vector according to Table
3.21. The result vector is interpreted as floating-point, signed integer,
or unsigned integer, according to the data type modifier of the
instruction. If the internal format of the texture is not compatible with
the instruction's data type modifer, the extracted texel value is
undefined.
<coord> is a four-component signed integer vector used to identify the
single texel accessed. The (i,j,k) coordinates of the texel and the layer
used for array textures are extracted according to Table X.18. The level
of detail accessed is obtained by adding the w component of <coord> to the
base level (level_base). <offset> is a constant 3-component signed
integer vector added to the texel coordinates prior to the texel fetch as
described above. In addition to the restrictions described above,
non-zero offset components are also not supported for BUFFER targets.
The texture used by TexelFetch() is specified by the image unit and target
parameters provided in the instruction, as for TextureSample() above.
Single texel fetches can not perform depth comparisons or access cubemaps.
If a program contains a TXF instruction specifying one of the "SHADOW" or
"CUBE" targets, it will fail to load.
coordinates used
texTarget supported i j k layer lod
---------------- --------- ----- ----- ---
1D yes x - - - w
2D yes x y - - w
3D yes x y z - w
CUBE no - - - - -
RECT yes x y - - w
ARRAY1D yes x - - y w
ARRAY2D yes x y - z w
SHADOW1D no - - - - -
SHADOW2D no - - - - -
SHADOWRECT no - - - - -
SHADOWCUBE no - - - - -
SHADOWARRAY1D no - - - - -
SHADOWARRAY2D no - - - - -
BUFFER yes x - - - -
Table X.18, Mappings of texel fetch coordinates to texel location.
Single-texel fetches do not support LOD clamping or any texture wrap mode,
and require a mipmapped minification filter to access any level of detail
other than the base level. The results of the texel fetch are undefined:
* if the computed LOD is less than the texture's base level (level_base)
or greater than the maximum level (level_max),
* if the computed LOD is not the texture's base level and the texture's
minification filter is NEAREST or LINEAR,
* if the layer specified for array textures is negative or greater than
the number of layers in the array texture,
* if the texel at (i,j,k) coordinates refer to a border texel outside
the defined extents of the specified LOD, where
i < -b_s, j < -b_s, k < -b_s,
i >= w_s - b_s, j >= h_s - b_s, or k >= d_s - b_s,
where the size parameters (w_s, h_s, d_s, and b_s) refer to the width,
height, depth, and border size of the image, as in equations 3.15,
3.16, and 3.17, or
* if the texture being accessed is not complete (or cube complete for
cubemaps).
Section 2.X.5, Program Flow Control
In addition to basic arithmetic, logical, and texture instructions, a
number of flow control instructions are provided, which are described in
detail in Section 2.X.8. Programs can contain several types of
instruction blocks: IF/ELSE/ENDIF blocks, REP/ENDREP blocks, and
subroutine blocks. IF/ELSE/ENDIF blocks are a set of instructions
beginning with an "IF" instruction, ending with an "ENDIF" instruction,
and possibly containing an optional "ELSE" instruction. REP/ENDREP blocks
are a set of instructions beginning with a "REP" instruction and ending
with an "ENDREP" instruction. Subroutine blocks begin with an instruction
label identifying the name of the subroutine and ending just before the
next instruction label or the end of the program. Examples include the
following:
MOVC CC, R0;
IF GT.x;
MOV R0, R1; # executes if R0.x > 0
ELSE;
MOV R0, R2; # executes if R0.x <= 0
ENDIF;
REP repCount;
ADD R0, R0, R1;
ENDREP;
square: # subroutine to compute R0^2
MUL R0, R0, R0;
RET;
main:
MOV R0, 9.0;
CAL square; # compute 9.0^2 in R0
IF/ELSE/ENDIF and REP/ENDREP blocks may be nested inside each other, and
inside subroutines. In all cases, each instruction block must be
terminated with the appropriate instruction (ENDIF for IF, ENDREP for
REP). Nested instruction blocks must be wholly contained within a block
-- if a REP instruction is found between an IF and ELSE instruction, the
corresponding ENDREP must also be present between the IF and ELSE.
Subroutines may not be nested inside IF/ELSE/ENDIF or REP/ENDREP blocks,
or inside other subroutines. A program will fail to load if any
instruction block is terminated by an incorrect instruction, is not
terminated before the block containing it, or contains an instruction
label.
IF/ELSE/ENDIF blocks evaluate a condition to determine which instructions
to execute. If the condition is true, all instructions between the IF and
ELSE are executed. If the condition is false, all instructions between
the ELSE and ENDIF are executed. The ELSE instruction is optional. If
the ELSE is omitted, all instructions between the IF and ENDIF are
executed if the condition is true, or skipped if the condition is false.
A limited amount of nesting is supported -- a program will fail to load if
an IF instruction is nested inside MAX_PROGRAM_IF_DEPTH_NV or more
IF/ELSE/ENDIF blocks.
REP/ENDREP blocks are used to execute a sequence of instructions multiple
times. The REP instruction includes an optional scalar operand to specify
a loop count indicating the number of times the block of instructions
should be repeated. If the loop count is omitted, the contents of a
REP/ENDREP block will be repeated indefinitely until the loop is
explicitly terminated. A limited amount of nesting is supported -- a
program will fail to load if a REP instruction is nested inside
MAX_PROGRAM_LOOP_DEPTH_NV or more REP/ENDREP blocks.
Within a REP/ENDREP block, the CONT instruction can be used to terminate
the current iteration of the loop by effectively jumping to the ENDREP
instruction. The BRK instruction can be used to terminate the entire loop
by effectively jumping to the instruction immediately following the ENDREP
instruction. If CONT and BRK instructions are found inside multiply
nested REP/ENDREP blocks, they apply to the innermost block. A program
will fail to load if it includes a CONT or BRK instruction that is not
contained inside a REP/ENDREP block.
A REP/ENDREP block without a specified loop count can result in an
infinite loop. To prevent obvious infinite loops, a program will fail to
load if it contains a REP/ENDREP block that contains neither a BRK
instruction at the current nesting level or a RET instruction at any
nesting level.
Subroutines are supported via the CAL and RET instructions. A subroutine
block is identified by an instruction, which can be any valid identifier
according to the <instLabel> grammar rule. The CAL instruction identifies
a subroutine name to call according to the <instTarget> grammar rule.
Instruction labels used in CAL instructions do not need to be defined in
the program text that precedes the instruction, but a program will fail to
load if it includes a CAL instruction that references an instruction label
that is not defined anywhere in the program. When a CAL instruction is
executed, it transfers control to the instruction immediately following
the specified instruction label. Subsequent instructions in that
subroutine are executed until a RET instruction is executed, or until
program execution reaches another instruction label or the end of the
program text. After the subroutine finishes, execution continues with the
instruction immediately following the CAL instruction. When a RET
instruction is issued, it will break out of any IF/ELSE/ENDIF or
REP/ENDREP blocks that contain it.
Subroutines may call other subroutines before completing, up to an
implementation-dependent maximum depth of MAX_PROGRAM_CALL_DEPTH_NV calls.
Subroutines may call any subroutine in the program, including themselves,
as long as the call depth limit is obeyed. The results of issuing a CAL
instruction while MAX_PROGRAM_CALL_DEPTH subroutines have not completed
has undefined results, including possible program termination.
Several flow control instructions include condition code tests. The IF
instruction requires a condition test to determine what instructions are
executed. The CONT, BRK, CAL, and RET instructions have an optional
condition code test; if the test fails, the instructions are not executed.
Condition code tests are specified by the <ccTest> grammar rule. The test
is evaluated like the condition code write mask (section 2.X.4.3), and
passes if and only if any of the four components passes.
If an instruction label named "main" is specified, GPU program execution
begins with the instruction immediately following that label. Otherwise,
it begins with the first instruction of the program. Instructions are
executed in sequence until either a RET instruction is issued in the main
subroutine or the end of the program text is reached.
Section 2.X.6, Program Options
Programs may specify a number of options to indicate that one or more
extended language features are used by the program. All program options
used by the program must be declared at the beginning of the program
string. Each program option specified in a program string will modify the
syntactic or semantic rules used to interpet the program and the execution
environment used to execute the program. Features in program options
not declared by the program are ignored, even if the option is otherwise
supported by the GL. Each option declaration consists of two tokens: the
keyword "OPTION" and an identifier.
The set of available options depends on the program type, and is
enumerated in the specifications for each program type. Some program
types may not provide any options.
Section 2.X.7, Program Declarations
Programs may include a number of declaration statements to specify
characteristics of the program. Each declaration statement is followed by
one or more arguments, separated by commas.
The set of available declarations depends on the program type, and is
enumerated in the specifications for each program type. Some program
types may not provide declarations.
Section 2.X.8, Program Instruction Set
The following sections enumerate the set of instructions supported for GPU
programs.
Some instructions allow the use of one of the three basic data type
modifiers (floating point, signed integer, and unsigned integer). Unless
otherwise mentioned:
* the result and all of the operands will be interpreted according to
the specified data type, and
* if no data type modifier is specified, the instruction will operate as
though a floating-point modifier ("F") were specified.
Some instructions will override one or both of these rules.
Section 2.X.8.Z, ABS: Absolute Value
The ABS instruction performs a component-wise absolute value operation on
the single operand to yield a result vector.
tmp = VectorLoad(op0);
result.x = abs(tmp.x);
result.y = abs(tmp.y);
result.z = abs(tmp.z);
result.w = abs(tmp.w);
ABS supports all three data type modifiers. Taking the absolute value of
an unsigned integer is not a useful operation, but is not illegal.
Section 2.X.8.Z, ADD: Add
The ADD instruction performs a component-wise add of the two operands to
yield a result vector.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
result.x = tmp0.x + tmp1.x;
result.y = tmp0.y + tmp1.y;
result.z = tmp0.z + tmp1.z;
result.w = tmp0.w + tmp1.w;
ADD supports all three data type modifiers.
Section 2.X.8.Z, AND: Bitwise AND
The AND instruction performs a bitwise AND operation on the components of
the two source vectors to yield a result vector.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
result.x = tmp0.x & tmp1.x;
result.y = tmp0.y & tmp1.y;
result.z = tmp0.z & tmp1.z;
result.w = tmp0.w & tmp1.w;
AND supports only signed and unsigned integer data type modifiers. If no
type modifier is specified, both operands and the result are treated as
signed integers.
Section 2.X.8.Z, BRK: Break out of Loop Instruction
The BRK instruction conditionally transfers control to the instruction
immediately following the next ENDREP instruction. A BRK instruction has
no effect if the condition code test evaluates to FALSE.
The following pseudocode describes the operation of the instruction:
if (TestCC(cc.c***) || TestCC(cc.*c**) ||
TestCC(cc.**c*) || TestCC(cc.***c)) {
continue execution at instruction following the next ENDREP;
}
Section 2.X.8.Z, CAL: Subroutine Call
The CAL instruction conditionally transfers control to the instruction
following the label specified in the instruction. It also pushes a
reference to the instruction immediately following the CAL instruction
onto the call stack, where execution will continue after executing the
matching RET instruction. The following pseudocode describes the
operation of the instruction:
if (TestCC(cc.c***) || TestCC(cc.*c**) ||
TestCC(cc.**c*) || TestCC(cc.***c)) {
if (callStackDepth >= MAX_PROGRAM_CALL_DEPTH_NV) {
// undefined results
} else {
callStack[callStackDepth] = nextInstruction;
callStackDepth++;
}
// continue execution at instruction following <instTarget>
} else {
// do nothing
}
In the pseudocode, <instTarget> is the label specified in the instruction
matching the <branchLabel> grammar rule, <callStackDepth> is the current
depth of the call stack, <callStack> is an array holding the call stack,
and <nextInstruction> is a reference to the instruction immediately
following the CAL instruction in the program string.
If the call stack overflows, the results of the CAL instruction are
undefined, and can result in immediate program termination.
An instruction label signifies the beginning of a new subroutine.
Subroutines may not nest or overlap. If a CAL instruction is executed and
subsequent program execution reaches an instruction label before a
corresponding RET instruction is executed, the subroutine call returns
immediately, as though an unconditional RET instruction were inserted
immediately before the instruction label.
(Note: On previous vertex program extensions -- NV_vertex_program2 and
NV_vertex_program3 -- instruction labels were also used as targets for
branch (BRA) instructions. This unstructured branching functionality has
been replaced with the structured branching constructs found in this
instruction set.)
Section 2.X.8.Z, CEIL: Ceiling
The CEIL instruction loads a single vector operand and performs a
component-wise ceiling operation to generate a result vector.
tmp = VectorLoad(op0);
iresult.x = ceil(tmp.x);
iresult.y = ceil(tmp.y);
iresult.z = ceil(tmp.z);
iresult.w = ceil(tmp.w);
The ceiling operation returns the nearest integer greater than or equal to
the operand. For example ceil(-1.7) = -1.0, ceil(+1.0) = +1.0, and
ceil(+3.7) = +4.0.
CEIL supports all three data type modifiers. The single operand is always
treated as a floating-point vector, but the result is written as a
floating-point value, a signed integer, or an unsigned integer, as
specified by the data type modifier. If a value is not exactly
representable using the data type of the result (e.g., an overflow or
writing a negative value to an unsigned integer), the result is undefined.
Section 2.X.8.Z, CMP: Compare
The CMP instructions performs a component-wise comparison of the first
operand against zero, and copies the values of the second or third
operands based on the results of the compare.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
tmp2 = VectorLoad(op2);
result.x = (tmp0.x < 0) ? tmp1.x : tmp2.x;
result.y = (tmp0.y < 0) ? tmp1.y : tmp2.y;
result.z = (tmp0.z < 0) ? tmp1.z : tmp2.z;
result.w = (tmp0.w < 0) ? tmp1.w : tmp2.w;
CMP supports all three data type modifiers. CMP with an unsigned data
type modifier is not a useful operation, but is not illegal.
Section 2.X.8.Z, CONT: Continue with Next Loop Iteration
The CONT instruction conditionally transfers control to the next ENDREP
instruction. A CONT instruction has no effect if the condition code test
evaluates to FALSE.
The following pseudocode describes the operation of the instruction:
if (TestCC(cc.c***) || TestCC(cc.*c**) ||
TestCC(cc.**c*) || TestCC(cc.***c)) {
continue execution at the next ENDREP;
}
Section 2.X.8.Z, COS: Cosine with Reduction to [-PI,PI]
The COS instruction approximates the trigonometric cosine of the angle
specified by the scalar operand and replicates it to all four components
of the result vector. The angle is specified in radians and does not have
to be in the range [-PI,PI].
tmp = ScalarLoad(op0);
result.x = ApproxCosine(tmp);
result.y = ApproxCosine(tmp);
result.z = ApproxCosine(tmp);
result.w = ApproxCosine(tmp);
COS supports only floating-point data type modifiers.
Section 2.X.8.Z, DDX: Partial Derivative Relative to X
The DDX instruction computes approximate partial derivatives of a vector
operand with respect to the X window coordinate, and is only available to
fragment programs. See the NV_fragment_program4 specification for more
details.
Section 2.X.8.Z, DDY: Partial Derivative Relative to Y
The DDY instruction computes approximate partial derivatives of a vector
operand with respect to the Y window coordinate, and is only available to
fragment programs. See the NV_fragment_program4 specification for more
details.
Section 2.X.8.Z, DIV: Divide Vector Components by Scalar
The DIV instruction performs a component-wise divide of the first vector
operand by the second scalar operand to produce a 4-component result
vector.
tmp0 = VectorLoad(op0);
tmp1 = ScalarLoad(op1);
result.x = tmp0.x / tmp1;
result.y = tmp0.y / tmp1;
result.z = tmp0.z / tmp1;
result.w = tmp0.w / tmp1;
DIV supports all three data type modifiers. For floating-point division,
this instruction is not guaranteed to produce results identical to a
RCP/MUL instruction sequence.
The results of an signed or unsigned integer division by zero are
undefined.
Section 2.X.8.Z, DP2: 2-Component Dot Product
The DP2 instruction computes a two-component dot product of the two
operands (using the first two components) and replicates the dot product
to all four components of the result vector.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
dot = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y);
result.x = dot;
result.y = dot;
result.z = dot;
result.w = dot;
DP2 supports only floating-point data type modifiers.
Section 2.X.8.Z, DP2A: 2-Component Dot Product with Scalar Add
The DP2 instruction computes a two-component dot product of the two
operands (using the first two components), adds the x component of the
third operand, and replicates the result to all four components of the
result vector.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
tmp2 = VectorLoad(op2);
dot = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + tmp2.x;
result.x = dot;
result.y = dot;
result.z = dot;
result.w = dot;
DP2A supports only floating-point data type modifiers.
Section 2.X.8.Z, DP3: 3-Component Dot Product
The DP3 instruction computes a three-component dot product of the two
operands (using the x, y, and z components) and replicates the dot product
to all four components of the result vector.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
dot = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +
(tmp0.z * tmp1.z);
result.x = dot;
result.y = dot;
result.z = dot;
result.w = dot;
DP3 supports only floating-point data type modifiers.
Section 2.X.8.Z, DP4: 4-Component Dot Product
The DP4 instruction computes a four-component dot product of the two
operands and replicates the dot product to all four components of the
result vector.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1):
dot = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +
(tmp0.z * tmp1.z) + (tmp0.w * tmp1.w);
result.x = dot;
result.y = dot;
result.z = dot;
result.w = dot;
DP4 supports only floating-point data type modifiers.
Section 2.X.8.Z, DPH: Homogeneous Dot Product
The DPH instruction computes a three-component dot product of the two
operands (using the x, y, and z components), adds the w component of the
second operand, and replicates the sum to all four components of the
result vector. This is equivalent to a four-component dot product where
the w component of the first operand is forced to 1.0.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1):
dot = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +
(tmp0.z * tmp1.z) + tmp1.w;
result.x = dot;
result.y = dot;
result.z = dot;
result.w = dot;
DPH supports only floating-point data type modifiers.
Section 2.X.8.Z, DST: Distance Vector
The DST instruction computes a distance vector from two specially-
formatted operands. The first operand should be of the form [NA, d^2,
d^2, NA] and the second operand should be of the form [NA, 1/d, NA, 1/d],
where NA values are not relevant to the calculation and d is a vector
length. If both vectors satisfy these conditions, the result vector will
be of the form [1.0, d, d^2, 1/d].
The exact behavior is specified in the following pseudo-code:
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
result.x = 1.0;
result.y = tmp0.y * tmp1.y;
result.z = tmp0.z;
result.w = tmp1.w;
Given an arbitrary vector, d^2 can be obtained using the DP3 instruction
(using the same vector for both operands) and 1/d can be obtained from d^2
using the RSQ instruction.
This distance vector is useful for per-vertex light attenuation
calculations: a DP3 operation using the distance vector and an
attenuation constants vector as operands will yield the attenuation
factor.
DST supports only floating-point data type modifiers.
Section 2.X.8.Z, ELSE: Start of If Test Else Block
The ELSE instruction signifies the end of the "execute if true" portion of
an IF/ELSE/ENDIF block and the beginning of the "execute if false"
portion.
If the condition evaluated at the IF statement was TRUE, when a program
reaches the ELSE statement, it has completed the entire "execute if true"
portion of the IF/ELSE/ENDIF block. Execution will continue at the
corresponding ENDIF instruction.
If the condition evaluated at the IF statement was FALSE, program
execution would skip over the entire "execute if true" portion of the
IF/ELSE/ENDIF block, including the ELSE instruction.
Section 2.X.8.Z, EMIT: Emit Vertex
The EMIT instruction emits a new vertex to be added to the current output
primitive generated by a geometry program, and is only available to
geometry programs. See the NV_geometry_program4 specification for more
details.
Section 2.X.8.Z, ENDIF: End of If Test Block
The ENDIF instruction signifies the end of an IF/ELSE/ENDIF block. It has
no other effect on program execution.
Section 2.X.8,Z, ENDPRIM: End of Primitive
A geometry program can emit multiple primitives in a single invocation.
The ENDPRIM instruction is used in a geometry program to signify the end
of the current primitive and the beginning of a new primitive of the same
type. It is only available to geometry programs. See the
NV_geometry_program4 specification for more details.
Section 2.X.8.Z, ENDREP: End of Repeat Block
The ENDREP instruction specifies the end of a REP block.
When used with in conjunction with a REP instruction with a loop count,
ENDREP decrements the loop counter. If the decremented loop counter is
greater than zero, ENDREP transfers control to the instruction immediately
after the corresponding REP instruction. If the loop counter is less than
or equal to zero, execution continues at the instruction following the
ENDREP instruction. When used in conjunction with a REP instruction
without loop count, ENDREP always transfers control to the instruction
immediately after the REP instruction.
if (REP instruction includes a loop count) {
LoopCount--;
if (LoopCount > 0) {
continue execution at instruction following corresponding REP
instruction;
}
} else {
continue execution at instruction following corresponding REP
instruction;
}
Section 2.X.8.Z, EX2: Exponential Base 2
The EX2 instruction approximates 2 raised to the power of the scalar
operand and replicates the approximation to all four components of the
result vector.
tmp = ScalarLoad(op0);
result.x = Approx2ToX(tmp);
result.y = Approx2ToX(tmp);
result.z = Approx2ToX(tmp);
result.w = Approx2ToX(tmp);
EX2 supports only floating-point data type modifiers.
Section 2.X.8.Z, FLR: Floor
The FLR instruction loads a single vector operand and performs a
component-wise floor operation to generate a result vector.
tmp = VectorLoad(op0);
result.x = floor(tmp.x);
result.y = floor(tmp.y);
result.z = floor(tmp.z);
result.w = floor(tmp.w);
The floor operation returns the nearest integer less than or equal to the
operand. For example floor(-1.7) = -2.0, floor(+1.0) = +1.0, and floor(+3.7)
= +3.0.
FLR supports all three data type modifiers. The single operand is always
treated as a floating-point value, but the result is written as a
floating-point value, a signed integer, or an unsigned integer, as
specified by the data type modifier. If a value is not exactly
representable using the data type of the result (e.g., an overflow or
writing a negative value to an unsigned integer), the result is undefined.
Section 2.X.8.Z, FRC: Fraction
The FRC instruction extracts the fractional portion of each component of
the operand to generate a result vector. The fractional portion of a
component is defined as the result after subtracting off the floor of the
component (see FLR), and is always in the range [0.0, 1.0).
For negative values, the fractional portion is NOT the number written to
the right of the decimal point -- the fractional portion of -1.7 is not
0.7 -- it is 0.3. 0.3 is produced by subtracting the floor of -1.7 (-2.0)
from -1.7.
tmp = VectorLoad(op0);
result.x = fraction(tmp.x);
result.y = fraction(tmp.y);
result.z = fraction(tmp.z);
result.w = fraction(tmp.w);
FRC supports only floating-point data type modifiers.
Section 2.X.8.Z, I2F: Integer to Float
The I2F instruction converts the components of an integer vector operand
to floating-point to produce a floating-point result vector.
tmp = VectorLoad(op0);
result.x = (float) tmp.x;
result.y = (float) tmp.y;
result.z = (float) tmp.z;
result.w = (float) tmp.w;
I2F supports only signed and unsigned integer data type modifiers. The
single operand is interpreted according to the data type modifier. If no
data type modifier is specified, the operand is treated as a signed
integer vector. The result is always written as a float.
Section 2.X.8.Z, IF: Start of If Test Block
The IF instruction performs a condition code test to determine what
instructions inside an IF/ELSE/ENDIF block are executed. If the test
passes, execution continues at the instruction immediately following the
IF instruction. If the test fails, IF transfers control to the
instruction immediately following the corresponding ELSE instruction (if
present) or the ENDIF instruction (if no ELSE is present).
Implementations may have a limited ability to nest IF blocks in any
subroutine. If the number of IF/ENDIF blocks nested inside each other is
MAX_PROGRAM_IF_DEPTH_NV or higher, a program will fail to compile.
// Evaluate the condition. If the condition is true, continue at the
// next instruction. Otherwise, continue at the
if (TestCC(cc.c***) || TestCC(cc.*c**) ||
TestCC(cc.**c*) || TestCC(cc.***c)) {
continue execution at the next instruction;
} else if (IF block contains an ELSE statement) {
continue execution at instruction following corresponding ELSE;
} else {
continue execution at instruction following corresponding ENDIF;
}
(Note: Unlike the NV_fragment_program2 extension, there is no run-time
limit on the maximum overall depth of IF/ENDIF nesting. As long as each
individual subroutine of the program obeys the static nesting limits,
there will be no run-time errors in the program. With the
NV_fragment_program2 extension, a program could terminate abnormally if it
called a subroutine inside a very deeply nested set of IF/ENDIF blocks and
the called subroutine also contained deeply nested IF/ENDIF blocks. SUch
an error could occur even if neither subroutine exceeded static limits.)
Section 2.X.8.Z, KIL: Kill Fragment
The KIL instruction conditionally kills a fragment, and is only available
to fragment programs. See the NV_fragment_program4 specification for more
details.
Section 2.X.8.Z, LG2: Logarithm Base 2
The LG2 instruction approximates the base 2 logarithm of the scalar
operand and replicates it to all four components of the result vector.
tmp = ScalarLoad(op0);
result.x = ApproxLog2(tmp);
result.y = ApproxLog2(tmp);
result.z = ApproxLog2(tmp);
result.w = ApproxLog2(tmp);
If the scalar operand is zero or negative, the result is undefined.
LG2 supports only floating-point data type modifiers.
Section 2.X.8.Z, LIT: Compute Lighting Coefficients
The LIT instruction accelerates lighting computations by computing
lighting coefficients for ambient, diffuse, and specular light
contributions. The "x" component of the single operand is assumed to hold
a diffuse dot product (n dot VP_pli, as in the vertex lighting equations
in Section 2.13.1). The "y" component of the operand is assumed to hold a
specular dot product (n dot h_i). The "w" component of the operand is
assumed to hold the specular exponent of the material (s_rm), and is
clamped to the range (-128, +128) exclusive.
The "x" component of the result vector receives the value that should be
multiplied by the ambient light/material product (always 1.0). The "y"
component of the result vector receives the value that should be
multiplied by the diffuse light/material product (n dot VP_pli). The "z"
component of the result vector receives the value that should be
multiplied by the specular light/material product (f_i * (n dot h_i) ^
s_rm). The "w" component of the result is the constant 1.0.
Negative diffuse and specular dot products are clamped to 0.0, as is done
in the standard per-vertex lighting operations. In addition, if the
diffuse dot product is zero or negative, the specular coefficient is
forced to zero.
tmp = VectorLoad(op0);
if (tmp.x < 0) tmp.x = 0;
if (tmp.y < 0) tmp.y = 0;
if (tmp.w < -(128.0-epsilon)) tmp.w = -(128.0-epsilon);
else if (tmp.w > 128-epsilon) tmp.w = 128-epsilon;
result.x = 1.0;
result.y = tmp.x;
result.z = (tmp.x > 0) ? RoughApproxPower(tmp.y, tmp.w) : 0.0;
result.w = 1.0;
Since 0^0 is defined to be 1, RoughApproxPower(0.0, 0.0) will produce 1.0.
LIT supports only floating-point data type modifiers.
Section 2.X.8.Z, LRP: Linear Interpolation
The LRP instruction performs a component-wise linear interpolation between
the second and third operands using the first operand as the blend factor.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
tmp2 = VectorLoad(op2);
result.x = tmp0.x * tmp1.x + (1 - tmp0.x) * tmp2.x;
result.y = tmp0.y * tmp1.y + (1 - tmp0.y) * tmp2.y;
result.z = tmp0.z * tmp1.z + (1 - tmp0.z) * tmp2.z;
result.w = tmp0.w * tmp1.w + (1 - tmp0.w) * tmp2.w;
LRP supports only floating-point data type modifiers.
Section 2.X.8.Z, MAD: Multiply and Add
The MAD instruction performs a component-wise multiply of the first two
operands, and then does a component-wise add of the product to the third
operand to yield a result vector.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
tmp2 = VectorLoad(op2);
result.x = tmp0.x * tmp1.x + tmp2.x;
result.y = tmp0.y * tmp1.y + tmp2.y;
result.z = tmp0.z * tmp1.z + tmp2.z;
result.w = tmp0.w * tmp1.w + tmp2.w;
The multiplication and addition operations in this instruction are subject
to the same rules as described for the MUL and ADD instructions.
MAD supports all three data type modifiers.
Section 2.X.8.Z, MAX: Maximum
The MAX instruction computes component-wise maximums of the values in the
two operands to yield a result vector.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
result.x = (tmp0.x > tmp1.x) ? tmp0.x : tmp1.x;
result.y = (tmp0.y > tmp1.y) ? tmp0.y : tmp1.y;
result.z = (tmp0.z > tmp1.z) ? tmp0.z : tmp1.z;
result.w = (tmp0.w > tmp1.w) ? tmp0.w : tmp1.w;
MAX supports all three data type modifiers.
Section 2.X.8.Z, MIN: Minimum
The MIN instruction computes component-wise minimums of the values in the
two operands to yield a result vector.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
result.x = (tmp0.x > tmp1.x) ? tmp1.x : tmp0.x;
result.y = (tmp0.y > tmp1.y) ? tmp1.y : tmp0.y;
result.z = (tmp0.z > tmp1.z) ? tmp1.z : tmp0.z;
result.w = (tmp0.w > tmp1.w) ? tmp1.w : tmp0.w;
MIN supports all three data type modifiers.
Section 2.X.8.Z, MOD: Modulus
The MOD instruction performs a component-wise modulus operation on the first
vector operand by the second scalar operand to produce a 4-component result
vector.
tmp0 = VectorLoad(op0);
tmp1 = ScalarLoad(op1);
result.x = tmp0.x % tmp1;
result.y = tmp0.y % tmp1;
result.z = tmp0.z % tmp1;
result.w = tmp0.w % tmp1;
MOD supports both signed and unsigned integer data type modifiers. If no
data type modifier is specified, both operands and the result are treated
as signed integers.
A result component is undefined if the corresponding component of the
first operand is negative or if the second operand is less than or equal
to zero.
Section 2.X.8.Z, MOV: Move
The MOV instruction copies the value of the operand to yield a result
vector.
result = VectorLoad(op0);
MOV supports all three data type modifiers.
Section 2.X.8.Z, MUL: Multiply
The MUL instruction performs a component-wise multiply of the two operands
to yield a result vector.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
result.x = tmp0.x * tmp1.x;
result.y = tmp0.y * tmp1.y;
result.z = tmp0.z * tmp1.z;
result.w = tmp0.w * tmp1.w;
MUL supports all three data type modifiers. The MUL instruction
additionally supports three special modifiers.
The "S24" and "U24" modifiers specify "fast" signed or unsigned integer
multiplies of 24-bit quantities, respectively. The results of such
multiplies are undefined if either operand is outside the range
[-2^23,+2^23-1] for S24 or [0,2^24-1] for U24. If "S24" or "U24" is
specified, the data type is implied and normal data type modifiers may not
be provided.
The "HI" modifier specifies a 32-bit integer multiply that returns the 32
most significant bits of the 64-bit product. Integer multiplies without
the "HI" modifier normally return the least significant bits of the
product. If "HI" is specified, either of the "S" or "U" integer data type
modifiers must also be specified.
Note that if condition code updates are performed on integer multiplies,
the overflow or carry flags are always cleared, even if the product
overflowed. If it is necessary to determine if the results of an integer
multiply overflowed, the MUL.HI instruction may be used.
Section 2.X.8.Z, NOT: Bitwise Not
The NOT instruction performs a component-wise bitwise NOT operation on the
source vector to produce a result vector.
tmp = VectorLoad(op0);
tmp.x = ~tmp.x;
tmp.y = ~tmp.y;
tmp.z = ~tmp.z;
tmp.w = ~tmp.w;
NOT supports only integer data type modifiers. If no type modifier is
specified, the operand and the result are treated as signed integers.
Section 2.X.8.Z, NRM: Normalize 3-Component Vector
The NRM instruction normalizes the vector given by the x, y, and z
components of the vector operand to produce the x, y, and z components of
the result vector. The w component of the result is undefined.
tmp = VectorLoad(op0);
scale = ApproxRSQ(tmp.x * tmp.x + tmp.y * tmp.y + tmp.z * tmp.z);
result.x = tmp.x * scale;
result.y = tmp.y * scale;
result.z = tmp.z * scale;
result.w = undefined;
NRM supports only floating-point data type modifiers.
Section 2.X.8.Z, OR: Bitwise Or
The OR instruction performs a bitwise OR operation on the components of
the two source vectors to yield a result vector.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
result.x = tmp0.x | tmp1.x;
result.y = tmp0.y | tmp1.y;
result.z = tmp0.z | tmp1.z;
result.w = tmp0.w | tmp1.w;
OR supports only integer data type modifiers. If no type modifier is
specified, both operands and the result are treated as signed integers.
Section 2.X.8.Z, PK2H: Pack Two 16-bit Floats
The PK2H instruction converts the "x" and "y" components of the single
floating-point vector operand into 16-bit floating-point format, packs the
bit representation of these two floats into a 32-bit unsigned integer, and
replicates that value to all four components of the result vector. The
PK2H instruction can be reversed by the UP2H instruction below.
tmp0 = VectorLoad(op0);
/* result obtained by combining raw bits of tmp0.x, tmp0.y */
result.x = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16);
result.y = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16);
result.z = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16);
result.w = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16);
PK2H supports all three data type modifiers. The single operand is always
treated as a floating-point value, but the result is written as a
floating-point value, a signed integer, or an unsigned integer, as
specified by the data type modifier. For integer results, the bits can be
interpreted as described above. For floating-point result variables, the
packed results do not constitute a meaningful floating-point variable and
should only be used to feed future unpack instructions.
A program will fail to load if it contains a PK2H instruction that writes
its results to a variable declared as "SHORT".
Section 2.X.8.Z, PK2US: Pack Two Floats as Unsigned 16-bit
The PK2US instruction converts the "x" and "y" components of the single
floating-point vector operand into a packed pair of 16-bit unsigned
scalars. The scalars are represented in a bit pattern where all '0' bits
corresponds to 0.0 and all '1' bits corresponds to 1.0. The bit
representations of the two converted components are packed into a 32-bit
unsigned integer, and that value is replicated to all four components of
the result vector. The PK2US instruction can be reversed by the UP2US
instruction below.
tmp0 = VectorLoad(op0);
if (tmp0.x < 0.0) tmp0.x = 0.0;
if (tmp0.x > 1.0) tmp0.x = 1.0;
if (tmp0.y < 0.0) tmp0.y = 0.0;
if (tmp0.y > 1.0) tmp0.y = 1.0;
us.x = round(65535.0 * tmp0.x); /* us is a ushort vector */
us.y = round(65535.0 * tmp0.y);
/* result obtained by combining raw bits of us. */
result.x = ((us.x) | (us.y << 16));
result.y = ((us.x) | (us.y << 16));
result.z = ((us.x) | (us.y << 16));
result.w = ((us.x) | (us.y << 16));
PK2US supports all three data type modifiers. The single operand is
always treated as a floating-point value, but the result is written as a
floating-point value, a signed integer, or an unsigned integer, as
specified by the data type modifier. For integer result variables, the
bits can be interpreted as described above. For floating-point result
variables, the packed results do not constitute a meaningful
floating-point variable and should only be used to feed future unpack
instructions.
A program will fail to load if it contains a PK2US instruction that writes
its results to a variable declared as "SHORT".
Section 2.X.8.Z, PK4B: Pack Four Floats as Signed 8-bit
The PK4B instruction converts the four components of the single
floating-point vector operand into 8-bit signed quantities. The signed
quantities are represented in a bit pattern where all '0' bits corresponds
to -128/127 and all '1' bits corresponds to +127/127. The bit
representations of the four converted components are packed into a 32-bit
unsigned integer, and that value is replicated to all four components of
the result vector. The PK4B instruction can be reversed by the UP4B
instruction below.
tmp0 = VectorLoad(op0);
if (tmp0.x < -128/127) tmp0.x = -128/127;
if (tmp0.y < -128/127) tmp0.y = -128/127;
if (tmp0.z < -128/127) tmp0.z = -128/127;
if (tmp0.w < -128/127) tmp0.w = -128/127;
if (tmp0.x > +127/127) tmp0.x = +127/127;
if (tmp0.y > +127/127) tmp0.y = +127/127;
if (tmp0.z > +127/127) tmp0.z = +127/127;
if (tmp0.w > +127/127) tmp0.w = +127/127;
ub.x = round(127.0 * tmp0.x + 128.0); /* ub is a ubyte vector */
ub.y = round(127.0 * tmp0.y + 128.0);
ub.z = round(127.0 * tmp0.z + 128.0);
ub.w = round(127.0 * tmp0.w + 128.0);
/* result obtained by combining raw bits of ub. */
result.x = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
result.y = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
result.z = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
result.w = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
PK4B supports all three data type modifiers. The single operand is always
treated as a floating-point value, but the result is written as a
floating-point value, a signed integer, or an unsigned integer, as
specified by the data type modifier. For integer result variables, the
bits can be interpreted as described above. For floating-point result
variables, the packed results do not constitute a meaningful
floating-point variable and should only be used to feed future unpack
instructions. A program will fail to load if it contains a PK4B
instruction that writes its results to a variable declared as "SHORT".
Section 2.X.8.Z, PK4UB: Pack Four Floats as Unsigned 8-bit
The PK4UB instruction converts the four components of the single
floating-point vector operand into a packed grouping of 8-bit unsigned
scalars. The scalars are represented in a bit pattern where all '0' bits
corresponds to 0.0 and all '1' bits corresponds to 1.0. The bit
representations of the four converted components are packed into a 32-bit
unsigned integer, and that value is replicated to all four components of
the result vector. The PK4UB instruction can be reversed by the UP4UB
instruction below.
tmp0 = VectorLoad(op0);
if (tmp0.x < 0.0) tmp0.x = 0.0;
if (tmp0.x > 1.0) tmp0.x = 1.0;
if (tmp0.y < 0.0) tmp0.y = 0.0;
if (tmp0.y > 1.0) tmp0.y = 1.0;
if (tmp0.z < 0.0) tmp0.z = 0.0;
if (tmp0.z > 1.0) tmp0.z = 1.0;
if (tmp0.w < 0.0) tmp0.w = 0.0;
if (tmp0.w > 1.0) tmp0.w = 1.0;
ub.x = round(255.0 * tmp0.x); /* ub is a ubyte vector */
ub.y = round(255.0 * tmp0.y);
ub.z = round(255.0 * tmp0.z);
ub.w = round(255.0 * tmp0.w);
/* result obtained by combining raw bits of ub. */
result.x = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
result.y = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
result.z = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
result.w = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
PK4UB supports all three data type modifiers. The single operand is
always treated as a floating-point value, but the result is written as a
floating-point value, a signed integer, or an unsigned integer, as
specified by the data type modifier. For integer result variables, the
bits can be interpreted as described above. For floating-point result
variables, the packed results do not constitute a meaningful
floating-point variable and should only be used to feed future unpack
instructions.
A program will fail to load if it contains a PK4UB instruction that writes
its results to a variable declared as "SHORT".
Section 2.X.8.Z, POW: Exponentiate
The POW instruction approximates the value of the first scalar operand
raised to the power of the second scalar operand and replicates it to all
four components of the result vector.
tmp0 = ScalarLoad(op0);
tmp1 = ScalarLoad(op1);
result.x = ApproxPower(tmp0, tmp1);
result.y = ApproxPower(tmp0, tmp1);
result.z = ApproxPower(tmp0, tmp1);
result.w = ApproxPower(tmp0, tmp1);
The exponentiation approximation function may be implemented using the
base 2 exponentiation and logarithm approximation operations in the EX2
and LG2 instructions. In particular,
ApproxPower(a,b) = ApproxExp2(b * ApproxLog2(a)).
Note that a logarithm may be involved even for cases where the exponent is
an integer. This means that it may not be possible to exponentiate
correctly with a negative base. In constrast, it is possible in a
"normal" mathematical formulation to raise negative numbers to integral
powers (e.g., (-3)^2== 9, and (-0.5)^-2==4).
POW supports only floating-point data type modifiers.
Section 2.X.8.Z, RCC: Reciprocal (Clamped)
The RCC instruction approximates the reciprocal of the scalar operand,
clamps the result to one of two ranges, and replicates the clamped result
to all four components of the result vector.
If the approximated reciprocal is greater than 0.0, the result is clamped
to the range [2^-64, 2^+64]. If the approximate reciprocal is not greater
than zero, the result is clamped to the range [-2^+64, -2^-64].
tmp = ScalarLoad(op0);
result.x = ClampApproxReciprocal(tmp);
result.y = ClampApproxReciprocal(tmp);
result.z = ClampApproxReciprocal(tmp);
result.w = ClampApproxReciprocal(tmp);
RCC supports only floating-point data type modifiers.
Section 2.X.8.Z, RCP: Reciprocal
The RCP instruction approximates the reciprocal of the scalar operand and
replicates it to all four components of the result vector.
tmp = ScalarLoad(op0);
result.x = ApproxReciprocal(tmp);
result.y = ApproxReciprocal(tmp);
result.z = ApproxReciprocal(tmp);
result.w = ApproxReciprocal(tmp);
RCP supports only floating-point data type modifiers.
Section 2.X.8.Z, REP: Start of Repeat Block
The REP instruction begins a REP/ENDREP block. The REP instruction
supports an optional operand whose x component specifies the initial value
for the loop count. The loop count indicates the number of times the
instructions between the REP and corresponding ENDREP instruction will be
executed. If the initial value of the loop count is not positive, the
entire block is skipped and execution continues at the instruction
following the corresponding ENDREP instruction. If the loop count is
specified as a floating-point value, it is converted to the largest
integer less than or equal to the specified value (i.e., taking its
floor).
If no operand is provided to REP, the loop count is ignored and the
corresponding ENDREP instruction unconditionally transfers control to the
instruction immediately following the REP instruction. The only way to
exit such a loop is with the BRK instruction. To prevent obvious infinite
loops, a program that includes a REP/ENDREP block with no loop count will
fail to compile unless it contains either a BRK instruction at the current
nesting level or a RET instruction at any nesting level.
Implementations may have a limited ability to nest REP/ENDREP blocks. If
the number of REP/ENDREP blocks nested inside each other is
MAX_PROGRAM_LOOP_DEPTH_NV or higher, a program will fail to compile.
// Set up loop information for the new nesting level.
tmp = VectorLoad(op0);
LoopCount = floor(tmp.x);
if (LoopCount <= 0) {
continue execution at the corresponding ENDREP;
}
REP supports all three data type modifiers. The single operand is
interpreted according to the data type modifier.
(Note: Unlike the NV_fragment_program2 extension, REP blocks in this
extension support fully general looping; the specified loop count can be
computed in the program itself. Additionally, there is no run-time limit
on the maximum overall depth of REP/ENDREP nesting. As long as each
individual subroutine of the program obeys the static nesting limits,
there will be no run-time errors in the program. With the
NV_fragment_program2 extension, a program could terminate abnormally if it
called a subroutine inside a deeply nested set of REP/ENDREP blocks and
the called subroutine also contained deeply nested REP/ENDREP blocks.
Such an error could occur even if neither subroutine exceeded static
limits.)
Section 2.X.8.Z, RET: Subroutine Return
The RET instruction conditionally returns from a subroutine initiated by a
CAL instruction by popping an instruction reference off the top of the
call stack and transferring control to the referenced instruction. The
following pseudocode describes the operation of the instruction:
if (TestCC(cc.c***) || TestCC(cc.*c**) ||
TestCC(cc.**c*) || TestCC(cc.***c)) {
if (callStackDepth <= 0) {
// terminate program
} else {
callStackDepth--;
instruction = callStack[callStackDepth];
}
// continue execution at <instruction>
} else {
// do nothing
}
In the pseudocode, <callStackDepth> is the depth of the call stack,
<callStack> is an array holding the call stack, and <instruction> is a
reference to an instruction previously pushed onto the call stack.
If the call stack is empty when RET executes, the program terminates
normally.
Section 2.X.8.Z, RFL: Reflection Vector
The RFL instruction computes the reflection of the second vector operand
(the "direction" vector) about the vector specified by the first vector
operand (the "axis" vector). Both operands are treated as 3D vectors (the
w components are ignored). The result vector is another 3D vector (the
"reflected direction" vector). The length of the result vector, ignoring
rounding errors, should equal that of the second operand.
axis = VectorLoad(op0);
direction = VectorLoad(op1);
tmp.w = (axis.x * axis.x + axis.y * axis.y + axis.z * axis.z);
tmp.x = (axis.x * direction.x + axis.y * direction.y +
axis.z * direction.z);
tmp.x = 2.0 * tmp.x;
tmp.x = tmp.x / tmp.w;
result.x = tmp.x * axis.x - direction.x;
result.y = tmp.x * axis.y - direction.y;
result.z = tmp.x * axis.z - direction.z;
result.w = undefined;
RFL supports only floating-point data type modifiers.
Section 2.X.8.Z, ROUND: Round to Nearest Integer
The ROUND instruction loads a single vector operand and performs a
component-wise round operation to generate a result vector.
tmp = VectorLoad(op0);
result.x = round(tmp.x);
result.y = round(tmp.y);
result.z = round(tmp.z);
result.w = round(tmp.w);
The round operation returns the nearest integer to the operand. If the
fractional portion of the operand is 0.5, round() selects the nearest even
integer. For example round(-1.7) = -2.0, round(+1.0) = +1.0, and
round(+3.7) = +4.0.
ROUND supports all three data type modifiers. The single operand is
always treated as a floating-point value, but the result is written as a
floating-point value, a signed integer, or an unsigned integer, as
specified by the data type modifier. If a value is not exactly
representable using the data type of the result (e.g., an overflow or
writing a negative value to an unsigned integer), the result is undefined.
Section 2.X.8.Z, RSQ: Reciprocal Square Root
The RSQ instruction approximates the reciprocal of the square root of the
scalar operand and replicates it to all four components of the result
vector.
tmp = ScalarLoad(op0);
result.x = ApproxRSQRT(tmp);
result.y = ApproxRSQRT(tmp);
result.z = ApproxRSQRT(tmp);
result.w = ApproxRSQRT(tmp);
If the operand is less than or equal to zero, the results of the
instruction are undefined.
RSQ supports only floating-point data type modifiers.
Note that this instruction differs from the RSQ instruction in
ARB_vertex_program in that it does not implicitly take the absolute value
of its operand. The |abs| operator can be used to achieve equivalent
semantics.
Section 2.X.8.Z, SAD: Sum of Absolute Differences
The SAD instruction performs a component-wise difference of the first two
integer operands (subtracting the second from the first), and then does a
component-wise add of the absolute value of the difference to the third
unsigned integer operand to yield an unsigned integer result vector.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
tmp2 = VectorLoad(op2);
result.x = abs(tmp0.x - tmp1.x) + tmp2.x;
result.y = abs(tmp0.y - tmp1.y) + tmp2.y;
result.z = abs(tmp0.z - tmp1.z) + tmp2.z;
result.w = abs(tmp0.w - tmp1.w) + tmp2.w;
SAD supports signed and unsigned integer data type modifiers. The first
two operands are interpreted according to the data type modifier. The
third operand and the result are always unsigned integers.
Section 2.X.8.Z, SCS: Sine/Cosine without Reduction
The SCS instruction approximates the trigonometric sine and cosine of the
angle specified by the scalar operand and places the cosine in the x
component and the sine in the y component of the result vector. The z and
w components of the result vector are undefined. The angle is specified
in radians and must be in the range [-PI,PI].
tmp = ScalarLoad(op0);
result.x = ApproxCosine(tmp);
result.y = ApproxSine(tmp);
result.z = undefined;
result.w = undefined;
If the scalar operand is not in the range [-PI,PI], the result vector is
undefined.
SCS supports only floating-point data type modifiers.
Section 2.X.8.Z, SEQ: Set on Equal
The SEQ instruction performs a component-wise comparison of the two
operands. Each component of the result vector returns a TRUE value
(described below) if the corresponding component of the first operand is
equal to that of the second, and a FALSE value otherwise.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
result.x = (tmp0.x == tmp1.x) ? TRUE : FALSE;
result.y = (tmp0.y == tmp1.y) ? TRUE : FALSE;
result.z = (tmp0.z == tmp1.z) ? TRUE : FALSE;
result.w = (tmp0.w == tmp1.w) ? TRUE : FALSE;
SEQ supports all data type modifiers. For floating-point data types, the
TRUE value is 1.0 and the FALSE value is 0.0. For signed integer data
types, the TRUE value is -1 and the FALSE value is 0. For unsigned
integer data types, the TRUE value is the maximum integer value (all bits
are ones) and the FALSE value is zero.
Section 2.X.8.Z, SFL: Set on False
The SFL instruction is a degenerate case of the other "Set on"
instructions that sets all components of the result vector to a FALSE
value (described below).
result.x = FALSE;
result.y = FALSE;
result.z = FALSE;
result.w = FALSE;
SFL supports all data type modifiers. For floating-point data types, the
FALSE value is 0.0. For signed and unsigned integer data types, the FALSE
value is zero.
Section 2.X.8.Z, SGE: Set on Greater Than or Equal
The SGE instruction performs a component-wise comparison of the two
operands. Each component of the result vector returns a TRUE value
(described below) if the corresponding component of the first operand is
greater than or equal to that of the second, and a FALSE value otherwise.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
result.x = (tmp0.x >= tmp1.x) ? TRUE : FALSE;
result.y = (tmp0.y >= tmp1.y) ? TRUE : FALSE;
result.z = (tmp0.z >= tmp1.z) ? TRUE : FALSE;
result.w = (tmp0.w >= tmp1.w) ? TRUE : FALSE;
SGE supports all data type modifiers. For floating-point data types, the
TRUE value is 1.0 and the FALSE value is 0.0. For signed integer data
types, the TRUE value is -1 and the FALSE value is 0. For unsigned
integer data types, the TRUE value is the maximum integer value (all bits
are ones) and the FALSE value is zero.
Section 2.X.8.Z, SGT: Set on Greater Than
The SGT instruction performs a component-wise comparison of the two
operands. Each component of the result vector returns a TRUE value
(described below) if the corresponding component of the first operand is
greater than that of the second, and a FALSE value otherwise.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
result.x = (tmp0.x > tmp1.x) ? TRUE : FALSE;
result.y = (tmp0.y > tmp1.y) ? TRUE : FALSE;
result.z = (tmp0.z > tmp1.z) ? TRUE : FALSE;
result.w = (tmp0.w > tmp1.w) ? TRUE : FALSE;
SGT supports all data type modifiers. For floating-point data types, the
TRUE value is 1.0 and the FALSE value is 0.0. For signed integer data
types, the TRUE value is -1 and the FALSE value is 0. For unsigned
integer data types, the TRUE value is the maximum integer value (all bits
are ones) and the FALSE value is zero.
Section 2.X.8.Z, SHL: Shift Left
The SHL instruction performs a component-wise left shift of the bits of
the first operand by the value of the second scalar operand to produce a
result vector. The bits vacated during the shift operation are filled
with zeroes.
tmp0 = VectorLoad(op0);
tmp1 = ScalarLoad(op1);
result.x = tmp0.x << tmp1;
result.y = tmp0.y << tmp1;
result.z = tmp0.z << tmp1;
result.w = tmp0.w << tmp1;
The results of a shift operation ("<<") are undefined if the value of the
second operand is negative, or greater than or equal to the number of bits
in the first operand.
SHL supports both signed and unsigned integer data type modifiers. If no
modifier is provided, the operands and the result are treated as signed
integers.
Section 2.X.8.Z, SHR: Shift Right
The SHR instruction performs a component-wise right shift of the bits of
the first operand by the value of the second scalar operand to produce a
result vector. The bits vacated during shift operation are filled with
zeros if the operand is non-negative and ones otherwise.
tmp0 = VectorLoad(op0);
tmp1 = ScalarLoad(op1);
result.x = tmp0.x >> tmp1;
result.y = tmp0.y >> tmp1;
result.z = tmp0.z >> tmp1;
result.w = tmp0.w >> tmp1;
The results of a shift operation (">>") are undefined if the value of the
second operand is negative, or greater than or equal to the number of bits
in the first operand.
SHR supports both signed and unsigned integer data type modifiers. If no
modifiers are provided, the operands and the result are treated as signed
integers.
Section 2.X.8.Z, SIN: Sine with Reduction to [-PI,PI]
The SIN instruction approximates the trigonometric sine of the angle
specified by the scalar operand and replicates it to all four components
of the result vector. The angle is specified in radians and does not have
to be in the range [-PI,PI].
tmp = ScalarLoad(op0);
result.x = ApproxSine(tmp);
result.y = ApproxSine(tmp);
result.z = ApproxSine(tmp);
result.w = ApproxSine(tmp);
SIN supports only floating-point data type modifiers.
Section 2.X.8.Z, SLE: Set on Less Than or Equal
The SLE instruction performs a component-wise comparison of the two
operands. Each component of the result vector returns a TRUE value
(described below) if the corresponding component of the first operand is
less than or equal to that of the second, and a FALSE value otherwise.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
result.x = (tmp0.x <= tmp1.x) ? TRUE : FALSE;
result.y = (tmp0.y <= tmp1.y) ? TRUE : FALSE;
result.z = (tmp0.z <= tmp1.z) ? TRUE : FALSE;
result.w = (tmp0.w <= tmp1.w) ? TRUE : FALSE;
SLE supports all data type modifiers. For floating-point data types, the
TRUE value is 1.0 and the FALSE value is 0.0. For signed integer data
types, the TRUE value is -1 and the FALSE value is 0. For unsigned
integer data types, the TRUE value is the maximum integer value (all bits
are ones) and the FALSE value is zero.
Section 2.X.8.Z, SLT: Set on Less Than
The SLT instruction performs a component-wise comparison of the two
operands. Each component of the result vector returns a TRUE value
(described below) if the corresponding component of the first operand is
less than that of the second, and a FALSE value otherwise.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
result.x = (tmp0.x < tmp1.x) ? TRUE : FALSE;
result.y = (tmp0.y < tmp1.y) ? TRUE : FALSE;
result.z = (tmp0.z < tmp1.z) ? TRUE : FALSE;
result.w = (tmp0.w < tmp1.w) ? TRUE : FALSE;
SLT supports all data type modifiers. For floating-point data types, the
TRUE value is 1.0 and the FALSE value is 0.0. For signed integer d