blob: 204441683a3a4f193250aae8061f616f57ac7518 [file] [log] [blame]
Name
NV_gpu_program4
Name Strings
GL_NV_gpu_program4
Contact
Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com)
Status
Shipping for GeForce 8 Series (November 2006)
Version
Last Modified Date: 09/11/2014
NVIDIA Revision: 11
Number
322
Dependencies
This extension is written against to OpenGL 2.0 specification.
OpenGL 2.0 is not required, but we expect all implementations of this
extension will also support OpenGL 2.0.
This extension is also written against the ARB_vertex_program
specification, which provides the basic mechanisms for the assembly
programming model used by this extension.
This extension serves as the basis for the NV_fragment_program4,
NV_geometry_program4, and NV_vertex_program4, which all build on this
extension to support fragment, geometry, and vertex programs,
respectively. If "GL_NV_gpu_program4" is found in the extension string,
all of these extensions are supported.
NV_parameter_buffer_object affects the definition of this extension.
ARB_texture_rectangle trivially affects the definition of this extension.
EXT_gpu_program_parameters trivially affects the definition of this
extension.
EXT_texture_integer trivially affects the definition of this extension.
EXT_texture_array trivially affects the definition of this extension.
EXT_texture_buffer_object trivially affects the definition of this
extension.
NV_primitive_restart trivially affects the definition of this extension.
Overview
This specification documents the common instruction set and basic
functionality provided by NVIDIA's 4th generation of assembly instruction
sets supporting programmable graphics pipeline stages.
The instruction set builds upon the basic framework provided by the
ARB_vertex_program and ARB_fragment_program extensions to expose
considerably more capable hardware. In addition to new capabilities for
vertex and fragment programs, this extension provides a new program type
(geometry programs) further described in the NV_geometry_program4
specification.
NV_gpu_program4 provides a unified instruction set -- all instruction set
features are available for all program types, except for a small number of
features that make sense only for a specific program type. It provides
fully capable signed and unsigned integer data types, along with a set of
arithmetic, logical, and data type conversion instructions capable of
operating on integers. It also provides a uniform set of structured
branching constructs (if tests, loops, and subroutines) that fully support
run-time condition testing.
This extension provides several new texture mapping capabilities. Shadow
cube maps are supported, where cube map faces can encode depth values.
Texture lookup instructions can include an immediate texel offset, which
can assist in advanced filtering. New instructions are provided to fetch
a single texel by address in a texture map (TXF) and query the size of a
specified texture level (TXQ).
By and large, vertex and fragment programs written to ARB_vertex_program
and ARB_fragment_program can be ported directly by simply changing the
program header from "!!ARBvp1.0" or "!!ARBfp1.0" to "!!NVvp4.0" or
"!!NVfp4.0", and then modifying the code to take advantage of the expanded
feature set. There are a small number of areas where this extension is
not a functional superset of previous vertex program extensions, which are
documented in this specification.
New Procedures and Functions
void ProgramLocalParameterI4iNV(enum target, uint index,
int x, int y, int z, int w);
void ProgramLocalParameterI4ivNV(enum target, uint index,
const int *params);
void ProgramLocalParametersI4ivNV(enum target, uint index,
sizei count, const int *params);
void ProgramLocalParameterI4uiNV(enum target, uint index,
uint x, uint y, uint z, uint w);
void ProgramLocalParameterI4uivNV(enum target, uint index,
const uint *params);
void ProgramLocalParametersI4uivNV(enum target, uint index,
sizei count, const uint *params);
void ProgramEnvParameterI4iNV(enum target, uint index,
int x, int y, int z, int w);
void ProgramEnvParameterI4ivNV(enum target, uint index,
const int *params);
void ProgramEnvParametersI4ivNV(enum target, uint index,
sizei count, const int *params);
void ProgramEnvParameterI4uiNV(enum target, uint index,
uint x, uint y, uint z, uint w);
void ProgramEnvParameterI4uivNV(enum target, uint index,
const uint *params);
void ProgramEnvParametersI4uivNV(enum target, uint index,
sizei count, const uint *params);
void GetProgramLocalParameterIivNV(enum target, uint index,
int *params);
void GetProgramLocalParameterIuivNV(enum target, uint index,
uint *params);
void GetProgramEnvParameterIivNV(enum target, uint index,
int *params);
void GetProgramEnvParameterIuivNV(enum target, uint index,
uint *params);
New Tokens
Accepted by the <pname> parameter of GetBooleanv, GetIntegerv,
GetFloatv, and GetDoublev:
MIN_PROGRAM_TEXEL_OFFSET_EXT 0x8904
MAX_PROGRAM_TEXEL_OFFSET_EXT 0x8905
(note: these tokens are shared with the EXT_gpu_shader4 extension.)
Accepted by the <pname> parameter of GetProgramivARB:
PROGRAM_ATTRIB_COMPONENTS_NV 0x8906
PROGRAM_RESULT_COMPONENTS_NV 0x8907
MAX_PROGRAM_ATTRIB_COMPONENTS_NV 0x8908
MAX_PROGRAM_RESULT_COMPONENTS_NV 0x8909
MAX_PROGRAM_GENERIC_ATTRIBS_NV 0x8DA5
MAX_PROGRAM_GENERIC_RESULTS_NV 0x8DA6
Additions to Chapter 2 of the OpenGL 1.5 Specification (OpenGL Operation)
(Modify "Section 2.14.1" of the ARB_vertex_program specification,
describing program parameters.)
Each program object has an associated array of program local parameters.
Program local parameters are four-component vectors whose components can
hold floating-point, signed integer, or unsigned integer values. The data
type of each local parameter is established when the parameter's values
are assigned. If a program attempts to read a local parameter using a
data type other than the one used when the parameter is set, the values
returned are undefined. ... The commands
void ProgramLocalParameter4fARB(enum target, uint index,
float x, float y, float z, float w);
void ProgramLocalParameter4fvARB(enum target, uint index,
const float *params);
void ProgramLocalParameter4dARB(enum target, uint index,
double x, double y, double z, double w);
void ProgramLocalParameter4dvARB(enum target, uint index,
const double *params);
void ProgramLocalParameterI4iNV(enum target, uint index,
int x, int y, int z, int w);
void ProgramLocalParameterI4ivNV(enum target, uint index,
const int *params);
void ProgramLocalParameterI4uiNV(enum target, uint index,
uint x, uint y, uint z, uint w);
void ProgramLocalParameterI4uivNV(enum target, uint index,
const uint *params);
update the values of the program local parameter numbered <index>
belonging to the program object currently bound to <target>. For the
non-vector versions of these commands, the four components of the
parameter are updated with the values of <x>, <y>, <z>, and <w>,
respectively. For the vector versions, the components of the parameter
are updated with the array of four values pointed to by <params>. The
error INVALID_VALUE is generated if <index> is greater than or equal to
the number of program local parameters supported by <target>.
The commands
void ProgramLocalParameters4fvNV(enum target, uint index,
sizei count, const float *params);
void ProgramLocalParametersI4ivNV(enum target, uint index,
sizei count, const int *params);
void ProgramLocalParametersI4uivNV(enum target, uint index,
sizei count, const uint *params);
update the values of the program local parameters numbered <index> through
<index> + <count> - 1 with the array of 4 * <count> values pointed to by
<params>. The error INVALID_VALUE is generated if the sum of <index> and
<count> is greater than the number of program local parameters supported
by <target>.
When a program local parameter is updated, the data type of its components
is assigned according to the data type of the provided values. If values
provided are of type "float" or "double", the components of the parameter
are floating-point. If the values provided are of type "int", the
components of the parameter are signed integers. If the values provided
are of type "uint", the components of the parameter are unsigned integers.
Additionally, each program target has an associated array of program
environment parameters. Unlike program local parameters, program
environment parameters are shared by all program objects of a given
target. Program environment parameters are four-component vectors whose
components can hold floating-point, signed integer, or unsigned integer
values. The data type of each environment parameter is established when
the parameter's values are assigned. If a program attempts to read an
environment parameter using a data type other than the one used when the
parameter is set, the values returned are undefined. ... The commands
void ProgramEnvParameter4fARB(enum target, uint index,
float x, float y, float z, float w);
void ProgramEnvParameter4fvARB(enum target, uint index,
const float *params);
void ProgramEnvParameter4dARB(enum target, uint index,
double x, double y, double z, double w);
void ProgramEnvParameter4dvARB(enum target, uint index,
const double *params);
void ProgramEnvParameterI4iNV(enum target, uint index,
int x, int y, int z, int w);
void ProgramEnvParameterI4ivNV(enum target, uint index,
const int *params);
void ProgramEnvParameterI4uiNV(enum target, uint index,
uint x, uint y, uint z, uint w);
void ProgramEnvParameterI4uivNV(enum target, uint index,
const uint *params);
update the values of the program environment parameter numbered <index>
for the given program target <target>. For the non-vector versions of
these commands, the four components of the parameter are updated with the
values of <x>, <y>, <z>, and <w>, respectively. For the vector versions,
the four components of the parameter are updated with the array of four
values pointed to by <params>. The error INVALID_VALUE is generated if
<index> is greater than or equal to the number of program environment
parameters supported by <target>.
The commands
void ProgramEnvParameters4fvNV(enum target, uint index,
sizei count, const float *params);
void ProgramEnvParametersI4ivNV(enum target, uint index,
sizei count, const int *params);
void ProgramEnvParametersI4uivNV(enum target, uint index,
sizei count, const uint *params);
update the values of the program environment parameters numbered <index>
through <index> + <count> - 1 with the array of 4 * <count> values pointed
to by <params>. The error INVALID_VALUE is generated if the sum of
<index> and <count> is greater than the number of program local parameters
supported by <target>.
When a program environment parameter is updated, the data type of its
components is assigned according to the data type of the provided values.
If values provided are of type "float" or "double", the components of the
parameter are floating-point. If the values provided are of type "int",
the components of the parameter are signed integers. If the values
provided are of type "uint", the components of the parameter are unsigned
integers.
...
Insert New Section 2.X between Sections 2.Y and 2.Z:
Section 2.X, GPU Programs
The GL provides a number of different program targets that allow an
application to either replace certain fixed-function pipeline stages with
a fully programmable model or use a program to control aspects of the GL
pipeline that previously had only hard-wired behavior.
A common base instruction set is available for all program types,
providing both integer and floating-point operations. Structured
branching operations and subroutine calls are available. Texture
mapping (loading data from external images) is supported for all
program types. The main differences between the different program
types are the set of available inputs and outputs, which are program type-
specific, and a few instructions that are meaningful for only a subset
of program types.
Section 2.X.2, Program Grammar
GPU program strings are specified as an array of ASCII characters
containing the program text. When a GPU program is loaded by a call to
ProgramStringARB, the program string is parsed into a set of tokens
possibly separated by whitespace. Spaces, tabs, newlines, carriage
returns, and comments are considered whitespace. Comments begin with the
character "#" and are terminated by a newline, a carriage return, or the
end of the program array.
The Backus-Naur Form (BNF) grammar below specifies the syntactically valid
sequences for GPU programs. The set of valid tokens can be inferred
from the grammar. A line containing "/* empty */" represents an empty
string and is used to indicate optional rules. A program is invalid if it
contains any tokens or characters not defined in this specification.
Note that this extension is not a standalone extension and a small number
of grammar rules are left to be defined in the extensions defining the
specific vertex, fragment, and geometry program types.
<program> ::= <optionSequence> <declSequence>
<statementSequence> "END"
<optionSequence> ::= <option> <optionSequence>
| /* empty */
<option> ::= "OPTION" <identifier> ";"
<declSequence> ::= /* empty */
<statementSequence> ::= <statement> <statementSequence>
| /* empty */
<statement> ::= <instruction> ";"
| <namingStatement> ";"
| <instLabel> ":"
<instruction> ::= <ALUInstruction>
| <TexInstruction>
| <FlowInstruction>
<ALUInstruction> ::= <VECTORop_instruction>
| <SCALARop_instruction>
| <BINSCop_instruction>
| <BINop_instruction>
| <VECSCAop_instruction>
| <TRIop_instruction>
| <SWZop_instruction>
<TexInstruction> ::= <TEXop_instruction>
| <TXDop_instruction>
<FlowInstruction> ::= <BRAop_instruction>
| <FLOWCCop_instruction>
| <IFop_instruction>
| <REPop_instruction>
| <ENDFLOWop_instruction>
<VECTORop_instruction> ::= <VECTORop> <opModifiers> <instResult> ","
<instOperandV>
<VECTORop> ::= "ABS"
| "CEIL"
| "FLR"
| "FRC"
| "I2F"
| "LIT"
| "MOV"
| "NOT"
| "NRM"
| "PK2H"
| "PK2US"
| "PK4B"
| "PK4UB"
| "ROUND"
| "SSG"
| "TRUNC"
<SCALARop_instruction> ::= <SCALARop> <opModifiers> <instResult> ","
<instOperandS>
<SCALARop> ::= "COS"
| "EX2"
| "LG2"
| "RCC"
| "RCP"
| "RSQ"
| "SCS"
| "SIN"
| "UP2H"
| "UP2US"
| "UP4B"
| "UP4UB"
<BINSCop_instruction> ::= <BINSCop> <opModifiers> <instResult> ","
<instOperandS> "," <instOperandS>
<BINSCop> ::= "POW"
<VECSCAop_instruction> ::= <VECSCAop> <opModifiers> <instResult> ","
<instOperandV> "," <instOperandS>
<VECSCAop> ::= "DIV"
| "SHL"
| "SHR"
| "MOD"
<BINop_instruction> ::= <BINop> <opModifiers> <instResult> ","
<instOperandV> "," <instOperandV>
<BINop> ::= "ADD"
| "AND"
| "DP3"
| "DP4"
| "DPH"
| "DST"
| "MAX"
| "MIN"
| "MUL"
| "OR"
| "RFL"
| "SEQ"
| "SFL"
| "SGE"
| "SGT"
| "SLE"
| "SLT"
| "SNE"
| "STR"
| "SUB"
| "XPD"
| "DP2"
| "XOR"
<TRIop_instruction> ::= <TRIop> <opModifiers> <instResult> ","
<instOperandV> "," <instOperandV> ","
<instOperandV>
<TRIop> ::= "CMP"
| "DP2A"
| "LRP"
| "MAD"
| "SAD"
| "X2D"
<SWZop_instruction> ::= <SWZop> <opModifiers> <instResult> ","
<instOperandVNS> "," <extendedSwizzle>
<SWZop> ::= "SWZ"
<TEXop_instruction> ::= <TEXop> <opModifiers> <instResult> ","
<instOperandV> "," <texAccess>
<TEXop> ::= "TEX"
| "TXB"
| "TXF"
| "TXL"
| "TXP"
| "TXQ"
<TXDop_instruction> ::= <TXDop> <opModifiers> <instResult> ","
<instOperandV> "," <instOperandV> ","
<instOperandV> "," <texAccess>
<TXDop> ::= "TXD"
<BRAop_instruction> ::= <BRAop> <opModifiers> <instTarget>
<optBranchCond>
<BRAop> ::= "CAL"
<FLOWCCop_instruction> ::= <FLOWCCop> <opModifiers> <optBranchCond>
<FLOWCCop> ::= "RET"
| "BRK"
| "CONT"
<IFop_instruction> ::= <IFop> <opModifiers> <ccTest>
<IFop> ::= "IF"
<REPop_instruction> ::= <REPop> <opModifiers> <instOperandV>
| <REPop> <opModifiers>
<REPop> ::= "REP"
<ENDFLOWop_instruction> ::= <ENDFLOWop> <opModifiers>
<ENDFLOWop> ::= "ELSE"
| "ENDIF"
| "ENDREP"
<opModifiers> ::= <opModifierItem> <opModifiers>
| /* empty */
<opModifierItem> ::= "." <opModifier>
<opModifier> ::= "F"
| "U"
| "S"
| "CC"
| "CC0"
| "CC1"
| "SAT"
| "SSAT"
| "NTC"
| "S24"
| "U24"
| "HI"
<texAccess> ::= <texImageUnit> "," <texTarget>
| <texImageUnit> "," <texTarget> "," <texOffset>
<texImageUnit> ::= "texture" <optArrayMemAbs>
<texTarget> ::= "1D"
| "2D"
| "3D"
| "CUBE"
| "RECT"
| "SHADOW1D"
| "SHADOW2D"
| "SHADOWRECT"
| "ARRAY1D"
| "ARRAY2D"
| "SHADOWCUBE"
| "SHADOWARRAY1D"
| "SHADOWARRAY2D"
<texOffset> ::= "(" <texOffsetComp> ")"
| "(" <texOffsetComp> "," <texOffsetComp> ")"
| "(" <texOffsetComp> "," <texOffsetComp> ","
<texOffsetComp> ")"
<texOffsetComp> ::= <optSign> <int>
<optBranchCond> ::= /* empty */
| <ccMask>
<instOperandV> ::= <instOperandAbsV>
| <instOperandBaseV>
<instOperandAbsV> ::= <operandAbsNeg> "|" <instOperandBaseV> "|"
<instOperandBaseV> ::= <operandNeg> <attribUseV>
| <operandNeg> <tempUseV>
| <operandNeg> <paramUseV>
| <operandNeg> <bufferUseV>
<instOperandS> ::= <instOperandAbsS>
| <instOperandBaseS>
<instOperandAbsS> ::= <operandAbsNeg> "|" <instOperandBaseS> "|"
<instOperandBaseS> ::= <operandNeg> <attribUseS>
| <operandNeg> <tempUseS>
| <operandNeg> <paramUseS>
| <operandNeg> <bufferUseS>
<instOperandVNS> ::= <attribUseVNS>
| <tempUseVNS>
| <paramUseVNS>
| <bufferUseVNS>
<operandAbsNeg> ::= <optSign>
<operandNeg> ::= <optSign>
<instResult> ::= <instResultCC>
| <instResultBase>
<instResultCC> ::= <instResultBase> <ccMask>
<instResultBase> ::= <tempUseW>
| <resultUseW>
<namingStatement> ::= <varMods> <ATTRIB_statement>
| <varMods> <PARAM_statement>
| <varMods> <TEMP_statement>
| <varMods> <OUTPUT_statement>
| <varMods> <BUFFER_statement>
| <ALIAS_statement>
<ATTRIB_statement> ::= "ATTRIB" <establishName> "=" <attribUseD>
<PARAM_statement> ::= <PARAM_singleStmt>
| <PARAM_multipleStmt>
<PARAM_singleStmt> ::= "PARAM" <establishName> <paramSingleInit>
<PARAM_multipleStmt> ::= "PARAM" <establishName> <optArraySize>
<paramMultipleInit>
<paramSingleInit> ::= "=" <paramUseDB>
<paramMultipleInit> ::= "=" "{" <paramMultInitList> "}"
<paramMultInitList> ::= <paramUseDM>
| <paramUseDM> "," <paramMultInitList>
<TEMP_statement> ::= "TEMP" <varNameList>
<OUTPUT_statement> ::= "OUTPUT" <establishName> "=" <resultUseD>
<varMods> ::= <varModifier> <varMods>
| /* empty */
<varModifier> ::= "SHORT"
| "LONG"
| "INT"
| "UINT"
| "FLOAT"
<ALIAS_statement> ::= "ALIAS" <establishName> "=" <establishedName>
<BUFFER_statement> ::= <bufferDeclType> <establishName> "="
<bufferSingleInit>
| <bufferDeclType> <establishName>
<optArraySize> "=" <bufferMultInit>
<bufferDeclType> ::= "BUFFER"
| "BUFFER4"
<bufferSingleInit> ::= "=" <bufferUseDB>
<bufferMultInit> ::= "=" "{" <bufferMultInitList> "}"
<bufferMultInitList> ::= <bufferUseDM>
| <bufferUseDM> "," <bufferMultInitList>
<varNameList> ::= <establishName>
| <establishName> "," <varNameList>
<attribUseV> ::= <attribBasic> <swizzleSuffix>
| <attribVarName> <swizzleSuffix>
| <attribVarName> <arrayMem> <swizzleSuffix>
| <attribColor> <swizzleSuffix>
| <attribColor> "." <colorType> <swizzleSuffix>
<attribUseS> ::= <attribBasic> <scalarSuffix>
| <attribVarName> <scalarSuffix>
| <attribVarName> <arrayMem> <scalarSuffix>
| <attribColor> <scalarSuffix>
| <attribColor> "." <colorType> <scalarSuffix>
<attribUseVNS> ::= <attribBasic>
| <attribVarName>
| <attribVarName> <arrayMem>
| <attribColor>
| <attribColor> "." <colorType>
<attribUseD> ::= <attribBasic>
| <attribColor>
| <attribColor> "." <colorType>
| <attribMulti>
<paramUseV> ::= <paramVarName> <optArrayMem> <swizzleSuffix>
| <stateSingleItem> <swizzleSuffix>
| <programSingleItem> <swizzleSuffix>
| <constantVector> <swizzleSuffix>
| <constantScalar>
<paramUseS> ::= <paramVarName> <optArrayMem> <scalarSuffix>
| <stateSingleItem> <scalarSuffix>
| <programSingleItem> <scalarSuffix>
| <constantVector> <scalarSuffix>
| <constantScalar>
<paramUseVNS> ::= <paramVarName> <optArrayMem>
| <stateSingleItem>
| <programSingleItem>
| <constantVector>
| <constantScalar>
<paramUseDB> ::= <stateSingleItem>
| <programSingleItem>
| <constantVector>
| <signedConstantScalar>
<paramUseDM> ::= <stateMultipleItem>
| <programMultipleItem>
| <constantVector>
| <signedConstantScalar>
<stateMultipleItem> ::= <stateSingleItem>
| "state" "." <stateMatrixRows>
<stateSingleItem> ::= "state" "." <stateMaterialItem>
| "state" "." <stateLightItem>
| "state" "." <stateLightModelItem>
| "state" "." <stateLightProdItem>
| "state" "." <stateFogItem>
| "state" "." <stateMatrixRow>
| "state" "." <stateTexGenItem>
| "state" "." <stateClipPlaneItem>
| "state" "." <statePointItem>
| "state" "." <stateTexEnvItem>
| "state" "." <stateDepthItem>
<stateMaterialItem> ::= "material" "." <stateMatProperty>
| "material" "." <faceType> "."
<stateMatProperty>
<stateMatProperty> ::= "ambient"
| "diffuse"
| "specular"
| "emission"
| "shininess"
<stateLightItem> ::= "light" <arrayMemAbs> "." <stateLightProperty>
<stateLightProperty> ::= "ambient"
| "diffuse"
| "specular"
| "position"
| "attenuation"
| "spot" "." <stateSpotProperty>
| "half"
<stateSpotProperty> ::= "direction"
<stateLightModelItem> ::= "lightmodel" "." <stateLModProperty>
<stateLModProperty> ::= "ambient"
| "scenecolor"
| <faceType> "." "scenecolor"
<stateLightProdItem> ::= "lightprod" <arrayMemAbs> "."
<stateLProdProperty>
| "lightprod" <arrayMemAbs> "." <faceType> "."
<stateLProdProperty>
<stateLProdProperty> ::= "ambient"
| "diffuse"
| "specular"
<stateFogItem> ::= "fog" "." <stateFogProperty>
<stateFogProperty> ::= "color"
| "params"
<stateMatrixRows> ::= <stateMatrixItem>
| <stateMatrixItem> "." <stateMatModifier>
| <stateMatrixItem> "." "row" <arrayRange>
| <stateMatrixItem> "." <stateMatModifier> "."
"row" <arrayRange>
<stateMatrixRow> ::= <stateMatrixItem> "." "row" <arrayMemAbs>
| <stateMatrixItem> "." <stateMatModifier> "."
"row" <arrayMemAbs>
<stateMatrixItem> ::= "matrix" "." <stateMatrixName>
<stateMatModifier> ::= "inverse"
| "transpose"
| "invtrans"
<stateMatrixName> ::= "modelview" <optArrayMemAbs>
| "projection"
| "mvp"
| "texture" <optArrayMemAbs>
| "program" <arrayMemAbs>
<stateTexGenItem> ::= "texgen" <optArrayMemAbs> "."
<stateTexGenType> "." <stateTexGenCoord>
<stateTexGenType> ::= "eye"
| "object"
<stateTexGenCoord> ::= "s"
| "t"
| "r"
| "q"
<stateClipPlaneItem> ::= "clip" <arrayMemAbs> "." "plane"
<statePointItem> ::= "point" "." <statePointProperty>
<statePointProperty> ::= "size"
| "attenuation"
<stateTexEnvItem> ::= "texenv" <optArrayMemAbs> "."
<stateTexEnvProperty>
<stateTexEnvProperty> ::= "color"
<stateDepthItem> ::= "depth" "." <stateDepthProperty>
<stateDepthProperty> ::= "range"
<programSingleItem> ::= <progEnvParam>
| <progLocalParam>
<programMultipleItem> ::= <progEnvParams>
| <progLocalParams>
<progEnvParams> ::= "program" "." "env" <arrayMemAbs>
| "program" "." "env" <arrayRange>
<progEnvParam> ::= "program" "." "env" <arrayMemAbs>
<progLocalParams> ::= "program" "." "local" <arrayMemAbs>
| "program" "." "local" <arrayRange>
<progLocalParam> ::= "program" "." "local" <arrayMemAbs>
<constantVector> ::= "{" <constantVectorList> "}"
<constantVectorList> ::= <signedConstantScalar>
| <signedConstantScalar> ","
<signedConstantScalar>
| <signedConstantScalar> ","
<signedConstantScalar> ","
<signedConstantScalar>
| <signedConstantScalar> ","
<signedConstantScalar> ","
<signedConstantScalar> ","
<signedConstantScalar>
<signedConstantScalar> ::= <optSign> <constantScalar>
<constantScalar> ::= <floatConstant>
| <intConstant>
<floatConstant> ::= <float>
<intConstant> ::= <int>
<tempUseV> ::= <tempVarName> <swizzleSuffix>
<tempUseS> ::= <tempVarName> <scalarSuffix>
<tempUseVNS> ::= <tempVarName>
<tempUseW> ::= <tempVarName> <optWriteMask>
<resultUseW> ::= <resultBasic> <optWriteMask>
| <resultVarName> <optWriteMask>
<resultUseD> ::= <resultBasic>
<bufferUseV> ::= <bufferVarName> <optArrayMem> <swizzleSuffix>
<bufferUseS> ::= <bufferVarName> <optArrayMem> <scalarSuffix>
<bufferUseVNS> ::= <bufferVarName> <optArrayMem>
<bufferUseDB> ::= <bufferBinding> <arrayMemAbs>
<bufferUseDM> ::= <bufferBinding> <arrayMemAbs>
| <bufferBinding> <arrayRange>
| <bufferBinding>
<bufferBinding> ::= "program" "." "buffer" <arrayMemAbs>
<optArraySize> ::= "[" "]"
| "[" <int> "]"
<optArrayMem> ::= /* empty */
| <arrayMem>
<arrayMem> ::= <arrayMemAbs>
| <arrayMemRel>
<optArrayMemAbs> ::= /* empty */
| <arrayMemAbs>
<arrayMemAbs> ::= "[" <int> "]"
<arrayMemRel> ::= "[" <arrayMemReg> <arrayMemOffset> "]"
<arrayMemReg> ::= <addrUseS>
<arrayMemOffset> ::= /* empty */
| "+" <int>
| "-" <int>
<arrayRange> ::= "[" <int> ".." <int> "]"
<addrUseS> ::= <addrVarName> <scalarSuffix>
<ccMask> ::= "(" <ccTest> ")"
<ccTest> ::= <ccMaskRule> <swizzleSuffix>
<ccMaskRule> ::= "EQ"
| "GE"
| "GT"
| "LE"
| "LT"
| "NE"
| "TR"
| "FL"
| "EQ0"
| "GE0"
| "GT0"
| "LE0"
| "LT0"
| "NE0"
| "TR0"
| "FL0"
| "EQ1"
| "GE1"
| "GT1"
| "LE1"
| "LT1"
| "NE1"
| "TR1"
| "FL1"
| "NAN"
| "NAN0"
| "NAN1"
| "LEG"
| "LEG0"
| "LEG1"
| "CF"
| "CF0"
| "CF1"
| "NCF"
| "NCF0"
| "NCF1"
| "OF"
| "OF0"
| "OF1"
| "NOF"
| "NOF0"
| "NOF1"
| "AB"
| "AB0"
| "AB1"
| "BLE"
| "BLE0"
| "BLE1"
| "SF"
| "SF0"
| "SF1"
| "NSF"
| "NSF0"
| "NSF1"
<optWriteMask> ::= /* empty */
| <xyzwMask>
| <rgbaMask>
<xyzwMask> ::= "." "x"
| "." "y"
| "." "xy"
| "." "z"
| "." "xz"
| "." "yz"
| "." "xyz"
| "." "w"
| "." "xw"
| "." "yw"
| "." "xyw"
| "." "zw"
| "." "xzw"
| "." "yzw"
| "." "xyzw"
<rgbaMask> ::= "." "r"
| "." "g"
| "." "rg"
| "." "b"
| "." "rb"
| "." "gb"
| "." "rgb"
| "." "a"
| "." "ra"
| "." "ga"
| "." "rga"
| "." "ba"
| "." "rba"
| "." "gba"
| "." "rgba"
<swizzleSuffix> ::= /* empty */
| "." <component>
| "." <xyzwSwizzle>
| "." <rgbaSwizzle>
<extendedSwizzle> ::= <extSwizComp> "," <extSwizComp> ","
<extSwizComp> "," <extSwizComp>
<extSwizComp> ::= <optSign> <xyzwExtSwizSel>
| <optSign> <rgbaExtSwizSel>
<xyzwExtSwizSel> ::= "0"
| "1"
| <xyzwComponent>
<rgbaExtSwizSel> ::= <rgbaComponent>
<scalarSuffix> ::= "." <component>
<component> ::= <xyzwComponent>
| <rgbaComponent>
<xyzwComponent> ::= "x"
| "y"
| "z"
| "w"
<rgbaComponent> ::= "r"
| "g"
| "b"
| "a"
<optSign> ::= /* empty */
| "-"
| "+"
<faceType> ::= "front"
| "back"
<colorType> ::= "primary"
| "secondary"
<instLabel> ::= <identifier>
<instTarget> ::= <identifier>
<establishedName> ::= <identifier>
<establishName> ::= <identifier>
The <int> rule matches an integer constant. The integer consists of a
sequence of one or more digits ("0" through "9"), or a sequence in
hexadecimal form beginning with "0x" followed by a sequence of one or more
hexadecimal digits ("0" through "9", "a" through "f", "A" through "F").
The <float> rule matches a floating-point constant consisting of an
integer part, a decimal point, a fraction part, an "e" or "E", and an
optionally signed integer exponent. The integer and fraction parts both
consist of a sequence of one or more digits ("0" through "9"). Either the
integer part or the fraction parts (not both) may be missing; either the
decimal point or the "e" (or "E") and the exponent (not both) may be
missing. Most grammar rules that allow floating-point values also allow
integers matching the <int> rule.
The <identifier> rule matches a sequence of one or more letters ("A"
through "Z", "a" through "z"), digits ("0" through "9), underscores ("_"),
or dollar signs ("$"); the first character must not be a number. Upper
and lower case letters are considered different (names are
case-sensitive). The following strings are reserved keywords and may not
be used as identifiers: "fragment" (for fragment programs only), "vertex"
(for vertex and geometry programs), "primitive" (for fragment and geometry
programs), "program", "result", "state", and "texture".
The <tempVarName>, <paramVarName>, <attribVarName>, <resultVarName>, and
<bufferName> rules match identifiers that have been previously established
as names of temporary, program parameter, attribute, result, and program
parameter buffer variables, respectively.
The <xyzwSwizzle> and <rgbaSwizzle> rules match any 4-character strings
consisting only of the characters "x", "y", "z", and "w" (<xyzwSwizzle>)
or "r", "g", "b", "a" (<rgbaSwizzle>).
The error INVALID_OPERATION is generated if a program fails to load
because it is not syntactically correct or for one of the semantic
restrictions described in the following sections.
A successfully loaded program is parsed into a sequence of instructions.
Each instruction is identified by its tokenized name. The operation of
these instructions when executed is defined in section 2.X.4. A
successfully loaded program string replaces the program string previously
loaded into the specified program object. If the OUT_OF_MEMORY error is
generated by ProgramStringARB, no change is made to the previous contents
of the current program object.
Section 2.X.3, Program Variables
Programs may operate on a number of different variables during their
execution. The following sections define the different classes of
variables that can be declared and used by a program.
Some variable classes require variable bindings. Variable classes with
bindings refer to state that is either generated or consumed outside the
program. Examples of variable bindings include a vertex's normal, the
position of a vertex computed by a vertex program, an interpolated texture
coordinate, and the diffuse color of light 1. Variables that are used
only during program execution do not have bindings.
Variables may be declared explicitly according to the <namingStatement>
grammar rule. Explicit variable declarations allow a program to establish
a variable name that can be used to refer to a specified resource in
subsequent instructions. Variables may be declared anywhere in the
program string, but must be declared prior to use. A program will fail to
load if it declares the same variable name more than once, or if it refers
to a variable name that has not been previously declared in the program
string.
Variables may also be declared implicitly, simply by using a variable
binding as an operand in a program instruction. Such uses are considered
to automatically create a nameless variable using the specified binding.
Only variable from classes with bindings can be declared implicitly.
Section 2.X.3.1, Program Variable Types
Explicit variable declarations may include one or more modifiers that
specify additional information about the variable, such as the size and
data type of the components of the variable. Variable modifiers are
specified according to the <varModifier> grammar rule.
By default, variables are considered typeless. They can be used in
instructions that read or write the variable as floating-point values,
signed integers, or unsigned integers. If a variable is written using one
data type but then read using a different one, the results of the
operation are undefined. Variables with bindings are considered to be
read or written when their values are produced or consumed; the data type
used by the GL is specified in the description of each binding.
Explicitly declared variables may optionally have one data type modifier,
which can be used to detect data type mismatch errors. Type modifers of
"INT", "UINT", and "FLOAT" indicate that the components of the variable
are stored as signed integers, unsigned integers, or floating-point
values, respectively. A program will fail to load if it attempts to read
or write a variable using a data type other than the one indicated by the
data type modifier. Variables without a data type modifier can be read or
written using any data type.
Explicitly declared variables may optionally have one storage size
modifier. Variables decared as "SHORT" will be represented using at least
16 bits per component. "SHORT" floating-point values will have at least 5
bits of exponent and 10 bits of mantissa. Variables declared as "LONG"
will be represented with at least 32 bits per component. "LONG"
floating-point values will have at least 8 bits of exponent and 23 bits of
mantissa. If no size modifier is provided, the GL will automatically
select component sizes. Implementations are not required to support more
than one component size, so "SHORT", "LONG", and the default could all
refer to the same component size. The "LONG" modifier is supported only
for declarations of temporary variables ("TEMP"). The "SHORT" modifier is
supported only for declarations of temporary variables and result
variables ("OUTPUT").
Each variable declaration can include at most one data type and one
storage size modifier. A program will fail to load if it specifies
multiple data type or multiple storage size modifiers in a single variable
declaration.
(NOTE: Fragment programs also support the modifiers "FLAT", "CENTROID",
and "NOPERSPECTIVE", which control how per-fragment attribute values are
produced. These modifiers are described in detail in the
NV_fragment_program4 specification.)
Explicitly declared variables of all types may be declared as arrays. An
array variable has one or more members, numbered 0 through <n>-1, where
<n> is the number of entries in the array. The total number of entries in
the array can be declared using the <optArraySize> grammar rule. For
variable classes without bindings, an array size must be specified in the
program, and must be a positive integer. For variable classes with
bindings, a declared size is optional, and is taken from the number of
bindings assigned in the declaration if omitted. A program will fail to
load if the declared size of an array variable does not match the number
of assigned bindings.
When a variable is declared as an array, instructions that use the
variable must specify an array member to access according to the
<arrayMem> grammar rule. A program will fail to load if it contains an
instruction that accesses an array variable without specifying an array
member or an instruction that specifies an array member for a non-array
variable.
Section 2.X.3.2, Program Attribute Variables
Program attribute variables represent per-vertex or per-fragment inputs to
the program. All attribute variables have associated bindings, and are
read-only during program execution. Attribute variables may be declared
explicitly via the <ATTRIB_statement> grammar rule, or implicitly by using
an attribute binding in an instruction.
The set of available attribute bindings depends on the program type, and
is enumerated in the specifications for each program type.
The set of bindings allowed for attribute array variables is limited to
attribute state grouped in arrays (e.g., texture coordinates, generic
vertex attributes). Additionally, all bindings assigned to the array must
be of the same binding type and must increase consecutively. Examples of
valid and invalid binding lists include:
vertex.attrib[1], vertex.attrib[2] # valid, 2-entry array
vertex.texcoord[0..3] # valid, 4-entry array
vertex.attrib[1], vertex.attrib[3] # invalid, skipped attrib 2
vertex.attrib[2], vertex.attrib[1] # invalid, wrong order
vertex.attrib[1], vertex.texcoord[2] # invalid, different types
Additionally, attribute bindings may be used in no more than one array
variable accessed with relative addressing.
Implementations may have a limit on the total number of attribute binding
components used by each program target (MAX_PROGRAM_ATTRIB_COMPONENTS_NV).
Programs that use more attribute binding components than this limit will
fail to load. The method of counting used attribute binding components is
implementation-dependent, but must satisfy the following properties:
* If an attribute binding is not referenced in a program, or is
referenced only in declarations of attribute variables that are not
used, none of its components are counted.
* An attribute binding component may be counted as used only if there
exists an instruction operand where
- the component is enabled for read by the swizzle pattern (Section
2.X.4.2), and
- the attribute binding is
- referenced directly by the operand,
- bound to a declared variable referenced by the operand, or
- bound to a declared array variable where another binding in
the array satisfies one of the two previous conditions.
Implementations are not required to optimize out unused elements of an
attribute array or components that are used in only some elements of
an array. The last of these rules is intended to cover the case where
the same attribute binding is used in multiple variables.
For example, an operand whose swizzle pattern selects only the x
component may result in the x component of an attribute binding being
counted, but may never result in the counting of the y, z, or w
components of any attribute binding.
* Implementations are not required to determine that components read by
an instruction are actually unused due to:
- instruction write masks (for example, a component-wise ADD
operation that only writes the "x" component doesn't have to read
the "y", "z", and "w" components of its operands) or
- any other properties of the instruction (for example, the DP3
instruction computes a 3-component dot product doesn't have to
read the "w" component of its operands).
Section 2.X.3.3, Program Parameters
Program parameter variables are used as constants during program
execution. All program parameter variables have associated bindings and
are read-only during program execution. Program parameters retain their
values across program invocations, although their values may change
between invocations due to GL state changes. Program parameter variables
may be declared explicitly via the <PARAM_statement> grammar rule, or
implicitly by using a parameter binding in an instruction. Except where
otherwise specified, program parameter bindings always specify
floating-point values.
When declaring program parameter array variables, all bindings are
supported and can be assigned to array members in any order. The only
restriction is that no parameter binding may be used more than once in
array variables accessed using relative addressing. A program will fail
to load if any program parameter binding is used more than once in a
single array accessed using relative addressing or used at least once in
two or more arrays accessed using relative addressing.
Constant Bindings
If a program parameter binding matches the <constantScalar> or
<signedConstantScalar> grammar rules, the corresponding program parameter
variable is bound to the vector (X,X,X,X), where X is the value of the
specified constant.
If a program parameter binding matches <constantVector>, the corresponding
program parameter variable is bound to the vector (X,Y,Z,W), where X, Y,
Z, and W are the values corresponding to the first, second, third, and
fourth match of <signedConstantScalar>. If fewer than four constants are
specified, Y, Z, and W assume the values 0, 0, and 1, if their respective
constants are not specified.
Constant bindings can be interpreted as having signed integer, unsigned
integer, or floating-point values, depending on how they are used in the
program text. For constants in variable declarations, the components of
the constant are interpreted according to the variable's component data
type modifier. If no data type modifier is specified in a declaration,
constants are interpreted as floating-point values. For constant bindings
used directly in an instruction, the components of the constant are
interpreted according to the required data type of the operand. A program
will fail to load if it specifies a floating-point constant value
(matching the <floatConstant> grammar rule) that should be interpreted as
a signed or unsigned integer, or a negative integer constant value that
should be interpreted as an unsigned integer.
If the value used to specify a floating-point constant can not be exactly
represented, the nearest floating-point value will be used. If the value
used to specify an integer constant is too large to be represented, the
program will fail to load.
Program Environment/Local Parameter Bindings
Binding Components Underlying State
------------------------- ---------- -------------------------------
program.env[a] (x,y,z,w) program environment parameter a
program.local[a] (x,y,z,w) program local parameter a
program.env[a..b] (x,y,z,w) program environment parameters
a through b
program.local[a..b] (x,y,z,w) program local parameters
a through b
Table X.1: Program Environment/Local Parameter Bindings. <a> and <b>
indicate parameter numbers, where <a> must be less than or equal to <b>.
If a program parameter binding matches "program.env[a]" or
"program.local[a]", the four components of the program parameter variable
are filled with the four components of program environment parameter <a>
or program local parameter <a> respectively.
Additionally, for program parameter array bindings, "program.env[a..b]"
and "program.local[a..b]" are equivalent to specifying program environment
or local parameters <a> through <b> in order, respectively. A program
using any of these bindings will fail to load if <a> is greater than <b>.
Program environment and local parameters are typeless, and may be
specified as signed integer, unsigned integer, or floating-point
variables. If a program environment parameter is read using a data type
other than the one used to specify it, an undefined value is returned.
Material Property Bindings
Binding Components Underlying State
----------------------------- ---------- ----------------------------
state.material.ambient (r,g,b,a) front ambient material color
state.material.diffuse (r,g,b,a) front diffuse material color
state.material.specular (r,g,b,a) front specular material color
state.material.emission (r,g,b,a) front emissive material color
state.material.shininess (s,0,0,1) front material shininess
state.material.front.ambient (r,g,b,a) front ambient material color
state.material.front.diffuse (r,g,b,a) front diffuse material color
state.material.front.specular (r,g,b,a) front specular material color
state.material.front.emission (r,g,b,a) front emissive material color
state.material.front.shininess (s,0,0,1) front material shininess
state.material.back.ambient (r,g,b,a) back ambient material color
state.material.back.diffuse (r,g,b,a) back diffuse material color
state.material.back.specular (r,g,b,a) back specular material color
state.material.back.emission (r,g,b,a) back emissive material color
state.material.back.shininess (s,0,0,1) back material shininess
Table X.3: Material Property Bindings. If a material face is not
specified in the binding, the front property is used.
If a program parameter binding matches any of the material properties
listed in Table X.3, the program parameter variable is filled according to
the table. For ambient, diffuse, specular, or emissive colors, the "x",
"y", "z", and "w" components are filled with the "r", "g", "b", and "a"
components, respectively, of the corresponding material color. For
material shininess, the "x" component is filled with the material's
specular exponent, and the "y", "z", and "w" components are filled with
the floating-point constants 0, 0, and 1, respectively. Bindings
containing ".back" refer to the back material; all other bindings refer to
the front material.
Material properties can be changed inside a Begin/End pair, either
directly by calling Material, or indirectly through color material.
However, such property changes are not guaranteed to update program
parameter bindings until the following End command. Program parameter
variables bound to material properties changed inside a Begin/End pair are
undefined until the following End command.
Light Property Bindings
Binding Components Underlying State
----------------------------- ---------- ----------------------------
state.light[n].ambient (r,g,b,a) light n ambient color
state.light[n].diffuse (r,g,b,a) light n diffuse color
state.light[n].specular (r,g,b,a) light n specular color
state.light[n].position (x,y,z,w) light n position
state.light[n].attenuation (a,b,c,e) light n attenuation constants
and spot light exponent
state.light[n].spot.direction (x,y,z,c) light n spot direction and
cutoff angle cosine
state.light[n].half (x,y,z,1) light n infinite half-angle
state.lightmodel.ambient (r,g,b,a) light model ambient color
state.lightmodel.scenecolor (r,g,b,a) light model front scene color
state.lightmodel. (r,g,b,a) light model front scene color
front.scenecolor
state.lightmodel. (r,g,b,a) light model back scene color
back.scenecolor
state.lightprod[n].ambient (r,g,b,a) light n / front material
ambient color product
state.lightprod[n].diffuse (r,g,b,a) light n / front material
diffuse color product
state.lightprod[n].specular (r,g,b,a) light n / front material
specular color product
state.lightprod[n]. (r,g,b,a) light n / front material
front.ambient ambient color product
state.lightprod[n]. (r,g,b,a) light n / front material
front.diffuse diffuse color product
state.lightprod[n]. (r,g,b,a) light n / front material
front.specular specular color product
state.lightprod[n]. (r,g,b,a) light n / back material
back.ambient ambient color product
state.lightprod[n]. (r,g,b,a) light n / back material
back.diffuse diffuse color product
state.lightprod[n]. (r,g,b,a) light n / back material
back.specular specular color product
Table X.4: Light Property Bindings. <n> indicates a light number.
If a program parameter binding matches "state.light[n].ambient",
"state.light[n].diffuse", or "state.light[n].specular", the "x", "y", "z",
and "w" components of the program parameter variable are filled with the
"r", "g", "b", and "a" components, respectively, of the corresponding
light color.
If a program parameter binding matches "state.light[n].position", the "x",
"y", "z", and "w" components of the program parameter variable are filled
with the "x", "y", "z", and "w" components, respectively, of the light
position.
If a program parameter binding matches "state.light[n].attenuation", the
"x", "y", and "z" components of the program parameter variable are filled
with the constant, linear, and quadratic attenuation parameters of the
specified light, respectively (section 2.13.1). The "w" component of the
program parameter variable is filled with the spot light exponent of the
specified light.
If a program parameter binding matches "state.light[n].spot.direction",
the "x", "y", and "z" components of the program parameter variable are
filled with the "x", "y", and "z" components of the spot light direction
of the specified light, respectively (section 2.13.1). The "w" component
of the program parameter variable is filled with the cosine of the spot
light cutoff angle of the specified light.
If a program parameter binding matches "state.light[n].half", the "x",
"y", and "z" components of the program parameter variable are filled with
the x, y, and z components, respectively, of the normalized infinite
half-angle vector
h_inf = || P + (0, 0, 1) ||.
The "w" component is filled with 1.0. In the computation of h_inf, P
consists of the x, y, and z coordinates of the normalized vector from the
eye position P_e to the eye-space light position P_pli (section 2.13.1).
h_inf is defined to correspond to the normalized half-angle vector when
using an infinite light (w coordinate of the position is zero) and an
infinite viewer (v_bs is FALSE). For local lights or a local viewer,
h_inf is well-defined but does not match the normalized half-angle vector,
which will vary depending on the vertex position.
If a program parameter binding matches "state.lightmodel.ambient", the
"x", "y", "z", and "w" components of the program parameter variable are
filled with the "r", "g", "b", and "a" components of the light model
ambient color, respectively.
If a program parameter binding matches "state.lightmodel.scenecolor" or
"state.lightmodel.front.scenecolor", the "x", "y", and "z" components of
the program parameter variable are filled with the "r", "g", and "b"
components respectively of the "front scene color"
c_scene = a_cs * a_cm + e_cm,
where a_cs is the light model ambient color, a_cm is the front ambient
material color, and e_cm is the front emissive material color. The "w"
component of the program parameter variable is filled with the alpha
component of the front diffuse material color. If a program parameter
binding matches "state.lightmodel.back.scenecolor", a similar back scene
color, computed using back-facing material properties, is used. The front
and back scene colors match the values that would be assigned to vertices
using conventional lighting if all lights were disabled.
If a program parameter binding matches anything beginning with
"state.lightprod[n]", the "x", "y", and "z" components of the program
parameter variable are filled with the "r", "g", and "b" components,
respectively, of the corresponding light product. The three light product
components are the products of the corresponding color components of the
specified material property and the light color of the specified light
(see Table X.4). The "w" component of the program parameter variable is
filled with the alpha component of the specified material property.
Light products depend on material properties, which can be changed inside
a Begin/End pair. Such property changes are not guaranteed to take effect
until the following End command. Program parameter variables bound to
light products whose corresponding material property changes inside a
Begin/End pair are undefined until the following End command.
Texture Coordinate Generation Property Bindings
Binding Components Underlying State
------------------------- ---------- ----------------------------
state.texgen[n].eye.s (a,b,c,d) TexGen eye linear plane
coefficients, s coord, unit n
state.texgen[n].eye.t (a,b,c,d) TexGen eye linear plane
coefficients, t coord, unit n
state.texgen[n].eye.r (a,b,c,d) TexGen eye linear plane
coefficients, r coord, unit n
state.texgen[n].eye.q (a,b,c,d) TexGen eye linear plane
coefficients, q coord, unit n
state.texgen[n].object.s (a,b,c,d) TexGen object linear plane
coefficients, s coord, unit n
state.texgen[n].object.t (a,b,c,d) TexGen object linear plane
coefficients, t coord, unit n
state.texgen[n].object.r (a,b,c,d) TexGen object linear plane
coefficients, r coord, unit n
state.texgen[n].object.q (a,b,c,d) TexGen object linear plane
coefficients, q coord, unit n
Table X.5: Texture Coordinate Generation Property Bindings. "[n]" is
optional -- texture unit <n> is used if specified; texture unit 0 is
used otherwise.
If a program parameter binding matches a set of TexGen plane coefficients,
the "x", "y", "z", and "w" components of the program parameter variable
are filled with the coefficients p1, p2, p3, and p4, respectively, for
object linear coefficients, and the coefficents p1', p2', p3', and p4',
respectively, for eye linear coefficients (section 2.10.4).
Fog Property Bindings
Binding Components Underlying State
----------------------------- ---------- ----------------------------
state.fog.color (r,g,b,a) RGB fog color (section 3.10)
state.fog.params (d,s,e,r) fog density, linear start
and end, and 1/(end-start)
(section 3.10)
Table X.6: Fog Property Bindings
If a program parameter binding matches "state.fog.color", the "x", "y",
"z", and "w" components of the program parameter variable are filled with
the "r", "g", "b", and "a" components, respectively, of the fog color
(section 3.10).
If a program parameter binding matches "state.fog.params", the "x", "y",
and "z" components of the program parameter variable are filled with the
fog density, linear fog start, and linear fog end parameters (section
3.10), respectively. The "w" component is filled with 1/(end-start),
where end and start are the linear fog end and start parameters,
respectively.
Clip Plane Property Bindings
Binding Components Underlying State
----------------------------- ---------- ----------------------------
state.clip[n].plane (a,b,c,d) clip plane n coefficients
Table X.7: Clip Plane Property Bindings. <n> specifies the clip plane
number, and is required.
If a program parameter binding matches "state.clip[n].plane", the "x",
"y", "z", and "w" components of the program parameter variable are filled
with the coefficients p1', p2', p3', and p4', respectively, of clip plane
<n> (section 2.11).
Point Property Bindings
Binding Components Underlying State
----------------------------- ---------- ----------------------------
state.point.size (s,n,x,f) point size, min and max size
clamps, and fade threshold
(section 3.3)
state.point.attenuation (a,b,c,1) point size attenuation consts
Table X.8: Point Property Bindings
If a program parameter binding matches "state.point.size", the "x", "y",
"z", and "w" components of the program parameter variable are filled with
the point size, minimum point size, maximum point size, and fade
threshold, respectively (section 3.3).
If a program parameter binding matches "state.point.attenuation", the "x",
"y", and "z" components of the program parameter variable are filled with
the constant, linear, and quadratic point size attenuation parameters (a,
b, and c), respectively (section 3.3). The "w" component is filled with
1.0.
Texture Environment Property Bindings
Binding Components Underlying State
------------------------- ---------- ----------------------------
state.texenv[n].color (r,g,b,a) texture environment n color
Table X.9: Texture Environment Property Bindings. "[n]" is optional --
texture unit <n> is used if specified; texture unit 0 is used otherwise.
If a program parameter binding matches "state.texenv[n].color", the "x",
"y", "z", and "w" components of the program parameter variable are filled
with the "r", "g", "b", and "a" components, respectively, of the
corresponding texture environment color. Note that only "legacy" texture
units, as queried by MAX_TEXTURE_UNITS, include texture environment state.
Texture image units and texture coordinate sets do not have associated
texture environment state.
Depth Property Bindings
Binding Components Underlying State
--------------------------- ---------- ----------------------------
state.depth.range (n,f,d,1) Depth range near, far, and
(far-near) (section 2.10.1)
Table X.10: Depth Property Bindings
If a program parameter binding matches "state.depth.range", the "x" and
"y" components of the program parameter variable are filled with the
mappings of near and far clipping planes to window coordinates,
respectively. The "z" component is filled with the difference of the
mappings of near and far clipping planes, far minus near. The "w"
component is filled with 1.0.
Matrix Property Bindings
Binding Underlying State
------------------------------------ ---------------------------
* state.matrix.modelview[n] modelview matrix n
state.matrix.projection projection matrix
state.matrix.mvp modelview-projection matrix
* state.matrix.texture[n] texture matrix n
state.matrix.program[n] program matrix n
Table X.11: Base Matrix Property Bindings. The "[n]" syntax indicates
a specific matrix number. For modelview and texture matrices, a matrix
number is optional, and matrix zero will be used if the matrix number is
omitted. These base bindings may further be modified by a
inverse/transpose selector and a row selector.
If the beginning of a program parameter binding matches any of the matrix
binding names listed in Table X.11, the binding corresponds to a 4x4
matrix. If the parameter binding is followed by ".inverse", ".transpose",
or ".invtrans" (<stateMatModifier> grammar rule), the inverse, transpose,
or transpose of the inverse, respectively, of the matrix specified in
Table X.11 is selected. Otherwise, the matrix specified in Table X.11 is
selected. If the specified matrix is poorly-conditioned (singular or
nearly so), its inverse matrix is undefined. The binding name
"state.matrix.mvp" refers to the product of modelview matrix zero and the
projection matrix, defined as
MVP = P * M0,
where P is the projection matrix and M0 is modelview matrix zero.
If the selected matrix is followed by ".row[<a>]" (matching the
<stateMatrixRow> grammar rule), the "x", "y", "z", and "w" components of
the program parameter variable are filled with the four entries of row <a>
of the selected matrix. In the example,
PARAM m0 = state.matrix.modelview[1].row[0];
PARAM m1 = state.matrix.projection.transpose.row[3];
the variable "m0" is set to the first row (row 0) of modelview matrix 1
and "m1" is set to the last row (row 3) of the transpose of the projection
matrix.
For program parameter array bindings, multiple rows of the selected matrix
can be bound via the <stateMatrixRows> grammar rule. If the selected
matrix binding is followed by ".row[<a>..<b>]", the result is equivalent
to specifying matrix rows <a> through <b>, in order. A program will fail
to load if <a> is greater than <b>. If no row selection is specified
(<optMatrixRows> matches ""), matrix rows 0 through 3 are bound in order.
In the example,
PARAM m2[] = { state.matrix.program[0].row[1..2] };
PARAM m3[] = { state.matrix.program[0].transpose };
the array "m2" has two entries, containing rows 1 and 2 of program matrix
zero, and "m3" has four entries, containing all four rows of the transpose
of program matrix zero.
Section 2.X.3.4, Program Temporaries
Program temporary variables are used to hold temporary results during
program execution. Temporaries do not persist between program
invocations, and are undefined at the beginning of each program
invocation.
Temporary variables are declared explicitly using the <TEMP_statement>
grammar rule. Each such statement can declare one or more temporaries.
Temporaries can not be declared implicitly. Temporaries can be declared
using any component size ("SHORT" or "LONG") and type ("FLOAT" or "INT")
modifier.
Temporary variables may be declared as arrays. Temporary variables
declared as arrays may be stored in slower memory than those not declared
as arrays, and it is recommended to use non-array variables unless array
functionality is required.
Section 2.X.3.5, Program Results
Program result variables represent the per-vertex or per-fragment results
of the program. All result variables have associated bindings, are
write-only during program execution, and are undefined at the beginning of
each program invocation. Any vertex or fragment attributes corresponding
to unwritten result variables will be undefined in subsequent stages of
the pipeline. Result variables may be declared explicitly via the
<OUTPUT_statement> grammar rule, or implicitly by using a result binding
in an instruction.
The set of available result bindings depends on the program type, and is
enumerated in the specifications for each program type.
Result variables may generally be declared as arrays, but the set of
bindings allowed for arrays is limited to state grouped in arrays (e.g.,
texture coordinates, clip distances, colors). Additionally, all bindings
assigned to the array must be of the same binding type and must increase
consecutively. Examples of valid and invalid binding lists for vertex
programs include:
result.clip[1], result.clip[2] # valid, 2-entry array
result.texcoord[0..3] # valid, 4-entry array
result.texcoord[1], result.texcoord[3] # invalid, skipped texcoord 2
result.texcoord[2], result.texcoord[1] # invalid, wrong order
result.texcoord[1], result.clip[2] # invalid, different types
Additionally, result bindings may be used in no more than one array
addressed with relative addressing.
Implementations may have a limit on the total number of result binding
components used by each program target (MAX_PROGRAM_RESULT_COMPONENTS_NV).
Programs that require more result binding components than this limit will
fail to load. The method of counting used result binding components is
implementation-dependent, but must satisfy the following properties:
* If a result binding is not referenced in a program, or is referenced
only in declarations of result variables that are not used, none of
its components are counted.
* A result binding component may be counted as used only if there exists
an instruction operand where
- the component is enabled in the write mask (Section 2.X.4.3), and
- the result binding is either
- referenced directly by the operand,
- bound to a declared variable referenced by the operand, or
- bound to a declared array variable where another binding in
the array satisfies one of the two previous conditions.
Implementations are not required to optimize out unused elements of an
result array or components that are used in only some elements of an
array. The last of these rules is intended to cover the case where
the same result binding is used in multiple variables.
For example, an instruction whose write mask selects only the x
component may result in the x component of a result binding being
counted, but may never result in the counting of the y, z, or w
components of any result binding.
Section 2.X.3.6, Program Parameter Buffers
Program parameter buffers are arrays consisting of single-component
typeless values or four-component typeless vectors stored in a buffer
object. The GL provides an implementation-dependent number of buffer
object binding points for each program target, to which buffer objects can
be attached. Program parameter buffer variables can be changed either by
updating the contents of bound buffer objects, or simply by changing the
buffer object attached to a binding point.
Program parameter buffer variables are used as constants during program
execution. All program parameter buffer variables have an associated
binding and are read-only during program execution. Program parameter
buffers retain their values across program invocations, although their
values may change as buffer object bindings or contents change. Program
parameter buffer variables must be declared explicitly via the
<BUFFER_statement> grammar rule. Program parameter buffer bindings can
not be used directly in executable instructions.
Program parameter buffer variables are treated as an array of
single-component values if the <bufferDeclType> grammar rule matches
"BUFFER" or as an array of four-component vectors if it matches "BUFFER4".
A program will fail to load if a variable declared as "BUFFER" and another
variable declared as "BUFFER4" use the same buffer binding point.
Program parameter buffer variables may be declared as arrays, but all
bindings assigned to the array must use the same binding point and must
increase consecutively.
Binding Components Underlying State
----------------------------- ---------- -----------------------------
program.buffer[a][b] (x,x,x,x) program parameter buffer a,
element b
program.buffer[a][b..c] (x,x,x,x) program parameter buffer a,
elements b through c
program.buffer[a] (x,x,x,x) program parameter buffer a,
all elements
Table X.12: Program Parameter Buffer Bindings. <a> indicates a buffer
number, <b> and <c> indicate individual elements.
If a program parameter buffer binding matches "program.buffer[a][b]", the
program parameter variable are filled with element <b> of the buffer
object bound to binding point <a>. Each element of the bound buffer
object is treated a one or four words of data that can hold integer or
floating-point values. When a single-component binding is evaluated, the
selected word is broadcast to all four components of the variable. When a
four-component binding is evaluated, the four components of the buffer
element are loaded into the variable. If no buffer object is bound to
binding point <a>, or the bound buffer object is not large enough to hold
an element <b>, the values used are undefined. The binding point <a> must
be a nonnegative integer constant.
For program parameter buffer array declarations, "program.buffer[a][b..c]"
is equivalent to specifying elements <b> through <c> of the buffer object
bound to binding point <a> in order.
For program parameter buffer array declarations, "program.buffer[a]" is
equivalent to specifying the entire buffer -- elements 0 through <n>-1,
where <n> is either the size of the array (if declared) or the
implementation-dependent maximum parameter buffer object size limit (if no
size is declared).
Section 2.X.3.7, Program Condition Code Registers
The program condition code registers are four-component vectors. Each
component of this register is a collection of single-bit flags, including
a sign flag (SF), a zero flag (ZF), an overflow flag (OF), and a carry
flag (CF). There are two condition code registers (CC0 and CC1), whose
values are undefined at the beginning of program execution.
Most program instructions can optionally update one of the condition code
registers, by designating the condition code to update in the instruction.
When a condition code component is updated, the four flags of each
component of the condition code are set according to the corresponding
component of the instruction result. Full details on the condition code
updates and tests can be found in Section 2.X.4.3.
The value of these four flags can be combined in various condition code
tests, which can be used to mask writes to destination variables and to
perform conditional branches or other condition operations.
Section 2.X.3.8, Program Aliases
Programs can create aliases by matching the <ALIAS_statement> grammar
rule. Aliases allow programs to use multiple variable names to refer to a
single underlying variable. For example, the statement
ALIAS var1 = var0
establishes a variable name of "var1". Subsequent references to "var1" in
the program text are treated as references to "var0". The left hand side
of an ALIAS statement must be a new variable name, and the right hand side
must be an established variable name.
Aliases are not considered variable declarations, so do not count against
the limits on the number of variable declarations allowed in the program
text.
Section 2.X.3.9, Program Resource Limits
(see ARB_vertex_program specification, incorporates all the different
limits on instruction counts, temporaries, attribute bindings, program
parameters, and so on)
Section 2.X.4, Program Execution Environment
The set of instructions supported for GPU programs is given in Table X.13
below and is described in detail in Section 2.X.8. An instruction can use
up to three operands when it executes, and most instructions can write a
single result vector. Instructions may also specify one or more
modifiers, according to the <opModifiers> grammar rule. Instruction
modifiers affect how the specified operation is performed.
GPU programs may operate on signed integer, unsigned integer, or
floating-point values; some instructions are capable of operating on any
of the three types. However, the data type of the operands and the result
are always determined based solely on the instruction and its modifiers.
If any of the variables used in the instruction are typeless, they will be
interpreted according to the data type derived from the instruction. If
any variables with a conflicting data type are used in the instruction,
the program will fail to load unless the "NTC" (no type checking)
instruction modifier is specified.
Modifiers
Instruction F I C S H D Out Inputs Description
----------- - - - - - - --- -------- --------------------------------
ABS X X X X X F v v absolute value
ADD X X X X X F v v,v add
AND - X X - - S v v,v bitwise and
BRK - - - - - - - c break out of loop instruction
CAL - - - - - - - c subroutine call
CEIL X X X X X F v vf ceiling
CMP X X X X X F v v,v,v compare
CONT - - - - - - - c continue with next loop interation
COS X - X X X F s s cosine with reduction to [-PI,PI]
DIV X X X X X F v v,s divide vector components by scalar
DP2 X - X X X F s v,v 2-component dot product
DP2A X - X X X F s v,v,v 2-comp. dot product w/scalar add
DP3 X - X X X F s v,v 3-component dot product
DP4 X - X X X F s v,v 4-component dot product
DPH X - X X X F s v,v homogeneous dot product
DST X - X X X F v v,v distance vector
ELSE - - - - - - - - start if test else block
ENDIF - - - - - - - - end if test block
ENDREP - - - - - - - - end of repeat block
EX2 X - X X X F s s exponential base 2
FLR X X X X X F v vf floor
FRC X - X X X F v v fraction
I2F - X X - - S vf v integer to float
IF - - - - - - - c start of if test block
KIL X X - - X F - vc kill fragment
LG2 X - X X X F s s logarithm base 2
LIT X - X X X F v v compute lighting coefficients
LRP X - X X X F v v,v,v linear interpolation
MAD X X X X X F v v,v,v multiply and add
MAX X X X X X F v v,v maximum
MIN X X X X X F v v,v minimum
MOD - X X - - S v v,s modulus vector components by scalar
MOV X X X X X F v v move
MUL X X X X X F v v,v multiply
NOT - X X - - S v v bitwise not
NRM X - X X X F v v normalize 3-component vector
OR - X X - - S v v,v bitwise or
PK2H X X - - - F s vf pack two 16-bit floats
PK2US X X - - - F s vf pack two floats as unsigned 16-bit
PK4B X X - - - F s vf pack four floats as signed 8-bit
PK4UB X X - - - F s vf pack four floats as unsigned 8-bit
POW X - X X X F s s,s exponentiate
RCC X - X X X F s s reciprocal (clamped)
RCP X - X X X F s s reciprocal
REP X X - - X F - v start of repeat block
RET - - - - - - - c subroutine return
RFL X - X X X F v v,v reflection vector
ROUND X X X X X F v vf round to nearest integer
RSQ X - X X X F s s reciprocal square root
SAD - X X - - S vu v,v,vu sum of absolute differences
SCS X - X X X F v s sine/cosine without reduction
SEQ X X X X X F v v,v set on equal
SFL X X X X X F v v,v set on false
SGE X X X X X F v v,v set on greater than or equal
SGT X X X X X F v v,v set on greater than
SHL - X X - - S v v,s shift left
SHR - X X - - S v v,s shift right
SIN X - X X X F s s sine with reduction to [-PI,PI]
SLE X X X X X F v v,v set on less than or equal
SLT X X X X X F v v,v set on less than
SNE X X X X X F v v,v set on not equal
SSG X - X X X F v v set sign
STR X X X X X F v v,v set on true
SUB X X X X X F v v,v subtract
SWZ X - X X X F v v extended swizzle
TEX X X X X - F v vf texture sample
TRUNC X X X X X F v vf truncate (round toward zero)
TXB X X X X - F v vf texture sample with bias
TXD X X X X - F v vf,vf,vf texture sample w/partials
TXF X X X X - F v vs texel fetch
TXL X X X X - F v vf texture sample w/LOD
TXP X X X X - F v vf texture sample w/projection
TXQ - - - - - S vs vs texture info query
UP2H X X X X - F vf s unpack two 16-bit floats
UP2US X X X X - F vf s unpack two unsigned 16-bit ints
UP4B X X X X - F vf s unpack four signed 8-bit ints
UP4UB X X X X - F vf s unpack four unsigned 8-bit ints
X2D X - X X X F v v,v,v 2D coordinate transformation
XOR - X X - - S v v,v exclusive or
XPD X - X X X F v v,v cross product
Table X.13: Summary of NV_gpu_program4 instructions. The "Modifiers"
columns specify the set of modifiers allowed for the instruction:
F = floating-point data type modifiers
I = signed and unsigned integer data type modifiers
C = condition code update modifiers
S = clamping (saturation) modifiers
H = half-precision float data type suffix
D = default data type modifier (F, U, or S)
The input and output columns describe the formats of the operands and
results of the instruction.
v: 4-component vector (data type is inherited from operation)
vf: 4-component vector (data type is always floating-point)
vs: 4-component vector (data type is always signed integer)
vu: 4-component vector (data type is always unsigned integer)
s: scalar (replicated if written to a vector destination;
data type is inherited from operation)
c: condition code test result (e.g., "EQ", "GT1.x")
vc: 4-component vector or condition code test
Section 2.X.4.1, Program Instruction Modifiers
There are several types of instruction modifiers available. A data type
modifier specifies that an instruction should operate on signed integer,
unsigned integer, or floating-point data, when multiple data types are
supported. A clamping modifier applies to instructions with
floating-point results, and specifies the range to which the results
should be clamped. A condition code update modifier specifies that the
instruction should update one of the condition code variables. Several
other special modifiers are also provided.
Instruction modifiers may be specified as stand-alone modifiers or as
suffixes concatenated with the opcode name. A program will fail to load
if it contains an instruction that
* specifies more than one modifier of any given type,
* specifies a clamping modifier on an instruction, unless it produces
floating-point results, or
* specifies a modifier that is not supported by the instruction (see
Table X.13 and the instruction description).
Stand-alone instruction modifiers are specified according to the
<opModifiers> grammar rule using a ".<modifier>" syntax. Multiple
modifers, separated by periods, may be specified. The set of supported
modifiers is described in Table X.14.
Modifier Description
-------- -----------------------------------------------
F Floating-point operation
U Fixed-point operation, unsigned operands
S Fixed-point operation, signed operands
CC Update condition code register zero
CC0 Update condition code register zero
CC1 Update condition code register one
SAT Floating-point results clamped to [0,1]
SSAT Floating-point results clamped to [-1,1]
NTC Disable type-checking on operands/results
S24 Signed multiply (24-bit operands)
U24 Unsigned multiply (24-bit operands)
HI Multiplies two 32-bit integer operands, returns
the 32 MSBs of the product
Table X.14, Instruction Modifers.
"F", "U", and "S" modifiers are data type modifiers and specify that the
instruction should operate on floating-point, unsigned integer, or
signed integer values, respectively. For example, "ADD.F", "ADD.U", and
"ADD.S" specify component-wise addition of floating-point, unsigned
integer, or signed integer vectors, respectively. These modifiers specify
a data type, but do not specify a precision at which the operation is
performed. Floating-point operations will be carried out with an internal
precision no less than that used to represent the largest operand.
Fixed-point operations will be carried out using at least as many bits as
used to represent the largest operand. Operands represented with fewer
bits than used to perform the instruction will be promoted to a larger
data type. Signed integer operands will be sign-extended, where the most
significant bits are filled with ones if the operand is negative and zero
otherwise. Unsigned integer operands will be zero-extended, where the
most significant bits are always filled with zeroes. For some
instructions, the data type of some operands or the result are fixed; in
these cases, the data type modifier specifies the data type of the
remaining values.
"CC", "CC0", and "CC1" are condition code update modifiers that specify
that one of the condition code registers should be updated based on the
result of the instruction, as described in section 2.X.4.3. "CC" and
"CC0" specify that the condition code register CC0 be updated; "CC1"
specifies an update to CC1. If no condition code update modifier is
provided, the condition code registers will not be affected.
"SAT" and "SSAT" are clamping modifiers that specify that the
floating-point components of the instruction result should be clamped to
[0,1] or [-1,1], respectively, before updating the condition code and the
destination variable. If no clamping suffix is specified, unclamped
results will be used for condition code updates (if any) and destination
variable writes. Clamping modifiers are not supported on instructions
that do not produce floating-point results.
"NTC" (no type checking) disables data type checking on the instruction,
and allows instructions to use operands or result variables whose data
types are inconsistent with the expected data types of the instruction.
"S24", "U24", and "HI" are special modifiers that are allowed only for the
MUL instruction, and are described in detail where MUL is documented. No
more than one such modifier may be provided for any instruction.
If an instruction supports data type modifiers, but none is provided, a
default data type will be chosen based on the instruction, as specified in
Table X.13 and the instruction set description (Section 2.X.8). If
condition code update or clamping modifiers are not specified, the
corresponding operation will not be performed.
Additionally, each instruction name may have one or more suffixes,
concatenated onto the base instruction name, that operate as instruction
modifiers. For conciseness, these suffixes are not spelled out in the
grammar -- the base opcode name is used as a placeholder for the opcode
and all of its possible suffixes. Instruction suffixes are provided
mainly for compatibility with prior GPU program instruction sets (e.g.,
NV_vertex_program3, NV_fragment_program2, and predecessors). The set of
allowable suffixes, and their equivalent stand-alone modifiers, are listed
in Table X.15.
Suffix Modifier Description
------ ---------- ---------------------------------------------------
R F Floating-point operation, 32-bit precision
H F(*) Floating-point operation, at least 16-bit precision
C CC0 Update condition code register zero
C0 CC0 Update condition code register zero
C1 CC1 Update condition code register one
_SAT SAT Floating-point results clamped to [0,1]
_SSAT SSAT Floating-point results clamped to [-1,1]
Table X.15, Instruction Suffixes.
The "R" and "H" suffixes specify floating-point operations and are
equivalent to the "F" data type modifier. They additionally specify a
minimum precision for the operations. Instructions with an "R" precision
modifier will be carried out at no less than IEEE single-precision
floating-point (8 bits of exponent, 23 bits of mantissa). Instructions
with an "H" precision modifier will be carried out at no less than 16-bit
floating-point precision (5 bits of exponent, 10 bits of mantissa).
An instruction may have multiple suffixes, but they must appear in order,
with data type suffixes first, followed by condition code update suffixes,
followed by clamping suffixes. For example, "ADDR" carries out an add at
32-bit precision. "ADDH_SAT" carries out an add at 16-bit precision (or
better) and clamps the results to [0,1]. "ADDRC1_SSAT" carries out an add
at 32-bit floating-point precision, clamps the results to [-1,1], and
updates condition code one based on the clamped result.
Section 2.X.4.2, Program Operands
Most program instructions operate on one or more scalar or vector
operands. Each operand specifies an operand variable, which is either the
name of a previously declared variable or an implicit variable declaration
created by using a variable binding in the instruction. Attribute,
parameter, or parameter buffer variables can be declared implicitly by
using a valid binding name in an operand. Instruction operands are
specified by the <instOperandV>, <instOperandS>, or <instOperandVNS>
grammar rules.
If the operand variable is not an array, its contents are loaded directly.
If the operand variable is an array, a single element of the array is
loaded according to the <arrayMem> grammar rule. The elements of an array
are numbered from 0 to <n>-1, where <n> is the number of entries in the
array. Array members can be accessed using either absolute or relative
addressing.
Absolute array addressing is used when the <arrayMemAbs> grammar rule is
matched; the array member to load is specified by the matching integer.
Out-of-bounds array absolute accesses are not allowed. If the specified
member number is greater than or equal to the size of the array, the
program will fail to load.
Relative array addressing is used when the <arrayMemRel> grammar rule is
matched. This grammar rule allows the program to specify a scalar integer
operand and an optional constant offset, according to the <arrayMemReg>
and <arrayMemOffset> grammar rules. When performing relative addressing,
the GL evaluates the specified integer scalar operand (according to the
rules specified in this section) and adds the constant offset. The array
member loaded is given by this sum. The constant offset is considered
zero if an offset is omitted. If the sum is negative or exceeds the size
of the array, the results of the access are undefined, but may not lead to
program or GL termination. The set of constant offsets supported for
relative addressing is limited to values in the range [0,<n>-1], where <n>
is the size of the array. A program will fail to load if it specifies an
offset outside that range. If offsets outside that range are required,
they can be applied by using an integer ADD instruction writing to a
temporary variable.
After the operand is loaded, its components can be rearranged according to
the <swizzleSuffix> grammar rule, or it can be converted to a scalar
operand according to the <scalarSuffix> grammar rule.
The <swizzleSuffix> grammar rule rearranges the components of a loaded
vector to produce another vector. If the <swizzleSuffix> rule matches the
<xyzwSwizzle> or <rgbaSwizzle> grammar rule, a pattern of the form ".????"
is used, where each question mark is replaced with one of "x", "y", "z",
"w", "r", "g", "b", or a". For such patterns, the x, y, z, and w
components of the operand are taken from the vector components named by
the first, second, third, and fourth character of the pattern,
respectively. Swizzle components of "r", "g", "b", and "a" are equivalent
to "x", "y", "z", and "w", respectively. For example, if the swizzle
suffix is ".yzzx" or ".gbbr" and the specified source contains {2,8,9,0},
the result is the vector {8,9,9,2}. If the <swizzleSuffix> matches the
<component> grammar rule, a pattern of the form ".?" is used. For this
pattern, all four components of the operand are taken from the single
component identified by the pattern. If the swizzle suffix is omitted,
components are not rearranged and swizzling has no effect, as though
".xyzw" were specified.
The swizzle suffix rules do not allow mixing "x", "y", "z", or "w"
selectors with "r", "g", "b", or "a" selectors. A program will fail to
load if it contains a swizzle suffix with selectors from both of these
sets.
The <scalarSuffix> grammar rule converts a vector to a scalar by selecting
a single component. The <scalarSuffix> rule is similar to the swizzle
selector, except that only a single component is selected. If the scalar
suffix is ".y" and the specified source contains {2,8,9,0}, the value is
the scalar value 8.
Next, a component-wise negate operation is performed on the operand if the
<operandNeg> grammar rule matches "-". Negation is not performed if the
operand has no sign prefix, or is prefixed with "+". For unsigned integer
operands, the negate operand performs a two's complement operation.
Next, a component-wise absolute value operation is performed on the
operand if the <instOperandAbsV> or <instOperandAbsS> grammar rule is
matched, by surrounding the operand with two "|" characters. The result
is optionally negated if the <operandAbsNeg> grammar rule matches "-".
For unsigned integer operands, the absolute value operation has no effect.
Section 2.X.4.3, Program Destination Variable Update
Most program instructions perform computations that produce a result,
which will be written to a variable. Each instruction that computes a
result specifies a destination variable, which is either the name of a
previously declared variable or an implicit variable declaration created
by using a variable binding in the instruction. Result variables can be
declared implicitly by using a valid program result binding name in the
result portion of the instruction. Instruction results are specified
according to the <instResult> grammar rule.
The destination variable may be a single member of an array. In this
case, a single array member is specified using the <arrayMem> grammar
rule, and the array member to update is computed in the exact same manner
as done for operand loads. If the array member is computed at run time,
and is negative or greater than or equal to the size of the array, the
results of the destination variable update are undefined and could result
in overwriting other program variables.
The results of the operation may be obtained at a different precision than
that used to store the destination variable. If so, the results are
converted to match the size of the destination variable. For
floating-point values, the results are rounded to the nearest
floating-point value that can be represented in the destination variable.
If a result component is larger in magnitude than the largest
representable floating-point value in the data type of the destination
variable, an infinity encoding (+/-INF) is used. Signed or unsigned
integer values are sign-extended or zero-extended, respectively, if the
destination variable has more bits than the result, and have their most
significant bits discarded if the destination variable has fewer bits.
Writes to individual components of a vector destination variable can be
controlled at compile time by individual component write masks specified
in the instruction. The component write mask is specified by the
<optWriteMask> grammar rule, and is a string of up to four characters,
naming the components to enable for writing. If no write mask is
specified, all components are enabled for writing. The characters "x",
"y", "z", and "w" match the x, y, z, and w components respectively. For
example, a write mask mask of ".xzw" indicates that the x, z, and w
components should be enabled for writing but the y component should not be
written. The grammar requires that the destination register mask
components must be listed in "xyzw" order. Additionally, write mask
components of "r", "g", "b", and "a" are equivalent to "x", "y", "z", and
"w", respectively. The grammar does not allow mixing "x", "y", "z", or
"w" components with "r", "g", "b", and "a" ones.
Writes to individual components of a vector destination variable, or to a
scalar destination variable, can also be controlled at run time using
condition code write masks. The condition code write mask is specified by
the <ccMask> grammar rule. If a mask is specified, a condition code
variable is loaded according to the <ccMaskRule> grammar rule and tested
as described in Table X.16 to produce a four-component vector of TRUE/FALSE
values.
mask rule test name condition
--------------- ---------------------- -----------------
EQ, EQ0, EQ1 equal !SF && ZF
GE, GE0, GE1 greater than or equal !(SF ^ OF)
GT, GT0, GT1 greater than (!SF ^ OF) && !ZF
LE, LE0, LE1 less than or equal SF ^ (ZF || OF)
LT, LT0, LT1 less than (SF && !ZF) ^ OF
NE, NE0, NE1 not equal SF || !ZF
FL, FL0, FL1 false always false
TR, TR0, TR1 true always true
NAN, NAN0, NAN1 not a number SF && ZF
LEG, LEG0, LEG1 less, equal, or greater !SF || !ZF
(anything but a NaN)
CF, CF0, CF1 carry flag CF
NCF, NCF0, NCF1 no carry flag !CF
OF, OF0, OF1 overflow flag OF
NOF, NOF0, NOF1 no overflow flag !OF
SF, SF0, SF1 sign flag SF
NSF, NSF0, NSF1 no sign flag !SF
AB, AB0, AB1 above CF && !ZF
BLE, BLE0, BLE1 below or equal !CF || ZF
Table X.16, Condition Code Tests. The allowed rules are specified in
the "mask rule" column. If "0" or "1" is appended to the rule name
(e.g., "EQ1"), the corresponding condition code register (CC1 in this
example) is loaded, otherwise CC0 is loaded. After loading, each
component is tested, using the expression listed in the "condition"
column.
After the condition code tests are performed, the four-component result
can be swizzled according to the <swizzleSuffix> grammar rule. Individual
components of the destination variable are written only if the
corresponding component of the swizzled condition code test result is
TRUE. If both a (compile-time) component write mask and a condition code
write mask are specified, destination variable components are written only
if the corresponding component is enabled in both masks.
A program instruction can also optionally update one of the two condition
code registers if the "CC", "CC0", or "CC1" instruction modifier are
specified. These instruction modifiers update condition code register
CC0, CC0, or CC1, respectively. The instructions "ADD.CC" or "ADD.CC0"
will perform an add and update condition code zero, "ADD.CC1" will add and
update condition code one, and "ADD" will simply perform the add without a
condition code update. The components of the selected condition code
register are updated if and only if the corresponding component of the
destination variable are enabled by both write masks. For the purposes of
condition code update, a scalar destination variable is treated as a
vector where the scalar result is written to "x" (if enabled in the write
mask), and writes to the "y", "z", and "w" components are disabled.
When condition code components are written, the condition code flags are
updated based on the corresponding component of the result. If a
component of the destination register is not enabled for writes, the
corresponding condition code component is also unchanged.
For floating-point results, the sign flag (SF) is set if the result is
less than zero or is a NaN (not a number) value. The zero flag (ZF) is
set if the result is equal to zero or is a NaN.
For signed and unsigned integer results, the sign flag (SF) is set if the
most significant bit of the value written to the result variable is set
and the zero flag (ZF) is set if the result written is zero. For
instructions other than those performing an integer add or subtract (ADD,
MAD, SAD, SUB), the overflow and carry flags (OF and CF) are cleared.
For integer add or subtract operations, the overflow and carry flags by
doing both signed and unsigned adds/subtracts as follows:
The overflow flag (OF) is set by interpreting the two operands as signed
integers and performing a signed add or subtract. If the result is
representable as a signed integer (i.e., doesn't overflow), the overflow
flag is cleared; otherwise, it is set.
The carry flag (CF) is set by interpreting the two operands as unsigned
integers and performing an unsigned add or subtract. If the result of
an add is representable as an unsigned integer (i.e., doesn't overflow),
the carry flag is cleared; otherwise, it is set. If the result of a
subtract is greater than or equal to zero, the carry flag is set;
otherwise, it is cleared.
For the purposes of condition code setting, negation modifiers turn add
operations into subtracts and vice versa. If the operation is equivalent
to an add with both operands negated (-A-B), the carry and overflow flags
are both undefined.
Section 2.X.4.4, Program Texture Access
Certain program instructions may access texture images, as described in
section 3.8. The coordinates, level-of-detail, and partial derivatives
used for performing the texture lookup are derived from values provided in
the program as described in the various sub-sections of Section 2.X.8.
These descriptions use the function
result_t_vec
TextureSample(float_vec coord, float lod, float_vec ddx,
float_vec ddy, int_vec offset);
which obtains a filtered texel value <tau> as described in Section 3.8.8
and returns a 4-component vector (R,G,B,A) according to the format
conversions specified in Table 3.21. The result vector is interpreted as
floating-point, signed integer, or unsigned integer, according to the data
type modifier of the instruction. If the internal format of the texture
does not match the instruction's data type modifer, the results of the
texture lookup are undefined.
(Note: For unextended OpenGL 2.0, all supported texture internal formats
store integer values but return floating-point results in the range [0,1]
on a texture lookup. The ARB_texture_float extension introduces
floating-point internal format where components are both stored and
returned as floating-point values. The EXT_texture_integer extension
introduces formats that both store and return either signed or unsigned
integer values.)
<coord> is a four-component floating-point vector from which the (s,t,r)
texture coordinates used for the texture access, the layer used for array
textures, and the reference value used for depth comparisons (section
3.8.14) are extracted according to Table X.17. If the texture is a cube
map, (s,t,r) is projected to one of the six cube faces to produce a new
(s,t) vector according to Section 3.8.6. For array textures, the layer
used is derived by rounding the extracted floating-point component to the
nearest integer and clamping the result to the range [0,<n>-1], where <n>
is the number of layers in the texture.
<lod> specifies the level of detail parameter and replaces the value
computed in equation 3.18. <ddx> and <ddy> specify partial derivatives
(ds/dx, dt/dx, dr/dx, ds/dy, dt/dy, and dr/dy) for the texture
coordinates, and may be used to derive footprint shapes for anisotropic
texture filtering.
<offset> is a constant 3-component signed integer vector specified
according to the <texOffset> grammar rule, which is added to the computed
<u>, <v>, and <w> texel locations prior to sampling. One, two, or three
components may be specified in the instruction; if fewer than three are
specified, the remaining offset components are zero. A limited range of
offset values are supported; the minimum and maximum <texOffset> values
are implementation-dependent and given by MIN_PROGRAM_TEXEL_OFFSET_EXT and
MAX_PROGRAM_TEXEL_OFFSET_EXT, respectively. A program will fail to load:
* if the texture target specified in the instruction is 1D, ARRAY1D,
SHADOW1D, or SHADOWARRAY1D, and the second or third component of the
offset vector is non-zero,
* if the texture target specified in the instruction is 2D, RECT,
ARRAY2D, SHADOW2D, SHADOWRECT, or SHADOWARRAY2D, and the third
component of the offset vector is non-zero,
* if the texture target is CUBE or SHADOWCUBE, and any component of the
offset vector is non-zero -- texel offsets are not supported for cube
map or buffer textures, or
* if any component of the offset vector is less than
MIN_PROGRAM_TEXEL_OFFSET_EXT or greater than
MAX_PROGRAM_TEXEL_OFFSET_EXT.
(NOTE: Texel offsets are a new feature provided by this extension and are
described in more detail in edits to Section 3.8 below.)
The texture used by TextureSample() is one of the textures bound to the
texture image unit whose number is specified in the instruction according
to the <texImageUnit> grammar rule. The texture target accessed is
specified according to the <texTarget> grammar rule and Table X.17.
Fixed-function texture enables are always ignored when determining the
texture to access in a program.
coordinates used
texTarget Texture Type s t r layer shadow
---------------- --------------------- ----- ----- ------
1D TEXTURE_1D x - - - -
2D TEXTURE_2D x y - - -
3D TEXTURE_3D x y z - -
CUBE TEXTURE_CUBE_MAP x y z - -
RECT TEXTURE_RECTANGLE_ARB x y - - -
ARRAY1D TEXTURE_1D_ARRAY_EXT x - - y -
ARRAY2D TEXTURE_2D_ARRAY_EXT x y - z -
SHADOW1D TEXTURE_1D x - - - z
SHADOW2D TEXTURE_2D x y - - z
SHADOWRECT TEXTURE_RECTANGLE_ARB x y - - z
SHADOWCUBE TEXTURE_CUBE_MAP x y z - w
SHADOWARRAY1D TEXTURE_1D_ARRAY_EXT x - - y z
SHADOWARRAY2D TEXTURE_2D_ARRAY_EXT x y - z w
BUFFER TEXTURE_BUFFER_EXT <not supported>
Table X.17: Texture types accessed for each of the <texTarget>, and
coordinate mappings. The "SHADOW" and "ARRAY" targets are special
pseudo-targets described below. The "coordinates used" column indicate
the input values used for each coordinate of the texture lookup, the
layer selector for array textures, and the reference value for texture
comparisons. Buffer textures are not supported by normal texture lookup
functions, but are supported by TXF and TXQ, described below.
Texture targets with "SHADOW" are used to access textures with a
DEPTH_COMPONENT base internal format using depth comparisons (Section
3.8.14). Results of a texture access are undefined:
* if a "SHADOW" target is used, and the corresponding texture has a base
internal format other than DEPTH_COMPONENT or a TEXTURE_COMPARE_MODE
of NONE, or
* if a non-"SHADOW" target is used, and the corresponding texture has a
base internal format of DEPTH_COMPONENT and a TEXTURE_COMPARE_MODE
other than NONE.
If the texture being accessed is not complete (or cube complete for
cubemap textures), no texture access is performed and the result is
undefined.
A program will fail to load if it attempts to sample from multiple texture
targets (including the SHADOW pseudo-targets) on the same texture image
unit. For example, a program containing any two the following
instructions will fail to load:
TEX out, coord, texture[0], 1D;
TEX out, coord, texture[0], 2D;
TEX out, coord, texture[0], ARRAY2D;
TEX out, coord, texture[0], SHADOW2D;
TEX out, coord, texture[0], 3D;
Additionally, multiple texture targets for a single texture image unit may
not be used at the same time by the GL. The error INVALID_OPERATION is
generated by Begin, RasterPos, or any command that performs an implicit
Begin if an enabled program accesses one texture target for a texture unit
while another enabled program or fixed-function fragment processing
accesses a different texture target for the same texture image unit.
Some texture instructions use standard methods to compute partial
derivatives and/or the level-of-detail used to perform texture accesses.
For fragment programs, the functions
float_vec ComputePartialsX(float_vec coord);
float_vec ComputePartialsY(float_vec coord);
compute approximate component-wise partial derivatives of the
floating-point vector <coord> relative to the X and Y coordinates,
respectively. For vertex and geometry programs, these functions always
return (0,0,0,0). The function
float ComputeLOD(float_vec ddx, float_vec ddy);
maps partial derivative vectors <ddx> and <ddy> to ds/dx, dt/dx, dr/dx,
ds/dy, dt/dy, and dr/dy and computes lambda_base(x,y) according to
equation 3.18.
The TXF instruction provides the ability to extract a single texel from a
specified texture image using the function
result_t_vec TexelFetch(int_vec coord, int_vec offset);
The extracted texel is converted to an (R,G,B,A) vector according to Table
3.21. The result vector is interpreted as floating-point, signed integer,
or unsigned integer, according to the data type modifier of the
instruction. If the internal format of the texture is not compatible with
the instruction's data type modifer, the extracted texel value is
undefined.
<coord> is a four-component signed integer vector used to identify the
single texel accessed. The (i,j,k) coordinates of the texel and the layer
used for array textures are extracted according to Table X.18. The level
of detail accessed is obtained by adding the w component of <coord> to the
base level (level_base). <offset> is a constant 3-component signed
integer vector added to the texel coordinates prior to the texel fetch as
described above. In addition to the restrictions described above,
non-zero offset components are also not supported for BUFFER targets.
The texture used by TexelFetch() is specified by the image unit and target
parameters provided in the instruction, as for TextureSample() above.
Single texel fetches can not perform depth comparisons or access cubemaps.
If a program contains a TXF instruction specifying one of the "SHADOW" or
"CUBE" targets, it will fail to load.
coordinates used
texTarget supported i j k layer lod
---------------- --------- ----- ----- ---
1D yes x - - - w
2D yes x y - - w
3D yes x y z - w
CUBE no - - - - -
RECT yes x y - - w
ARRAY1D yes x - - y w
ARRAY2D yes x y - z w
SHADOW1D no - - - - -
SHADOW2D no - - - - -
SHADOWRECT no - - - - -
SHADOWCUBE no - - - - -
SHADOWARRAY1D no - - - - -
SHADOWARRAY2D no - - - - -
BUFFER yes x - - - -
Table X.18, Mappings of texel fetch coordinates to texel location.
Single-texel fetches do not support LOD clamping or any texture wrap mode,
and require a mipmapped minification filter to access any level of detail
other than the base level. The results of the texel fetch are undefined:
* if the computed LOD is less than the texture's base level (level_base)
or greater than the maximum level (level_max),
* if the computed LOD is not the texture's base level and the texture's
minification filter is NEAREST or LINEAR,
* if the layer specified for array textures is negative or greater than
the number of layers in the array texture,
* if the texel at (i,j,k) coordinates refer to a border texel outside
the defined extents of the specified LOD, where
i < -b_s, j < -b_s, k < -b_s,
i >= w_s - b_s, j >= h_s - b_s, or k >= d_s - b_s,
where the size parameters (w_s, h_s, d_s, and b_s) refer to the width,
height, depth, and border size of the image, as in equations 3.15,
3.16, and 3.17, or
* if the texture being accessed is not complete (or cube complete for
cubemaps).
Section 2.X.5, Program Flow Control
In addition to basic arithmetic, logical, and texture instructions, a
number of flow control instructions are provided, which are described in
detail in Section 2.X.8. Programs can contain several types of
instruction blocks: IF/ELSE/ENDIF blocks, REP/ENDREP blocks, and
subroutine blocks. IF/ELSE/ENDIF blocks are a set of instructions
beginning with an "IF" instruction, ending with an "ENDIF" instruction,
and possibly containing an optional "ELSE" instruction. REP/ENDREP blocks
are a set of instructions beginning with a "REP" instruction and ending
with an "ENDREP" instruction. Subroutine blocks begin with an instruction
label identifying the name of the subroutine and ending just before the
next instruction label or the end of the program. Examples include the
following:
MOVC CC, R0;
IF GT.x;
MOV R0, R1; # executes if R0.x > 0
ELSE;
MOV R0, R2; # executes if R0.x <= 0
ENDIF;
REP repCount;
ADD R0, R0, R1;
ENDREP;
square: # subroutine to compute R0^2
MUL R0, R0, R0;
RET;
main:
MOV R0, 9.0;
CAL square; # compute 9.0^2 in R0
IF/ELSE/ENDIF and REP/ENDREP blocks may be nested inside each other, and
inside subroutines. In all cases, each instruction block must be
terminated with the appropriate instruction (ENDIF for IF, ENDREP for
REP). Nested instruction blocks must be wholly contained within a block
-- if a REP instruction is found between an IF and ELSE instruction, the
corresponding ENDREP must also be present between the IF and ELSE.
Subroutines may not be nested inside IF/ELSE/ENDIF or REP/ENDREP blocks,
or inside other subroutines. A program will fail to load if any
instruction block is terminated by an incorrect instruction, is not
terminated before the block containing it, or contains an instruction
label.
IF/ELSE/ENDIF blocks evaluate a condition to determine which instructions
to execute. If the condition is true, all instructions between the IF and
ELSE are executed. If the condition is false, all instructions between
the ELSE and ENDIF are executed. The ELSE instruction is optional. If
the ELSE is omitted, all instructions between the IF and ENDIF are
executed if the condition is true, or skipped if the condition is false.
A limited amount of nesting is supported -- a program will fail to load if
an IF instruction is nested inside MAX_PROGRAM_IF_DEPTH_NV or more
IF/ELSE/ENDIF blocks.
REP/ENDREP blocks are used to execute a sequence of instructions multiple
times. The REP instruction includes an optional scalar operand to specify
a loop count indicating the number of times the block of instructions
should be repeated. If the loop count is omitted, the contents of a
REP/ENDREP block will be repeated indefinitely until the loop is
explicitly terminated. A limited amount of nesting is supported -- a
program will fail to load if a REP instruction is nested inside
MAX_PROGRAM_LOOP_DEPTH_NV or more REP/ENDREP blocks.
Within a REP/ENDREP block, the CONT instruction can be used to terminate
the current iteration of the loop by effectively jumping to the ENDREP
instruction. The BRK instruction can be used to terminate the entire loop
by effectively jumping to the instruction immediately following the ENDREP
instruction. If CONT and BRK instructions are found inside multiply
nested REP/ENDREP blocks, they apply to the innermost block. A program
will fail to load if it includes a CONT or BRK instruction that is not
contained inside a REP/ENDREP block.
A REP/ENDREP block without a specified loop count can result in an
infinite loop. To prevent obvious infinite loops, a program will fail to
load if it contains a REP/ENDREP block that contains neither a BRK
instruction at the current nesting level or a RET instruction at any
nesting level.
Subroutines are supported via the CAL and RET instructions. A subroutine
block is identified by an instruction, which can be any valid identifier
according to the <instLabel> grammar rule. The CAL instruction identifies
a subroutine name to call according to the <instTarget> grammar rule.
Instruction labels used in CAL instructions do not need to be defined in
the program text that precedes the instruction, but a program will fail to
load if it includes a CAL instruction that references an instruction label
that is not defined anywhere in the program. When a CAL instruction is
executed, it transfers control to the instruction immediately following
the specified instruction label. Subsequent instructions in that
subroutine are executed until a RET instruction is executed, or until
program execution reaches another instruction label or the end of the
program text. After the subroutine finishes, execution continues with the
instruction immediately following the CAL instruction. When a RET
instruction is issued, it will break out of any IF/ELSE/ENDIF or
REP/ENDREP blocks that contain it.
Subroutines may call other subroutines before completing, up to an
implementation-dependent maximum depth of MAX_PROGRAM_CALL_DEPTH_NV calls.
Subroutines may call any subroutine in the program, including themselves,
as long as the call depth limit is obeyed. The results of issuing a CAL
instruction while MAX_PROGRAM_CALL_DEPTH subroutines have not completed
has undefined results, including possible program termination.
Several flow control instructions include condition code tests. The IF
instruction requires a condition test to determine what instructions are
executed. The CONT, BRK, CAL, and RET instructions have an optional
condition code test; if the test fails, the instructions are not executed.
Condition code tests are specified by the <ccTest> grammar rule. The test
is evaluated like the condition code write mask (section 2.X.4.3), and
passes if and only if any of the four components passes.
If an instruction label named "main" is specified, GPU program execution
begins with the instruction immediately following that label. Otherwise,
it begins with the first instruction of the program. Instructions are
executed in sequence until either a RET instruction is issued in the main
subroutine or the end of the program text is reached.
Section 2.X.6, Program Options
Programs may specify a number of options to indicate that one or more
extended language features are used by the program. All program options
used by the program must be declared at the beginning of the program
string. Each program option specified in a program string will modify the
syntactic or semantic rules used to interpet the program and the execution
environment used to execute the program. Features in program options
not declared by the program are ignored, even if the option is otherwise
supported by the GL. Each option declaration consists of two tokens: the
keyword "OPTION" and an identifier.
The set of available options depends on the program type, and is
enumerated in the specifications for each program type. Some program
types may not provide any options.
Section 2.X.7, Program Declarations
Programs may include a number of declaration statements to specify
characteristics of the program. Each declaration statement is followed by
one or more arguments, separated by commas.
The set of available declarations depends on the program type, and is
enumerated in the specifications for each program type. Some program
types may not provide declarations.
Section 2.X.8, Program Instruction Set
The following sections enumerate the set of instructions supported for GPU
programs.
Some instructions allow the use of one of the three basic data type
modifiers (floating point, signed integer, and unsigned integer). Unless
otherwise mentioned:
* the result and all of the operands will be interpreted according to
the specified data type, and
* if no data type modifier is specified, the instruction will operate as
though a floating-point modifier ("F") were specified.
Some instructions will override one or both of these rules.
Section 2.X.8.Z, ABS: Absolute Value
The ABS instruction performs a component-wise absolute value operation on
the single operand to yield a result vector.
tmp = VectorLoad(op0);
result.x = abs(tmp.x);
result.y = abs(tmp.y);
result.z = abs(tmp.z);
result.w = abs(tmp.w);
ABS supports all three data type modifiers. Taking the absolute value of
an unsigned integer is not a useful operation, but is not illegal.
Section 2.X.8.Z, ADD: Add
The ADD instruction performs a component-wise add of the two operands to
yield a result vector.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
result.x = tmp0.x + tmp1.x;
result.y = tmp0.y + tmp1.y;
result.z = tmp0.z + tmp1.z;
result.w = tmp0.w + tmp1.w;
ADD supports all three data type modifiers.
Section 2.X.8.Z, AND: Bitwise AND
The AND instruction performs a bitwise AND operation on the components of
the two source vectors to yield a result vector.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
result.x = tmp0.x & tmp1.x;
result.y = tmp0.y & tmp1.y;
result.z = tmp0.z & tmp1.z;
result.w = tmp0.w & tmp1.w;
AND supports only signed and unsigned integer data type modifiers. If no
type modifier is specified, both operands and the result are treated as
signed integers.
Section 2.X.8.Z, BRK: Break out of Loop Instruction
The BRK instruction conditionally transfers control to the instruction
immediately following the next ENDREP instruction. A BRK instruction has
no effect if the condition code test evaluates to FALSE.
The following pseudocode describes the operation of the instruction:
if (TestCC(cc.c***) || TestCC(cc.*c**) ||
TestCC(cc.**c*) || TestCC(cc.***c)) {
continue execution at instruction following the next ENDREP;
}
Section 2.X.8.Z, CAL: Subroutine Call
The CAL instruction conditionally transfers control to the instruction
following the label specified in the instruction. It also pushes a
reference to the instruction immediately following the CAL instruction
onto the call stack, where execution will continue after executing the
matching RET instruction. The following pseudocode describes the
operation of the instruction:
if (TestCC(cc.c***) || TestCC(cc.*c**) ||
TestCC(cc.**c*) || TestCC(cc.***c)) {
if (callStackDepth >= MAX_PROGRAM_CALL_DEPTH_NV) {
// undefined results
} else {
callStack[callStackDepth] = nextInstruction;
callStackDepth++;
}
// continue execution at instruction following <instTarget>
} else {
// do nothing
}
In the pseudocode, <instTarget> is the label specified in the instruction
matching the <branchLabel> grammar rule, <callStackDepth> is the current
depth of the call stack, <callStack> is an array holding the call stack,
and <nextInstruction> is a reference to the instruction immediately
following the CAL instruction in the program string.
If the call stack overflows, the results of the CAL instruction are
undefined, and can result in immediate program termination.
An instruction label signifies the beginning of a new subroutine.
Subroutines may not nest or overlap. If a CAL instruction is executed and
subsequent program execution reaches an instruction label before a
corresponding RET instruction is executed, the subroutine call returns
immediately, as though an unconditional RET instruction were inserted
immediately before the instruction label.
(Note: On previous vertex program extensions -- NV_vertex_program2 and
NV_vertex_program3 -- instruction labels were also used as targets for
branch (BRA) instructions. This unstructured branching functionality has
been replaced with the structured branching constructs found in this
instruction set.)
Section 2.X.8.Z, CEIL: Ceiling
The CEIL instruction loads a single vector operand and performs a
component-wise ceiling operation to generate a result vector.
tmp = VectorLoad(op0);
iresult.x = ceil(tmp.x);
iresult.y = ceil(tmp.y);
iresult.z = ceil(tmp.z);
iresult.w = ceil(tmp.w);
The ceiling operation returns the nearest integer greater than or equal to
the operand. For example ceil(-1.7) = -1.0, ceil(+1.0) = +1.0, and
ceil(+3.7) = +4.0.
CEIL supports all three data type modifiers. The single operand is always
treated as a floating-point vector, but the result is written as a
floating-point value, a signed integer, or an unsigned integer, as
specified by the data type modifier. If a value is not exactly
representable using the data type of the result (e.g., an overflow or
writing a negative value to an unsigned integer), the result is undefined.
Section 2.X.8.Z, CMP: Compare
The CMP instructions performs a component-wise comparison of the first
operand against zero, and copies the values of the second or third
operands based on the results of the compare.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
tmp2 = VectorLoad(op2);
result.x = (tmp0.x < 0) ? tmp1.x : tmp2.x;
result.y = (tmp0.y < 0) ? tmp1.y : tmp2.y;
result.z = (tmp0.z < 0) ? tmp1.z : tmp2.z;
result.w = (tmp0.w < 0) ? tmp1.w : tmp2.w;
CMP supports all three data type modifiers. CMP with an unsigned data
type modifier is not a useful operation, but is not illegal.
Section 2.X.8.Z, CONT: Continue with Next Loop Iteration
The CONT instruction conditionally transfers control to the next ENDREP
instruction. A CONT instruction has no effect if the condition code test
evaluates to FALSE.
The following pseudocode describes the operation of the instruction:
if (TestCC(cc.c***) || TestCC(cc.*c**) ||
TestCC(cc.**c*) || TestCC(cc.***c)) {
continue execution at the next ENDREP;
}
Section 2.X.8.Z, COS: Cosine with Reduction to [-PI,PI]
The COS instruction approximates the trigonometric cosine of the angle
specified by the scalar operand and replicates it to all four components
of the result vector. The angle is specified in radians and does not have
to be in the range [-PI,PI].
tmp = ScalarLoad(op0);
result.x = ApproxCosine(tmp);
result.y = ApproxCosine(tmp);
result.z = ApproxCosine(tmp);
result.w = ApproxCosine(tmp);
COS supports only floating-point data type modifiers.
Section 2.X.8.Z, DDX: Partial Derivative Relative to X
The DDX instruction computes approximate partial derivatives of a vector
operand with respect to the X window coordinate, and is only available to
fragment programs. See the NV_fragment_program4 specification for more
details.
Section 2.X.8.Z, DDY: Partial Derivative Relative to Y
The DDY instruction computes approximate partial derivatives of a vector
operand with respect to the Y window coordinate, and is only available to
fragment programs. See the NV_fragment_program4 specification for more
details.
Section 2.X.8.Z, DIV: Divide Vector Components by Scalar
The DIV instruction performs a component-wise divide of the first vector
operand by the second scalar operand to produce a 4-component result
vector.
tmp0 = VectorLoad(op0);
tmp1 = ScalarLoad(op1);
result.x = tmp0.x / tmp1;
result.y = tmp0.y / tmp1;
result.z = tmp0.z / tmp1;
result.w = tmp0.w / tmp1;
DIV supports all three data type modifiers. For floating-point division,
this instruction is not guaranteed to produce results identical to a
RCP/MUL instruction sequence.
The results of an signed or unsigned integer division by zero are
undefined.
Section 2.X.8.Z, DP2: 2-Component Dot Product
The DP2 instruction computes a two-component dot product of the two
operands (using the first two components) and replicates the dot product
to all four components of the result vector.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
dot = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y);
result.x = dot;
result.y = dot;
result.z = dot;
result.w = dot;
DP2 supports only floating-point data type modifiers.
Section 2.X.8.Z, DP2A: 2-Component Dot Product with Scalar Add
The DP2 instruction computes a two-component dot product of the two
operands (using the first two components), adds the x component of the
third operand, and replicates the result to all four components of the
result vector.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
tmp2 = VectorLoad(op2);
dot = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + tmp2.x;
result.x = dot;
result.y = dot;
result.z = dot;
result.w = dot;
DP2A supports only floating-point data type modifiers.
Section 2.X.8.Z, DP3: 3-Component Dot Product
The DP3 instruction computes a three-component dot product of the two
operands (using the x, y, and z components) and replicates the dot product
to all four components of the result vector.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
dot = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +
(tmp0.z * tmp1.z);
result.x = dot;
result.y = dot;
result.z = dot;
result.w = dot;
DP3 supports only floating-point data type modifiers.
Section 2.X.8.Z, DP4: 4-Component Dot Product
The DP4 instruction computes a four-component dot product of the two
operands and replicates the dot product to all four components of the
result vector.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1):
dot = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +
(tmp0.z * tmp1.z) + (tmp0.w * tmp1.w);
result.x = dot;
result.y = dot;
result.z = dot;
result.w = dot;
DP4 supports only floating-point data type modifiers.
Section 2.X.8.Z, DPH: Homogeneous Dot Product
The DPH instruction computes a three-component dot product of the two
operands (using the x, y, and z components), adds the w component of the
second operand, and replicates the sum to all four components of the
result vector. This is equivalent to a four-component dot product where
the w component of the first operand is forced to 1.0.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1):
dot = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +
(tmp0.z * tmp1.z) + tmp1.w;
result.x = dot;
result.y = dot;
result.z = dot;
result.w = dot;
DPH supports only floating-point data type modifiers.
Section 2.X.8.Z, DST: Distance Vector
The DST instruction computes a distance vector from two specially-
formatted operands. The first operand should be of the form [NA, d^2,
d^2, NA] and the second operand should be of the form [NA, 1/d, NA, 1/d],
where NA values are not relevant to the calculation and d is a vector
length. If both vectors satisfy these conditions, the result vector will
be of the form [1.0, d, d^2, 1/d].
The exact behavior is specified in the following pseudo-code:
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
result.x = 1.0;
result.y = tmp0.y * tmp1.y;
result.z = tmp0.z;
result.w = tmp1.w;
Given an arbitrary vector, d^2 can be obtained using the DP3 instruction
(using the same vector for both operands) and 1/d can be obtained from d^2
using the RSQ instruction.
This distance vector is useful for per-vertex light attenuation
calculations: a DP3 operation using the distance vector and an
attenuation constants vector as operands will yield the attenuation
factor.
DST supports only floating-point data type modifiers.
Section 2.X.8.Z, ELSE: Start of If Test Else Block
The ELSE instruction signifies the end of the "execute if true" portion of
an IF/ELSE/ENDIF block and the beginning of the "execute if false"
portion.
If the condition evaluated at the IF statement was TRUE, when a program
reaches the ELSE statement, it has completed the entire "execute if true"
portion of the IF/ELSE/ENDIF block. Execution will continue at the
corresponding ENDIF instruction.
If the condition evaluated at the IF statement was FALSE, program
execution would skip over the entire "execute if true" portion of the
IF/ELSE/ENDIF block, including the ELSE instruction.
Section 2.X.8.Z, EMIT: Emit Vertex
The EMIT instruction emits a new vertex to be added to the current output
primitive generated by a geometry program, and is only available to
geometry programs. See the NV_geometry_program4 specification for more
details.
Section 2.X.8.Z, ENDIF: End of If Test Block
The ENDIF instruction signifies the end of an IF/ELSE/ENDIF block. It has
no other effect on program execution.
Section 2.X.8,Z, ENDPRIM: End of Primitive
A geometry program can emit multiple primitives in a single invocation.
The ENDPRIM instruction is used in a geometry program to signify the end
of the current primitive and the beginning of a new primitive of the same
type. It is only available to geometry programs. See the
NV_geometry_program4 specification for more details.
Section 2.X.8.Z, ENDREP: End of Repeat Block
The ENDREP instruction specifies the end of a REP block.
When used with in conjunction with a REP instruction with a loop count,
ENDREP decrements the loop counter. If the decremented loop counter is
greater than zero, ENDREP transfers control to the instruction immediately
after the corresponding REP instruction. If the loop counter is less than
or equal to zero, execution continues at the instruction following the
ENDREP instruction. When used in conjunction with a REP instruction
without loop count, ENDREP always transfers control to the instruction
immediately after the REP instruction.
if (REP instruction includes a loop count) {
LoopCount--;
if (LoopCount > 0) {
continue execution at instruction following corresponding REP
instruction;
}
} else {
continue execution at instruction following corresponding REP
instruction;
}
Section 2.X.8.Z, EX2: Exponential Base 2
The EX2 instruction approximates 2 raised to the power of the scalar
operand and replicates the approximation to all four components of the
result vector.
tmp = ScalarLoad(op0);
result.x = Approx2ToX(tmp);
result.y = Approx2ToX(tmp);
result.z = Approx2ToX(tmp);
result.w = Approx2ToX(tmp);
EX2 supports only floating-point data type modifiers.
Section 2.X.8.Z, FLR: Floor
The FLR instruction loads a single vector operand and performs a
component-wise floor operation to generate a result vector.
tmp = VectorLoad(op0);
result.x = floor(tmp.x);
result.y = floor(tmp.y);
result.z = floor(tmp.z);
result.w = floor(tmp.w);
The floor operation returns the nearest integer less than or equal to the
operand. For example floor(-1.7) = -2.0, floor(+1.0) = +1.0, and floor(+3.7)
= +3.0.
FLR supports all three data type modifiers. The single operand is always
treated as a floating-point value, but the result is written as a
floating-point value, a signed integer, or an unsigned integer, as
specified by the data type modifier. If a value is not exactly
representable using the data type of the result (e.g., an overflow or
writing a negative value to an unsigned integer), the result is undefined.
Section 2.X.8.Z, FRC: Fraction
The FRC instruction extracts the fractional portion of each component of
the operand to generate a result vector. The fractional portion of a
component is defined as the result after subtracting off the floor of the
component (see FLR), and is always in the range [0.0, 1.0).
For negative values, the fractional portion is NOT the number written to
the right of the decimal point -- the fractional portion of -1.7 is not
0.7 -- it is 0.3. 0.3 is produced by subtracting the floor of -1.7 (-2.0)
from -1.7.
tmp = VectorLoad(op0);
result.x = fraction(tmp.x);
result.y = fraction(tmp.y);
result.z = fraction(tmp.z);
result.w = fraction(tmp.w);
FRC supports only floating-point data type modifiers.
Section 2.X.8.Z, I2F: Integer to Float
The I2F instruction converts the components of an integer vector operand
to floating-point to produce a floating-point result vector.
tmp = VectorLoad(op0);
result.x = (float) tmp.x;
result.y = (float) tmp.y;
result.z = (float) tmp.z;
result.w = (float) tmp.w;
I2F supports only signed and unsigned integer data type modifiers. The
single operand is interpreted according to the data type modifier. If no
data type modifier is specified, the operand is treated as a signed
integer vector. The result is always written as a float.
Section 2.X.8.Z, IF: Start of If Test Block
The IF instruction performs a condition code test to determine what
instructions inside an IF/ELSE/ENDIF block are executed. If the test
passes, execution continues at the instruction immediately following the
IF instruction. If the test fails, IF transfers control to the
instruction immediately following the corresponding ELSE instruction (if
present) or the ENDIF instruction (if no ELSE is present).
Implementations may have a limited ability to nest IF blocks in any
subroutine. If the number of IF/ENDIF blocks nested inside each other is
MAX_PROGRAM_IF_DEPTH_NV or higher, a program will fail to compile.
// Evaluate the condition. If the condition is true, continue at the
// next instruction. Otherwise, continue at the
if (TestCC(cc.c***) || TestCC(cc.*c**) ||
TestCC(cc.**c*) || TestCC(cc.***c)) {
continue execution at the next instruction;
} else if (IF block contains an ELSE statement) {
continue execution at instruction following corresponding ELSE;
} else {
continue execution at instruction following corresponding ENDIF;
}
(Note: Unlike the NV_fragment_program2 extension, there is no run-time
limit on the maximum overall depth of IF/ENDIF nesting. As long as each
individual subroutine of the program obeys the static nesting limits,
there will be no run-time errors in the program. With the
NV_fragment_program2 extension, a program could terminate abnormally if it
called a subroutine inside a very deeply nested set of IF/ENDIF blocks and
the called subroutine also contained deeply nested IF/ENDIF blocks. SUch
an error could occur even if neither subroutine exceeded static limits.)
Section 2.X.8.Z, KIL: Kill Fragment
The KIL instruction conditionally kills a fragment, and is only available
to fragment programs. See the NV_fragment_program4 specification for more
details.
Section 2.X.8.Z, LG2: Logarithm Base 2
The LG2 instruction approximates the base 2 logarithm of the scalar
operand and replicates it to all four components of the result vector.
tmp = ScalarLoad(op0);
result.x = ApproxLog2(tmp);
result.y = ApproxLog2(tmp);
result.z = ApproxLog2(tmp);
result.w = ApproxLog2(tmp);
If the scalar operand is zero or negative, the result is undefined.
LG2 supports only floating-point data type modifiers.
Section 2.X.8.Z, LIT: Compute Lighting Coefficients
The LIT instruction accelerates lighting computations by computing
lighting coefficients for ambient, diffuse, and specular light
contributions. The "x" component of the single operand is assumed to hold
a diffuse dot product (n dot VP_pli, as in the vertex lighting equations
in Section 2.13.1). The "y" component of the operand is assumed to hold a
specular dot product (n dot h_i). The "w" component of the operand is
assumed to hold the specular exponent of the material (s_rm), and is
clamped to the range (-128, +128) exclusive.
The "x" component of the result vector receives the value that should be
multiplied by the ambient light/material product (always 1.0). The "y"
component of the result vector receives the value that should be
multiplied by the diffuse light/material product (n dot VP_pli). The "z"
component of the result vector receives the value that should be
multiplied by the specular light/material product (f_i * (n dot h_i) ^
s_rm). The "w" component of the result is the constant 1.0.
Negative diffuse and specular dot products are clamped to 0.0, as is done
in the standard per-vertex lighting operations. In addition, if the
diffuse dot product is zero or negative, the specular coefficient is
forced to zero.
tmp = VectorLoad(op0);
if (tmp.x < 0) tmp.x = 0;
if (tmp.y < 0) tmp.y = 0;
if (tmp.w < -(128.0-epsilon)) tmp.w = -(128.0-epsilon);
else if (tmp.w > 128-epsilon) tmp.w = 128-epsilon;
result.x = 1.0;
result.y = tmp.x;
result.z = (tmp.x > 0) ? RoughApproxPower(tmp.y, tmp.w) : 0.0;
result.w = 1.0;
Since 0^0 is defined to be 1, RoughApproxPower(0.0, 0.0) will produce 1.0.
LIT supports only floating-point data type modifiers.
Section 2.X.8.Z, LRP: Linear Interpolation
The LRP instruction performs a component-wise linear interpolation between
the second and third operands using the first operand as the blend factor.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
tmp2 = VectorLoad(op2);
result.x = tmp0.x * tmp1.x + (1 - tmp0.x) * tmp2.x;
result.y = tmp0.y * tmp1.y + (1 - tmp0.y) * tmp2.y;
result.z = tmp0.z * tmp1.z + (1 - tmp0.z) * tmp2.z;
result.w = tmp0.w * tmp1.w + (1 - tmp0.w) * tmp2.w;
LRP supports only floating-point data type modifiers.
Section 2.X.8.Z, MAD: Multiply and Add
The MAD instruction performs a component-wise multiply of the first two
operands, and then does a component-wise add of the product to the third
operand to yield a result vector.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
tmp2 = VectorLoad(op2);
result.x = tmp0.x * tmp1.x + tmp2.x;
result.y = tmp0.y * tmp1.y + tmp2.y;
result.z = tmp0.z * tmp1.z + tmp2.z;
result.w = tmp0.w * tmp1.w + tmp2.w;
The multiplication and addition operations in this instruction are subject
to the same rules as described for the MUL and ADD instructions.
MAD supports all three data type modifiers.
Section 2.X.8.Z, MAX: Maximum
The MAX instruction computes component-wise maximums of the values in the
two operands to yield a result vector.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
result.x = (tmp0.x > tmp1.x) ? tmp0.x : tmp1.x;
result.y = (tmp0.y > tmp1.y) ? tmp0.y : tmp1.y;
result.z = (tmp0.z > tmp1.z) ? tmp0.z : tmp1.z;
result.w = (tmp0.w > tmp1.w) ? tmp0.w : tmp1.w;
MAX supports all three data type modifiers.
Section 2.X.8.Z, MIN: Minimum
The MIN instruction computes component-wise minimums of the values in the
two operands to yield a result vector.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
result.x = (tmp0.x > tmp1.x) ? tmp1.x : tmp0.x;
result.y = (tmp0.y > tmp1.y) ? tmp1.y : tmp0.y;
result.z = (tmp0.z > tmp1.z) ? tmp1.z : tmp0.z;
result.w = (tmp0.w > tmp1.w) ? tmp1.w : tmp0.w;
MIN supports all three data type modifiers.
Section 2.X.8.Z, MOD: Modulus
The MOD instruction performs a component-wise modulus operation on the first
vector operand by the second scalar operand to produce a 4-component result
vector.
tmp0 = VectorLoad(op0);
tmp1 = ScalarLoad(op1);
result.x = tmp0.x % tmp1;
result.y = tmp0.y % tmp1;
result.z = tmp0.z % tmp1;
result.w = tmp0.w % tmp1;
MOD supports both signed and unsigned integer data type modifiers. If no
data type modifier is specified, both operands and the result are treated
as signed integers.
A result component is undefined if the corresponding component of the
first operand is negative or if the second operand is less than or equal
to zero.
Section 2.X.8.Z, MOV: Move
The MOV instruction copies the value of the operand to yield a result
vector.
result = VectorLoad(op0);
MOV supports all three data type modifiers.
Section 2.X.8.Z, MUL: Multiply
The MUL instruction performs a component-wise multiply of the two operands
to yield a result vector.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
result.x = tmp0.x * tmp1.x;
result.y = tmp0.y * tmp1.y;
result.z = tmp0.z * tmp1.z;
result.w = tmp0.w * tmp1.w;
MUL supports all three data type modifiers. The MUL instruction
additionally supports three special modifiers.
The "S24" and "U24" modifiers specify "fast" signed or unsigned integer
multiplies of 24-bit quantities, respectively. The results of such
multiplies are undefined if either operand is outside the range
[-2^23,+2^23-1] for S24 or [0,2^24-1] for U24. If "S24" or "U24" is
specified, the data type is implied and normal data type modifiers may not
be provided.
The "HI" modifier specifies a 32-bit integer multiply that returns the 32
most significant bits of the 64-bit product. Integer multiplies without
the "HI" modifier normally return the least significant bits of the
product. If "HI" is specified, either of the "S" or "U" integer data type
modifiers must also be specified.
Note that if condition code updates are performed on integer multiplies,
the overflow or carry flags are always cleared, even if the product
overflowed. If it is necessary to determine if the results of an integer
multiply overflowed, the MUL.HI instruction may be used.
Section 2.X.8.Z, NOT: Bitwise Not
The NOT instruction performs a component-wise bitwise NOT operation on the
source vector to produce a result vector.
tmp = VectorLoad(op0);
tmp.x = ~tmp.x;
tmp.y = ~tmp.y;
tmp.z = ~tmp.z;
tmp.w = ~tmp.w;
NOT supports only integer data type modifiers. If no type modifier is
specified, the operand and the result are treated as signed integers.
Section 2.X.8.Z, NRM: Normalize 3-Component Vector
The NRM instruction normalizes the vector given by the x, y, and z
components of the vector operand to produce the x, y, and z components of
the result vector. The w component of the result is undefined.
tmp = VectorLoad(op0);
scale = ApproxRSQ(tmp.x * tmp.x + tmp.y * tmp.y + tmp.z * tmp.z);
result.x = tmp.x * scale;
result.y = tmp.y * scale;
result.z = tmp.z * scale;
result.w = undefined;
NRM supports only floating-point data type modifiers.
Section 2.X.8.Z, OR: Bitwise Or
The OR instruction performs a bitwise OR operation on the components of
the two source vectors to yield a result vector.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
result.x = tmp0.x | tmp1.x;
result.y = tmp0.y | tmp1.y;
result.z = tmp0.z | tmp1.z;
result.w = tmp0.w | tmp1.w;
OR supports only integer data type modifiers. If no type modifier is
specified, both operands and the result are treated as signed integers.
Section 2.X.8.Z, PK2H: Pack Two 16-bit Floats
The PK2H instruction converts the "x" and "y" components of the single
floating-point vector operand into 16-bit floating-point format, packs the
bit representation of these two floats into a 32-bit unsigned integer, and
replicates that value to all four components of the result vector. The
PK2H instruction can be reversed by the UP2H instruction below.
tmp0 = VectorLoad(op0);
/* result obtained by combining raw bits of tmp0.x, tmp0.y */
result.x = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16);
result.y = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16);
result.z = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16);
result.w = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16);
PK2H supports all three data type modifiers. The single operand is always
treated as a floating-point value, but the result is written as a
floating-point value, a signed integer, or an unsigned integer, as
specified by the data type modifier. For integer results, the bits can be
interpreted as described above. For floating-point result variables, the
packed results do not constitute a meaningful floating-point variable and
should only be used to feed future unpack instructions.
A program will fail to load if it contains a PK2H instruction that writes
its results to a variable declared as "SHORT".
Section 2.X.8.Z, PK2US: Pack Two Floats as Unsigned 16-bit
The PK2US instruction converts the "x" and "y" components of the single
floating-point vector operand into a packed pair of 16-bit unsigned
scalars. The scalars are represented in a bit pattern where all '0' bits
corresponds to 0.0 and all '1' bits corresponds to 1.0. The bit
representations of the two converted components are packed into a 32-bit
unsigned integer, and that value is replicated to all four components of
the result vector. The PK2US instruction can be reversed by the UP2US
instruction below.
tmp0 = VectorLoad(op0);
if (tmp0.x < 0.0) tmp0.x = 0.0;
if (tmp0.x > 1.0) tmp0.x = 1.0;
if (tmp0.y < 0.0) tmp0.y = 0.0;
if (tmp0.y > 1.0) tmp0.y = 1.0;
us.x = round(65535.0 * tmp0.x); /* us is a ushort vector */
us.y = round(65535.0 * tmp0.y);
/* result obtained by combining raw bits of us. */
result.x = ((us.x) | (us.y << 16));
result.y = ((us.x) | (us.y << 16));
result.z = ((us.x) | (us.y << 16));
result.w = ((us.x) | (us.y << 16));
PK2US supports all three data type modifiers. The single operand is
always treated as a floating-point value, but the result is written as a
floating-point value, a signed integer, or an unsigned integer, as
specified by the data type modifier. For integer result variables, the
bits can be interpreted as described above. For floating-point result
variables, the packed results do not constitute a meaningful
floating-point variable and should only be used to feed future unpack
instructions.
A program will fail to load if it contains a PK2US instruction that writes
its results to a variable declared as "SHORT".
Section 2.X.8.Z, PK4B: Pack Four Floats as Signed 8-bit
The PK4B instruction converts the four components of the single
floating-point vector operand into 8-bit signed quantities. The signed
quantities are represented in a bit pattern where all '0' bits corresponds
to -128/127 and all '1' bits corresponds to +127/127. The bit
representations of the four converted components are packed into a 32-bit
unsigned integer, and that value is replicated to all four components of
the result vector. The PK4B instruction can be reversed by the UP4B
instruction below.
tmp0 = VectorLoad(op0);
if (tmp0.x < -128/127) tmp0.x = -128/127;
if (tmp0.y < -128/127) tmp0.y = -128/127;
if (tmp0.z < -128/127) tmp0.z = -128/127;
if (tmp0.w < -128/127) tmp0.w = -128/127;
if (tmp0.x > +127/127) tmp0.x = +127/127;
if (tmp0.y > +127/127) tmp0.y = +127/127;
if (tmp0.z > +127/127) tmp0.z = +127/127;
if (tmp0.w > +127/127) tmp0.w = +127/127;
ub.x = round(127.0 * tmp0.x + 128.0); /* ub is a ubyte vector */
ub.y = round(127.0 * tmp0.y + 128.0);
ub.z = round(127.0 * tmp0.z + 128.0);
ub.w = round(127.0 * tmp0.w + 128.0);
/* result obtained by combining raw bits of ub. */
result.x = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
result.y = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
result.z = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
result.w = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
PK4B supports all three data type modifiers. The single operand is always
treated as a floating-point value, but the result is written as a
floating-point value, a signed integer, or an unsigned integer, as
specified by the data type modifier. For integer result variables, the
bits can be interpreted as described above. For floating-point result
variables, the packed results do not constitute a meaningful
floating-point variable and should only be used to feed future unpack
instructions. A program will fail to load if it contains a PK4B
instruction that writes its results to a variable declared as "SHORT".
Section 2.X.8.Z, PK4UB: Pack Four Floats as Unsigned 8-bit
The PK4UB instruction converts the four components of the single
floating-point vector operand into a packed grouping of 8-bit unsigned
scalars. The scalars are represented in a bit pattern where all '0' bits
corresponds to 0.0 and all '1' bits corresponds to 1.0. The bit
representations of the four converted components are packed into a 32-bit
unsigned integer, and that value is replicated to all four components of
the result vector. The PK4UB instruction can be reversed by the UP4UB
instruction below.
tmp0 = VectorLoad(op0);
if (tmp0.x < 0.0) tmp0.x = 0.0;
if (tmp0.x > 1.0) tmp0.x = 1.0;
if (tmp0.y < 0.0) tmp0.y = 0.0;
if (tmp0.y > 1.0) tmp0.y = 1.0;
if (tmp0.z < 0.0) tmp0.z = 0.0;
if (tmp0.z > 1.0) tmp0.z = 1.0;
if (tmp0.w < 0.0) tmp0.w = 0.0;
if (tmp0.w > 1.0) tmp0.w = 1.0;
ub.x = round(255.0 * tmp0.x); /* ub is a ubyte vector */
ub.y = round(255.0 * tmp0.y);
ub.z = round(255.0 * tmp0.z);
ub.w = round(255.0 * tmp0.w);
/* result obtained by combining raw bits of ub. */
result.x = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
result.y = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
result.z = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
result.w = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
PK4UB supports all three data type modifiers. The single operand is
always treated as a floating-point value, but the result is written as a
floating-point value, a signed integer, or an unsigned integer, as
specified by the data type modifier. For integer result variables, the
bits can be interpreted as described above. For floating-point result
variables, the packed results do not constitute a meaningful
floating-point variable and should only be used to feed future unpack
instructions.
A program will fail to load if it contains a PK4UB instruction that writes
its results to a variable declared as "SHORT".
Section 2.X.8.Z, POW: Exponentiate
The POW instruction approximates the value of the first scalar operand
raised to the power of the second scalar operand and replicates it to all
four components of the result vector.
tmp0 = ScalarLoad(op0);
tmp1 = ScalarLoad(op1);
result.x = ApproxPower(tmp0, tmp1);
result.y = ApproxPower(tmp0, tmp1);
result.z = ApproxPower(tmp0, tmp1);
result.w = ApproxPower(tmp0, tmp1);
The exponentiation approximation function may be implemented using the
base 2 exponentiation and logarithm approximation operations in the EX2
and LG2 instructions. In particular,
ApproxPower(a,b) = ApproxExp2(b * ApproxLog2(a)).
Note that a logarithm may be involved even for cases where the exponent is
an integer. This means that it may not be possible to exponentiate
correctly with a negative base. In constrast, it is possible in a
"normal" mathematical formulation to raise negative numbers to integral
powers (e.g., (-3)^2== 9, and (-0.5)^-2==4).
POW supports only floating-point data type modifiers.
Section 2.X.8.Z, RCC: Reciprocal (Clamped)
The RCC instruction approximates the reciprocal of the scalar operand,
clamps the result to one of two ranges, and replicates the clamped result
to all four components of the result vector.
If the approximated reciprocal is greater than 0.0, the result is clamped
to the range [2^-64, 2^+64]. If the approximate reciprocal is not greater
than zero, the result is clamped to the range [-2^+64, -2^-64].
tmp = ScalarLoad(op0);
result.x = ClampApproxReciprocal(tmp);
result.y = ClampApproxReciprocal(tmp);
result.z = ClampApproxReciprocal(tmp);
result.w = ClampApproxReciprocal(tmp);
RCC supports only floating-point data type modifiers.
Section 2.X.8.Z, RCP: Reciprocal
The RCP instruction approximates the reciprocal of the scalar operand and
replicates it to all four components of the result vector.
tmp = ScalarLoad(op0);
result.x = ApproxReciprocal(tmp);
result.y = ApproxReciprocal(tmp);
result.z = ApproxReciprocal(tmp);
result.w = ApproxReciprocal(tmp);
RCP supports only floating-point data type modifiers.
Section 2.X.8.Z, REP: Start of Repeat Block
The REP instruction begins a REP/ENDREP block. The REP instruction
supports an optional operand whose x component specifies the initial value
for the loop count. The loop count indicates the number of times the
instructions between the REP and corresponding ENDREP instruction will be
executed. If the initial value of the loop count is not positive, the
entire block is skipped and execution continues at the instruction
following the corresponding ENDREP instruction. If the loop count is
specified as a floating-point value, it is converted to the largest
integer less than or equal to the specified value (i.e., taking its
floor).
If no operand is provided to REP, the loop count is ignored and the
corresponding ENDREP instruction unconditionally transfers control to the
instruction immediately following the REP instruction. The only way to
exit such a loop is with the BRK instruction. To prevent obvious infinite
loops, a program that includes a REP/ENDREP block with no loop count will
fail to compile unless it contains either a BRK instruction at the current
nesting level or a RET instruction at any nesting level.
Implementations may have a limited ability to nest REP/ENDREP blocks. If
the number of REP/ENDREP blocks nested inside each other is
MAX_PROGRAM_LOOP_DEPTH_NV or higher, a program will fail to compile.
// Set up loop information for the new nesting level.
tmp = VectorLoad(op0);
LoopCount = floor(tmp.x);
if (LoopCount <= 0) {
continue execution at the corresponding ENDREP;
}
REP supports all three data type modifiers. The single operand is
interpreted according to the data type modifier.
(Note: Unlike the NV_fragment_program2 extension, REP blocks in this
extension support fully general looping; the specified loop count can be
computed in the program itself. Additionally, there is no run-time limit
on the maximum overall depth of REP/ENDREP nesting. As long as each
individual subroutine of the program obeys the static nesting limits,
there will be no run-time errors in the program. With the
NV_fragment_program2 extension, a program could terminate abnormally if it
called a subroutine inside a deeply nested set of REP/ENDREP blocks and
the called subroutine also contained deeply nested REP/ENDREP blocks.
Such an error could occur even if neither subroutine exceeded static
limits.)
Section 2.X.8.Z, RET: Subroutine Return
The RET instruction conditionally returns from a subroutine initiated by a
CAL instruction by popping an instruction reference off the top of the
call stack and transferring control to the referenced instruction. The
following pseudocode describes the operation of the instruction:
if (TestCC(cc.c***) || TestCC(cc.*c**) ||
TestCC(cc.**c*) || TestCC(cc.***c)) {
if (callStackDepth <= 0) {
// terminate program
} else {
callStackDepth--;
instruction = callStack[callStackDepth];
}
// continue execution at <instruction>
} else {
// do nothing
}
In the pseudocode, <callStackDepth> is the depth of the call stack,
<callStack> is an array holding the call stack, and <instruction> is a
reference to an instruction previously pushed onto the call stack.
If the call stack is empty when RET executes, the program terminates
normally.
Section 2.X.8.Z, RFL: Reflection Vector
The RFL instruction computes the reflection of the second vector operand
(the "direction" vector) about the vector specified by the first vector
operand (the "axis" vector). Both operands are treated as 3D vectors (the
w components are ignored). The result vector is another 3D vector (the
"reflected direction" vector). The length of the result vector, ignoring
rounding errors, should equal that of the second operand.
axis = VectorLoad(op0);
direction = VectorLoad(op1);
tmp.w = (axis.x * axis.x + axis.y * axis.y + axis.z * axis.z);
tmp.x = (axis.x * direction.x + axis.y * direction.y +
axis.z * direction.z);
tmp.x = 2.0 * tmp.x;
tmp.x = tmp.x / tmp.w;
result.x = tmp.x * axis.x - direction.x;
result.y = tmp.x * axis.y - direction.y;
result.z = tmp.x * axis.z - direction.z;
result.w = undefined;
RFL supports only floating-point data type modifiers.
Section 2.X.8.Z, ROUND: Round to Nearest Integer
The ROUND instruction loads a single vector operand and performs a
component-wise round operation to generate a result vector.
tmp = VectorLoad(op0);
result.x = round(tmp.x);
result.y = round(tmp.y);
result.z = round(tmp.z);
result.w = round(tmp.w);
The round operation returns the nearest integer to the operand. If the
fractional portion of the operand is 0.5, round() selects the nearest even
integer. For example round(-1.7) = -2.0, round(+1.0) = +1.0, and
round(+3.7) = +4.0.
ROUND supports all three data type modifiers. The single operand is
always treated as a floating-point value, but the result is written as a
floating-point value, a signed integer, or an unsigned integer, as
specified by the data type modifier. If a value is not exactly
representable using the data type of the result (e.g., an overflow or
writing a negative value to an unsigned integer), the result is undefined.
Section 2.X.8.Z, RSQ: Reciprocal Square Root
The RSQ instruction approximates the reciprocal of the square root of the
scalar operand and replicates it to all four components of the result
vector.
tmp = ScalarLoad(op0);
result.x = ApproxRSQRT(tmp);
result.y = ApproxRSQRT(tmp);
result.z = ApproxRSQRT(tmp);
result.w = ApproxRSQRT(tmp);
If the operand is less than or equal to zero, the results of the
instruction are undefined.
RSQ supports only floating-point data type modifiers.
Note that this instruction differs from the RSQ instruction in
ARB_vertex_program in that it does not implicitly take the absolute value
of its operand. The |abs| operator can be used to achieve equivalent
semantics.
Section 2.X.8.Z, SAD: Sum of Absolute Differences
The SAD instruction performs a component-wise difference of the first two
integer operands (subtracting the second from the first), and then does a
component-wise add of the absolute value of the difference to the third
unsigned integer operand to yield an unsigned integer result vector.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
tmp2 = VectorLoad(op2);
result.x = abs(tmp0.x - tmp1.x) + tmp2.x;
result.y = abs(tmp0.y - tmp1.y) + tmp2.y;
result.z = abs(tmp0.z - tmp1.z) + tmp2.z;
result.w = abs(tmp0.w - tmp1.w) + tmp2.w;
SAD supports signed and unsigned integer data type modifiers. The first
two operands are interpreted according to the data type modifier. The
third operand and the result are always unsigned integers.
Section 2.X.8.Z, SCS: Sine/Cosine without Reduction
The SCS instruction approximates the trigonometric sine and cosine of the
angle specified by the scalar operand and places the cosine in the x
component and the sine in the y component of the result vector. The z and
w components of the result vector are undefined. The angle is specified
in radians and must be in the range [-PI,PI].
tmp = ScalarLoad(op0);
result.x = ApproxCosine(tmp);
result.y = ApproxSine(tmp);
result.z = undefined;
result.w = undefined;
If the scalar operand is not in the range [-PI,PI], the result vector is
undefined.
SCS supports only floating-point data type modifiers.
Section 2.X.8.Z, SEQ: Set on Equal
The SEQ instruction performs a component-wise comparison of the two
operands. Each component of the result vector returns a TRUE value
(described below) if the corresponding component of the first operand is
equal to that of the second, and a FALSE value otherwise.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
result.x = (tmp0.x == tmp1.x) ? TRUE : FALSE;
result.y = (tmp0.y == tmp1.y) ? TRUE : FALSE;
result.z = (tmp0.z == tmp1.z) ? TRUE : FALSE;
result.w = (tmp0.w == tmp1.w) ? TRUE : FALSE;
SEQ supports all data type modifiers. For floating-point data types, the
TRUE value is 1.0 and the FALSE value is 0.0. For signed integer data
types, the TRUE value is -1 and the FALSE value is 0. For unsigned
integer data types, the TRUE value is the maximum integer value (all bits
are ones) and the FALSE value is zero.
Section 2.X.8.Z, SFL: Set on False
The SFL instruction is a degenerate case of the other "Set on"
instructions that sets all components of the result vector to a FALSE
value (described below).
result.x = FALSE;
result.y = FALSE;
result.z = FALSE;
result.w = FALSE;
SFL supports all data type modifiers. For floating-point data types, the
FALSE value is 0.0. For signed and unsigned integer data types, the FALSE
value is zero.
Section 2.X.8.Z, SGE: Set on Greater Than or Equal
The SGE instruction performs a component-wise comparison of the two
operands. Each component of the result vector returns a TRUE value
(described below) if the corresponding component of the first operand is
greater than or equal to that of the second, and a FALSE value otherwise.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
result.x = (tmp0.x >= tmp1.x) ? TRUE : FALSE;
result.y = (tmp0.y >= tmp1.y) ? TRUE : FALSE;
result.z = (tmp0.z >= tmp1.z) ? TRUE : FALSE;
result.w = (tmp0.w >= tmp1.w) ? TRUE : FALSE;
SGE supports all data type modifiers. For floating-point data types, the
TRUE value is 1.0 and the FALSE value is 0.0. For signed integer data
types, the TRUE value is -1 and the FALSE value is 0. For unsigned
integer data types, the TRUE value is the maximum integer value (all bits
are ones) and the FALSE value is zero.
Section 2.X.8.Z, SGT: Set on Greater Than
The SGT instruction performs a component-wise comparison of the two
operands. Each component of the result vector returns a TRUE value
(described below) if the corresponding component of the first operand is
greater than that of the second, and a FALSE value otherwise.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
result.x = (tmp0.x > tmp1.x) ? TRUE : FALSE;
result.y = (tmp0.y > tmp1.y) ? TRUE : FALSE;
result.z = (tmp0.z > tmp1.z) ? TRUE : FALSE;
result.w = (tmp0.w > tmp1.w) ? TRUE : FALSE;
SGT supports all data type modifiers. For floating-point data types, the
TRUE value is 1.0 and the FALSE value is 0.0. For signed integer data
types, the TRUE value is -1 and the FALSE value is 0. For unsigned
integer data types, the TRUE value is the maximum integer value (all bits
are ones) and the FALSE value is zero.
Section 2.X.8.Z, SHL: Shift Left
The SHL instruction performs a component-wise left shift of the bits of
the first operand by the value of the second scalar operand to produce a
result vector. The bits vacated during the shift operation are filled
with zeroes.
tmp0 = VectorLoad(op0);
tmp1 = ScalarLoad(op1);
result.x = tmp0.x << tmp1;
result.y = tmp0.y << tmp1;
result.z = tmp0.z << tmp1;
result.w = tmp0.w << tmp1;
The results of a shift operation ("<<") are undefined if the value of the
second operand is negative, or greater than or equal to the number of bits
in the first operand.
SHL supports both signed and unsigned integer data type modifiers. If no
modifier is provided, the operands and the result are treated as signed
integers.
Section 2.X.8.Z, SHR: Shift Right
The SHR instruction performs a component-wise right shift of the bits of
the first operand by the value of the second scalar operand to produce a
result vector. The bits vacated during shift operation are filled with
zeros if the operand is non-negative and ones otherwise.
tmp0 = VectorLoad(op0);
tmp1 = ScalarLoad(op1);
result.x = tmp0.x >> tmp1;
result.y = tmp0.y >> tmp1;
result.z = tmp0.z >> tmp1;
result.w = tmp0.w >> tmp1;
The results of a shift operation (">>") are undefined if the value of the
second operand is negative, or greater than or equal to the number of bits
in the first operand.
SHR supports both signed and unsigned integer data type modifiers. If no
modifiers are provided, the operands and the result are treated as signed
integers.
Section 2.X.8.Z, SIN: Sine with Reduction to [-PI,PI]
The SIN instruction approximates the trigonometric sine of the angle
specified by the scalar operand and replicates it to all four components
of the result vector. The angle is specified in radians and does not have
to be in the range [-PI,PI].
tmp = ScalarLoad(op0);
result.x = ApproxSine(tmp);
result.y = ApproxSine(tmp);
result.z = ApproxSine(tmp);
result.w = ApproxSine(tmp);
SIN supports only floating-point data type modifiers.
Section 2.X.8.Z, SLE: Set on Less Than or Equal
The SLE instruction performs a component-wise comparison of the two
operands. Each component of the result vector returns a TRUE value
(described below) if the corresponding component of the first operand is
less than or equal to that of the second, and a FALSE value otherwise.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
result.x = (tmp0.x <= tmp1.x) ? TRUE : FALSE;
result.y = (tmp0.y <= tmp1.y) ? TRUE : FALSE;
result.z = (tmp0.z <= tmp1.z) ? TRUE : FALSE;
result.w = (tmp0.w <= tmp1.w) ? TRUE : FALSE;
SLE supports all data type modifiers. For floating-point data types, the
TRUE value is 1.0 and the FALSE value is 0.0. For signed integer data
types, the TRUE value is -1 and the FALSE value is 0. For unsigned
integer data types, the TRUE value is the maximum integer value (all bits
are ones) and the FALSE value is zero.
Section 2.X.8.Z, SLT: Set on Less Than
The SLT instruction performs a component-wise comparison of the two
operands. Each component of the result vector returns a TRUE value
(described below) if the corresponding component of the first operand is
less than that of the second, and a FALSE value otherwise.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
result.x = (tmp0.x < tmp1.x) ? TRUE : FALSE;
result.y = (tmp0.y < tmp1.y) ? TRUE : FALSE;
result.z = (tmp0.z < tmp1.z) ? TRUE : FALSE;
result.w = (tmp0.w < tmp1.w) ? TRUE : FALSE;
SLT supports all data type modifiers. For floating-point data types, the
TRUE value is 1.0 and the FALSE value is 0.0. For signed integer data
types, the TRUE value is -1 and the FALSE value is 0. For unsigned
integer data types, the TRUE value is the maximum integer value (all bits
are ones) and the FALSE value is zero.
Section 2.X.8.Z, SNE: Set on Not Equal
The SNE instruction performs a component-wise comparison of the two
operands. Each component of the result vector returns a TRUE value
(described below) if the corresponding component of the first operand is
less than that of the second, and a FALSE value otherwise.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
result.x = (tmp0.x != tmp1.x) ? TRUE : FALSE;
result.y = (tmp0.y != tmp1.y) ? TRUE : FALSE;
result.z = (tmp0.z != tmp1.z) ? TRUE : FALSE;
result.w = (tmp0.w != tmp1.w) ? TRUE : FALSE;
SNE supports all data type modifiers. For floating-point data types, the
TRUE value is 1.0 and the FALSE value is 0.0. For signed integer data
types, the TRUE value is -1 and the FALSE value is 0. For unsigned
integer data types, the TRUE value is the maximum integer value (all bits
are ones) and the FALSE value is zero.
Section 2.X.8.Z, SSG: Set Sign
The SSG instruction generates a result vector containing the signs of
each component of the single vector operand. Each component of the
result vector is 1.0 if the corresponding component of the operand
is greater than zero, 0.0 if the corresponding component of the
operand is equal to zero, and -1.0 if the corresponding component
of the operand is less than zero.
tmp = VectorLoad(op0);
result.x = SetSign(tmp.x);
result.y = SetSign(tmp.y);
result.z = SetSign(tmp.z);
result.w = SetSign(tmp.w);
SSG supports only floating-point data type modifiers.
Section 2.X.8.Z, STR: Set on True
The STR instruction is a degenerate case of the other "Set on"
instructions that sets all components of the result vector to a TRUE value
(described below).
result.x = TRUE;
result.y = TRUE;
result.z = TRUE;
result.w = TRUE;
STR supports all data type modifiers. For floating-point data types, the
TRUE value is 1.0. For signed integer data types, the TRUE value is -1.
For unsigned integer data types, the TRUE value is the maximum integer
value (all bits are ones).
Section 2.X.8.Z, SUB: Subtract
The SUB instruction performs a component-wise subtraction of the second
operand from the first to yield a result vector.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
result.x = tmp0.x - tmp1.x;
result.y = tmp0.y - tmp1.y;
result.z = tmp0.z - tmp1.z;
result.w = tmp0.w - tmp1.w;
SUB supports all three data type modifiers.
Section 2.X.8.Z, SWZ: Extended Swizzle
The SWZ instruction loads the single vector operand, and performs a
swizzle operation more powerful than that provided for loading normal
vector operands to yield an instruction vector.
After the operand is loaded, the "x", "y", "z", and "w" components of the
result vector are selected by the first, second, third, and fourth matches
of the <extSwizComp> pattern in the <extendedSwizzle> rule.
A result component can be selected from any of the four components of the
operand or the constants 0.0 and 1.0. The result component can also be
optionally negated. The following pseudocode describes the component
selection method. "operand" refers to the vector operand, "select" is an
enumerant where the values ZERO, ONE, X, Y, Z, and W correspond to the
<extSwizSel> rule matching "0", "1", "x", "y", "z", and "w", respectively.
"negate" is TRUE if and only if the <optionalSign> rule in <extSwizComp>
matches "-".
float ExtSwizComponent(floatVec operand, enum select, boolean negate)
{
float result;
switch (select) {
case ZERO: result = 0.0; break;
case ONE: result = 1.0; break;
case X: result = operand.x; break;
case Y: result = operand.y; break;
case Z: result = operand.z; break;
case W: result = operand.w; break;
}
if (negate) {
result = -result;
}
return result;
}
The entire extended swizzle operation is then defined using the following
pseudocode:
tmp = VectorLoad(op0);
result.x = ExtSwizComponent(tmp, xSelect, xNegate);
result.y = ExtSwizComponent(tmp, ySelect, yNegate);
result.z = ExtSwizComponent(tmp, zSelect, zNegate);
result.w = ExtSwizComponent(tmp, wSelect, wNegate);
"xSelect", "xNegate", "ySelect", "yNegate", "zSelect", "zNegate",
"wSelect", and "wNegate" correspond to the "select" and "negate" values
above for the four <extSwizComp> matches.
Since this instruction allows for component selection and negation for
each individual component, the grammar does not allow the use of the
normal swizzle and negation operations allowed for vector operands in
other instructions.
SWZ supports only floating-point data type modifiers.
Section 2.X.8.Z, TEX: Texture Sample
The TEX instruction takes the four components of a single floating-point
source vector and performs a filtered texture access as described in
Section 2.X.4.4. The returned (R,G,B,A) value is written to the
floating-point result vector. Partial derivatives and the level of detail
are computed automatically.
tmp = VectorLoad(op0);
ddx = ComputePartialsX(tmp);
ddy = ComputePartialsY(tmp);
lambda = ComputeLOD(ddx, ddy);
result = TextureSample(tmp, lambda, ddx, ddy, texelOffset);
TEX supports all three data type modifiers. The single operand is always
treated as a floating-point vector; the results are interpreted according
to the data type modifier.
Section 2.X.8.Z, TRUNC: Truncate (Round Toward Zero)
The TRUNC instruction loads a single vector operand and performs a
component-wise truncate operation to generate a result vector.
tmp = VectorLoad(op0);
result.x = trunc(tmp.x);
result.y = trunc(tmp.y);
result.z = trunc(tmp.z);
result.w = trunc(tmp.w);
The truncate operation returns the nearest integer to zero smaller in
magnitude than the operand. For example trunc(-1.7) = -1.0, trunc(+1.0) =
+1.0, and trunc(+3.7) = +3.0.
TRUNC supports all three data type modifiers. The single operand is
always treated as a floating-point value, but the result is written as a
floating-point value, a signed integer, or an unsigned integer, as
specified by the data type modifier. If a value is not exactly
representable using the data type of the result (e.g., an overflow or
writing a negative value to an unsigned integer), the result is undefined.
Section 2.X.8.Z, TXB: Texture Sample with Bias
The TXB instruction takes the four components of a single floating-point
source vector and performs a filtered texture access as described in
Section 2.X.4.4. The returned (R,G,B,A) value is written to the
floating-point result vector. Partial derivatives and the level of detail
are computed automatically, but the fourth component of the source vector
is added to the computed LOD prior to sampling.
tmp = VectorLoad(op0);
ddx = ComputePartialsX(tmp);
ddy = ComputePartialsY(tmp);
lambda = ComputeLOD(ddx, ddy);
result = TextureSample(tmp, lambda + tmp.w, ddx, ddy, texelOffset);
The single source vector in the TXB instruction does not have enough
coordinates to specify a lookup into a two-dimensional array texture or
cube map texture with both an LOD bias and an explicit reference value for
depth comparison. A program will fail to load if it contains a TXB
instruction with a target of SHADOWCUBE or SHADOWARRAY2D.
TXB supports all three data type modifiers. The single operand is always
treated as a floating-point vector; the results are interpreted according
to the data type modifier.
Section 2.X.8.Z, TXD: Texture Sample with Partials
The TXD instruction takes the four components of the first floating-point
source vector and performs a filtered texture access as described in
Section 2.X.4.4. The returned (R,G,B,A) value is written to the
floating-point result vector. The partial derivatives of the texture
coordinates with respect to X and Y are specified by the second and third
floating-point source vectors. The level of detail is computed
automatically using the provided partial derivatives.
Note that for cube map texture targets, the provided partial derivatives
are in the coordinate system used before texture coordinates are projected
onto the appropriate cube face. The partial derivatives of the
post-projection texture coordinates, which are used for level-of-detail
and anisotropic filtering calculations, are derived from the original
coordinates and partial derivatives in an implementation-dependent manner.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
tmp2 = VectorLoad(op2);
lambda = ComputeLOD(tmp1, tmp2);
result = TextureSample(tmp0, lambda, tmp1, tmp2, texelOffset);
TXD supports all three data type modifiers. All three operands are always
treated as floating-point vectors; the results are interpreted according
to the data type modifier.
Section 2.X.8.Z, TXF: Texel Fetch
The TXF instruction takes the four components of a single signed integer
source vector and performs a single texel fetch as described in Section
2.X.4.4. The first three components provide the <i>, <j>, and <k> values
for the texel fetch, and the fourth component is used to determine the LOD
to access. The returned (R,G,B,A) value is written to the floating-point
result vector. Partial derivatives are irrelevant for single texel
fetches.
tmp = VectorLoad(op0);
result = TexelFetch(tmp, texelOffset);
TXF supports all three data type modifiers. The single vector operand is
treated as a signed integer vector; the results are interpreted according
to the data type modifier.
Section 2.X.8.Z, TXL: Texture Sample with LOD
The TXL instruction takes the four components of a single floating-point
source vector and performs a filtered texture access as described in
Section 2.X.4.4. The returned (R,G,B,A) value is written to the
floating-point result vector. The level of detail is taken from the
fourth component of the source vector.
Partial derivatives are not computed by the TXL instruction and
anisotropic filtering is not performed.
tmp = VectorLoad(op0);
ddx = (0,0,0);
ddy = (0,0,0);
result = TextureSample(tmp, tmp.w, ddx, ddy, texelOffset);
The single source vector in the TXL instruction does not have enough
coordinates to specify a lookup into a 2D array or cube map texture with
both an explicit LOD and a reference value for depth comparison. A
program will fail to load if it contains a TXL instruction with a target
of SHADOWCUBE or SHADOWARRAY2D.
TXL supports all three data type modifiers. The single vector operand is
treated as a floating-point vector; the results are interpreted according
to the data type modifier.
Section 2.X.8.Z, TXP: Texture Sample with Projection
The TXP instruction divides the first three components of its single
floating-point source vector by its fourth component, maps the results to
s, t, and r, and performs a filtered texture access as described in
Section 2.X.4.4. The returned (R,G,B,A) value is written to the
floating-point result vector. Partial derivatives and the level of detail
are computed automatically.
tmp0 = VectorLoad(op0);
tmp0.x = tmp0.x / tmp0.w;
tmp0.y = tmp0.y / tmp0.w;
tmp0.z = tmp0.z / tmp0.w;
ddx = ComputePartialsX(tmp);
ddy = ComputePartialsY(tmp);
lambda = ComputeLOD(ddx, ddy);
result = TextureSample(tmp, lambda, ddx, ddy, texelOffset);
The single source vector in the TXP instruction does not have enough
coordinates to specify a lookup into a 2D array or cube map texture with
both a Q coordinate and an explicit reference value for depth comparison.
A program will fail to load if it contains a TXP instruction with a target
of SHADOWCUBE or SHADOWARRAY2D.
TXP supports all three data type modifiers. The single vector operand is
treated as a floating-point vector; the results are interpreted according
to the data type modifier.
Section 2.X.8.Z, TXQ: Texture Size Query
The TXQ instruction takes the first component of the single integer vector
operand, adds the number of the base level of the specified texture to
determine a texture image level, and returns an integer result vector
containing the size of the image at that level of the texture.
For one-dimensional and one-dimensional array textures, the "x" component
of the result vector is filled with the width of the image(s). For
two-dimensional, rectangle, cube map, and two-dimensional array textures,
the "x" and "y" components are filled with the width and height of the
image(s). For three-dimensional textures, the "x", "y", and "z"
components are filled with the width, height, and depth of the image.
Additionally, the number of layers in an array texture is returned in the
"y" component of the result for one-dimensional array textures or the "z"
component for two-dimensional array textures. All other components of the
result vector is undefined. For the purposes of this instruction, the
width, height, and depth of a texture do NOT include any border.
tmp0 = VectorLoad(op0);
tmp0.x = tmp0.x + texture[op1].target[op2].base_level;
result.x = texture[op1].target[op2].level[tmp0.x].width;
result.y = texture[op1].target[op2].level[tmp0.x].height;
result.z = texture[op1].target[op2].level[tmp0.x].depth;
If the level computed by adding the operand to the base level of the
texture is less than the base level number or greater than the maximum
level number, the results are undefined.
TXQ supports no data type modifiers; the scalar operand and the result
vector are both interpreted as signed integers.
Section 2.X.8.Z, UP2H: Unpack Two 16-bit Floats
The UP2H instruction unpacks two 16-bit floats stored together in a 32-bit
scalar operand. The first 16-bit float (stored in the 16 least
significant bits) is written into the "x" and "z" components of the result
vector; the second is written into the "y" and "w" components of the
result vector.
This operation undoes the type conversion and packing performed by
the PK2H instruction.
tmp = ScalarLoad(op0);
result.x = (fp16) (RawBits(tmp) & 0xFFFF);
result.y = (fp16) ((RawBits(tmp) >> 16) & 0xFFFF);
result.z = (fp16) (RawBits(tmp) & 0xFFFF);
result.w = (fp16) ((RawBits(tmp) >> 16) & 0xFFFF);
UP2H supports all three data type modifiers. The single operand is read
as a floating-point value, a signed integer, or an unsigned integer, as
specified by the data type modifier; the 32 least significant bits of the
encoding are used for unpacking. For floating-point operand variables, it
is expected (but not required) that the operand was produced by a previous
pack instruction. The result is always written as a floating-point
vector.
A program will fail to load if it contains a UP2H instruction whose
operand is a variable declared as "SHORT".
Section 2.X.8.Z, UP2US: Unpack Two Unsigned 16-bit Integers
The UP2US instruction unpacks two 16-bit unsigned values packed
together in a 32-bit scalar operand. The unsigned quantities are
encoded where a bit pattern of all '0' bits corresponds to 0.0 and
a pattern of all '1' bits corresponds to 1.0. The "x" and "z"
components of the result vector are obtained from the 16 least
significant bits of the operand; the "y" and "w" components are
obtained from the 16 most significant bits.
This operation undoes the type conversion and packing performed by
the PK2US instruction.
tmp = ScalarLoad(op0);
result.x = ((RawBits(tmp) >> 0) & 0xFFFF) / 65535.0;
result.y = ((RawBits(tmp) >> 16) & 0xFFFF) / 65535.0;
result.z = ((RawBits(tmp) >> 0) & 0xFFFF) / 65535.0;
result.w = ((RawBits(tmp) >> 16) & 0xFFFF) / 65535.0;
UP2US supports all three data type modifiers. The single operand is read
as a floating-point value, a signed integer, or an unsigned integer, as
specified by the data type modifier; the 32 least significant bits of the
encoding are used for unpacking. For floating-point operand variables, it
is expected (but not required) that the operand was produced by a previous
pack instruction. The result is always written as a floating-point
vector.
A GPU program will fail to load if it contains a UP2S instruction
whose operand is a variable declared as "SHORT".
Section 2.X.8.Z, UP4B: Unpack Four Signed 8-bit Integers
The UP4B instruction unpacks four 8-bit signed values packed together
in a 32-bit scalar operand. The signed quantities are encoded where
a bit pattern of all '0' bits corresponds to -128/127 and a pattern
of all '1' bits corresponds to +127/127. The "x" component of the
result vector is the converted value corresponding to the 8 least
significant bits of the operand; the "w" component corresponds to
the 8 most significant bits.
This operation undoes the type conversion and packing performed by
the PK4B instruction.
tmp = ScalarLoad(op0);
result.x = (((RawBits(tmp) >> 0) & 0xFF) - 128) / 127.0;
result.y = (((RawBits(tmp) >> 8) & 0xFF) - 128) / 127.0;
result.z = (((RawBits(tmp) >> 16) & 0xFF) - 128) / 127.0;
result.w = (((RawBits(tmp) >> 24) & 0xFF) - 128) / 127.0;
UP2B supports all three data type modifiers. The single operand is read
as a floating-point value, a signed integer, or an unsigned integer, as
specified by the data type modifier; the 32 least significant bits of the
encoding are used for unpacking. For floating-point operand variables, it
is expected (but not required) that the operand was produced by a previous
pack instruction. The result is always written as a floating-point
vector.
A program will fail to load if it contains a UP4B instruction whose
operand is a variable declared as "SHORT".
Section 2.X.8.Z, UP4UB: Unpack Four Unsigned 8-bit Integers
The UP4UB instruction unpacks four 8-bit unsigned values packed
together in a 32-bit scalar operand. The unsigned quantities are
encoded where a bit pattern of all '0' bits corresponds to 0.0 and a
pattern of all '1' bits corresponds to 1.0. The "x" component of the
result vector is obtained from the 8 least significant bits of the
operand; the "w" component is obtained from the 8 most significant
bits.
This operation undoes the type conversion and packing performed by
the PK4UB instruction.
tmp = ScalarLoad(op0);
result.x = ((RawBits(tmp) >> 0) & 0xFF) / 255.0;
result.y = ((RawBits(tmp) >> 8) & 0xFF) / 255.0;
result.z = ((RawBits(tmp) >> 16) & 0xFF) / 255.0;
result.w = ((RawBits(tmp) >> 24) & 0xFF) / 255.0;
UP4UB supports all three data type modifiers. The single operand is read
as a floating-point value, a signed integer, or an unsigned integer, as
specified by the data type modifier; the 32 least significant bits of the
encoding are used for unpacking. For floating-point operand variables, it
is expected (but not required) that the operand was produced by a previous
pack instruction. The result is always written as a floating-point
vector.
A program will fail to load if it contains a UP4UB instruction whose
operand is a variable declared as "SHORT".
Section 2.X.8.Z, X2D: 2D Coordinate Transformation
The X2D instruction multiplies the 2D offset vector specified by the
"x" and "y" components of the second vector operand by the 2x2 matrix
specified by the four components of the third vector operand, and adds
the transformed offset vector to the 2D vector specified by the "x"
and "y" components of the first vector operand. The first component
of the sum is written to the "x" and "z" components of the result;
the second component is written to the "y" and "w" components of
the result.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
tmp2 = VectorLoad(op2);
result.x = tmp0.x + tmp1.x * tmp2.x + tmp1.y * tmp2.y;
result.y = tmp0.y + tmp1.x * tmp2.z + tmp1.y * tmp2.w;
result.z = tmp0.x + tmp1.x * tmp2.x + tmp1.y * tmp2.y;
result.w = tmp0.y + tmp1.x * tmp2.z + tmp1.y * tmp2.w;
X2D supports only floating-point data type modifiers.
Section 2.X.8.Z, XOR: Exclusive Or
The XOR instruction performs a bitwise XOR operation on the components of
the two source vectors to yield a result vector.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
result.x = tmp0.x ^ tmp1.x;
result.y = tmp0.y ^ tmp1.y;
result.z = tmp0.z ^ tmp1.z;
result.w = tmp0.w ^ tmp1.w;
XOR supports only integer data type modifiers. If no type modifier is
specified, both operands and the result are treated as signed integers.
Section 2.X.8.Z, XPD: Cross Product
The XPD instruction computes the cross product using the first three
components of its two vector operands to generate the x, y, and z
components of the result vector. The w component of the result vector is
undefined.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
result.x = tmp0.y * tmp1.z - tmp0.z * tmp1.y;
result.y = tmp0.z * tmp1.x - tmp0.x * tmp1.z;
result.z = tmp0.x * tmp1.y - tmp0.y * tmp1.x;
result.w = undefined;
XPD supports only floating-point data type modifiers.
Additions to Chapter 3 of the OpenGL 1.5 Specification (Rasterization)
Modify Section 3.8.1, Texture Image Specification, p. 150
(modify 4th paragraph, p. 151 -- add cubemaps to the list of texture
targets that can be used with DEPTH_COMPONENT textures) Textures with a
base internal format of DEPTH_COMPONENT are supported by texture image
specification commands only if <target> is TEXTURE_1D, TEXTURE_2D,
TEXTURE_CUBE_MAP, TEXTURE_RECTANGLE_ARB, TEXTURE_1D_ARRAY_EXT,
TEXTURE_2D_ARRAY_EXT, PROXY_TEXTURE_1D PROXY_TEXTURE_2D,
PROXY_TEXTURE_CUBE_MAP, PROXY_TEXTURE_RECTANGLE_ARB,
PROXY_TEXTURE_1D_ARRAY_EXT, or PROXY_TEXTURE_2D_ARRAY_EXT. Using this
format in conjunction with any other target will result in an
INVALID_OPERATION error.
Delete Section 3.8.7, Texture Wrap Modes. (The language in this section
is folded into updates to the following section, and is no longer needed
here.)
Modify Section 3.8.8, Texture Minification:
(replace the last paragraph, p. 171): Let s(x,y) be the function that
associates an s texture coordinate with each set of window coordinates
(x,y) that lie within a primitive; define t(x,y) and r(x,y) analogously.
Let
u(x,y) = w_t * s(x,y) + offsetu_shader,
v(x,y) = h_t * t(x,y) + offsetv_shader,
w(x,y) = d_t * r(x,y) + offsetw_shader, and
where w_t, h_t, and d_t are as defined by equations 3.15, 3.16, and 3.17
with w_s, h_s, and d_s equal to the width, height, and depth of the image
array whose level is level_base. (offsetu_shader, offsetv_shader,
offsetw_shader) is the texel offset specified in the vertex, geometry, or
fragment program instruction used to perform the access. For
fixed-function texture accesses, all three shader offsets are taken to be
zero. For a one-dimensional texture, define v(x,y) == 0 and w(x,y) === 0;
for two-dimensional textures, define w(x,y) == 0.
After u(x,y), v(x,y), and w(x,y) are generated, they are clamped if the
corresponding texture wrap modes are CLAMP or MIRROR_CLAMP_EXT. Let
u'(x,y) = clamp(u(x,y), 0, w_t), if TEXTURE_WRAP_S is CLAMP
clamp(u(x,y), -w_t, w_t), if TEXTURE_WRAP_S is
MIRROR_CLAMP_EXT, or
u(x,y), otherwise
v'(x,y) = clamp(v(x,y), 0, w_t), if TEXTURE_WRAP_T is CLAMP
clamp(v(x,y), -w_t, w_t), if TEXTURE_WRAP_T is
MIRROR_CLAMP_EXT, or
v(x,y), otherwise
w'(x,y) = clamp(w(x,y), 0, w_t), if TEXTURE_WRAP_R is CLAMP
clamp(w(x,y), -w_t, w_t), if TEXTURE_WRAP_R is
MIRROR_CLAMP_EXT, or
w(x,y), otherwise,
where clamp(<a>,<b>,<c>) returns <b> if <a> is less than <b>, <c> if a is
greater than <c>, and <a> otherwise.
(start a new paragraph with "For a polygon, rho is given at a fragment
with window coordinates...", and then continue with the original spec
text.)
(replace text starting with the last paragraph on p. 172, continuing to
the end of p. 174)
When lambda indicates minification, the value assigned to
TEXTURE_MIN_FILTER is used to determine how the texture value for a
fragment is selected.
When TEXTURE_MIN_FILTER is NEAREST, the texel in the image array of level
level_base that is nearest (in Manhattan distance) to that specified by
(s,t,r) is obtained. Let i, j, and k be integers such that:
i = apply_wrap(floor(u'(x,y))),
j = apply_wrap(floor(v'(x,y))), and
k = apply_wrap(floor(w'(x,y))),
where the coordinate returned by apply_wrap() is as defined by Table X.19.
The values of i, j, and k are then modified according to the texture wrap
modes, as described in Table 3.19, to produce new values (i', j', and k').
For a three-dimensional texture, the texel at location (i,j,k) becomes the
texture value. For a two-dimensional texture, k is irrelevant, and the
texel at location (i,j) becomes the texture value. For a one-dimensional
texture, j and k are irrelevant, and the texel at location i becomes the
texture value.
Wrap mode Result
-------------------------- ------------------------------------------
CLAMP_TO_EDGE clamp(coord, 0, size-1)
CLAMP_TO_BORDER clamp(coord, -1, size)
CLAMP { clamp(coord, 0, size-1),
{ for NEAREST filtering
{ clamp(coord, -1, size),
{ for LINEAR filtering
REPEAT mod(coord, size)
MIRROR_CLAMP_TO_EDGE_EXT clamp(mirror(coord), 0, size-1)
MIRROR_CLAMP_TO_BORDER_EXT clamp(mirror(size), 0, size)
MIRROR_CLAMP_EXT { clamp(mirror(coord), 0, size-1),
{ for NEAREST filtering
{ clamp(mirror(size), 0, size),
{ for LINEAR filtering
MIRRORED_REPEAT (size-1) - mirror(mod(coord, 2*size)-size)
Table X.19: Texel location wrap mode application. mod(<a>,<b>) is
defined to return <a>-<b>*floor(<a>/<b>), and mirror(<a>) is defined to
return <a> if <a> is greater than or equal to zero or -(1+<a>)
otherwise. The values of "wrap mode" and size are TEXTURE_WRAP_S and
w_t, TEXTURE_WRAP_T and h_t, and TEXTURE_WRAP_R and d_t, for i, j, and k
coordinates, respectively. The coordinate clamp and MIRROR_CLAMP_EXT
depends on the filtering mode (NEAREST or LINEAR).
If the selected (i,j,k), (i,j), or i location refers to a border texel
that satisfies any of the following conditions:
i < -b_s,
j < -b_s,
k < -b_s,
i >= w_t + b_s,
j >= h_t + b_s, or
j >= d_t + b_s,
then the border values defined by TEXTURE_BORDER_COLOR are used in place
of the non-existent texel. If the texture contains color components, the
values of TEXTURE_BORDER_COLOR are interpreted as an RGBA color to match
the texture's internal format in a manner consistent with table 3.15. If
the texture contains depth components, the first component of
TEXTURE_BORDER_COLOR is interpreted as a depth value.
When TEXTURE_MIN_FILTER is LINEAR, a 2x2x2 cube of texels in the image
array of level level_base is selected. Let:
i_0 = apply_wrap(floor(u' - 0.5)),
j_0 = apply_wrap(floor(v' - 0.5)),
k_0 = apply_wrap(floor(w' - 0.5)),
i_1 = apply_wrap(floor(u' - 0.5) + 1),
j_1 = apply_wrap(floor(v' - 0.5) + 1),
k_1 = apply_wrap(floor(w' - 0.5) + 1),
alpha = frac(u' - 0.5),
beta = frac(v' - 0.5),
gamma = frac(w' - 0.5),
where frac(<x>) denotes the fractional part of <x>.
For a three-dimensional texture, the texture value tau is found as...
(replace last paragraph, p.174) For any texel in the equation above that
refers to a border texel outside the defined range of the image, the texel
value is taken from the texture border color as with NEAREST filtering.
Modify Section 3.8.14, Texture Comparison Modes (p. 185)
(modify 2nd paragraph, p. 188, indicating that the Q texture coordinate is
used for depth comparisons on cubemap textures)
Let D_t be the depth texture value, in the range [0, 1]. For
fixed-function texture lookups, let R be the interpolated <r> texture
coordinate, clamped to the range [0, 1]. For texture lookups generated by
a program instruction, let R be the reference value for depth comparisons
provided in the instruction, also clamped to [0, 1]. Then the effective
texture value L_t, I_t, or A_t is computed as follows:
Additions to Chapter 4 of the OpenGL 1.5 Specification (Per-Fragment
Operations and the Frame Buffer)
None.
Additions to Chapter 5 of the OpenGL 1.5 Specification (Special Functions)
None.
Additions to Chapter 6 of the OpenGL 1.5 Specification (State and
State Requests)
Modify Section 6.1.12 of the ARB_vertex_program specification.
(Add new integer program parameter queries, plus language that program
environment or local parameter query results are undefined if the query
specifies a data type incompatible with the data type of the parameter
being queried.)
The commands
void GetProgramEnvParameterdvARB(enum target, uint index,
double *params);
void GetProgramEnvParameterfvARB(enum target, uint index,
float *params);
void GetProgramEnvParameterIivNV(enum target, uint index,
int *params);
void GetProgramEnvParameterIuivNV(enum target, uint index,
uint *params);
obtain the current value for the program environment parameter numbered
<index> for the given program target <target>, and places the information
in the array <params>. The values returned are undefined if the data type
of the components of the parameter is not compatible with the data type of
<params>. Floating-point components are compatible with "double" or
"float"; signed and unsigned integer components are compatible with "int"
and "uint", respectively. The error INVALID_ENUM is generated if <target>
specifies a nonexistent program target or a program target that does not
support program environment parameters. The error INVALID_VALUE is
generated if <index> is greater than or equal to the
implementation-dependent number of supported program environment
parameters for the program target.
...
The commands
void GetProgramLocalParameterdvARB(enum target, uint index,
double *params);
void GetProgramLocalParameterfvARB(enum target, uint index,
float *params);
void GetProgramLocalParameterIivNV(enum target, uint index,
int *params);
void GetProgramLocalParameterIuivNV(enum target, uint index,
uint *params);
obtain the current value for the program local parameter numbered <index>
belonging to the program object currently bound to <target>, and places
the information in the array <params>. The values returned are undefined
if the data type of the components of the parameter is not compatible with
the data type of <params>. Floating-point components are compatible with
"double' or "float"; signed and unsigned integer components are compatible
with "int" and "uint", respectively. The error INVALID_ENUM is generated
if <target> specifies a nonexistent program target or a program target
that does not support program local parameters. The error INVALID_VALUE
is generated if <index> is greater than or equal to the
implementation-dependent number of supported program local parameters for
the program target.
...
The command
void GetProgramivARB(enum target, enum pname, int *params);
obtains program state for the program target <target>, writing ...
(add new paragraphs describing the new supported queries)
If <pname> is PROGRAM_ATTRIB_COMPONENTS_NV or
PROGRAM_RESULT_COMPONENTS_NV, GetProgramivARB returns a single integer
holding the number of active attribute or result variable components,
respectively, used by the program object currently bound to <target>.
If <pname> is MAX_PROGRAM_ATTRIB_COMPONENTS or
MAX_PROGRAM_RESULT_COMPONENTS_NV, GetProgramivARB returns a single integer
holding the maximum number of active attribute or result variable
components, respectively, supported for programs of type <target>.
Additions to Appendix A of the OpenGL 1.5 Specification (Invariance)
None.
Additions to the AGL/GLX/WGL Specifications
None.
GLX Protocol
The following new rendering commands are sent to the server as part
of a glXRender request.
ProgramLocalParameterI4ivNV
2 28 rendering command length
2 4303 rendering command opcode
4 ENUM target
4 CARD32 index
4 INT32 params[0]
4 INT32 params[1]
4 INT32 params[2]
4 INT32 params[3]
ProgramLocalParameterI4uivNV
2 28 rendering command length
2 4305 rendering command opcode
4 ENUM target
4 CARD32 index
4 CARD32 params[0]
4 CARD32 params[1]
4 CARD32 params[2]
4 CARD32 params[3]
ProgramEnvParameterI4ivNV
2 28 rendering command length
2 4307 rendering command opcode
4 ENUM target
4 CARD32 index
4 INT32 params[0]
4 INT32 params[1]
4 INT32 params[2]
4 INT32 params[3]
ProgramEnvParameterI4uivNV
2 28 rendering command length
2 4309 rendering command opcode
4 ENUM target
4 CARD32 index
4 CARD32 params[0]
4 CARD32 params[1]
4 CARD32 params[2]
4 CARD32 params[3]
Following new rendering commands are added. These can be sent as a
glXRender or glXRenderLarge request.
ProgramLocalParametersI4ivNV
2 16+count*4*4 rendering command length
2 4304 rendering command opcode
4 ENUM target
4 CARD32 index
4 CARD32 count
4*count*4 LISTofINT32 params
If the command is encoded in a glXRenderLarge request, the
command opcode and command length fields above are expanded to
4 bytes each:
4 20+count*4*4 rendering command length
4 4304 rendering command opcode
ProgramLocalParametersI4uivNV
2 16+count*4*4 rendering command length
2 4306 rendering command opcode
4 ENUM target
4 CARD32 index
4 CARD32 count
4*count*4 LISTofCARD32 params
If the command is encoded in a glXRenderLarge request, the
command opcode and command length fields above are expanded to
4 bytes each:
4 20+count*4*4 rendering command length
4 4306 rendering command opcode
ProgramEnvParametersI4ivNV
2 16+count*4*4 rendering command length
2 4308 rendering command opcode
4 ENUM target
4 CARD32 index
4 CARD32 count
4*count*4 LISTofCARD32 params
If the command is encoded in a glXRenderLarge request, the
command opcode and command length fields above are expanded to
4 bytes each:
4 20+count*4*4 rendering command length
4 4308 rendering command opcode
ProgramEnvParametersI4uivNV
2 16+count*4*4 rendering command length
2 4310 rendering command opcode
4 ENUM target
4 CARD32 index
4 INT32 count
4*count*4 LISTofCARD32 params
If the command is encoded in a glXRenderLarge request, the
command opcode and command length fields above are expanded to
4 bytes each:
4 20+count*4*4 rendering command length
4 4310 rendering command opcode
The remaining commands are non-rendering commands. These commands
are sent separately (i.e., not as part of a glXRender or
glXRenderLarge request), using the glXVendorPrivateWithReply
request:
GetProgramLocalParameterIivNV
1 CARD8 opcode (X assigned)
1 17 GLX opcode (X_GLXVendorPrivateWithReply)
2 5 request length
4 1365 vendor specific opcode
4 GLX_CONTEXT_TAG context tag
4 ENUM target
4 CARD32 index
=>
1 1 reply
1 CARD8 unused
2 CARD16 sequence number
4 4 reply length
24 CARD32 unused
16 INT32 params
GetProgramLocalParameterIuivNV
1 CARD8 opcode (X assigned)
1 17 GLX opcode (X_GLXVendorPrivateWithReply)
2 5 request length
4 1366 vendor specific opcode
4 GLX_CONTEXT_TAG context tag
4 ENUM target
4 CARD32 index
=>
1 1 reply
1 CARD8 unused
2 CARD16 sequence number
4 4 reply length
24 CARD32 unused
16 CARD32 params
GetProgramEnvParameterIivNV
1 CARD8 opcode (X assigned)
1 17 GLX opcode (X_GLXVendorPrivateWithReply)
2 5 request length
4 1367 vendor specific opcode
4 GLX_CONTEXT_TAG context tag
4 ENUM target
4 CARD32 index
=>
1 1 reply
1 CARD8 unused
2 CARD16 sequence number
4 4 reply length
24 CARD32 unused
16 INT32 params
GetProgramEnvParameterIuivNV
1 CARD8 opcode (X assigned)
1 17 GLX opcode (X_GLXVendorPrivateWithReply)
2 5 request length
4 1368 vendor specific opcode
4 GLX_CONTEXT_TAG context tag
4 ENUM target
4 CARD32 index
=>
1 1 reply
1 CARD8 unused
2 CARD16 sequence number
4 4 reply length
24 CARD32 unused
16 CARD32 params
Errors
The error INVALID_VALUE is generated by ProgramLocalParameter4fARB,
ProgramLocalParameter4fvARB, ProgramLocalParameter4dARB,
ProgramLocalParameter4dvARB, ProgramLocalParameterI4iNV,
ProgramLocalParameterI4ivNV, ProgramLocalParameterI4uiNV,
ProgramLocalParameterI4uivNV, GetProgramLocalParameter4fvARB,
GetProgramLocalParameter4dvARB, GetProgramLocalParameterI4ivNV, and
GetProgramLocalParameterI4uivNV if <index> is greater than or equal to the
number of program local parameters supported by <target>.
The error INVALID_VALUE is generated by ProgramEnvParameter4fARB,
ProgramEnvParameter4fvARB, ProgramEnvParameter4dARB,
ProgramEnvParameter4dvARB, ProgramEnvParameterI4iNV,
ProgramEnvParameterI4ivNV, ProgramEnvParameterI4uiNV,
ProgramEnvParameterI4uivNV, GetProgramEnvParameter4fvARB,
GetProgramEnvParameter4dvARB, GetProgramEnvParameterI4ivNV, and
GetProgramEnvParameterI4uivNV if <index> is greater than or equal to the
number of program environment parameters supported by <target>.
The error INVALID_VALUE is generated by ProgramLocalParameters4fvNV,
ProgramLocalParametersI4ivNV, and ProgramLocalParametersI4uivNV if the sum
of <index> and <count> is greater than the number of program local
parameters supported by <target>.
The error INVALID_VALUE is generated by ProgramEnvParameters4fvNV,
ProgramEnvParametersI4ivNV, and ProgramEnvParametersI4uivNV if the sum of
<index> and <count> is greater than the number of program environment
parameters supported by <target>.
Dependencies on NV_parameter_buffer_object
If NV_parameter_buffer_object is not supported, references to program
parameter buffer variables and bindings should be removed.
Dependencies on ARB_texture_rectangle
If ARB_texture_rectangle is not supported, references to rectangle
textures and the RECT and SHADOWRECT texture target identifiers should be
removed.
Dependencies on EXT_gpu_program_parameters
If EXT_gpu_program_parameters is not supported, references to the
Program{Local,Env}Parameters4fvNV commands, which set multiple program
local or environment parameters in a single call, should be removed.
These prototypes were included in this spec for completeness only.
Dependencies on EXT_texture_integer
If EXT_texture_integer is not supported, references to texture lookups
returning integer values in Section 2.X.4.4 (Texture Access) should be
removed, and all texture formats are considered to produce floating-point
values.
Dependencies on EXT_texture_array
If EXT_texture_array is not supported, references to array textures in
Section 2.X.4.4 (Texture Access) and elsewhere should be removed, as
should all references to the "ARRAY1D", "ARRAY2D", "SHADOWARRAY1D", and
"SHADOWARRAY2D" tokens.
Dependencies on EXT_texture_buffer_object
If EXT_texture_buffer_object is not supported, references to buffer
textures in Section 2.X.4.4 (Texture Access) and elsewhere should be
removed, as should all references to the "BUFFER" tokens.
Dependencies on NV_primitive_restart
If NV_primitive_restart is supported, index values causing a primitive
restart are not considered as specifying an End command, followed by
another Begin. Primitive restart is therefore not guaranteed to
immediately update bindings for material properties changed inside a
Begin/End. The spec language says they "are not guaranteed to update
program parameter bindings until the following End command."
New State
Initial
Get Value Type Get Command Value Description Sec Attrib
---------------------------- ---- --------------- ------- ---------------------- ------ ------
PROGRAM_ATTRIB_COMPONENTS_NV Z+ GetProgramivARB - number of components 6.1.12 -
used for attributes
PROGRAM_RESULT_COMPONENTS_NV Z+ GetProgramivARB - number of components 6.1.12 -
used for results
Table X.20. New Program Object State. Program object queries return
attributes of the program object currently bound to the program target
<target>.
New Implementation Dependent State
Minimum
Get Value Type Get Command Value Description Sec. Attrib
-------------------------------- ---- --------------- ------- --------------------- ------ ------
MIN_PROGRAM_TEXEL_OFFSET_EXT Z GetIntegerv -8 minimum texel offset 2.x.4.4 -
allowed in lookup
MAX_PROGRAM_TEXEL_OFFSET_EXT Z GetIntegerv +7 maximum texel offset 2.x.4.4 -
allowed in lookup
MAX_PROGRAM_ATTRIB_COMPONENTS_NV Z+ GetProgramivARB (*) maximum number of 6.1.12 -
components allowed
for attributes
MAX_PROGRAM_RESULT_COMPONENTS_NV Z+ GetProgramivARB (*) maximum number of 6.1.12 -
components allowed
for results
MAX_PROGRAM_GENERIC_ATTRIBS_NV Z+ GetProgramivARB (*) number of generic 6.1.12 -
attribute vectors
supported
MAX_PROGRAM_GENERIC_RESULTS_NV Z+ GetProgramivARB (*) number of generic 6.1.12 -
result vectors
supported
MAX_PROGRAM_CALL_DEPTH_NV Z+ GetProgramivARB 4 maximum program 2.X.5 -
call stack depth
MAX_PROGRAM_IF_DEPTH_NV Z+ GetProgramivARB 48 maximum program 2.X.5 -
if nesting
MAX_PROGRAM_LOOP_DEPTH_NV Z+ GetProgramivARB 4 maximum program 2.X.5 -
loop nesting
Table X.21: New Implementation-Dependent Values Introduced by
NV_gpu_program4. (*) means that the required minimum is program
type-specific. There are separate limits for each program type.
Issues
(1) How does this extension differ from previous NV_vertex_program and
NV_fragment_program extensions?
RESOLVED:
- This extension provides a uniform set of instructions and bindings.
Unlike previous extensions, the set of instructions and bindings
available is generally the same. The only exceptions are a small
number of instructions and bindings that make sense for one specific
program type.
- This extension supports integer data types and provides a
full-fledged integer instruction set.
- This extension supports array variables of all types, including
temporaries. Array variables can be accessed directly or indirectly
(using integer temporaries as indices).
- This extension provides a uniform set of structured branching
constructs (if tests, loops, subroutines) that fully support
run-time condition testing. Previous versions of NV_vertex_program
provided unstructured branching. Previous versions of
NV_fragment_program provided structure branching constructs, but the
support was more limited -- for example, looping constructs couldn't
specify loop counts with values computed at run time.
- This extension supports geometry programs, which are described in
more detail in the NV_geometry_program4 extension.
- This extension provides the ability to specify and use cubemap
textures with a DEPTH_COMPONENT internal format. Shadow mapping is
supported; the Q texture coordinate is used as the reference value
for comparisons.
(2) Is this extension backward-compatible with previous NV_vertex_program
and NV_fragment_program extensions? If not, what support has been
removed?
RESOLVED: This extension is largely, but not completely,
backward-compatible. Functionality removed includes:
- Unstructured branching: NV_vertex_program2 included a general
branch instruction "BRA" that could be used to jump to an arbitrary
instruction. The "CAL" instruction could "call" to an arbitrary
instruction into code that was not necessarily structured as simple
subroutine blocks. Arbitrary unstructured branching can be
difficult to implement efficiently on highly parallel GPU
architectures, while basic structured branching is not nearly as
difficult.
This extension retains the "CAL" instruction but treats each block
of code between instruction labels as a separate subroutine. The
"BRA" instruction and arbitrary branching has been removed. The
structured branching constructs in this extension are sufficient to
implement almost all of the looping/branching support in high-level
languages ("goto" being the most obvious exception).
- Address registers: NV_vertex_program added the notion of address
registers, which were effectively under-powered integer temporaries.
The set of instructions used to manipulate address registers was
severely limited. NV_vertex_program[23] extended the original
scalars to vectors and added a few more instructions to manipulate
address registers. Fragment programs had no address registers until
NV_fragment_program2 added the loop counter, which was very similar
in functionality to vertex program address registers, but even more
limited. This extension adds true integer temporaries, which can
accomplish everything old address registers could do, and much more.
Address register support was removed to simplify the API.
- NV_fragment_program2 LOOP construct: NV_fragment_program2 added a
LOOP instruction, which let you repeat a block of code <N> times,
with a parallel loop counter that started at <A> and stepped by <B>
on each iteration. This construct was signficantly limited in
several ways -- the loop count had to be constant, and you could
only access the innermost loop counter in a nested loop. This
extension discards the support and retains the simpler "REP"
construct to implement loops. If desired, a loop counter can be
implemented by manipulating an integer temporary. The "BRK"
instruction (conditional break) is retained, and a "CONT"
instruction (conditional continue) is added. Additionally, the loop
count need not be a constant.
- NV_vertex_program and ARB_vertex_program EXP and LOG instructions:
NV_vertex_program provided EXP and LOG instructions that computed a
rough approximation of 2^x or log_2(x) and provided some additional
values that could help refine the approximation. Those opcodes were
carried forward into ARB_vertex_program. Both ARB_vertex_program
and NV_vertex_program2 provided EX2 and LG2 instructions that
computed a better approximation. All fragment program extensions
also provided EX2 and LG2, but did not bother to include EXP and
LOG. On the hardware targeted by this extension, there is no
advantage to using EXP and LOG, so these opcodes have been removed
for simplicity.
- NV_vertex_program3 and NV_fragment_program2 provide the ability to
do indirect addressing of inputs/outputs when using bindings in
instructions -- for example:
MOV R0, vertex.attrib[A0.x+2]; # vertex
MOV result.texcoord[A0.y], R1; # vertex
MOV R2, fragment.texcoord[A0.x]; # fragment
This extension provides indexing capability, but using named array
variables instead.
ATTRIB attribs[] = { vertex.attrib[2..5] };
MOV R0, attribs[A0.x];
OUTPUT outcoords[] = { result.texcoord[0..3] };
MOV outcoords[A0.y], R1;
ATTRIB texcoords[] = { fragment.texcoord[0..2] };
MOV R2, texcoords[A0.x];
This approach makes the set of attribute and result bindings more
regular. Additionally, it helps the assembler determine which
vertex/fragment attributes are actually needed -- when the assembler
sees constructs like "fragment.texcoord[A0.x]", it must treat *all*
texture coordinates as live unless it can determine the range of
values used for indexing. The named array variable approach
explicitly identifies which attributes are needed when indexing is
used.
Functionality altered includes:
- The RSQ instruction in the original NV_vertex_program and
ARB_vertex_program extensions implicitly took the absolute value of
their operand. Since the ARB extensions don't have numerics
guarantees, computing the reciprocal square root of a negative value
was not meaningful. To allow for the possibility of taking the
reciprocal square root of a negative value (which should yield NaN
-- "not a number"), the RSQ instruction in this instruction no
longer implicitly takes the absolute value of its operand.
Equivalent functionality can be achieved using the explicit |abs|
absolute value operator on the operand to RSQ.
- The results of texture lookups accessing inconsistent textures are
now undefined, instead of producing a fixed constant vector.
(3) What should this set of extensions be called?
RESOLVED: NV_gpu_program4, NV_vertex_program4, NV_fragment_program4,
and NV_geometry_program4. Only NV_gpu_program4 will appear in the
extension string; the other three specifications exist simply to define
vertex, fragment, and geometry program-specific features.
The "gpu_program" name was chosen due to the common instruction set
intended to run on GPUs. On previous chip generations, the vertex and
fragment instruction sets were similar, but there were enough
differences to package them separately.
The choice of "4" indicates that this is the fourth generation of
programmable hardware from NVIDIA. The GeForce3 and GeForce4 series
supported NV_vertex_program. The GeForce FX series supported
NV_vertex_program2 and added fragment programmability with
NV_fragment_program. Around this time, the OpenGL Architecture Review
Board (ARB) approved ARB_vertex_program and ARB_fragment_program
extensions, and NVIDIA added NV_vertex_program2_option and
NV_fragment_program_option extensions exposing GeForce FX features using
the ARB extensions' instruction set. The GeForce6 and GeForce7 series
brought the NV_vertex_program3 and NV_fragment_program2 extensions,
which extend the ARB extensions further. This extension adds geometry
programs, and brings the "version number" for each of these extensions
up to "4".
(4) This instruction adds integer data type support in programmable
shaders that were previously float-centric. Should applications be able
to pass integer values directly to the shaders, and if so, how does it
work?
RESOLVED: The diagram at the bottom of this issue depicts data flows in
the GL, as extended by this and related extensions.
This extension generalizes some state to be "typeless", instead of being
strongly typed (and almost invariably floating-point) as in the core
specification. We introduce a new set of functions to specify GL state
as signed or unsigned integer values, instead of floating point values.
These functions include:
* VertexAttribI*{i,ui}() -- Specify generic vertex attributes as
integers. This extension does not create "integer" versions for
fixed-function attribute functions (e.g., glColor, glTexCoord),
which remain fully floating-point.
* Program{Env,Local}ParameterI*{i,ui}() -- Specify environment and
local parameters as integers.
* TexImage*() with EXT_texture_integer internal formats -- Specify
texture images as containing integer data whose values are not
converted to floating-point values.
* EXT_parameter_buffer_object functions -- Bind (typeless) buffer
object data stores for use as program parameters. These buffer
objects can be loaded with either integer or floating-point data.
* EXT_texture_buffer_object functions -- Bind (typeless) buffer object
data stores for use as textures. These buffer objects can be loaded
with either integer or floating-point data.
Each type of program (using NV_gpu_program4 and related extension) can
read attributes using any data type (float, signed integer, unsigned
integer) and write result values used by subsequent stages using any
data type.
Finally, there are several new places where integer data can be
consumed by the GL:
* NV_transform_feedback -- Stream transformed vertex attribute
components to a (typeless) buffer object. The transformed
attributes can be written as signed or unsigned integers in vertex
and geometry programs.
* EXT_texture_integer internal formats and framebuffer objects --
Provide support for rendering to integer texture formats, where
final fragment values are treated as signed or unsigned integers,
rather than floating-point values.
The diagram below represents a substantial portion of the GL pipeline.
Each line connecting blocks represents an interface where data is
"produced" from the GL state or by fixed-function or programmable
pipeline stages and "consumed" by another pipeline stage. Each producer
and consumer is labeled with a data type. For producers, the
"(typeless)" designation generally means that the state and/or output
can be written as floating-point values or as signed or unsigned
integers. "(float)" means that the outputs are always written as
floating-point. The same distinction applies to consumers --
"(typeless)" means that the consumer is capable of reading inputs using
any data type, and "(float)" means that consumer always reads inputs as
floating-point values.
To get sane results, applications must ensure that each value passed
between pipeline stages is produced and consumed using the same data
type. If a value is written in one stage as a floating-point value; it
must be read as a floating-point value as well. If such a value is read
as a signed or unsigned integer, its value is considered undefined. In
practice, the raw bits used to represent the floating-point (IEEE
single-precision floating-point encoding in the initial implementation
of this spec) will be treated as an integer.
Type matching between stages is not enforced by the GL, because the
overhead of doing so would be substantial. Such overhead would include:
* matching the inputs and outputs of each pipeline stage
(fixed-function or programmable) every time the program
configuration or fixed-function state changes,
* tracking the data type of each generic vertex attribute and checking
it against the vertex program's inputs,
* tracking the data type of each program parameter and checking it
against the manner the parameters were used in programs,
* matching color buffers against fragment program outputs.
Such error checking is certainly valuable, but the additional CPU
overhead cost is substantial. Given that current CPUs often have a hard
time keeping up with high-end GPUs, adding more overhead is a step in
the wrong direction. We expect developer tools, such as instrumented
drivers, to be able to provide type checking on most interfaces.
The diagram below depicts assembly programmability. Using vertex,
geometry, and fragment shaders provided by the OpenGL Shading Language
(GLSL) isn't substantially different from the assembly interface, except
that the interfaces between programmable pipeline stages are more
tightly coupled in GLSL (vertex, geometry, and fragment shaders are
linked together into a single program object), and that shader variables
are more strongly typed in GLSL than in the assembly interface.
In the figure below, the first programmable stage is vertex program
execution. For all inputs read by the vertex program, they must be
specified in the GL vertex APIs (immediate mode or vertex arrays) using
a data type matching the data type read by the shader. Additionally,
vertex programs (and all other program types) can read program
parameters, parameter buffers, and textures. In all cases the
parameter, buffer, or texture data must be accessed in the shader using
the same data type used to specify the data. If vertex programs are
disabled, fixed-function vertex processing is used. Fixed-function
vertex processing is fully floating-point, and all the conventional
vertex attributes and state used by fixed-function are floating-point
values.
After vertex processing, an optional geometry program can be executed,
which reads attributes written by vertex programs (or fixed-functon) and
writes out new vertex attributes. The vertex attributes it reads must
have been written by the vertex program (or fixed-function) using a
matching data type.
After geometry program execution, vertex attributes can optionally be
written out to buffer objects using the NV_transform_feedback extension.
The vertex attributes are written by the GL to the buffer objects using
the same data type used to write the attribute in the geometry program
(or vertex program if geometry programs are disabled).
Then, rasterization generates fragments based on transformed vertices.
Most attributes written by vertex or geometry programs can be read by
fragment programs, after the rasterization hardware "interpolates" them.
This extension allows fragment programs to control how each attribute is
interpolated. If an attribute is flat-shaded, it will be taken from the
output attribute of the provoking vertex of the primitive using the same
data type. If an attribute is smooth-shaded, the per-vertex attributes
will be interpreted as a floating-point value, and a floating-point
result. One necessary consequence of this is that any integer
per-fragment attributes must be flat-shaded. To prevent some
interpolation type errors, assembly and GLSL fragment shaders will not
compile if they declare an integer fragment attribute that is not flat
shaded. [NOTE: While point primitives generally have constant
attributes, any integer attributes must still be flat-shaded; point
rasterization may perform (degenerate) floating-point interpolation.]
Fragment programs must read attributes using data types matching the
outputs of the interpolation or flat-shading operations. They may write
one or more color outputs using any data type, but the data type used
must match the corresponding framebuffer attachments. Outputs directed
at signed or unsigned integer textures (EXT_texture_integer) must be
written using the appropriate integer data type; all other outputs must
be written as floating-point values. Note that some of the
fixed-function per-fragment operations (e.g., blending, alpha test) are
specified as floating-point operations and are skipped when directed at
signed or unsigned integer color buffers.
generic conventional
vertex vertex
attributes attributes
| (typeless) | (float)
| |
| |
| +----------------------+
program | | |
parameters ----+ | | |
(typeless) | | | (typeless) | (float)
| V V V
constant +-+----------> vertex fixed-function
buffers ----+ |(typeless) program vertex
(typeless) | | | |
| | | (typeless) | (float)
textures ----+ | V |
(typeless) | |<----------------------+
| | |
| | +---------------+
| | | |
| | | (typeless) |
| | V |
| +---------> geometry |
| |(typeless) program |
| | | |
| | | (typeless) |
| | V |
| | |<--------------+
| | |
| | |
| | +-----------------+
| | | |(typeless)
| | | v
| | | transform
| | | feedback
| | | buffers
| | |
| | |
| | +-----------------------+
| | | |
| | | (float) | (typeless)
| | V V
| | interpolated flat
| | attributes attributes
| | | |
| | | (float) | (typeless)
| | V |
| | |<----------------------+
| | |
| | +-----------------------+
| | | |
| | | (typeless) | (float)
| |(typeless) V V
| +---------> fragment +------> fixed-function
| program |(float) fragment
| | | |
+--------------------------/|/--------+ |
| |
| (typeless) | (float)
V |
|<----------------------+
|
+-----------------------+------ ....
| |
| (typeless) | (typeless)
V V
color color
attachment attachment
0 1
(5) Instructions can operate on signed integer, unsigned integer, and
floating-point values. Some operations make sense on all three data
types? How is this supported, and what type checking support is provided
by the assembler?
RESOLVED: One important property of the instruction set is that the
data type for all operands and the result is fully specified by the
instructions themselves. For instructions (such as ADD) that make sense
for both integer and floating-point values, an optional data type
modifier is provided to indicate which type of operation should be
performed. For example, "ADD.S", "ADD.U", and "ADD.F", add signed
integers, unsigned integers, or floating-point values, respectively. If
no data type modifier is provided, ".F" is assumed if the instruction
can apply to floating-point values and ".S" is assumed otherwise.
To help identify errors where the wrong data type is used -- for
example, adding integer values in an ADD instruction that omits a data
type modifier and thus defaults to "ADD.F" -- variables may be declared
with optional data type modifiers. In the following code:
INT TEMP a;
UINT TEMP b;
FLOAT TEMP c;
TEMP d;
"a", "b", "c", and "d" are declared as temporary variables holding
signed integer, unsigned integer, floating-point, and typeless values.
Since each instruction fully specifies the data type of each operand and
its result, these data types can be checked against the data type
assigned to the variables operated on. If the types don't match, and
the variable is not typeless, an error is reported. The opcode modifier
".NTC" can be used to ignore such errors on a per-opcode basis, if
required.
Note that when bindings are used directly in instructions, they are
always considered typeless for simplicity. Some fixed-function bindings
have an obvious data type, but other bindings (e.g., program parameters)
can hold either integer or floating-point values, depending on how they
were specified.
Variable data types are optional. Typeless variables are provided
because some programs may want to reuse the same variable in several
places with different data types.
(6) Should both signed (INT) and unsigned integer (UINT) data types be
provided?
RESOLVED: Yes. Signed and unsigned integer operations are supported.
Providing both "INT" and "UINT" variable modifiers distinguish between
signed and unsigned values for type checking purposes, to ensure that
unsigned values aren't read as signed values and vice versa.
This specification says if a value is read a signed integer, but was
written as an unsigned integer, the value returned is undefined.
However, signed and unsigned integers are interchangeable in practice,
except for very large unsigned integers (which can't be represented as
signed values of the equivalent size) or negative signed integers.
If programs know that they won't generate negative or very large values,
signed and unsigned integers can be used interchangeably. To avoid type
errors in the assembler in this case, typeless variables can be used.
Or the ".NTC" modifier can be used when appropriate.
(7) Integer and floating-point constants are supported in the instruction
set. Integer constants might be interpreted to mean either "real integer"
values or floating-point values. How are they supported?
RESOLVED: When an obvious floating point constant is specified (e.g.,
"3.0"), the developers' intent is clear. If you try to use a
floating-point value in an instruction that wants an integer operand, or
a declaration of an integer parameter variable, the program will fail to
load. An integer constant used in an instruction isn't quite as clear.
But its meaning can be easily inferred because the operand types of
instructions are well-known at compile time. An integer multiply
involving the constant "2" will interpret the "2" as an integer. A
floating-point multiply involving the same constant "2" will interpret
it as a floating-point value.
The only real problem is for a parameter declaration that is typeless.
For typed variables, the intent is clear:
INT PARAM two = 2; # use integer 2
FLOAT PARAM twoPt0 = 2; # use floating-point 2.0
For typeless variables, there's no context to go on:
PARAM two = 2; # 2? 2.0?
This extension is intended to be largely upward-compatible with
ARB_vertex_program, ARB_fragment_program, and the other extensions built
on top of them. In all of these, the previous declaration is legal and
means "2.0". For compatibility, we choose to interpret integer
constants in this case as floating-point values. The assembler in the
NVIDIA implementation will issue a warning if this case ever occurs.
This extension does not provide decoration of integer constant values --
we considered adding suffixed integers such as "2U" to mean "2, and
don't even think about converting me to a float!". We expect that it
will be sufficient to use the "INT" or "FLOAT" modifiers to disambiguate
effectively.
(8) Should hexadecimal constants (e.g., 0x87A3 or 0xFFFFFFFF) be supported?
RESOLVED: Yes.
(9) Should we provide data type modifiers with explicit component sizes?
For example, "INT8", "FLOAT16", or "INT32". If so, should we provide a
mechanism to query the size (in bits) of a variable, or of different
variable types/qualifiers?
RESOLVED: No.
(10) Should this extension provide better support for array variables?
RESOLVED: Yes; array variables of all types are allowed.
In ARB_vertex_program, program parameter (constant) variables could be
addressed as arrays. Temporary variables, vertex attributes, and vertex
results could not be declared as arrays.
In NV_vertex_program3 and NV_fragment_program2, relative addressing was
supported in program bindings:
MOV R0, vertex.attrib[A0.x]; # vertex
MOV result.texcoord[A0.x], R0; # vertex
MOV R0, fragment.texcoord[A0.x]; # fragment -- inside LOOP
Explicitly declared attribute or result arrays were not supported, and
temporaries could also not be arrays.
This extension allows users to declare attribute, result, and temporary
arrays such as:
ATTRIB attribs[] = { vertex.attrib[7..11] };
TEMP scratch[10];
RESULT texcoords[] = { result.texcoord[0..3] };
Additionally, the relative addressing mechanisms provided by
NV_vertex_program3 and NV_fragment_program2 are NOT supported in this
extension -- instead, declared array variables are the only way to get
relative addressing. Using declared arrays allows the assembler to
identify which attributes will actually be used. An expression like
"vertex.texcoord[A0.x]" doesn't identify which texture coordinates are
referenced, and the assembler must be conservative in this case and
assume that they all are.
(11) Is relative addressing of temporaries allowed?
RESOLVED: Yes. However, arrays of temporaries may end up being stored
in off-chip memory, and may be slower to access than non-array
temporaries.
(12) Should this extension add bindings to pass generic attributes between
vertex, geometry, and fragment programs, or are texture coordinates
sufficient?
RESOLVED: While texture coordinates have been used in the past, generic
attributes should be provided.
The assembler provides a large set of bindings and automatically
eliminates generic attributes or components that are unused. At each
interface between programs, there is an implementation-dependent limit
on the number of attribute components that can be passed.
There are several reasons that this approach was chosen. First, if the
number of attributes that can be passed between program stages exceeds
the number of existing texture coordinate sets supported when specifying
vertex, a second implementation-dependent number of texture coordinates
would need to be exposed to cover the number supported between stages.
Second, the mechanisms described above reduce or eliminate the need to
pack attributes into four component vectors. Third, "texture
coordinates" that have been historically used for texture lookups don't
need to be used to pass values that aren't used this way.
(13) The structured branching support in NV_fragment_program2 provides a
REP instruction that says to repeat a block of code <N> times, as well as
a LOOP instruction that does the same, but also provides a special loop
counter variable. What sort of looping mechanism should we provide here?
RESOLVED: Provide only the REP instruction. The functionality provided
by the LOOP instruction can be easily achieved by using an integer
temporary as the loop index. This avoids two annoyances of the old LOOP
models: (a) the loop index (A0.x) is a special variable name, while all
other variables are declared normally and (b) instructions can only
access the loop index of the innermost loop -- loop indices at higher
nesting levels are not accessible.
One other option was a considered -- a "LOOPV" instruction (LOOP with a
variable where the program specified a variable name and component to
hold the loop index, instead of using the implicit variable name "A0.x".
In the end, it was decided that using an integer temporary as a loop
counter was sufficient.
(14) The structured branching support in NV_fragment_program2 provides a
REP instruction that requires a loop count. Some looping constructs may
not have a definite loop count, such as a "while" statement in C. Should
this construct be supported, and if so, how?
RESOLVED: The REP instruction is extended to make the loop count
optional. If no loop count is provided, the REP instruction specified a
loop that can only be exited using the BRK (break) or RET instructions.
To avoid obvious infinite loops, an error will be reported if a
REP/ENDREP block contains no BRK instruction at the current nesting
level and no RET instruction at any nesting level.
To implement a loop like "while (value < 7.0) ...", code such as the
following can be used:
TEMP cc; # dummy variable
REP;
SLT.CC cc.x, value.x, 7.0; # compare value.x to 7.0, set CC0
BRK NE.x; # break out if not true
...
... # presumably update value!
...
ENDREP;
(15) The structured branching support in NV_fragment_program2 provides a
BRK instruction that operates like C's "break" statement. Should we
provide something similar to C's "continue" statement, which skips to the
next iteration of the loop?
RESOLVED: Yes, a new CONT opcode is provided for this purpose.
(16) Can the BRK or CONT instructions break out of multiple levels of
nested loops at once?
RESOLVED: No. BRK and CONT only exit the current nesting level. To
break out of multiple levels of nested loops, multiple BRK/CONT
instructions are required.
(17) For REP instructions, is the loop counter reloaded on each iteration
of the loop?
RESOLVED: No. The loop counter is loaded once at the top of the loop,
compared to zero at the top of the loop, and decremented when each loop
iteration completes. A program may overwrite the variable used to
specify the initial value of the loop counter inside the loop without
affecting the number of times the loop body is executed.
(18) How are floating-point values represented in this extension? What
about floating-point arithmetic operations?
RESOLVED: In the initial hardware implementation of this extension,
floating-point values are represented using the standard 32-bit IEEE
single-precision encoding, consisting of a sign bit, 8 exponent bits,
and 23 mantissa bits. Special encodings for NaN (not a number), +/-INF
(infinity), and positive and negative zero are supported. Denorms
(values less than 2^-126, which have an exponent encoding of "0" and no
implied leading one) are supported, but may be flushed to zero,
preserving the sign bit of the original value. Arithmetic operations
are carried out at single-precision using normal IEEE floating-point
rules, including special rules for generating infinities, NaNs, and
zeros of each sign.
Floating-point temporaries declared as "SHORT" may be, but are not
necessarily, stored as 16-bit "fp16" values (sign bit, five exponent
bits, ten mantissa bits), as specified in the NV_float_buffer and
ARB_half_float_pixel extensions.
(19) Should we provide a method to declare how fragment attributes are
interpolated? It is possible to have flat-shaded attributes,
perspective-corrected attributes, and centroid-sampled attributes.
RESOLVED: Yes. Fragment program attribute variable declarations may
specify the "FLAT", "NOPERSPECTIVE", and "CENTROID" modifiers.
These modifiers are documented in detail in the NV_fragment_program4
specification.
(20) Should vertex and primitive identifiers be supported? If so, how?
RESOLVED: A vertex identifier is available as "vertex.id" in a vertex
program. The vertex ID is equal to value effectively passed to
ArrayElement when the vertex is specified, and is defined only if vertex
arrays are used with buffer objects (VBOs).
A primitive identifier is available as "primitive.id" in a geometry or
fragment program. The primitive ID is equal to the number of primitives
processed since the last implicit or explicit call to glBegin().
See the NV_vertex_program4 spec for more information on vertex IDs, and
the NV_geometry_program4 or NV_fragment_program4 specs for more
information on primitive IDs.
(21) For integer opcodes, should a bitwise inversion operator "~" be
provided, analogous to existing negation operator?
RESOLVED: No. If this operator were provided, it might allow a program
to evaluate the expression "a&(~b)" using a single instruction:
AND.U a, a, ~b;
Instead, it is necessary to instead do something like:
UINT TEMP t;
NOT.U t, b;
AND.U a, a, t;
If necessary, this functionality could be added in a subsequent
extension.
(22) What happens if you negate or take the absolute value of the
biggest-magnitude negative integer?
RESOLVED: Signed integers are represented using two's complement
representation. For 32-bit integers, the largest possible value is
2^31-1; the smallest possible value is -2^31. There is no way to
represent 2^31, which is what these operators "should" return. The
value returned in this case is the original value of -2^31.
(23) How do condition codes work? How are they different from those
provided in previous NVIDIA extensions?
RESOLVED: There are two condition codes -- CC0 and CC1 -- each of which
is a four-component vector. The condition codes are set based on the
result of an instruction that specifies a condition code update
modifier. Examples include:
ADD.S.CC R0, R1, R2; # add signed integers R1 and R2, update
# CC0 based on the result, write the
# final value to R0
ADD.F.CC1 R3, R4, R5; # add floats R4 and R5, update CC1 based
# on the result, write the final value
# to R3
ADD.U.CC0 R6.xy, R7, R8; # add unsigned integers R7 and R8, update
# CC0 (x and y components) based on the
# result, write the final value to R6
# (x and y components)
Condition codes can be used for conditional writes, conditional
branches, or other operations. The condition codes aren't used
directly, but are instead used with a condition code test such as "LT"
(less than) or "EQ" (equal to). Examples include:
MOV R0 (GT.x), R1; # move R1 to R0 only if the x component of
# CC0 indicates a result of ">0"
MOV R2 (NE1), R3; # component-wise move of R3 to R2 if the
# corresponding component of CC1
# indicates a result of "!=0"
IF LE0.xyxy; # execute the block of code if the x or
... # y components of CC0 indicate a result
ENDIF; # of "<=0"
REP;
...
BRK EQ1.xyzx; # break out of loop if the x, y, or z
ENDREP; # components of CC1 indicate a result of
# "==0".
Previous NVIDIA extensions provide eight tests, which are still
supported here. The tests "EQ" (equal), "GE" (greater/equal), "GT"
(greater than), "LE" (less/equal), "LT" (less than), and "NE" (not
equal) can be used to determine the relation of the result used to set
the condition code with zero. The tests "TR" (true) and "FL" (false),
are special tests that always evaluate to true or false respectively.
For floating-point results, a NaN (not a number) encoding causes the
"NE" condition to evaluate to TRUE and all other conditions to evaluate
to FALSE. IEEE encodings for "negative" and "positive" zero are both
treated as equal to zero.
Condition codes are implemented as a set of flags, which are set
depending on the type of operation, as described in the spec.
For instructions that return floating-point or signed integer values,
the normal condition code tests reliably indicate the relationship of
the result to zero. For instructions that return unsigned values, the
condition codes are a bit more complicated. For example, the sign flag
is set if the most significant bit of the result written is set. As a
result, very large unsigned integer values (e.g., 0x80000000 -
0xFFFFFFFF) are effectively treated as negative values. Condition code
tests should be used with care with unsigned results -- to test if an
unsigned integer is ">0", use a sequence like:
MOV.U.CC R0, R1; # move R1 to R0, set condition code
IF NE; # test if the result is "!=0", a very
... # large value might fail "GT"!
ENDIF;
This extension provides a number of additional condition code tests
useful for different floating-point or integer operations:
* NAN (not a number) is true if a floating-point result is a NaN. LEG
(less, equal to, or greater) is the opposite of NAN.
* CF (carry flag) is true if an unsigned add overflows, or if an
unsigned subtract produces a non-negative value. NCF (no carry
flag) is the opposite of CF.
* OF (overflow flag) is true if a signed add or subtract overflows.
NOF (no overflow flag) is the opposite of OF.
* SF (sign flag) is true if the sign flag is set. NSF (no sign flag)
is the opposite of SF.
* AB (above) is true if an unsigned subtract produces a positive
result. BLE (below or equal) is the opposite of AB, and is true if
an unsigned subtract produces a negative result or zero. Note that
CF can be used to test if the result is greater than or equal to
zero, and NCF can be used to test if the result is less than zero.
(24) How do the "set on" instructions (SEQ, SGE, SGT, SLE, SLT, SNE) work
with integer values and/or condition codes?
RESOLVED: "Set on" instructions comparing signed and unsigned values
return zero if the condition is false, and an integer with all bits set
if the condition is true. If the result is signed, it is interpreted as
-1. If the result is unsigned, it is interpreted the largest unsigned
value (0xFFFFFFFF for 32-bit integers). This is different from the
floating-point "set on", which is defined to return 1.0.
This specific result encoding was chosen so that bitwise operators (NOT,
AND, OR, XOR) can be used to evaluate boolean expressions.
When performing condition code tests on the results of an integer "set
on" instruction, keep in mind that a TRUE result has the most
significant bit set and will be interpreted as a negative value. To
test if a condition is true, use "NE" (!=0). A condition code test of
"GT" will always fail if the condition code was written by an integer
"set on" instruction.
(25) What new texture functionality is provided?
RESOLVED: Several new features are provided.
First, the TXF (texel fetch) instruction allows programs to access a
texture map like a normal array. Integer coordinates identifying an
individual texel and LOD are provided, and the corresponding texture
data is returned without filtering of any type.
Second, the TXQ (texture size query) instruction allows programs to
query the size of a specified level of detail of a texture. This
feature allows programs to perform computations dependent on the size of
the texture without having to pass the size as a program parameter or
via some other mechanism.
Third, applications may specify a constant texel offset in a texture
instruction that moves the texture sample point by the specified number
of texels. This offset can be used to perform custom texture filtering,
and is also independent of the size of the texture LOD -- the same
offsets are applied, regardless of the mipmap level.
Fourth, shadow mapping is supported for cube map textures. The first
three coordinates are the normal (s,t,r) coordinates for a cube map
texture lookup, and the fourth component is a depth reference value that
can be compared to the depth value stored in the texture.
(26) What "consistency" requirements are in effect for textures accessed
via the TXF (texel fetch) instruction?
UNRESOLVED: The texture must be usable for regular texture mapping
operations -- if texture sizes or formats are inconsistent and a
mipmapped min filter is used, the results are undefined.
(27) How does the TXF instruction work with bordered textures?
RESOLVED: The entire image can be accessed, including the border
texels. For a 64x64 2D texture plus border (66x66 overall), the lower
left border texel is accessed using the coordinates (-1,-1); the upper
right border texel is accessed using the coordinates (64,64).
(28) What should TXQ (texture size query) return for "irrelevant" texture
sizes (e.g., height of a 1D texture)? Should it return any other
information at the same time?
RESOLVED: This specification leaves all "extra" components undefined.
(29) How do texture offsets interact with cubemap textures?
RESOLVED: They are not supported in this extension.
(30) How do texture offsets interact with mipmapped textures?
RESOLVED: The texture offsets are added after the (s,t,r) coordinates
have been divided by q (if applicable) and converted to (u,v,w)
coordinates by multiplying by the size of the selected texture level.
The offsets are added to the (u,v,w) coordinates, and always move the
sample point by an integral number of texel coordinates. If multiple
mipmaps are accessed, the sample point in each mipmap level is moved by
an identical offset. The applied offsets are independent of the
selected mipmap level.
(31) How do shadow cube maps work?
UNRESOLVED: An application can define a cube map texture with a
DEPTH_COMPONENT internal format, and then render a scene using the cube
map faces as the depth buffer(s). When rendering the projection should
be set up using the "center" of the cubemap as the eye, and using a
normal projection matrix. When applying the shadow map, the fragment
program read the (x,y,z) eye coordinates, compute the length of the
major axis (MAX(|x|,|y|,|z|) and then transform this coordinate to [0,1]
space using the same parameters used to derive Z in the projection
matrix. A 4-component vector consisting of x, y, z, and this computed
depth value should be passed to the texture lookup, and normal shadow
mapping operations will be performed.
This issue should include the math needed to do this computation and
sample code.
(32) Integer multiplies can overflow by a lot. Should there be some way
to return the high part of both unsigned and signed integer multiplies?
RESOLVED: Yes. The ".HI" multipler is provided to do a return the 32
MSBs of a 32x32 integer multiply. The instruction sequence:
INT TEMP R0, R1, R2, R3;
MUL.S R0, R2, R3;
MUL.S.HI R1, R2, R3;
will do a 32x32 signed integer multiply of R2 and R3, with the 32 LSBs of
the 64-bit result in R0 and the 32 MSBs in R1.
(33) Should there be any other special multiplication modifiers?
RESOLVED: Yes. The ".S24" and ".U24" modifiers allow for signed and
unsigned integer multiplies where both operands are guaranteed to fit in
the least significant 24 bits. On some architectures supporting this
extension, ".S24" and ".U24" integer multiplies may be faster than
general-purpose ".S" and ".U" multiplies. If either value doesn't fit
in 24 bits, the results of the operation are undefined --
implementations may, but are not required to, ignore the MSBs of the
operands if ".S24" or ".U24" is specified.
(34) This extension provides subroutines, but doesn't provide a stack to
push and pop parameters. How do we deal with this? NV_vertex_program3
supported PUSHA/POPA instructions to push and pop address registers.
RESOLVED: No explicit stack is required. A program can implement a
stack by allocating a temporary array plus a single integer temporary to
use as the stack "pointer". For example:
TEMP stack[256]; # 256 4-component vectors
INT TEMP sp; # sp.x == stack pointer
INT TEMP cc; # condition code results
function:
SGE.S.CC cc.x, sp.x, 256; # compute stackPointer >= 256
RET NE.x; # return if TRUE
MOV stack[sp], R0; # push R0 onto the stack
ADD.S sp.x, sp.x, 1;
...
SUB.S sp.x, sp.x, 1; # pop R0 off the stack
MOV R0, stack[sp];
RET
(35) Should we provide new vector semantics for previously-defined opcodes
(e.g., LG2 computes a component-wise logarithm)?
RESOLVED: Not in this extension. The instructions we define here are
compatible with the vector or scalar nature of previously defined
opcodes. This simplifies the implementation of an assembler that needs
to support both old and new instruction sets.
(36) Should it really be undefined to read from a register storing data of
one type with an instruction of the other type (e.g., to read the bits of
a floating-point number as an unsigned integer)?
RESOLVED: The spec describes undefined results for simplicity. In
practice, mixing data types can be done, where signed integers are
represented as two's complement integers and floating-point numbers are
represented using IEEE single-precision representation. For example:
TEMP R0, R1; # typeless
MOV.U R0, 0x3F800000; # R0 = 1.0
MOV.U R1, 0xBF800000; # R1 = -1.0
MUL.F R0, R0, R1; # R0 = -1 * 1 = -1 (0xBF800000)
XOR.U R0, R0, R1; # R0 = 0xBF800000 ^ 0xBF800000 = 0
NOT.U R0, R0; # R0 = 0xFFFFFFFF
I2F.S R0, R0; # R0 = -1.0 (0xFFFFFFFF = -1 signed)
SEQ.F R0, R0, R1; # R0 = 1.0 (-1.0 == -1.0)
(37) Buffer objects can be sourced as program parameters using the
NV_parameter_buffer_object extension. How are they accessed in a program?
RESOLVED: The instruction set and existing program environment and
local parameter bindings operate largely on four-component vectors.
However, NV_parameter_buffer_object exposes the ability to reach into
buffers consisting of user-generated data or data written to the buffer
object by the GPU. Such data sets may not consist entirely
four-component floating-point vectors, so a four-component vector API
may be unnatural. An application might need to reformat its data set to
deal with this issue. Or it might generate odd code to compensate for
mis-alignment -- for example, reading an array of 3-component vectors by
doing two four-component vector accesses and then rotating based on
alignment. Neither approach is particularly satisfying.
Instead, this extension takes the approach of treating parameter buffers
as array of scalar words. When an individual buffer element is read,
the single word is replicated to produce a four-component vector. To
access an array of 3-component vectors, code like the following can be
used:
PARAM buffer[] = { program.buffer[0] };
INT TEMP index;
TEMP R0;
...
MUL.S index, index, 3; # to read "vec3" #X, compute 3*X
MOV R0.x, buffer[index+0];
MOV R0.y, buffer[index+1];
MOV R0.z, buffer[index+2];
(38) Should recursion be allowed? If so, how is the total amount of
recursion limited?
RESOLVED: Recursion is allowed, and a call stack is provided by the
implementation. The size of the call stack is limited to the
implementation-dependent constant MAX_PROGRAM_CALL_DEPTH, and when a the
call stack is full, the results of further CAL instructions is
undefined. In the initial implementation of this extension, such
instructions will have no effect.
Note that no stack is provided to hold local registers; a program may
implement its own via a temporary array and integer stack "pointer".
(39) Variables are all four-component vectors in previous extensions.
Should scalar or small-vector variables be provided?
RESOLVED: It would be a useful feature, but it was left out for
simplicity. In practice, a variable where only the X component is used
will be equivalent to a scalar.
(40) The PK* (pack) and UP* (unpack) instructions allow packing multiple
components of data into a single component. The bit packing is
well-defined. Should we require specific data types (e.g., unsigned
integer) to hold packed values?
RESOLVED: No. Previous instruction sets only allowed programs to write
packed values to a floating-point variable (the only data type
provided). We will allow packed results to be written to a variable of
any data type. Integer instructions can be used to manipulate bits of
packed data in place.
(41) What happens when converting integers to floats or vice versa if
there is insufficient precision or range to represent the result?
RESOLVED: For integer-to-float conversions, the nearest representable
floating-point value is used, and the least significant bits of the
original integer value are lost. For float-to-integer conversions,
out-of-range values are clamped to the nearest representable integer.
(42) Why are some of the grammar rules so bizarre (e.g., attribUseD,
attribUseV, attribUseS, attribUseVNS)?
RESOLVED: This grammar is based upon the original ARB_vertex_program
grammar, which has a number of "interesting" characteristics. For
example, some of the bindings provided by ARB_vertex_program naturally
require some amount of lookahead. For example, a vertex program can
write an output color using any of the following:
MOV result.color, 0; # primary color
MOV result.color.primary, 0; # primary color again
MOV result.color.secondary, 0; # secondary color this time
The pieces of the color binding are separated by "." tokens. However,
writemasks are also supported, which also use "." before the write
mask. So, we could also have something like:
MOV result.color.xyz, 0; # primary color with W masked off
In this form, a parser needs to look at both the "." and the "xyz" to
determine that the binding being used is "result.color" (and not
"result.color.secondary").
Additionally, some checks that should probably be semantic errors (e.g.,
allowing different swizzle or scalar operand selectors per instruction,
or disallowing both in the case of SWZ) we specified in the original
grammar.
ARB_fragment_program and subsequent NVIDIA instructions built upon this,
and the grammar for this extension was rewritten in the current form so
it could be validated more easily.
(43) This is an NV extension (NV_gpu_program4). Why does the
MAX_PROGRAM_TEXEL_OFFSET_EXT token has an "EXT" suffix?
RESOLVED: This token is shared between this extension and the
comparable high-level GLSL programmability extension (EXT_gpu_shader4).
Rather than provide a duplicate set of token names, we simply use the
EXT version here.
(44) For the purposes of determining the number of attribute and result
components, how are "scalar" attributes counted. For example, only
the x component of the "pointsize" per-vertex output is actually
relevant.
RESOLVED: Implementations are allowed to count all inputs and outputs
as full four-component vectors. To avoid this, apply appropriate write
masks or swizzles.
For example, writing to "result.pointsize" may count as four components.
Consistently writing to "result.pointsize.x" may only count as one.
Similarly, reading a fragment's fog coordinate as "fragment.fogcoord"
may count as four components; "fragment.fogcoord.x" will only count as
one.
Revision History
Rev. Date Author Changes
---- -------- -------- --------------------------------------------
11 09/11/14 pbrown Fix cut-and-paste error in PK2US section.
10 12/14/09 mgodse Added GLX protocol.
9 10/29/09 pbrown Add language for previously undocumented errors
when using "SHORT" and "LONG" modifiers on
variable declarations. They're allowed only on
"TEMP" statements, except that "SHORT" is
allowed for "OUTPUT" as well.
8 08/11/08 jbreton Clarified that when a MOD instruction is
performed on negative operands the result is
undefined.
7 07/29/08 pbrown Discovered additional issues with texture wrap
handling, replaced with logic that applies wrap
modes per sample. Add a few instruction
pseudo-code lines explicitly identifying
undefined components.
6 05/02/08 pbrown Fix the prototype for the internal TexelFetch()
function used in the spec language; texel
coordinates are signed integers.
5 02/22/08 pbrown Clarified that when counting attribute/result
components, irrelevant/undefined components
can still count against the limits.
4 02/04/08 pbrown Fix errors in texture wrap mode handling.
Added a missing clamp to avoid sampling border
in REPEAT mode. Fixed incorrectly specified
weights for LINEAR filtering.
3 02/09/07 pbrown Updated status section (now released).
2 10/19/06 pbrown Change the token suffix for maximum texel offset
values from NV to EXT, since it is shared with
EXT_gpu_shader4. Clarify what happens on a
negate of an unsigned value. Fix typo in data
type modifier description. Add missing
description of the "BUFFER4" declaration
keyword.
1 pbrown Internal spec development.