skia / external / github.com / KhronosGroup / OpenGL-Registry / refs/heads/11723-workgroup / . / extensions / NV / NV_vertex_program2.txt

Name | |

NV_vertex_program2 | |

Name Strings | |

GL_NV_vertex_program2 | |

Contact | |

Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com) | |

Mark Kilgard, NVIDIA Corporation (mjk 'at' nvidia.com) | |

Notice | |

Copyright NVIDIA Corporation, 2000-2002. | |

IP Status | |

NVIDIA Proprietary. | |

Status | |

Implemented in CineFX (NV30) Emulation driver, August 2002. | |

Shipping in Release 40 NVIDIA driver for CineFX hardware, January 2003. | |

Version | |

Last Modified Date: 03/18/2008 | |

NVIDIA Revision: 33 | |

Number | |

287 | |

Dependencies | |

Written based on the wording of the OpenGL 1.3 Specification and requires | |

OpenGL 1.3. | |

Written based on the wording of the NV_vertex_program extension | |

specification, version 1.0. | |

NV_vertex_program is required. | |

Overview | |

This extension further enhances the concept of vertex programmability | |

introduced by the NV_vertex_program extension, and extended by | |

NV_vertex_program1_1. These extensions create a separate vertex program | |

mode where the configurable vertex transformation operations in unextended | |

OpenGL are replaced by a user-defined program. | |

This extension introduces the VP2 execution environment, which extends the | |

VP1 execution environment introduced in NV_vertex_program. The VP2 | |

environment provides several language features not present in previous | |

vertex programming execution environments: | |

* Branch instructions allow a program to jump to another instruction | |

specified in the program. | |

* Branching support allows for up to four levels of subroutine | |

calls/returns. | |

* A four-component condition code register allows an application to | |

compute a component-wise write mask at run time and apply that mask to | |

register writes. | |

* Conditional branches are supported, where the condition code register | |

is used to determine if a branch should be taken. | |

* Programmable user clipping is supported support (via the CLP0-CLP5 | |

clip distance registers). Primitives are clipped to the area where | |

the interpolated clip distances are greater than or equal to zero. | |

* Instructions can perform a component-wise absolute value operation on | |

any operand load. | |

The VP2 execution environment provides a number of new instructions, and | |

extends the semantics of several instructions already defined in | |

NV_vertex_program. | |

* ARR: Operates like ARL, except that float-to-int conversion is done | |

by rounding. Equivalent results could be achieved (less efficiently) | |

in NV_vertex program using an ADD/ARL sequence and a program parameter | |

holding the value 0.5. | |

* BRA, CAL, RET: Branch, subroutine call, and subroutine return | |

instructions. | |

* COS, SIN: Adds support for high-precision sine and cosine | |

computations. | |

* FLR, FRC: Adds support for computing the floor and fractional portion | |

of floating-point vector components. Equivalent results could be | |

achieved (less efficiently) in NV_vertex_program using the EXP | |

instruction to compute the fractional portion of one component at a | |

time. | |

* EX2, LG2: Adds support for high-precision exponentiation and | |

logarithm computations. | |

* ARA: Adds pairs of components of an address register; useful for | |

looping and other operations. | |

* SEQ, SFL, SGT, SLE, SNE, STR: Add six new "set on" instructions, | |

similar to the SLT and SGE instructions defined in NV_vertex_program. | |

Equivalent results could be achieved (less efficiently) in | |

NV_vertex_program with multiple SLT, SGE, and arithmetic instructions. | |

* SSG: Adds a new "set sign" operation, which produces a vector holding | |

negative one for negative components, zero for components with a value | |

of zero, and positive one for positive components. Equivalent results | |

could be achieved (less efficiently) in NV_vertex_program with | |

multiple SLT, SGE, and arithmetic instructions. | |

* The ARL instruction is extended to operate on four components instead | |

of a single component. | |

* All instructions that produce integer or floating-point result vectors | |

have variants that update the condition code register based on the | |

result vector. | |

This extension also raises some of the resource limitations in the | |

NV_vertex_program extension. | |

* 256 program parameter registers (versus 96 in NV_vertex_program). | |

* 16 temporary registers (versus 12 in NV_vertex_program). | |

* Two four-component integer address registers (versus one | |

single-component register in NV_vertex_program). | |

* 256 total vertex program instructions (versus 128 in | |

NV_vertex_program). | |

* Including loops, programs can execute up to 64K instructions. | |

Issues | |

This extension builds upon the NV_vertex_program extension. Should this | |

specification contain selected edits to the NV_vertex_program | |

specification or should the specs be unified? | |

RESOLVED: Since NV_vertex_program and NV_vertex_program2 programs share | |

many features, the main section of this specification is unified and | |

describes both types of programs. Other sections containing | |

NV_vertex_program features that are unchanged by this extension will not | |

be edited. | |

How can a program use condition codes to avoid extra computations? | |

Consider the example of evaluating the OpenGL lighting model for a | |

given light. If the diffuse dot product is negative (roughly 1/2 the | |

time for random geometry), the only contribution to the light is | |

ambient. In this case, condition codes and branching can skip over a | |

number of unneeded instructions. | |

# R0 holds accumulated light color | |

# R2 holds normal | |

# R3 holds computed light vector | |

# R4 holds computed half vector | |

# c[0] holds ambient light/material product | |

# c[1] holds diffuse light/material product | |

# c[2].xyz holds specular light/material product | |

# c[2].w holds specular exponent | |

DP3C R1.x, R2, R3; # diffuse dot product | |

ADD R0, R0, c[0]; # accumulate ambient | |

BRA pointsAway (LT.x) # skip rest if diffuse dot < 0 | |

MOV R1.w, c[2].w; | |

DP3 R1.y, R2, R4; # specular dot product | |

LIT R1, R1; # compute expontiated specular | |

MAD R4, c[1], R0.y; # accumulate diffuse | |

MAD R4, c[2], R0.z; # accumulate specular | |

pointsAway: | |

... # continue execution | |

How can a program use subroutines? | |

With subroutines, a program can encapsulate a small piece of | |

functionality into a subroutine and call it multiple times, as in CPU | |

code. Applications will need to identify the registers used to pass | |

data to and from the subroutine. | |

Subroutines could be used for applications like evaluating lighting | |

equations for a single light. With conditional branching and | |

subroutines, a variable number of lights (which could even vary | |

per-vertex) can be easily supported. | |

accumulate: | |

# R0 holds the accumulated result | |

# R1 holds the value to add | |

ADD R0, R1; | |

RET; | |

# Compute floor(A)*B by repeated addition using a subroutine. Yes, | |

# this is a stupid example. | |

# | |

# c[0] holds (A,B,0,1). | |

# R0 holds the accumulated result | |

# R1 holds B, the value to accumulate. | |

# R2 holds the number of iterations remaining. | |

MOV R0, c[0].z; # start with zero | |

MOV R1, c[0].y; | |

FLRC R2.x, c[0].x; | |

BRA done (LE.x); | |

top: | |

CAL accumulate; | |

ADDC R2.x, R2.x, -c[0].w; # decrement count | |

BRA top (GT.x); | |

done: | |

... | |

How can conventional OpenGL clip planes be supported in vertex programs? | |

The clip distance in the OpenGL specification can be evaluated with a | |

simple DP4 instruction that writes to one of the six clip distance | |

registers. Primitives will automatically be clipped to the half-space | |

where o[CLPx] >= 0, which matches the definition in the spec. | |

# R0 holds eye coordinates | |

# c[0] holds eye-space clip plane coefficients | |

DP4 o[CLP0].x, R0, c[0]; | |

Note that the clip plane or clip distance volume corresponding to the | |

o[CLPn] register used must be enabled, or no clipping will be performed. | |

The clip distance registers allow for clip distance volumes to be | |

computed more-or-less arbitrarily. To approximate clipping to a sphere | |

of radius <n>, the following code can be used. | |

# R0 holds eye coordinates | |

# c[0].xyz holds sphere center | |

# c[0].w holds the square of the sphere radius | |

SUB R1.xyz, R0, c[0]; # distance vector | |

DP3 R1.w, R1, R1; # compute distance squared | |

SUB o[CLP0].x, c[0].w, R1.w; # compute r^2 - d^2 | |

Since the clip distance is interpolated linearly over a primitive, the | |

clip distance evaluated at a point will represent a piecewise-linear | |

approximation of the true distance. The approximation will become | |

increasingly more accurate as the primitive is tesselated more finely. | |

How can looping be achieved in vertex programs? | |

Simple loops can be achieved using a general purpose floating-point | |

register component as a counter. The following code calls a function | |

named "function" <n> times, where <n> is specified in a program | |

parameter register component. | |

# c[0].x holds the number of iterations to execute. | |

# c[1].x holds the constant 1.0. | |

MOVC R15.x, c[0].x; | |

startLoop: | |

CAL function (GT.x); # if (counter > 0) function(); | |

SUBC R15.x, R15.x, c[1].x; # counter = counter - 1; | |

BRA startLoop (GT.x); # if (counter > 0) goto start; | |

endLoop: | |

... | |

More complex loops (where a separate index may be needed for indexed | |

addressing into the program parameter array) can be achieved using the | |

ARA instruction, which will add the x/z and y/w components of an address | |

register. | |

# c[0].x holds the number of iterations to execute | |

# c[0].y holds the initial index value | |

# c[0].z holds the constant -1.0 (used for the iteration count) | |

# c[0].w holds the index step value | |

ARLC A1, c[0]; | |

startLoop: | |

CAL function (GT.x); # if (counter > 0) function(); | |

# Note: A1.y can be used for | |

# indexing in function(). | |

ARAC A1.xy, A1; # counter = counter - 1; | |

# index += loopStep; | |

BRA startLoop (GT.x); # if (counter > 0) goto start; | |

endLoop: | |

... | |

Should this specification add support for vertex state programs beyond the | |

VP1 execution environment? | |

No. Vertex state programs are a little-used feature of | |

NV_vertex_program and don't perform particularly well. They are still | |

supported for compatibility with the original NV_vertex_program spec, | |

but they will not be extended to support new features. | |

How are NaN's be handled in the "set on" instructions (SEQ, SGE, SGT, SLE, | |

SLT, SNE)? What about MIN, MAX? SSG? When doing condition code tests? | |

Any of these instructions involving a NaN operand will produce a NaN | |

result. This behavior differs from the NV_fragment_program extension. | |

There, SEQ, SGE, SGT, SLE, and SLT will produce 0.0 if either operand is | |

a NaN, and SNE will produce 1.0 if either operand is a NaN. | |

For condition code updates, NaN values will result in "UN" condition | |

codes. All conditionals using a "UN" condition code, except "TR" and | |

"NE" will evaluate to false. This behavior is identical to the | |

functionality in NV_fragment_program. | |

How can the various features of this extension be used to provide skinning | |

functionality similar to that in ARB_vertex_blend and ARB_matrix_palette? | |

And how can that functionality be extended? | |

Assume an implementation that allows application of up to 8 matrices at | |

once. Further assume that v[12].xyzw and v[13].xyzw hold the set of 8 | |

weights, and v[14].xyzw and v[15].xyzw hold the set of 8 matrix indices. | |

Furthermore, assume that the palette of matrices are stored/tracked at | |

c[0], c[4], c[8], and so on. As an additional optimization, an | |

application can specify that fewer than 8 matrices should be applied by | |

storing a negative palette index immediately after the last index is | |

applied. | |

Skinning support in this example can be provided by the following code: | |

ARLC A0, v[14]; # load 4 palette indices at once | |

DP4 R1.x, c[A0.x+0], v[0]; # 1st matrix transform | |

DP4 R1.y, c[A0.x+1], v[0]; | |

DP4 R1.z, c[A0.x+2], v[0]; | |

DP4 R1.w, c[A0.x+3], v[0]; | |

MUL R0, R1, v[12].x; # accumulate weighted sum in R0 | |

BRA end (LT.y); # stop on a negative matrix index | |

DP4 R1.x, c[A0.y+0], v[0]; # 2nd matrix transform | |

DP4 R1.y, c[A0.y+1], v[0]; | |

DP4 R1.z, c[A0.y+2], v[0]; | |

DP4 R1.w, c[A0.y+3], v[0]; | |

MAD R0, R1, v[12].y, R0; # accumulate weighted sum in R0 | |

BRA end (LT.z); # stop on a negative matrix index | |

... # 3rd and 4th matrix transform | |

ARLC A0, v[15]; # load next four palette indices | |

BRA end (LT.x); | |

DP4 R1.x, c[A0.x+0], v[0]; # 5th matrix transform | |

DP4 R1.y, c[A0.x+1], v[0]; | |

DP4 R1.z, c[A0.x+2], v[0]; | |

DP4 R1.w, c[A0.x+3], v[0]; | |

MAD R0, R1, v[13].x, R0; # accumulate weighted sum in R0 | |

BRA end (LT.y); # stop on a negative matrix index | |

... # 6th, 7th, and 8th matrix transform | |

end: | |

... # any additional instructions | |

The amount of code used by this example could further be reduced using a | |

subroutine performing four transformations at a time: | |

ARLC A0, v[14]; # load first four indices | |

CAL skin4; # do first four transformations | |

BRA end (LT); # end if any of the first 4 indices was < 0 | |

ARLC A0, v[15]; # load second four indices | |

CAL skin4; # do second four transformations | |

end: | |

... # any additional instructions | |

Why does the RCC instruction exist? | |

RESOLVED: To perform numeric operations that will avoid overflow and | |

underflow issues. | |

Should the specification provide more examples? | |

RESOLVED: It would be nice. | |

New Procedures and Functions | |

None. | |

New Tokens | |

None. | |

Additions to Chapter 2 of the OpenGL 1.3 Specification (OpenGL Operation) | |

Modify Section 2.11, Clipping (p. 39) | |

(modify last paragraph, p. 39) When the GL is not in vertex program mode | |

(section 2.14), this view volume may be further restricted by as many as n | |

client-defined clip planes to generate the clip volume. ... | |

(add before next-to-last paragraph, p. 40) When the GL is in vertex | |

program mode, the view volume may be restricted to the individual clip | |

distance volumes derived from the per-vertex clip distances (o[CLP0] - | |

o[CLP5]). Clip distance volumes are applied if and only if per-vertex | |

clip distances are not supported in the vertex program execution | |

environment. A point P belonging to the primitive under consideration is | |

in the clip distance volume numbered n if and only if | |

c_n(P) >= 0, | |

where c_n(P) is the interpolated value of the clip distance CLPn at the | |

point P. For point primitives, c_n(P) is simply the clip distance for the | |

vertex in question. For line and triangle primitives, per-vertex clip | |

distances are interpolated using a weighted mean, with weights derived | |

according to the algorithms described in sections 3.4 and 3.5. | |

(modify next-to-last paragraph, p.40) Client-defined clip planes or clip | |

distance volumes are enabled with the generic Enable command and disabled | |

with the Disable command. The value of the argument to either command is | |

CLIP PLANEi where i is an integer between 0 and n; specifying a value of i | |

enables or disables the plane equation with index i. The constants obey | |

CLIP PLANEi = CLIP PLANE0 + i. | |

Add Section 2.14, Vertex Programs (p. 57). This section supersedes the | |

similar section added in the NV_vertex_program extension and extended in | |

the NV_vertex_program1_1 extension. | |

The conventional GL vertex transformation model described in sections 2.10 | |

through 2.13 is a configurable, but essentially hard-wired, sequence of | |

per-vertex computations based on a canonical set of per-vertex parameters | |

and vertex transformation related state such as transformation matrices, | |

lighting parameters, and texture coordinate generation parameters. | |

The general success and utility of the conventional GL vertex | |

transformation model reflects its basic correspondence to the typical | |

vertex transformation requirements of 3D applications. | |

However when the conventional GL vertex transformation model is not | |

sufficient, the vertex program mode provides a substantially more flexible | |

model for vertex transformation. The vertex program mode permits | |

applications to define their own vertex programs. | |

Section 2.14.1, Vertex Program Execution Environment | |

The vertex program execution environment is an operational model that | |

defines how a program is executed. The execution environment includes a | |

set of instructions, a set of registers, and semantic rules defining how | |

operations are performed. There are three vertex program execution | |

environments, VP1, VP1.1, and VP2. The environment names are taken from | |

the mandatory program prefix strings found at the beginning of all vertex | |

programs. The VP1.1 execution environment is a minor addition to the VP1 | |

execution environment, so references to the VP1 execution environment | |

below apply to both VP1 and VP1.1 execution environments except where | |

otherwise noted. | |

The vertex program instruction set consists primarily of floating-point | |

4-component vector operations operating on per-vertex attributes and | |

program parameters. Vertex programs execute on a per-vertex basis and | |

operate on each vertex completely independently from the processing of | |

other vertices. Vertex programs execute without data hazards so results | |

computed in one operation can be used immediately afterwards. Vertex | |

programs produce a set of vertex result vectors that becomes the set of | |

transformed vertex parameters used by primitive assembly. | |

In the VP1 environment, vertex programs execute a finite fixed sequence of | |

instructions with no branching or looping. In the VP2 environment, vertex | |

programs support conditional and unconditional branches and four levels of | |

subroutine calls. | |

The vertex program register set consists of six types of registers | |

described in the following sections. | |

Section 2.14.1.1, Vertex Attribute Registers | |

The Vertex Attribute Registers are sixteen 4-component vector | |

floating-point registers containing the current vertex's per-vertex | |

attributes. These registers are numbered 0 through 15. These registers | |

are private to each vertex program invocation and are initialized at each | |

vertex program invocation by the current vertex attribute state specified | |

with VertexAttribNV commands. These registers are read-only during vertex | |

program execution. The VertexAttribNV commands used to update the vertex | |

attribute registers can be issued both outside and inside of Begin/End | |

pairs. Vertex program execution is provoked by updating vertex attribute | |

zero. Updating vertex attribute zero outside of a Begin/End pair is | |

ignored without generating any error (identical to the Vertex command | |

operation). | |

The commands | |

void VertexAttrib{1234}{sfd}NV(uint index, T coords); | |

void VertexAttrib{1234}{sfd}vNV(uint index, T coords); | |

void VertexAttrib4ubNV(uint index, T coords); | |

void VertexAttrib4ubvNV(uint index, T coords); | |

specify the particular current vertex attribute indicated by index. | |

The coordinates for each vertex attribute are named x, y, z, and w. | |

The VertexAttrib1NV family of commands sets the x coordinate to the | |

provided single argument while setting y and z to 0 and w to 1. | |

Similarly, VertexAttrib2NV sets x and y to the specified values, | |

z to 0 and w to 1; VertexAttrib3NV sets x, y, and z, with w set | |

to 1, and VertexAttrib4NV sets all four coordinates. The error | |

INVALID_VALUE is generated if index is greater than 15. | |

No conversions are applied to the vertex attributes specified as | |

type short, float, or double. However, vertex attributes specified | |

as type ubyte are converted as described by Table 2.6. | |

The commands | |

void VertexAttribs{1234}{sfd}vNV(uint index, sizei n, T coords[]); | |

void VertexAttribs4ubvNV(uint index, sizei n, GLubyte coords[]); | |

specify a contiguous set of n vertex attributes. The effect of | |

VertexAttribs{1234}{sfd}vNV(index, n, coords) | |

is the same (assuming no errors) as the command sequence | |

#define NUM k /* where k is 1, 2, 3, or 4 components */ | |

int i; | |

for (i=n-1; i>=0; i--) { | |

VertexAttrib{NUM}{sfd}vNV(i+index, &coords[i*NUM]); | |

} | |

VertexAttribs4ubvNV behaves similarly. | |

The VertexAttribNV calls equivalent to VertexAttribsNV are issued in | |

reverse order so that vertex program execution is provoked when index | |

is zero only after all the other vertex attributes have first been | |

specified. | |

The set and operation of vertex attribute registers are identical for both | |

VP1 and VP2 execution environment. | |

Section 2.14.1.2, Program Parameter Registers | |

The Program Parameter Registers are a set of 4-component floating-point | |

vector registers containing the vertex program parameters. In the VP1 | |

execution environment, there are 96 registers, numbered 0 through 95. In | |

the VP2 execution environment, there are 256 registers, numbered 0 through | |

255. This relatively large set of registers is intended to hold | |

parameters such as matrices, lighting parameters, and constants required | |

by vertex programs. Vertex program parameter registers can be updated in | |

one of two ways: by the ProgramParameterNV commands outside of a | |

Begin/End pair or by a vertex state program executed outside of a | |

Begin/End pair (vertex state programs are discussed in section 2.14.3). | |

The commands | |

void ProgramParameter4fNV(enum target, uint index, | |

float x, float y, float z, float w) | |

void ProgramParameter4dNV(enum target, uint index, | |

double x, double y, double z, double w) | |

specify the particular program parameter indicated by index. | |

The coordinates values x, y, z, and w are assigned to the respective | |

components of the particular program parameter. target must be | |

VERTEX_PROGRAM_NV. | |

The commands | |

void ProgramParameter4dvNV(enum target, uint index, double *params); | |

void ProgramParameter4fvNV(enum target, uint index, float *params); | |

operate identically to ProgramParameter4fNV and ProgramParameter4dNV | |

respectively except that the program parameters are passed as an | |

array of four components. | |

The error INVALID_VALUE is generated if the specified index is greater | |

than or equal to the number of program parameters in the execution | |

environment (96 for VP1, 256 for VP2). | |

The commands | |

void ProgramParameters4dvNV(enum target, uint index, | |

uint num, double *params); | |

void ProgramParameters4fvNV(enum target, uint index, | |

uint num, float *params); | |

specify a contiguous set of num program parameters. The effect is | |

the same (assuming no errors) as | |

for (i=index; i<index+num; i++) { | |

ProgramParameter4{fd}vNV(target, i, ¶ms[i*4]); | |

} | |

The error INVALID_VALUE is generated if sum of <index> and <num> is | |

greater than the number of program parameters in the execution environment | |

(96 for VP1, 256 for VP2). | |

The program parameter registers are shared to all vertex program | |

invocations within a rendering context. ProgramParameterNV command | |

updates and vertex state program executions are serialized with respect to | |

vertex program invocations and other vertex state program executions. | |

Writes to the program parameter registers during vertex state program | |

execution can be maskable on a per-component basis. | |

The initial value of all 96 (VP1) or 256 (VP2) program parameter registers | |

is (0,0,0,0). | |

Section 2.14.1.3, Address Registers | |

The Address Registers are 4-component vector registers with signed 10-bit | |

integer components. In the VP1 execution environment, there is only a | |

single address register (A0) and only the x component of the register is | |

accessible. In the VP2 execution environment, there are two address | |

registers (A0 and A1), of which all four components are accessible. The | |

address registers are private to each vertex program invocation and are | |

initialized to (0,0,0,0) at every vertex program invocation. These | |

registers can be written during vertex program execution (but not read) | |

and their values can be used for as a relative offset for reading vertex | |

program parameter registers. Only the vertex program parameter registers | |

can be read using relative addressing (writes using relative addressing | |

are not supported). | |

See the discussion of relative addressing of program parameters in section | |

2.14.2.1 and the discussion of the ARL instruction in section 2.14.3.4. | |

Section 2.14.1.4, Temporary Registers | |

The Temporary Registers are 4-component floating-point vector registers | |

used to hold temporary results during vertex program execution. In the | |

VP1 execution environment, there are 12 temporary registers, numbered 0 | |

through 11. In the VP2 execution environment, there are 16 temporary | |

registers, numbered 0 through 15. These registers are private to each | |

vertex program invocation and initialized to (0,0,0,0) at every vertex | |

program invocation. These registers can be read and written during vertex | |

program execution. Writes to these registers can be maskable on a | |

per-component basis. | |

In the VP2 execution environment, there is one additional temporary | |

pseudo-register, "CC". CC is treated as unnumbered, write-only temporary | |

register, whose sole purpose is to allow instructions to modify the | |

condition code register (section 2.14.1.6) without overwriting the | |

contents of any temporary register. | |

Section 2.14.1.5, Vertex Result Registers | |

The Vertex Result Registers are 4-component floating-point vector | |

registers used to write the results of a vertex program. There are 15 | |

result registers in the VP1 execution environment, and 21 in the VP2 | |

execution environment. Each register value is initialized to (0,0,0,1) at | |

the invocation of each vertex program. Writes to the vertex result | |

registers can be maskable on a per-component basis. These registers are | |

named in Table X.1 and further discussed below. | |

Vertex Result Component | |

Register Name Description Interpretation | |

-------------- --------------------------------- -------------- | |

HPOS Homogeneous clip space position (x,y,z,w) | |

COL0 Primary color (front-facing) (r,g,b,a) | |

COL1 Secondary color (front-facing) (r,g,b,a) | |

BFC0 Back-facing primary color (r,g,b,a) | |

BFC1 Back-facing secondary color (r,g,b,a) | |

FOGC Fog coordinate (f,*,*,*) | |

PSIZ Point size (p,*,*,*) | |

TEX0 Texture coordinate set 0 (s,t,r,q) | |

TEX1 Texture coordinate set 1 (s,t,r,q) | |

TEX2 Texture coordinate set 2 (s,t,r,q) | |

TEX3 Texture coordinate set 3 (s,t,r,q) | |

TEX4 Texture coordinate set 4 (s,t,r,q) | |

TEX5 Texture coordinate set 5 (s,t,r,q) | |

TEX6 Texture coordinate set 6 (s,t,r,q) | |

TEX7 Texture coordinate set 7 (s,t,r,q) | |

CLP0(*) Clip distance 0 (d,*,*,*) | |

CLP1(*) Clip distance 1 (d,*,*,*) | |

CLP2(*) Clip distance 2 (d,*,*,*) | |

CLP3(*) Clip distance 3 (d,*,*,*) | |

CLP4(*) Clip distance 4 (d,*,*,*) | |

CLP5(*) Clip distance 5 (d,*,*,*) | |

Table X.1: Vertex Result Registers. (*) Registers CLP0 through CLP5, are | |

available only in the VP2 execution environment. | |

HPOS is the transformed vertex's homogeneous clip space position. The | |

vertex's homogeneous clip space position is converted to normalized device | |

coordinates and transformed to window coordinates as described at the end | |

of section 2.10 and in section 2.11. Further processing (subsequent to | |

vertex program termination) is responsible for clipping primitives | |

assembled from vertex program-generated vertices as described in section | |

2.10 but all client-defined clip planes are treated as if they are | |

disabled when vertex program mode is enabled. | |

Four distinct color results can be generated for each vertex. COL0 is the | |

transformed vertex's front-facing primary color. COL1 is the transformed | |

vertex's front-facing secondary color. BFC0 is the transformed vertex's | |

back-facing primary color. BFC1 is the transformed vertex's back-facing | |

secondary color. | |

Primitive coloring may operate in two-sided color mode. This behavior is | |

enabled and disabled by calling Enable or Disable with the symbolic value | |

VERTEX_PROGRAM_TWO_SIDE_NV. The selection between the back-facing colors | |

and the front-facing colors depends on the primitive of which the vertex | |

is a part. If the primitive is a point or a line segment, the | |

front-facing colors are always selected. If the primitive is a polygon | |

and two-sided color mode is disabled, the front-facing colors are | |

selected. If it is a polygon and two-sided color mode is enabled, then | |

the selection is based on the sign of the (clipped or unclipped) polygon's | |

signed area computed in window coordinates. This facingness determination | |

is identical to the two-sided lighting facingness determination described | |

in section 2.13.1. | |

The selected primary and secondary colors for each primitive are clamped | |

to the range [0,1] and then interpolated across the assembled primitive | |

during rasterization with at least 8-bit accuracy for each color | |

component. | |

FOGC is the transformed vertex's fog coordinate. The register's first | |

floating-point component is interpolated across the assembled primitive | |

during rasterization and used as the fog distance to compute per-fragment | |

the fog factor when fog is enabled. However, if both fog and vertex | |

program mode are enabled, but the FOGC vertex result register is not | |

written, the fog factor is overridden to 1.0. The register's other three | |

components are ignored. | |

Point size determination may operate in program-specified point size mode. | |

This behavior is enabled and disabled by calling Enable or Disable with | |

the symbolic value VERTEX_PROGRAM_POINT_SIZE_NV. If the vertex is for a | |

point primitive and the mode is enabled and the PSIZ vertex result is | |

written, the point primitive's size is determined by the clamped x | |

component of the PSIZ register. Otherwise (because vertex program mode is | |

disabled, program-specified point size mode is disabled, or because the | |

vertex program did not write PSIZ), the point primitive's size is | |

determined by the point size state (the state specified using the | |

PointSize command). | |

The PSIZ register's x component is clamped to the range zero through | |

either the hi value of ALIASED_POINT_SIZE_RANGE if point smoothing is | |

disabled or the hi value of the SMOOTH_POINT_SIZE_RANGE if point smoothing | |

is enabled. The register's other three components are ignored. | |

If the vertex is not for a point primitive, the value of the PSIZ vertex | |

result register is ignored. | |

TEX0 through TEX7 are the transformed vertex's texture coordinate sets for | |

texture units 0 through 7. These floating-point coordinates are | |

interpolated across the assembled primitive during rasterization and used | |

for accessing textures. If the number of texture units supported is less | |

than eight, the values of vertex result registers that do not correspond | |

to existent texture units are ignored. | |

CLP0 through CLP5, available only in the VP2 execution environment, are | |

the transformed vertex's clip distances. These floating-point coordinates | |

are used by post-vertex program clipping process (see section 2.11). | |

Section 2.14.1.6, The Condition Code Register | |

The VP2 execution environment provides a single four-component vector | |

called the condition code register. Each component of this register is | |

one of four enumerated values: GT (greater than), EQ (equal), LT (less | |

than), or UN (unordered). The condition code register can be used to mask | |

writes to registers and to evaluate conditional branches. | |

Most vertex program instructions can optionally update the condition code | |

register. When a vertex program instruction updates the condition code | |

register, a condition code component is set to LT if the corresponding | |

component of the result is less than zero, EQ if it is equal to zero, GT | |

if it is greater than zero, and UN if it is NaN (not a number). | |

The condition code register is initialized to a vector of EQ values each | |

time a vertex program executes. | |

There is no condition code register available in the VP1 execution | |

environment. | |

Section 2.14.1.7, Semantic Meaning for Vertex Attributes and Program | |

Parameters | |

One important distinction between the conventional GL vertex | |

transformation mode and the vertex program mode is that per-vertex | |

parameters and other state parameters in vertex program mode do not have | |

dedicated semantic interpretations the way that they do with the | |

conventional GL vertex transformation mode. | |

For example, in the conventional GL vertex transformation mode, the Normal | |

command specifies a per-vertex normal. The semantic that the Normal | |

command supplies a normal for lighting is established because that is how | |

the per-vertex attribute supplied by the Normal command is used by the | |

conventional GL vertex transformation mode. Similarly, other state | |

parameters such as a light source position have semantic interpretations | |

based on how the conventional GL vertex transformation model uses each | |

particular parameter. | |

In contrast, vertex attributes and program parameters for vertex programs | |

have no pre-defined semantic meanings. The meaning of a vertex attribute | |

or program parameter in vertex program mode is defined by how the vertex | |

attribute or program parameter is used by the current vertex program to | |

compute and write values to the Vertex Result Registers. This is the | |

reason that per-vertex attributes and program parameters for vertex | |

programs are numbered instead of named. | |

For convenience however, the existing per-vertex parameters for the | |

conventional GL vertex transformation mode (vertices, normals, | |

colors, fog coordinates, vertex weights, and texture coordinates) are | |

aliased to numbered vertex attributes. This aliasing is specified in | |

Table X.2. The table includes how the various conventional components | |

map to the 4-component vertex attribute components. | |

Vertex | |

Attribute Conventional Conventional | |

Register Per-vertex Conventional Component | |

Number Parameter Per-vertex Parameter Command Mapping | |

--------- --------------- ----------------------------------- ------------ | |

0 vertex position Vertex x,y,z,w | |

1 vertex weights VertexWeightEXT w,0,0,1 | |

2 normal Normal x,y,z,1 | |

3 primary color Color r,g,b,a | |

4 secondary color SecondaryColorEXT r,g,b,1 | |

5 fog coordinate FogCoordEXT fc,0,0,1 | |

6 - - - | |

7 - - - | |

8 texture coord 0 MultiTexCoord(GL_TEXTURE0_ARB, ...) s,t,r,q | |

9 texture coord 1 MultiTexCoord(GL_TEXTURE1_ARB, ...) s,t,r,q | |

10 texture coord 2 MultiTexCoord(GL_TEXTURE2_ARB, ...) s,t,r,q | |

11 texture coord 3 MultiTexCoord(GL_TEXTURE3_ARB, ...) s,t,r,q | |

12 texture coord 4 MultiTexCoord(GL_TEXTURE4_ARB, ...) s,t,r,q | |

13 texture coord 5 MultiTexCoord(GL_TEXTURE5_ARB, ...) s,t,r,q | |

14 texture coord 6 MultiTexCoord(GL_TEXTURE6_ARB, ...) s,t,r,q | |

15 texture coord 7 MultiTexCoord(GL_TEXTURE7_ARB, ...) s,t,r,q | |

Table X.2: Aliasing of vertex attributes with conventional per-vertex | |

parameters. | |

Only vertex attribute zero is treated specially because it is | |

the attribute that provokes the execution of the vertex program; | |

this is the attribute that aliases to the Vertex command's vertex | |

coordinates. | |

The result of a vertex program is the set of post-transformation | |

vertex parameters written to the Vertex Result Registers. | |

All vertex programs must write a homogeneous clip space position, but | |

the other Vertex Result Registers can be optionally written. | |

Clipping and culling are not the responsibility of vertex programs because | |

these operations assume the assembly of multiple vertices into a | |

primitive. View frustum clipping is performed subsequent to vertex | |

program execution. Clip planes are not supported in the VP1 execution | |

environment. Clip planes are supported indirectly via the clip distance | |

(o[CLPx]) registers in the VP2 execution environment. | |

Section 2.14.1.8, Vertex Program Specification | |

Vertex programs are specified as an array of ubytes. The array is a | |

string of ASCII characters encoding the program. | |

The command | |

LoadProgramNV(enum target, uint id, sizei len, | |

const ubyte *program); | |

loads a vertex program when the target parameter is VERTEX_PROGRAM_NV. | |

Multiple programs can be loaded with different names. id names the | |

program to load. The name space for programs is the positive integers | |

(zero is reserved). The error INVALID_VALUE occurs if a program is loaded | |

with an id of zero. The error INVALID_OPERATION is generated if a program | |

is loaded for an id that is currently loaded with a program of a different | |

program target. Managing the program name space and binding to vertex | |

programs is discussed later in section 2.14.1.8. | |

program is a pointer to an array of ubytes that represents the program | |

being loaded. The length of the array is indicated by len. | |

A second program target type known as vertex state programs is discussed | |

in 2.14.4. | |

At program load time, the program is parsed into a set of tokens possibly | |

separated by white space. Spaces, tabs, newlines, carriage returns, and | |

comments are considered whitespace. Comments begin with the character "#" | |

and are terminated by a newline, a carriage return, or the end of the | |

program array. | |

The Backus-Naur Form (BNF) grammar below specifies the syntactically valid | |

sequences for several types of vertex programs. The set of valid tokens | |

can be inferred from the grammar. The token "" represents an empty string | |

and is used to indicate optional rules. A program is invalid if it | |

contains any undefined tokens or characters. | |

The grammar provides for three different vertex program types, | |

corresponding to the three vertex program execution environments. VP1, | |

VP1.1, and VP2 programs match the grammar rules <vp1-program>, | |

<vp11-program>, and <vp2-program>, respectively. Some grammar rules | |

correspond to features or instruction forms available only in certain | |

execution environments. Rules beginning with the prefix "vp1-" are | |

available only to VP1 and VP1.1 programs. Rules beginning with the | |

prefixes "vp11-" and "vp2-" are available only to VP1.1 and VP2 programs, | |

respectively. | |

<program> ::= <vp1-program> | |

| <vp11-program> | |

| <vp2-program> | |

<vp1-program> ::= "!!VP1.0" <programBody> "END" | |

<vp11-program> ::= "!!VP1.1" <programBody> "END" | |

<vp2-program> ::= "!!VP2.0" <programBody> "END" | |

<programBody> ::= <optionSequence> <programText> | |

<optionSequence> ::= <option> <optionSequence> | |

| "" | |

<option> ::= "OPTION" <vp11-option> ";" | |

| "OPTION" <vp2-option> ";" | |

<vp11-option> ::= "NV_position_invariant" | |

<vp2-option> ::= "NV_position_invariant" | |

<programText> ::= <programTextItem> <programText> | |

| "" | |

<programTextItem> ::= <instruction> ";" | |

| <vp2-instructionLabel> | |

<instruction> ::= <ARL-instruction> | |

| <VECTORop-instruction> | |

| <SCALARop-instruction> | |

| <BINop-instruction> | |

| <TRIop-instruction> | |

| <vp2-BRA-instruction> | |

| <vp2-RET-instruction> | |

| <vp2-ARA-instruction> | |

<ARL-instruction> ::= <vp1-ARL-instruction> | |

| <vp2-ARL-instruction> | |

<vp1-ARL-instruction> ::= "ARL" <maskedAddrReg> "," <scalarSrc> | |

<vp2-ARL-instruction> ::= <vp2-ARLop> <maskedAddrReg> "," <vectorSrc> | |

<vp2-ARLop> ::= "ARL" | "ARLC" | |

| "ARR" | "ARRC" | |

<VECTORop-instruction> ::= <VECTORop> <maskedDstReg> "," <vectorSrc> | |

<VECTORop> ::= "LIT" | |

| "MOV" | |

| <vp11-VECTORop> | |

| <vp2-VECTORop> | |

<vp11-VECTORop> ::= "ABS" | |

<vp2-VECTORop> ::= "ABSC" | |

| "FLR" | "FLRC" | |

| "FRC" | "FRCC" | |

| "LITC" | |

| "MOVC" | |

| "SSG" | "SSGC" | |

<SCALARop-instruction> ::= <SCALARop> <maskedDstReg> "," <scalarSrc> | |

<SCALARop> ::= "EXP" | |

| "LOG" | |

| "RCP" | |

| "RSQ" | |

| <vp11-SCALARop> | |

| <vp2-SCALARop> | |

<vp11-SCALARop> ::= "RCC" | |

<vp2-SCALARop> ::= "COS" | "COSC" | |

| "EX2" | "EX2C" | |

| "LG2" | "LG2C" | |

| "EXPC" | |

| "LOGC" | |

| "RCCC" | |

| "RCPC" | |

| "RSQC" | |

| "SIN" | "SINC" | |

<BINop-instruction> ::= <BINop> <maskedDstReg> "," <vectorSrc> "," | |

<vectorSrc> | |

<BINop> ::= "ADD" | |

| "DP3" | |

| "DP4" | |

| "DST" | |

| "MAX" | |

| "MIN" | |

| "MUL" | |

| "SGE" | |

| "SLT" | |

| <vp11-BINop> | |

| <vp2-BINop> | |

<vp11-BINop> ::= "DPH" | |

| "SUB" | |

<vp2-BINop> ::= "ADDC" | |

| "DP3C" | |

| "DP4C" | |

| "DPHC" | |

| "DSTC" | |

| "MAXC" | |

| "MINC" | |

| "MULC" | |

| "SEQ" | "SEQC" | |

| "SFL" | "SFLC" | |

| "SGEC" | |

| "SGT" | "SGTC" | |

| "SLTC" | |

| "SLE" | "SLEC" | |

| "SNE" | "SNEC" | |

| "STR" | "STRC" | |

| "SUBC" | |

<TRIop-instruction> ::= <TRIop> <maskedDstReg> "," <vectorSrc> "," | |

<vectorSrc> "," <vectorSrc> | |

<TRIop> ::= "MAD" | |

| <vp2-TRIop> | |

<vp2-TRIop> ::= "MADC" | |

<vp2-BRA-instruction> ::= <vp2-BRANCHop> <vp2-branchLabel> | |

<vp2-branchCondition> | |

<vp2-BRANCHop> ::= "BRA" | |

| "CAL" | |

<vp2-RET-instruction> ::= "RET" <vp2-branchCondition> | |

<vp2-ARA-instruction> ::= <vp2-ARAop> <maskedAddrReg> "," <addrRegister> | |

<vp2-ARAop> ::= "ARA" | "ARAC" | |

<scalarSrc> ::= <baseScalarSrc> | |

| <vp2-absScalarSrc> | |

<vp2-absScalarSrc> ::= <optionalSign> "|" <baseScalarSrc> "|" | |

<baseScalarSrc> ::= <optionalSign> <srcRegister> <scalarSuffix> | |

<vectorSrc> ::= <baseVectorSrc> | |

| <vp2-absVectorSrc> | |

<vp2-absVectorSrc> ::= <optionalSign> "|" <baseVectorSrc> "|" | |

<baseVectorSrc> ::= <optionalSign> <srcRegister> <swizzleSuffix> | |

<srcRegister> ::= <vtxAttribRegister> | |

| <progParamRegister> | |

| <tempRegister> | |

<maskedDstReg> ::= <dstRegister> <optionalWriteMask> | |

<optionalCCMask> | |

<dstRegister> ::= <vtxResultRegister> | |

| <tempRegister> | |

| <vp2-nullRegister> | |

<vp2-nullRegister> ::= "CC" | |

<vp2-branchCondition> ::= <optionalCCMask> | |

<vtxAttribRegister> ::= "v" "[" vtxAttribRegNum "]" | |

<vtxAttribRegNum> ::= decimal integer from 0 to 15 inclusive | |

| "OPOS" | |

| "WGHT" | |

| "NRML" | |

| "COL0" | |

| "COL1" | |

| "FOGC" | |

| "TEX0" | |

| "TEX1" | |

| "TEX2" | |

| "TEX3" | |

| "TEX4" | |

| "TEX5" | |

| "TEX6" | |

| "TEX7" | |

<progParamRegister> ::= <absProgParamReg> | |

| <relProgParamReg> | |

<absProgParamReg> ::= "c" "[" <progParamRegNum> "]" | |

<progParamRegNum> ::= <vp1-progParamRegNum> | |

| <vp2-progParamRegNum> | |

<vp1-progParamRegNum> ::= decimal integer from 0 to 95 inclusive | |

<vp2-progParamRegNum> ::= decimal integer from 0 to 255 inclusive | |

<relProgParamReg> ::= "c" "[" <scalarAddr> <relProgParamOffset> "]" | |

<relProgParamOffset> ::= "" | |

| "+" <progParamPosOffset> | |

| "-" <progParamNegOffset> | |

<progParamPosOffset> ::= <vp1-progParamPosOff> | |

| <vp2-progParamPosOff> | |

<vp1-progParamPosOff> ::= decimal integer from 0 to 63 inclusive | |

<vp2-progParamPosOff> ::= decimal integer from 0 to 255 inclusive | |

<progParamNegOffset> ::= <vp1-progParamNegOff> | |

| <vp2-progParamNegOff> | |

<vp1-progParamNegOff> ::= decimal integer from 0 to 64 inclusive | |

<vp2-progParamNegOff> ::= decimal integer from 0 to 256 inclusive | |

<tempRegister> ::= "R0" | "R1" | "R2" | "R3" | |

| "R4" | "R5" | "R6" | "R7" | |

| "R8" | "R9" | "R10" | "R11" | |

<vp2-tempRegister> ::= "R12" | "R13" | "R14" | "R15" | |

<vtxResultRegister> ::= "o" "[" <vtxResultRegName> "]" | |

<vtxResultRegName> ::= "HPOS" | |

| "COL0" | |

| "COL1" | |

| "BFC0" | |

| "BFC1" | |

| "FOGC" | |

| "PSIZ" | |

| "TEX0" | |

| "TEX1" | |

| "TEX2" | |

| "TEX3" | |

| "TEX4" | |

| "TEX5" | |

| "TEX6" | |

| "TEX7" | |

| <vp2-resultRegName> | |

<vp2-resultRegName> ::= "CLP0" | |

| "CLP1" | |

| "CLP2" | |

| "CLP3" | |

| "CLP4" | |

| "CLP5" | |

<scalarAddr> ::= <addrRegister> "." <addrRegisterComp> | |

<maskedAddrReg> ::= <addrRegister> <addrWriteMask> | |

<addrRegister> ::= "A0" | |

| <vp2-addrRegister> | |

<vp2-addrRegister> ::= "A1" | |

<addrRegisterComp> ::= "x" | |

| <vp2-addrRegisterComp> | |

<vp2-addrRegisterComp> ::= "y" | |

| "z" | |

| "w" | |

<addrWriteMask> ::= "." "x" | |

| <vp2-addrWriteMask> | |

<vp2-addrWriteMask> ::= "" | |

| "." "y" | |

| "." "x" "y" | |

| "." "z" | |

| "." "x" "z" | |

| "." "y" "z" | |

| "." "x" "y" "z" | |

| "." "w" | |

| "." "x" "w" | |

| "." "y" "w" | |

| "." "x" "y" "w" | |

| "." "z" "w" | |

| "." "x" "z" "w" | |

| "." "y" "z" "w" | |

| "." "x" "y" "z" "w" | |

<optionalSign> ::= "" | |

| "-" | |

| <vp2-optionalSign> | |

<vp2-optionalSign> ::= "+" | |

<vp2-instructionLabel> ::= <vp2-branchLabel> ":" | |

<vp2-branchLabel> ::= <identifier> | |

<optionalWriteMask> ::= "" | |

| "." "x" | |

| "." "y" | |

| "." "x" "y" | |

| "." "z" | |

| "." "x" "z" | |

| "." "y" "z" | |

| "." "x" "y" "z" | |

| "." "w" | |

| "." "x" "w" | |

| "." "y" "w" | |

| "." "x" "y" "w" | |

| "." "z" "w" | |

| "." "x" "z" "w" | |

| "." "y" "z" "w" | |

| "." "x" "y" "z" "w" | |

<optionalCCMask> ::= "" | |

| <vp2-ccMask> | |

<vp2-ccMask> ::= "(" <vp2-ccMaskRule> <swizzleSuffix> ")" | |

<vp2-ccMaskRule> ::= "EQ" | "GE" | "GT" | "LE" | "LT" | "NE" | |

| "TR" | "FL" | |

<scalarSuffix> ::= "." <component> | |

<swizzleSuffix> ::= "" | |

| "." <component> | |

| "." <component> <component> | |

<component> <component> | |

<component> ::= "x" | |

| "y" | |

| "z" | |

| "w" | |

The <identifier> rule matches a sequence of one or more letters ("A" | |

through "Z", "a" through "z", and "_") and digits ("0" through "9); the | |

first character must be a letter. The underscore ("_") counts as a | |

letter. Upper and lower case letters are different (names are | |

case-sensitive). | |

The <vertexAttribRegNum> rule matches both register numbers 0 through 15 | |

and a set of mnemonics that abbreviate the aliasing of conventional | |

per-vertex parameters to vertex attribute register numbers. Table X.3 | |

shows the mapping from mnemonic to vertex attribute register number and | |

what the mnemonic abbreviates. | |

Vertex Attribute | |

Mnemonic Register Number Meaning | |

-------- ---------------- -------------------- | |

"OPOS" 0 object position | |

"WGHT" 1 vertex weight | |

"NRML" 2 normal | |

"COL0" 3 primary color | |

"COL1" 4 secondary color | |

"FOGC" 5 fog coordinate | |

"TEX0" 8 texture coordinate 0 | |

"TEX1" 9 texture coordinate 1 | |

"TEX2" 10 texture coordinate 2 | |

"TEX3" 11 texture coordinate 3 | |

"TEX4" 12 texture coordinate 4 | |

"TEX5" 13 texture coordinate 5 | |

"TEX6" 14 texture coordinate 6 | |

"TEX7" 15 texture coordinate 7 | |

Table X.3: The mapping between vertex attribute register numbers, | |

mnemonics, and meanings. | |

A vertex program fails to load if it does not write at least one component | |

of the HPOS register. | |

A vertex program fails to load in the VP1 execution environment if it | |

contains more than 128 instructions. A vertex program fails to load in | |

the VP2 execution environment if it contains more than 256 instructions. | |

Each block of text matching the <instruction> rule counts as an | |

instruction. | |

A vertex program fails to load if any instruction sources more than one | |

unique program parameter register. An instruction can match the | |

<progParamRegister> rule more than once only if all such matches are | |

identical. | |

A vertex program fails to load if any instruction sources more than one | |

unique vertex attribute register. An instruction can match the | |

<vtxAttribRegister> rule more than once only if all such matches refer to | |

the same register. | |

The error INVALID_OPERATION is generated if a vertex program fails to load | |

because it is not syntactically correct or for one of the semantic | |

restrictions listed above. | |

The error INVALID_OPERATION is generated if a program is loaded for id | |

when id is currently loaded with a program of a different target. | |

A successfully loaded vertex program is parsed into a sequence of | |

instructions. Each instruction is identified by its tokenized name. The | |

operation of these instructions when executed is defined in section | |

2.14.1.10. | |

A successfully loaded program replaces the program previously assigned to | |

the name specified by id. If the OUT_OF_MEMORY error is generated by | |

LoadProgramNV, no change is made to the previous contents of the named | |

program. | |

Querying the value of PROGRAM_ERROR_POSITION_NV returns a ubyte offset | |

into the last loaded program string indicating where the first error in | |

the program. If the program fails to load because of a semantic | |

restriction that cannot be determined until the program is fully scanned, | |

the error position will be len, the length of the program. If the program | |

loads successfully, the value of PROGRAM_ERROR_POSITION_NV is assigned the | |

value negative one. | |

Section 2.14.1.9, Vertex Program Binding and Program Management | |

The current vertex program is invoked whenever vertex attribute zero is | |

updated (whether by a VertexAttributeNV or Vertex command). The current | |

vertex program is updated by | |

BindProgramNV(enum target, uint id); | |

where target must be VERTEX_PROGRAM_NV. This binds the vertex program | |

named by id as the current vertex program. The error INVALID_OPERATION | |

is generated if id names a program that is not a vertex program | |

(for example, if id names a vertex state program as described in | |

section 2.14.4). | |

Binding to a nonexistent program id does not generate an error. | |

In particular, binding to program id zero does not generate an error. | |

However, because program zero cannot be loaded, program zero is | |

always nonexistent. If a program id is successfully loaded with a | |

new vertex program and id is also the currently bound vertex program, | |

the new program is considered the currently bound vertex program. | |

The INVALID_OPERATION error is generated when both vertex program | |

mode is enabled and Begin is called (or when a command that performs | |

an implicit Begin is called) if the current vertex program is | |

nonexistent or not valid. A vertex program may not be valid for | |

reasons explained in section 2.14.5. | |

Programs are deleted by calling | |

void DeleteProgramsNV(sizei n, const uint *ids); | |

ids contains n names of programs to be deleted. After a program | |

is deleted, it becomes nonexistent, and its name is again unused. | |

If a program that is currently bound is deleted, it is as though | |

BindProgramNV has been executed with the same target as the deleted | |

program and program zero. Unused names in ids are silently ignored, | |

as is the value zero. | |

The command | |

void GenProgramsNV(sizei n, uint *ids); | |

returns n previously unused program names in ids. These names | |

are marked as used, for the purposes of GenProgramsNV only, | |

but they become existent programs only when the are first loaded | |

using LoadProgramNV. The error INVALID_VALUE is generated if n | |

is negative. | |

An implementation may choose to establish a working set of programs on | |

which binding and ExecuteProgramNV operations (execute programs are | |

explained in section 2.14.4) are performed with higher performance. | |

A program that is currently part of this working set is said to | |

be resident. | |

The command | |

boolean AreProgramsResidentNV(sizei n, const uint *ids, | |

boolean *residences); | |

returns TRUE if all of the n programs named in ids are resident, | |

or if the implementation does not distinguish a working set. If at | |

least one of the programs named in ids is not resident, then FALSE is | |

returned, and the residence of each program is returned in residences. | |

Otherwise the contents of residences are not changed. If any of | |

the names in ids are nonexistent or zero, FALSE is returned, the | |

error INVALID_VALUE is generated, and the contents of residences | |

are indeterminate. The residence status of a single named program | |

can also be queried by calling GetProgramivNV with id set to the | |

name of the program and pname set to PROGRAM_RESIDENT_NV. | |

AreProgramsResidentNV indicates only whether a program is | |

currently resident, not whether it could not be made resident. | |

An implementation may choose to make a program resident only on | |

first use, for example. The client may guide the GL implementation | |

in determining which programs should be resident by requesting a | |

set of programs to make resident. | |

The command | |

void RequestResidentProgramsNV(sizei n, const uint *ids); | |

requests that the n programs named in ids should be made resident. | |

While all the programs are not guaranteed to become resident, | |

the implementation should make a best effort to make as many of | |

the programs resident as possible. As a result of making the | |

requested programs resident, program names not among the requested | |

programs may become non-resident. Higher priority for residency | |

should be given to programs listed earlier in the ids array. | |

RequestResidentProgramsNV silently ignores attempts to make resident | |

nonexistent program names or zero. AreProgramsResidentNV can be | |

called after RequestResidentProgramsNV to determine which programs | |

actually became resident. | |

Section 2.14.2, Vertex Program Operation | |

In the VP1 execution environment, there are twenty-one vertex program | |

instructions. Four instructions (ABS, DPH, RCC, and SUB) are available | |

only in the VP1.1 execution environment. The instructions and their | |

respective input and output parameters are summarized in Table X.4. | |

Instruction Inputs Output Description | |

----------- ------ ------ -------------------------------- | |

ABS(*) v v absolute value | |

ADD v,v v add | |

ARL v as address register load | |

DP3 v,v ssss 3-component dot product | |

DP4 v,v ssss 4-component dot product | |

DPH(*) v,v ssss homogeneous dot product | |

DST v,v v distance vector | |

EXP s v exponential base 2 (approximate) | |

LIT v v compute light coefficients | |

LOG s v logarithm base 2 (approximate) | |

MAD v,v,v v multiply and add | |

MAX v,v v maximum | |

MIN v,v v minimum | |

MOV v v move | |

MUL v,v v multiply | |

RCC(*) s ssss reciprocal (clamped) | |

RCP s ssss reciprocal | |

RSQ s ssss reciprocal square root | |

SGE v,v v set on greater than or equal | |

SLT v,v v set on less than | |

SUB(*) v,v v subtract | |

Table X.4: Summary of vertex program instructions in the VP1 execution | |

environment. "v" indicates a floating-point vector input or output, "s" | |

indicates a floating-point scalar input, "ssss" indicates a scalar output | |

replicated across a 4-component vector, "as" indicates a single component | |

of an address register. | |

In the VP2 execution environment, are thirty-nine vertex program | |

instructions. Vertex program instructions may have an optional suffix of | |

"C" to allow an update of the condition code register (section 2.14.1.6). | |

For example, there are two instructions to perform vector addition, "ADD" | |

and "ADDC". The vertex program instructions available in the VP2 | |

execution environment and their respective input and output parameters are | |

summarized in Table X.5. | |

Instruction Inputs Output Description | |

----------- ------ ------ -------------------------------- | |

ABS[C] v v absolute value | |

ADD[C] v,v v add | |

ARA[C] av av address register add | |

ARL[C] v av address register load | |

ARR[C] v av address register load (with round) | |

BRA as none branch | |

CAL as none subroutine call | |

COS[C] s ssss cosine | |

DP3[C] v,v ssss 3-component dot product | |

DP4[C] v,v ssss 4-component dot product | |

DPH[C] v,v ssss homogeneous dot product | |

DST[C] v,v v distance vector | |

EX2[C] s ssss exponential base 2 | |

EXP[C] s v exponential base 2 (approximate) | |

FLR[C] v v floor | |

FRC[C] v v fraction | |

LG2[C] s ssss logarithm base 2 | |

LIT[C] v v compute light coefficients | |

LOG[C] s v logarithm base 2 (approximate) | |

MAD[C] v,v,v v multiply and add | |

MAX[C] v,v v maximum | |

MIN[C] v,v v minimum | |

MOV[C] v v move | |

MUL[C] v,v v multiply | |

RCC[C] s ssss reciprocal (clamped) | |

RCP[C] s ssss reciprocal | |

RET none none subroutine call return | |

RSQ[C] s ssss reciprocal square root | |

SEQ[C] v,v v set on equal | |

SFL[C] v,v v set on false | |

SGE[C] v,v v set on greater than or equal | |

SGT[C] v,v v set on greater than | |

SIN[C] s ssss sine | |

SLE[C] v,v v set on less than or equal | |

SLT[C] v,v v set on less than | |

SNE[C] v,v v set on not equal | |

SSG[C] v v set sign | |

STR[C] v,v v set on true | |

SUB[C] v,v v subtract | |

Table X.5: Summary of vertex program instructions in the VP2 execution | |

environment. "v" indicates a floating-point vector input or output, "s" | |

indicates a floating-point scalar input, "ssss" indicates a scalar output | |

replicated across a 4-component vector, "av" indicates a full address | |

register, "as" indicates a single component of an address register. | |

Section 2.14.2.1, Vertex Program Operands | |

Most vertex program instructions operate on floating-point vectors, | |

floating-point scalars, or integer scalars as, indicated in the grammar | |

(see section 2.14.1.8) by the rules <vectorSrc>, <scalarSrc>, and | |

<scalarAddr>, respectively. | |

The basic set of floating-point scalar operands is defined by the grammar | |

rule <baseScalarSrc>. Scalar operands are single components of vertex | |

attribute, program parameter, or temporary registers, as allowed by the | |

<srcRegister> rule. A vector component is selected by the <scalarSuffix> | |

rule, where the characters "x", "y", "z", and "w" select the x, y, z, and | |

w components, respectively, of the vector. | |

The basic set of floating-point vector operands is defined by the grammar | |

rule <baseVectorSrc>. Vector operands can be obtained from vertex | |

attribute, program parameter, or temporary registers as allowed by the | |

<srcRegister> rule. | |

Basic vector operands can be swizzled according to the <swizzleSuffix> | |

rule. In its most general form, the <swizzleSuffix> rule matches the | |

pattern ".????" where each question mark is replaced with one of "x", "y", | |

"z", or "w". For such patterns, the x, y, z, and w components of the | |

operand are taken from the vector components named by the first, second, | |

third, and fourth character of the pattern, respectively. For example, if | |

the swizzle suffix is ".yzzx" and the specified source contains {2,8,9,0}, | |

the swizzled operand used by the instruction is {8,9,9,2}. | |

If the <swizzleSuffix> rule matches "", it is treated as though it were | |

".xyzw". If the <swizzleSuffix> rule matches (ignoring whitespace) ".x", | |

".y", ".z", or ".w", these are treated the same as ".xxxx", ".yyyy", | |

".zzzz", and ".wwww" respectively. | |

Floating-point scalar or vector operands can optionally be negated | |

according to the <negate> rules in <baseScalarSrc> and <baseVectorSrc>. | |

If the <negate> matches "-", each operand or operand component is negated. | |

In the VP2 execution environment, a component-wise absolute value | |

operation is performed on an operand if the <scalarSrc> or <vectorSrc> | |

rules match <vp2-absScalarSrc> or <vp2-absVectorSrc>. In this case, the | |

absolute value of each component of the operand is taken. In addition, if | |

the <negate> rule in <vp2-absScalarSrc> or <vp2-absVectorSrc> matches "-", | |

each component is subsequently negated. | |

Integer scalar operands are single components of one of the address | |

register vectors, as identified by the <addrRegister> rule. A vector | |

component is selected by the <scalarSuffix> rule in the same manner as | |

floating-point scalar operands. Negation and absolute value operations | |

are not available for integer scalar operands. | |

The following pseudo-code spells out the operand generation process. In | |

the pseudo-code, "float" and "int" are floating-point and integer scalar | |

types, while "floatVec" and "intVec" are four-component vectors. "source" | |

is the register used for the operand, matching the <srcRegister> or | |

<addrRegister> rules. "absolute" is TRUE if the operand matches the | |

<vp2-absScalarSrc> or <vp2-absVectorSrc> rules, and FALSE otherwise. | |

"negateBase" is TRUE if the <negate> rule in <baseScalarSrc> or | |

<baseVectorSrc> matches "-" and FALSE otherwise. "negateAbs" is TRUE if | |

the <negate> rule in <vp2-absScalarSrc> or <vp2-absVectorSrc> matches "-" | |

and FALSE otherwise. The ".c***", ".*c**", ".**c*", ".***c" modifiers | |

refer to the x, y, z, and w components obtained by the swizzle operation. | |

floatVec VectorLoad(floatVec source) | |

{ | |

floatVec operand; | |

operand.x = source.c***; | |

operand.y = source.*c**; | |

operand.z = source.**c*; | |

operand.w = source.***c; | |

if (negateBase) { | |

operand.x = -operand.x; | |

operand.y = -operand.y; | |

operand.z = -operand.z; | |

operand.w = -operand.w; | |

} | |

if (absolute) { | |

operand.x = abs(operand.x); | |

operand.y = abs(operand.y); | |

operand.z = abs(operand.z); | |

operand.w = abs(operand.w); | |

} | |

if (negateAbs) { | |

operand.x = -operand.x; | |

operand.y = -operand.y; | |

operand.z = -operand.z; | |

operand.w = -operand.w; | |

} | |

return operand; | |

} | |

float ScalarLoad(floatVec source) | |

{ | |

float operand; | |

operand = source.c***; | |

if (negateBase) { | |

operand = -operand; | |

} | |

if (absolute) { | |

operand = abs(operand); | |

} | |

if (negateAbs) { | |

operand = -operand; | |

} | |

return operand; | |

} | |

intVec AddrVectorLoad(intVec addrReg) | |

{ | |

intVec operand; | |

operand.x = source.c***; | |

operand.y = source.*c**; | |

operand.z = source.**c*; | |

operand.w = source.***c; | |

return operand; | |

} | |

int AddrScalarLoad(intVec addrReg) | |

{ | |

return source.c***; | |

} | |

If an operand is obtained from a program parameter register, by matching | |

the <progParamRegister> rule, the register number can be obtained by | |

absolute or relative addressing. | |

When absolute addressing is used, by matching the <absProgParamReg> rule, | |

the program parameter register number is the number matching the | |

<progParamRegNum>. | |

When relative addressing is used, by matching the <relProgParamReg> rule, | |

the program parameter register number is computed during program | |

execution. An index is computed by adding the integer scalar operand | |

specified by the <scalarAddr> rule to the positive or negative offset | |

specified by the <progParamOffset> rule. If <progParamOffset> matches "", | |

an offset of zero is used. | |

The following pseudo-code spells out the process of loading a program | |

parameter. "addrReg" refers to the address register used for relative | |

addressing, "absolute" is TRUE if the operand uses absolute addressing and | |

FALSE otherwise. "paramNumber" is the program parameter number for | |

absolute addressing; "paramOffset" is the program parameter offset for | |

relative addressing. "paramRegiser" is an array holding the complete set | |

of program parameter registers. | |

floatVec ProgramParameterLoad(intVec addrReg) | |

{ | |

int index; | |

if (absolute) { | |

index = paramNumber; | |

} else { | |

index = AddrScalarLoad(addrReg) + paramOffset | |

} | |

return paramRegister[index]; | |

} | |

Section 2.14.2.2, Vertex Program Destination Register Update | |

Most vertex program instructions write a 4-component result vector to a | |

single temporary, vertex result, or address register. Writes to | |

individual components of the destination register are controlled by | |

individual component write masks specified as part of the instruction. In | |

the VP2 execution environment, writes are additionally controlled by the a | |

condition code write mask, which is computed at run time. | |

The component write mask is specified by the <optionalWriteMask> rule | |

found in the <maskedDstReg> or <maskedAddrReg> rule. If the optional mask | |

is "", all components are enabled. Otherwise, the optional mask names the | |

individual components to enable. The characters "x", "y", "z", and "w" | |

match the x, y, z, and w components respectively. For example, an | |

optional mask of ".xzw" indicates that the x, z, and w components should | |

be enabled for writing but the y component should not. The grammar | |

requires that the destination register mask components must be listed in | |

"xyzw" order. | |

In the VP2 execution environment, the condition code write mask is | |

specified by the <optionalCCMask> rule found in the <maskedDstReg> and | |

<maskedAddrReg> rules. If the condition code mask matches "", all | |

components are enabled. Otherwise, the condition code register is loaded | |

and swizzled according to the swizzle codes specified by <swizzleSuffix>. | |

Each component of the swizzled condition code is tested according to the | |

rule given by <ccMaskRule>. <ccMaskRule> may have the values "EQ", "NE", | |

"LT", "GE", LE", or "GT", which mean to enable writes if the corresponding | |

condition code field evaluates to equal, not equal, less than, greater | |

than or equal, less than or equal, or greater than, respectively. | |

Comparisons involving condition codes of "UN" (unordered) evaluate to true | |

for "NE" and false otherwise. For example, if the condition code is | |

(GT,LT,EQ,GT) and the condition code mask is "(NE.zyxw)", the swizzle | |

operation will load (EQ,LT,GT,GT) and the mask will thus will enable | |

writes on the y, z, and w components. In addition, "TR" always enables | |

writes and "FL" always disables writes, regardless of the condition code. | |

Each component of the destination register is updated with the result of | |

the vertex program instruction if and only if the component is enabled for | |

writes by the component write mask, and the optional condition code mask | |

(if applicable). Otherwise, the component of the destination register | |

remains unchanged. | |

In the VP2 execution environment, a vertex program instruction can also | |

optionally update the condition code register. The condition code is | |

updated if the condition code register update suffix "C" is present in the | |

instruction. The instruction "ADDC" will update the condition code; the | |

otherwise equivalent instruction "ADD" will not. If condition code | |

updates are enabled, each component of the destination register enabled | |

for writes is compared to zero. The corresponding component of the | |

condition code is set to "LT", "EQ", or "GT", if the written component is | |

less than, equal to, or greater than zero, respectively. Condition code | |

components are set to "UN" if the written component is NaN. Values of | |

-0.0 and +0.0 both evaluate to "EQ". If a component of the destination | |

register is not enabled for writes, the corresponding condition code | |

component is also unchanged. | |

In the following example code, | |

# R1=(-2, 0, 2, NaN) R0 CC | |

MOVC R0, R1; # ( -2, 0, 2, NaN) (LT,EQ,GT,UN) | |

MOVC R0.xyz, R1.yzwx; # ( 0, 2, NaN, NaN) (EQ,GT,UN,UN) | |

MOVC R0 (NE), R1.zywx; # ( 0, 0, NaN, -2) (EQ,EQ,UN,LT) | |

the first instruction writes (-2,0,2,NaN) to R0 and updates the condition | |

code to (LT,EQ,GT,UN). The second instruction, only the "x", "y", and "z" | |

components of R0 and the condition code are updated, so R0 ends up with | |

(0,2,NaN,NaN) and the condition code ends up with (EQ,GT,UN,UN). In the | |

third instruction, the condition code mask disables writes to the x | |

component (its condition code field is "EQ"), so R0 ends up with | |

(0,0,NaN,-2) and the condition code ends up with (EQ,EQ,UN,LT). | |

The following pseudocode illustrates the process of writing a result | |

vector to the destination register. In the pseudocode, "instrmask" refers | |

to the component write mask given by the <optionalWriteMask> rule. In the | |

VP1 execution environment, "ccMaskRule" is always "" and "updatecc" is | |

always FALSE. In the VP2 execution environment, "ccMaskRule" refers to | |

the condition code mask rule given by <vp2-optionalCCMask> and "updatecc" | |

is TRUE if and only if condition code updates are enabled. "result", | |

"destination", and "cc" refer to the result vector, the register selected | |

by <dstRegister> and the condition code, respectively. Condition codes do | |

not exist in the VP1 execution environment. | |

boolean TestCC(CondCode field) { | |

switch (ccMaskRule) { | |

case "EQ": return (field == "EQ"); | |

case "NE": return (field != "EQ"); | |

case "LT": return (field == "LT"); | |

case "GE": return (field == "GT" || field == "EQ"); | |

case "LE": return (field == "LT" || field == "EQ"); | |

case "GT": return (field == "GT"); | |

case "TR": return TRUE; | |

case "FL": return FALSE; | |

case "": return TRUE; | |

} | |

} | |

enum GenerateCC(float value) { | |

if (value == NaN) { | |

return UN; | |

} else if (value < 0) { | |

return LT; | |

} else if (value == 0) { | |

return EQ; | |

} else { | |

return GT; | |

} | |

} | |

void UpdateDestination(floatVec destination, floatVec result) | |

{ | |

floatVec merged; | |

ccVec mergedCC; | |

// Merge the converted result into the destination register, under | |

// control of the compile- and run-time write masks. | |

merged = destination; | |

mergedCC = cc; | |

if (instrMask.x && TestCC(cc.c***)) { | |

merged.x = result.x; | |

if (updatecc) mergedCC.x = GenerateCC(result.x); | |

} | |

if (instrMask.y && TestCC(cc.*c**)) { | |

merged.y = result.y; | |

if (updatecc) mergedCC.y = GenerateCC(result.y); | |

} | |

if (instrMask.z && TestCC(cc.**c*)) { | |

merged.z = result.z; | |

if (updatecc) mergedCC.z = GenerateCC(result.z); | |

} | |

if (instrMask.w && TestCC(cc.***c)) { | |

merged.w = result.w; | |

if (updatecc) mergedCC.w = GenerateCC(result.w); | |

} | |

// Write out the new destination register and condition code. | |

destination = merged; | |

cc = mergedCC; | |

} | |

Section 2.14.2.3, Vertex Program Execution | |

In the VP1 execution environment, vertex programs consist of a sequence of | |

instructions without no support for branching. Vertex programs begin by | |

executing the first instruction in the program, and execute instructions | |

in the order specified in the program until the last instruction is | |

reached. | |

VP2 vertex programs can contain one or more instruction labels, matching | |

the grammar rule <vp2-instructionLabel>. An instruction label can be | |

referred to explicitly in branch (BRA) or subroutine call (CAL) | |

instructions. Instruction labels can be defined or used at any point in | |

the body of a program, and can be used in instructions before being | |

defined in the program string. | |

VP2 vertex program branching instructions can be conditional. The branch | |

condition is specified by the <vp2-conditionMask> and may depend on the | |

contents of the condition code register. Branch conditions are evaluated | |

by evaluating a condition code write mask in exactly the same manner as | |

done for register writes (section 2.14.2.2). If any of the four | |

components of the condition code write mask are enabled, the branch is | |

taken and execution continues with the instruction following the label | |

specified in the instruction. Otherwise, the instruction is ignored and | |

vertex program execution continues with the next instruction. In the | |

following example code, | |

MOVC CC, c[0]; # c[0]=(-2, 0, 2, NaN), CC gets (LT,EQ,GT,UN) | |

BRA label1 (LT.xyzw); | |

MOV R0,R1; # not executed | |

label1: | |

BRA label2 (LT.wyzw); | |

MOV R0,R2; # executed | |

label2: | |

the first BRA instruction loads a condition code of (LT,EQ,GT,UN) while | |

the second BRA instruction loads a condition code of (UN,EQ,GT,UN). The | |

first branch will be taken because the "x" component evaluates to LT; the | |

second branch will not be taken because no component evaluates to LT. | |

VP2 vertex programs can specify subroutine calls. When a subroutine call | |

(CAL) instruction is executed, a reference to the instruction immediately | |

following the CAL instruction is pushed onto the call stack. When a | |

subroutine return (RET) instruction is executed, an instruction reference | |

is popped off the call stack and program execution continues with the | |

popped instruction. A vertex program will terminate if a CAL instruction | |

is executed with four entries already in the call stack or if a RET | |

instruction is executed with an empty call stack. | |

If a VP2 vertex program has an instruction label "main", program execution | |

begins with the instruction immediately following the instruction label. | |

Otherwise, program execution begins with the first instruction of the | |

program. Instructions will be executed sequentially in the order | |

specified in the program, although branch instructions will affect the | |

instruction execution order, as described above. A vertex program will | |

terminate after executing a RET instruction with an empty call stack. A | |

vertex program will also terminate after executing the last instruction in | |

the program, unless that instruction was a taken branch. | |

A vertex program will fail to load if an instruction refers to a label | |

that is not defined in the program string. | |

A vertex program will terminate abnormally if a subroutine call | |

instruction produces a call stack overflow. Additionally, a vertex | |

program will terminate abnormally after executing 65536 instructions to | |

prevent hangs caused by infinite loops in the program. | |

When a vertex program terminates, normally or abnormally, it will emit a | |

vertex whose attributes are taken from the final values of the vertex | |

result registers (section 2.14.1.5). | |

Section 2.14.3, Vertex Program Instruction Set | |

The following sections describe the set of supported vertex program | |

instructions. Instructions available only in the VP1.1 or VP2 execution | |

environment will be noted in the instruction description. | |

Each section will contain pseudocode describing the instruction. | |

Instructions will have up to three operands, referred to as "op0", "op1", | |

and "op2". The operands are loaded using the mechanisms specified in | |

section 2.14.2.1. Most instructions will generate a result vector called | |

"result". The result vector is then written to the destination register | |

specified in the instruction using the mechanisms specified in section | |

2.14.2.2. | |

Operands and results are represented as 32-bit single-precision | |

floating-point numbers according to the IEEE 754 floating-point | |

specification. IEEE denorm encodings, used to represent numbers smaller | |

than 2^-126, are not supported. All such numbers are flushed to zero. | |

There are three special encodings referred to in this section: +INF means | |

"positive infinity", -INF means "negative infinity", and NaN refers to | |

"not a number". | |

Arithmetic operations are typically carried out in single precision | |

according to the rules specified in the IEEE 754 specification. Any | |

exceptions and special cases will be noted in the instruction description. | |

Section 2.14.3.1, ABS: Absolute Value | |

The ABS instruction performs a component-wise absolute value operation on | |

the single operand to yield a result vector. | |

tmp = VectorLoad(op0); | |

result.x = abs(tmp.x); | |

result.y = abs(tmp.y); | |

result.z = abs(tmp.z); | |

result.w = abs(tmp.w); | |

The following special-case rules apply to absolute value operation: | |

1. abs(NaN) = NaN. | |

2. abs(-INF) = abs(+INF) = +INF. | |

3. abs(-0.0) = abs(+0.0) = +0.0. | |

The ABS instruction is available only in the VP1.1 and VP2 execution | |

environments. | |

In the VP1.0 execution environment, the same functionality can be achieved | |

with "MAX result, src, -src". | |

In the VP2 execution environment, the ABS instruction is effectively | |

obsolete, since instructions can take the absolute value of each operand | |

at no cost. | |

Section 2.14.3.2, ADD: Add | |

The ADD instruction performs a component-wise add of the two operands to | |

yield a result vector. | |

tmp0 = VectorLoad(op0); | |

tmp1 = VectorLoad(op1); | |

result.x = tmp0.x + tmp1.x; | |

result.y = tmp0.y + tmp1.y; | |

result.z = tmp0.z + tmp1.z; | |

result.w = tmp0.w + tmp1.w; | |

The following special-case rules apply to addition: | |

1. "A+B" is always equivalent to "B+A". | |

2. NaN + <x> = NaN, for all <x>. | |

3. +INF + <x> = +INF, for all <x> except NaN and -INF. | |

4. -INF + <x> = -INF, for all <x> except NaN and +INF. | |

5. +INF + -INF = NaN. | |

6. -0.0 + <x> = <x>, for all <x>. | |

7. +0.0 + <x> = <x>, for all <x> except -0.0. | |

Section 2.14.3.3, ARA: Address Register Add | |

The ARA instruction adds two pairs of components of a vector address | |

register operand to produce an integer result vector. The "x" and "z" | |

components of the result vector contain the sum of the "x" and "z" | |

components of the operand; the "y" and "w" components of the result vector | |

contain the sum of the "y" and "w" components of the operand. Each | |

component of the result vector is clamped to [-512, +511], the range of | |

representable address register components. | |

itmp = AddrVectorLoad(op0); | |

iresult.x = itmp.x + itmp.z; | |

iresult.y = itmp.y + itmp.w; | |

iresult.z = itmp.x + itmp.z; | |

iresult.w = itmp.y + itmp.w; | |

if (iresult.x < -512) iresult.x = -512; | |

if (iresult.x > 511) iresult.x = 511; | |

if (iresult.y < -512) iresult.y = -512; | |

if (iresult.y > 511) iresult.y = 511; | |

if (iresult.z < -512) iresult.z = -512; | |

if (iresult.z > 511) iresult.z = 511; | |

if (iresult.w < -512) iresult.w = -512; | |

if (iresult.w > 511) iresult.w = 511; | |

Component swizzling is not supported when the operand is loaded. | |

The ARA instruction is available only in the VP2 execution environment. | |

Section 2.14.3.4, ARL: Address Register Load | |

In the VP1 execution environment, the ARL instruction loads a single | |

scalar operand and performs a floor operation to generate an integer | |

scalar to be written to the address register. | |

tmp = ScalarLoad(op0); | |

iresult.x = floor(tmp); | |

In the VP2 execution environment, the ARL instruction loads a single | |

vector operand and performs a component-wise floor operation to generate | |

an integer result vector. Each component of the result vector is clamped | |

to [-512, +511], the range of representable address register components. | |

The ARL instruction applies all masking operations to address register | |

writes as are described in section 2.14.2.2. | |

tmp = VectorLoad(op0); | |

iresult.x = floor(tmp.x); | |

iresult.y = floor(tmp.y); | |

iresult.z = floor(tmp.z); | |

iresult.w = floor(tmp.w); | |

if (iresult.x < -512) iresult.x = -512; | |

if (iresult.x > 511) iresult.x = 511; | |

if (iresult.y < -512) iresult.y = -512; | |

if (iresult.y > 511) iresult.y = 511; | |

if (iresult.z < -512) iresult.z = -512; | |

if (iresult.z > 511) iresult.z = 511; | |

if (iresult.w < -512) iresult.w = -512; | |

if (iresult.w > 511) iresult.w = 511; | |

The following special-case rules apply to floor computation: | |

1. floor(NaN) = NaN. | |

2. floor(<x>) = <x>, for -0.0, +0.0, -INF, and +INF. In all cases, the | |

sign of the result is equal to the sign of the operand. | |

Section 2.14.3.5, ARR: Address Register Load (with round) | |

The ARR instruction loads a single vector operand and performs a | |

component-wise round operation to generate an integer result vector. Each | |

component of the result vector is clamped to [-512, +511], the range of | |

representable address register components. The ARR instruction applies | |

all masking operations to address register writes as described in section | |

2.14.2.2. | |

tmp = VectorLoad(op0); | |

iresult.x = round(tmp.x); | |

iresult.y = round(tmp.y); | |

iresult.z = round(tmp.z); | |

iresult.w = round(tmp.w); | |

if (iresult.x < -512) iresult.x = -512; | |

if (iresult.x > 511) iresult.x = 511; | |

if (iresult.y < -512) iresult.y = -512; | |

if (iresult.y > 511) iresult.y = 511; | |

if (iresult.z < -512) iresult.z = -512; | |

if (iresult.z > 511) iresult.z = 511; | |

if (iresult.w < -512) iresult.w = -512; | |

if (iresult.w > 511) iresult.w = 511; | |

The rounding function, round(x), returns the nearest integer to <x>. If | |

the fractional portion of <x> is 0.5, round(x) selects the nearest even | |

integer. | |

The ARR instruction is available only in the VP2 execution environment. | |

Section 2.14.3.6, BRA: Branch | |

The BRA instruction conditionally transfers control to the instruction | |

following the label specified in the instruction. The following | |

pseudocode describes the operation of the instruction: | |

if (TestCC(cc.c***) || TestCC(cc.*c**) || | |

TestCC(cc.**c*) || TestCC(cc.***c)) { | |

// continue execution at instruction following <branchLabel> | |

} else { | |

// do nothing | |

} | |

In the pseudocode, <branchLabel> is the label specified in the instruction | |

matching the <vp2-branchLabel> grammar rule. | |

The BRA instruction is available only in the VP2 execution environment. | |

Section 2.14.3.7, CAL: Subroutine Call | |

The CAL instruction conditionally transfers control to the instruction | |

following the label specified in the instruction. It also pushes a | |

reference to the instruction immediately following the CAL instruction | |

onto the call stack, where execution will continue after executing the | |

matching RET instruction. The following pseudocode describes the | |

operation of the instruction: | |

if (TestCC(cc.c***) || TestCC(cc.*c**) || | |

TestCC(cc.**c*) || TestCC(cc.***c)) { | |

if (callStackDepth >= 4) { | |

// terminate vertex program | |

} else { | |

callStack[callStackDepth] = nextInstruction; | |

callStackDepth++; | |

} | |

// continue execution at instruction following <branchLabel> | |

} else { | |

// do nothing | |

} | |

In the pseudocode, <branchLabel> is the label specified in the instruction | |

matching the <vp2-branchLabel> grammar rule, <callStackDepth> is the | |

current depth of the call stack, <callStack> is an array holding the call | |

stack, and <nextInstruction> is a reference to the instruction immediately | |

following the present one in the program string. | |

The CAL instruction is available only in the VP2 execution environment. | |

Section 2.14.3.8, COS: Cosine | |

The COS instruction approximates the cosine of the angle specified by the | |

scalar operand and replicates the approximation to all four components of | |

the result vector. The angle is specified in radians and does not have to | |

be in the range [0,2*PI]. | |

tmp = ScalarLoad(op0); | |

result.x = ApproxCosine(tmp); | |

result.y = ApproxCosine(tmp); | |

result.z = ApproxCosine(tmp); | |

result.w = ApproxCosine(tmp); | |

The approximation function ApproxCosine is accurate to at least 22 bits | |

with an angle in the range [0,2*PI]. | |

| ApproxCosine(x) - cos(x) | < 1.0 / 2^22, if 0.0 <= x < 2.0 * PI. | |

The error in the approximation will typically increase with the absolute | |

value of the angle when the angle falls outside the range [0,2*PI]. | |

The following special-case rules apply to cosine approximation: | |

1. ApproxCosine(NaN) = NaN. | |

2. ApproxCosine(+/-INF) = NaN. | |

3. ApproxCosine(+/-0.0) = +1.0. | |

The COS instruction is available only in the VP2 execution environment. | |

Section 2.14.3.9, DP3: 3-component Dot Product | |

The DP3 instruction computes a three component dot product of the two | |

operands (using the x, y, and z components) and replicates the dot product | |

to all four components of the result vector. | |

tmp0 = VectorLoad(op0); | |

tmp1 = VectorLoad(op1): | |

result.x = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + | |

(tmp0.z * tmp1.z); | |

result.y = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + | |

(tmp0.z * tmp1.z); | |

result.z = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + | |

(tmp0.z * tmp1.z); | |

result.w = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + | |

(tmp0.z * tmp1.z); | |

Section 2.14.3.10, DP4: 4-component Dot Product | |

The DP4 instruction computes a four component dot product of the two | |

operands and replicates the dot product to all four components of the | |

result vector. | |

tmp0 = VectorLoad(op0); | |

tmp1 = VectorLoad(op1): | |

result.x = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + | |

(tmp0.z * tmp1.z) + (tmp0.w * tmp1.w); | |

result.y = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + | |

(tmp0.z * tmp1.z) + (tmp0.w * tmp1.w); | |

result.z = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + | |

(tmp0.z * tmp1.z) + (tmp0.w * tmp1.w); | |

result.w = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + | |

(tmp0.z * tmp1.z) + (tmp0.w * tmp1.w); | |

Section 2.14.3.11, DPH: Homogeneous Dot Product | |

The DPH instruction computes a four-component dot product of the two | |

operands, except that the W component of the first operand is assumed to | |

be 1.0. The instruction replicates the dot product to all four components | |

of the result vector. | |

tmp0 = VectorLoad(op0); | |

tmp1 = VectorLoad(op1): | |

result.x = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + | |

(tmp0.z * tmp1.z) + tmp1.w; | |

result.y = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + | |

(tmp0.z * tmp1.z) + tmp1.w; | |

result.z = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + | |

(tmp0.z * tmp1.z) + tmp1.w; | |

result.w = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + | |

(tmp0.z * tmp1.z) + tmp1.w; | |

The DPH instruction is available only in the VP1.1 and VP2 execution | |

environments. | |

Section 2.14.3.12, DST: Distance Vector | |

The DST instruction computes a distance vector from two specially- | |

formatted operands. The first operand should be of the form [NA, d^2, | |

d^2, NA] and the second operand should be of the form [NA, 1/d, NA, 1/d], | |

where NA values are not relevant to the calculation and d is a vector | |

length. If both vectors satisfy these conditions, the result vector will | |

be of the form [1.0, d, d^2, 1/d]. | |

The exact behavior is specified in the following pseudo-code: | |

tmp0 = VectorLoad(op0); | |

tmp1 = VectorLoad(op1); | |

result.x = 1.0; | |

result.y = tmp0.y * tmp1.y; | |

result.z = tmp0.z; | |

result.w = tmp1.w; | |

Given an arbitrary vector, d^2 can be obtained using the DP3 instruction | |

(using the same vector for both operands) and 1/d can be obtained from d^2 | |

using the RSQ instruction. | |

This distance vector is useful for per-vertex light attenuation | |

calculations: a DP3 operation using the distance vector and an | |

attenuation constants vector as operands will yield the attenuation | |

factor. | |

Section 2.14.3.13, EX2: Exponential Base 2 | |

The EX2 instruction approximates 2 raised to the power of the scalar | |

operand and replicates it to all four components of the result vector. | |

tmp = ScalarLoad(op0); | |

result.x = Approx2ToX(tmp); | |

result.y = Approx2ToX(tmp); | |

result.z = Approx2ToX(tmp); | |

result.w = Approx2ToX(tmp); | |

The approximation function is accurate to at least 22 bits: | |

| Approx2ToX(x) - 2^x | < 1.0 / 2^22, if 0.0 <= x < 1.0, | |

and, in general, | |

| Approx2ToX(x) - 2^x | < (1.0 / 2^22) * (2^floor(x)). | |

The following special-case rules apply to exponential approximation: | |

1. Approx2ToX(NaN) = NaN. | |

2. Approx2ToX(-INF) = +0.0. | |

3. Approx2ToX(+INF) = +INF. | |

4. Approx2ToX(+/-0.0) = +1.0. | |

The EX2 instruction is available only in the VP2 execution environment. | |

Section 2.14.3.14, EXP: Exponential Base 2 (approximate) | |

The EXP instruction computes a rough approximation of 2 raised to the | |

power of the scalar operand. The approximation is returned in the "z" | |

component of the result vector. A vertex program can also use the "x" and | |

"y" components of the result vector to generate a more accurate | |

approximation by evaluating | |

result.x * f(result.y), | |

where f(x) is a user-defined function that approximates 2^x over the | |

domain [0.0, 1.0). The "w" component of the result vector is always 1.0. | |

The exact behavior is specified in the following pseudo-code: | |

tmp = ScalarLoad(op0); | |

result.x = 2^floor(tmp); | |

result.y = tmp - floor(tmp); | |

result.z = RoughApprox2ToX(tmp); | |

result.w = 1.0; | |

The approximation function is accurate to at least 11 bits: | |

| RoughApprox2ToX(x) - 2^x | < 1.0 / 2^11, if 0.0 <= x < 1.0, | |

and, in general, | |

| RoughApprox2ToX(x) - 2^x | < (1.0 / 2^11) * (2^floor(x)). | |

The following special cases apply to the EXP instruction: | |

1. RoughApprox2ToX(NaN) = NaN. | |

2. RoughApprox2ToX(-INF) = +0.0. | |

3. RoughApprox2ToX(+INF) = +INF. | |

4. RoughApprox2ToX(+/-0.0) = +1.0. | |

The EXP instruction is present for compatibility with the original | |

NV_vertex_program instruction set; it is recommended that applications | |

using NV_vertex_program2 use the EX2 instruction instead. | |

Section 2.14.3.15, FLR: Floor | |

The FLR instruction performs a component-wise floor operation on the | |

operand to generate a result vector. The floor of a value is defined as | |

the largest integer less than or equal to the value. The floor of 2.3 is | |

2.0; the floor of -3.6 is -4.0. | |

tmp = VectorLoad(op0); | |

result.x = floor(tmp.x); | |

result.y = floor(tmp.y); | |

result.z = floor(tmp.z); | |

result.w = floor(tmp.w); | |

The following special-case rules apply to floor computation: | |

1. floor(NaN) = NaN. | |

2. floor(<x>) = <x>, for -0.0, +0.0, -INF, and +INF. In all cases, the | |

sign of the result is equal to the sign of the operand. | |

The FLR instruction is available only in the VP2 execution environment. | |

Section 2.14.3.16, FRC: Fraction | |

The FRC instruction extracts the fractional portion of each component of | |

the operand to generate a result vector. The fractional portion of a | |

component is defined as the result after subtracting off the floor of the | |

component (see FLR), and is always in the range [0.00, 1.00). | |

For negative values, the fractional portion is NOT the number written to | |

the right of the decimal point -- the fractional portion of -1.7 is not | |

0.7 -- it is 0.3. 0.3 is produced by subtracting the floor of -1.7 (-2.0) | |

from -1.7. | |

tmp = VectorLoad(op0); | |

result.x = tmp.x - floor(tmp.x); | |

result.y = tmp.y - floor(tmp.y); | |

result.z = tmp.z - floor(tmp.z); | |

result.w = tmp.w - floor(tmp.w); | |

The following special-case rules, which can be derived from the rules for | |

FLR and ADD apply to fraction computation: | |

1. fraction(NaN) = NaN. | |

2. fraction(+/-INF) = NaN. | |

3. fraction(+/-0.0) = +0.0. | |

The FRC instruction is available only in the VP2 execution environment. | |

Section 2.14.3.17, LG2: Logarithm Base 2 | |

The LG2 instruction approximates the base 2 logarithm of the scalar | |

operand and replicates it to all four components of the result vector. | |

tmp = ScalarLoad(op0); | |

result.x = ApproxLog2(tmp); | |

result.y = ApproxLog2(tmp); | |

result.z = ApproxLog2(tmp); | |

result.w = ApproxLog2(tmp); | |

The approximation function is accurate to at least 22 bits: | |

| ApproxLog2(x) - log_2(x) | < 1.0 / 2^22. | |

Note that for large values of x, there are not enough bits in the | |

floating-point storage format to represent a result that precisely. | |

The following special-case rules apply to logarithm approximation: | |

1. ApproxLog2(NaN) = NaN. | |

2. ApproxLog2(+INF) = +INF. | |

3. ApproxLog2(+/-0.0) = -INF. | |

4. ApproxLog2(x) = NaN, -INF < x < -0.0. | |

5. ApproxLog2(-INF) = NaN. | |

The LG2 instruction is available only in the VP2 execution environment. | |

Section 2.14.3.18, LIT: Compute Light Coefficients | |

The LIT instruction accelerates per-vertex lighting by computing lighting | |

coefficients for ambient, diffuse, and specular light contributions. The | |

"x" component of the operand is assumed to hold a diffuse dot product (n | |

dot VP_pli, as in the vertex lighting equations in Section 2.13.1). The | |

"y" component of the operand is assumed to hold a specular dot product (n | |

dot h_i). The "w" component of the operand is assumed to hold the | |

specular exponent of the material (s_rm), and is clamped to the range | |

(-128, +128) exclusive. | |

The "x" component of the result vector receives the value that should be | |

multiplied by the ambient light/material product (always 1.0). The "y" | |

component of the result vector receives the value that should be | |

multiplied by the diffuse light/material product (n dot VP_pli). The "z" | |

component of the result vector receives the value that should be | |

multiplied by the specular light/material product (f_i * (n dot h_i) ^ | |

s_rm). The "w" component of the result is the constant 1.0. | |

Negative diffuse and specular dot products are clamped to 0.0, as is done | |

in the standard per-vertex lighting operations. In addition, if the | |

diffuse dot product is zero or negative, the specular coefficient is | |

forced to zero. | |

tmp = VectorLoad(op0); | |

if (t.x < 0) t.x = 0; | |

if (t.y < 0) t.y = 0; | |

if (t.w < -(128.0-epsilon)) t.w = -(128.0-epsilon); | |

else if (t.w > 128-epsilon) t.w = 128-epsilon; | |

result.x = 1.0; | |

result.y = t.x; | |

result.z = (t.x > 0) ? RoughApproxPower(t.y, t.w) : 0.0; | |

result.w = 1.0; | |

The exponentiation approximation function is defined in terms of the base | |

2 exponentiation and logarithm approximation operations in the EXP and LOG | |

instructions, including errors and the processing of any special cases. | |

In particular, | |

RoughApproxPower(a,b) = RoughApproxExp2(b * RoughApproxLog2(a)). | |

The following special-case rules, which can be derived from the rules in | |

the LOG, MUL, and EXP instructions, apply to exponentiation: | |

1. RoughApproxPower(NaN, <x>) = NaN, | |

2. RoughApproxPower(<x>, <y>) = NaN, if x <= -0.0, | |

3. RoughApproxPower(+/-0.0, <x>) = +0.0, if x > +0.0, or | |

+INF, if x < -0.0, | |

4. RoughApproxPower(+1.0, <x>) = +1.0, if x is not NaN, | |

5. RoughApproxPower(+INF, <x>) = +INF, if x > +0.0, or | |

+0.0, if x < -0.0, | |

6. RoughApproxPower(<x>, +/-0.0) = +1.0, if x >= -0.0 | |

7. RoughApproxPower(<x>, +INF) = +0.0, if -0.0 <= x < +1.0, | |

+INF, if x > +1.0, | |

8. RoughApproxPower(<x>, +INF) = +INF, if -0.0 <= x < +1.0, | |

+0.0, if x > +1.0, | |

9. RoughApproxPower(<x>, +1.0) = <x>, if x >= +0.0, and | |

10. RoughApproxPower(<x>, NaN) = NaN. | |

Section 2.14.3.19, LOG: Logarithm Base 2 (Approximate) | |

The LOG instruction computes a rough approximation of the base 2 logarithm | |

of the absolute value of the scalar operand. The approximation is | |

returned in the "z" component of the result vector. A vertex program can | |

also use the "x" and "y" components of the result vector to generate a | |

more accurate approximation by evaluating | |

result.x + f(result.y), | |

where f(x) is a user-defined function that approximates 2^x over the | |

domain [1.0, 2.0). The "w" component of the result vector is always 1.0. | |

The exact behavior is specified in the following pseudo-code: | |

tmp = fabs(ScalarLoad(op0)); | |

result.x = floor(log2(tmp)); | |

result.y = tmp / (2^floor(log2(tmp))); | |

result.z = RoughApproxLog2(tmp); | |

result.w = 1.0; | |

The approximation function is accurate to at least 11 bits: | |

| RoughApproxLog2(x) - log_2(x) | < 1.0 / 2^11. | |

The following special-case rules apply to the LOG instruction: | |

1. RoughApproxLog2(NaN) = NaN. | |

2. RoughApproxLog2(+INF) = +INF. | |

3. RoughApproxLog2(+0.0) = -INF. | |

The LOG instruction is present for compatibility with the original | |

NV_vertex_program instruction set; it is recommended that applications | |

using NV_vertex_program2 use the LG2 instruction instead. | |

Section 2.14.3.20, MAD: Multiply And Add | |

The MAD instruction performs a component-wise multiply of the first two | |

operands, and then does a component-wise add of the product to the third | |

operand to yield a result vector. | |

tmp0 = VectorLoad(op0); | |

tmp1 = VectorLoad(op1); | |

tmp2 = VectorLoad(op2); | |

result.x = tmp0.x * tmp1.x + tmp2.x; | |

result.y = tmp0.y * tmp1.y + tmp2.y; | |

result.z = tmp0.z * tmp1.z + tmp2.z; | |

result.w = tmp0.w * tmp1.w + tmp2.w; | |

All special case rules applicable to the ADD and MUL instructions apply to | |

the individual components of the MAD operation as well. | |

Section 2.14.3.21, MAX: Maximum | |

The MAX instruction computes component-wise maximums of the values in the | |

two operands to yield a result vector. | |

tmp0 = VectorLoad(op0); | |

tmp1 = VectorLoad(op1); | |

result.x = max(tmp0.x, tmp1.x); | |

result.y = max(tmp0.y, tmp1.y); | |

result.z = max(tmp0.z, tmp1.z); | |

result.w = max(tmp0.w, tmp1.w); | |

The following special cases apply to the maximum operation: | |

1. max(A,B) is always equivalent to max(B,A). | |

2. max(NaN, <x>) == NaN, for all <x>. | |

Section 2.14.3.22, MIN: Minimum | |

The MIN instruction computes component-wise minimums of the values in the | |

two operands to yield a result vector. | |

tmp0 = VectorLoad(op0); | |

tmp1 = VectorLoad(op1); | |

result.x = min(tmp0.x, tmp1.x); | |

result.y = min(tmp0.y, tmp1.y); | |

result.z = min(tmp0.z, tmp1.z); | |

result.w = min(tmp0.w, tmp1.w); | |

The following special cases apply to the minimum operation: | |

1. min(A,B) is always equivalent to min(B,A). | |

2. min(NaN, <x>) == NaN, for all <x>. | |

Section 2.14.3.23, MOV: Move | |

The MOV instruction copies the value of the operand to yield a result | |

vector. | |

result = VectorLoad(op0); | |

Section 2.14.3.24, MUL: Multiply | |

The MUL instruction performs a component-wise multiply of the two operands | |

to yield a result vector. | |

tmp0 = VectorLoad(op0); | |

tmp1 = VectorLoad(op1); | |

result.x = tmp0.x * tmp1.x; | |

result.y = tmp0.y * tmp1.y; | |

result.z = tmp0.z * tmp1.z; | |

result.w = tmp0.w * tmp1.w; | |

The following special-case rules apply to multiplication: | |

1. "A*B" is always equivalent to "B*A". | |

2. NaN * <x> = NaN, for all <x>. | |

3. +/-0.0 * +/-INF = NaN. | |

4. +/-0.0 * <x> = +/-0.0, for all <x> except -INF, +INF, and NaN. The | |

sign of the result is positive if the signs of the two operands match | |

and negative otherwise. | |

5. +/-INF * <x> = +/-INF, for all <x> except -0.0, +0.0, and NaN. The | |

sign of the result is positive if the signs of the two operands match | |

and negative otherwise. | |

6. +1.0 * <x> = <x>, for all <x>. | |

Section 2.14.3.25, RCC: Reciprocal (Clamped) | |

The RCC instruction approximates the reciprocal of the scalar operand, | |

clamps the result to one of two ranges, and replicates the clamped result | |

to all four components of the result vector. | |

If the approximate reciprocal is greater than 0.0, the result is clamped | |

to the range [2^-64, 2^+64]. If the approximate reciprocal is not greater | |

than zero, the result is clamped to the range [-2^+64, -2^-64]. | |

tmp = ScalarLoad(op0); | |

result.x = ClampApproxReciprocal(tmp); | |

result.y = ClampApproxReciprocal(tmp); | |

result.z = ClampApproxReciprocal(tmp); | |

result.w = ClampApproxReciprocal(tmp); | |

The approximation function is accurate to at least 22 bits: | |

| ClampApproxReciprocal(x) - (1/x) | < 1.0 / 2^22, if 1.0 <= x < 2.0. | |

The following special-case rules apply to reciprocation: | |

1. ClampApproxReciprocal(NaN) = NaN. | |

2. ClampApproxReciprocal(+INF) = +2^-64. | |

3. ClampApproxReciprocal(-INF) = -2^-64. | |

4. ClampApproxReciprocal(+0.0) = +2^64. | |

5. ClampApproxReciprocal(-0.0) = -2^64. | |

6. ClampApproxReciprocal(x) = +2^-64, if +2^64 < x < +INF. | |

7. ClampApproxReciprocal(x) = -2^-64, if -INF < x < -2^-64. | |

8. ClampApproxReciprocal(x) = +2^64, if +0.0 < x < +2^-64. | |

9. ClampApproxReciprocal(x) = -2^64, if -2^-64 < x < -0.0. | |

The RCC instruction is available only in the VP1.1 and VP2 execution | |

environments. | |

Section 2.14.3.26, RCP: Reciprocal | |

The RCP instruction approximates the reciprocal of the scalar operand and | |

replicates it to all four components of the result vector. | |

tmp = ScalarLoad(op0); | |

result.x = ApproxReciprocal(tmp); | |

result.y = ApproxReciprocal(tmp); | |

result.z = ApproxReciprocal(tmp); | |

result.w = ApproxReciprocal(tmp); | |

The approximation function is accurate to at least 22 bits: | |

| ApproxReciprocal(x) - (1/x) | < 1.0 / 2^22, if 1.0 <= x < 2.0. | |

The following special-case rules apply to reciprocation: | |

1. ApproxReciprocal(NaN) = NaN. | |

2. ApproxReciprocal(+INF) = +0.0. | |

3. ApproxReciprocal(-INF) = -0.0. | |

4. ApproxReciprocal(+0.0) = +INF. | |

5. ApproxReciprocal(-0.0) = -INF. | |

Section 2.14.3.27, RET: Subroutine Call Return | |

The RET instruction conditionally returns from a subroutine initiated by a | |

CAL instruction by popping an instruction reference off the top of the | |

call stack and transferring control to the referenced instruction. The | |

following pseudocode describes the operation of the instruction: | |

if (TestCC(cc.c***) || TestCC(cc.*c**) || | |

TestCC(cc.**c*) || TestCC(cc.***c)) { | |

if (callStackDepth <= 0) { | |

// terminate vertex program | |

} else { | |

callStackDepth--; | |

instruction = callStack[callStackDepth]; | |

} | |

// continue execution at <instruction> | |

} else { | |

// do nothing | |

} | |

In the pseudocode, <callStackDepth> is the depth of the call stack, | |

<callStack> is an array holding the call stack, and <instruction> is a | |

reference to an instruction previously pushed onto the call stack. | |

The RET instruction is available only in the VP2 execution environment. | |

Section 2.14.3.28, RSQ: Reciprocal Square Root | |

The RSQ instruction approximates the reciprocal of the square root of the | |

scalar operand and replicates it to all four components of the result | |

vector. | |

tmp = ScalarLoad(op0); | |

result.x = ApproxRSQRT(tmp); | |

result.y = ApproxRSQRT(tmp); | |

result.z = ApproxRSQRT(tmp); | |

result.w = ApproxRSQRT(tmp); | |

The approximation function is accurate to at least 22 bits: | |

| ApproxRSQRT(x) - (1/x) | < 1.0 / 2^22, if 1.0 <= x < 4.0. | |

The following special-case rules apply to reciprocal square roots: | |

1. ApproxRSQRT(NaN) = NaN. | |

2. ApproxRSQRT(+INF) = +0.0. | |

3. ApproxRSQRT(-INF) = NaN. | |

4. ApproxRSQRT(+0.0) = +INF. | |

5. ApproxRSQRT(-0.0) = -INF. | |

6. ApproxRSQRT(x) = NaN, if -INF < x < -0.0. | |

Section 2.14.3.29, SEQ: Set on Equal | |

The SEQ instruction performs a component-wise comparison of the two | |

operands. Each component of the result vector is 1.0 if the corresponding | |

component of the first operand is equal to that of the second, and 0.0 | |

otherwise. | |

tmp0 = VectorLoad(op0); | |

tmp1 = VectorLoad(op1); | |

result.x = (tmp0.x == tmp1.x) ? 1.0 : 0.0; | |

result.y = (tmp0.y == tmp1.y) ? 1.0 : 0.0; | |

result.z = (tmp0.z == tmp1.z) ? 1.0 : 0.0; | |

result.w = (tmp0.w == tmp1.w) ? 1.0 : 0.0; | |

if (tmp0.x is NaN or tmp1.x is NaN) result.x = NaN; | |

if (tmp0.y is NaN or tmp1.y is NaN) result.y = NaN; | |

if (tmp0.z is NaN or tmp1.z is NaN) result.z = NaN; | |

if (tmp0.w is NaN or tmp1.w is NaN) result.w = NaN; | |

The following special-case rules apply to SEQ: | |

1. (<x> == <y>) and (<y> == <x>) always produce the same result. | |

1. (NaN == <x>) is FALSE for all <x>, including NaN. | |

2. (+INF == +INF) and (-INF == -INF) are TRUE. | |

3. (-0.0 == +0.0) and (+0.0 == -0.0) are TRUE. | |

The SEQ instruction is available only in the VP2 execution environment. | |

Section 2.14.3.30, SFL: Set on False | |

The SFL instruction is a degenerate case of the other "Set on" | |

instructions that sets all components of the result vector to | |

0.0. | |

result.x = 0.0; | |

result.y = 0.0; | |

result.z = 0.0; | |

result.w = 0.0; | |

The SFL instruction is available only in the VP2 execution environment. | |

Section 2.14.3.31, SGE: Set on Greater Than or Equal | |

The SGE instruction performs a component-wise comparison of the two | |

operands. Each component of the result vector is 1.0 if the corresponding | |

component of the first operands is greater than or equal that of the | |

second, and 0.0 otherwise. | |

tmp0 = VectorLoad(op0); | |

tmp1 = VectorLoad(op1); | |

result.x = (tmp0.x >= tmp1.x) ? 1.0 : 0.0; | |

result.y = (tmp0.y >= tmp1.y) ? 1.0 : 0.0; | |

result.z = (tmp0.z >= tmp1.z) ? 1.0 : 0.0; | |

result.w = (tmp0.w >= tmp1.w) ? 1.0 : 0.0; | |

if (tmp0.x is NaN or tmp1.x is NaN) result.x = NaN; | |

if (tmp0.y is NaN or tmp1.y is NaN) result.y = NaN; | |

if (tmp0.z is NaN or tmp1.z is NaN) result.z = NaN; | |

if (tmp0.w is NaN or tmp1.w is NaN) result.w = NaN; | |

The following special-case rules apply to SGE: | |

1. (NaN >= <x>) and (<x> >= NaN) are FALSE for all <x>. | |

2. (+INF >= +INF) and (-INF >= -INF) are TRUE. | |

3. (-0.0 >= +0.0) and (+0.0 >= -0.0) are TRUE. | |

Section 2.14.3.32, SGT: Set on Greater Than | |

The SGT instruction performs a component-wise comparison of the two | |

operands. Each component of the result vector is 1.0 if the corresponding | |

component of the first operands is greater than that of the second, and | |

0.0 otherwise. | |

tmp0 = VectorLoad(op0); | |

tmp1 = VectorLoad(op1); | |

result.x = (tmp0.x > tmp1.x) ? 1.0 : 0.0; | |

result.y = (tmp0.y > tmp1.y) ? 1.0 : 0.0; | |

result.z = (tmp0.z > tmp1.z) ? 1.0 : 0.0; | |

result.w = (tmp0.w > tmp1.w) ? 1.0 : 0.0; | |

if (tmp0.x is NaN or tmp1.x is NaN) result.x = NaN; | |

if (tmp0.y is NaN or tmp1.y is NaN) result.y = NaN; | |

if (tmp0.z is NaN or tmp1.z is NaN) result.z = NaN; | |

if (tmp0.w is NaN or tmp1.w is NaN) result.w = NaN; | |

The following special-case rules apply to SGT: | |

1. (NaN > <x>) and (<x> > NaN) are FALSE for all <x>. | |

2. (-0.0 > +0.0) and (+0.0 > -0.0) are FALSE. | |

The SGT instruction is available only in the VP2 execution environment. | |

Section 2.14.3.33, SIN: Sine | |

The SIN instruction approximates the sine of the angle specified by the | |

scalar operand and replicates it to all four components of the result | |

vector. The angle is specified in radians and does not have to be in the | |

range [0,2*PI]. | |

tmp = ScalarLoad(op0); | |

result.x = ApproxSine(tmp); | |

result.y = ApproxSine(tmp); | |

result.z = ApproxSine(tmp); | |

result.w = ApproxSine(tmp); | |

The approximation function is accurate to at least 22 bits with an angle | |

in the range [0,2*PI]. | |

| ApproxSine(x) - sin(x) | < 1.0 / 2^22, if 0.0 <= x < 2.0 * PI. | |

The error in the approximation will typically increase with the absolute | |

value of the angle when the angle falls outside the range [0,2*PI]. | |

The following special-case rules apply to cosine approximation: | |

1. ApproxSine(NaN) = NaN. | |

2. ApproxSine(+/-INF) = NaN. | |

3. ApproxSine(+/-0.0) = +/-0.0. The sign of the result is equal to the | |

sign of the single operand. | |

The SIN instruction is available only in the VP2 execution environment. | |

Section 2.14.3.34, SLE: Set on Less Than or Equal | |

The SLE instruction performs a component-wise comparison of the two | |

operands. Each component of the result vector is 1.0 if the corresponding | |

component of the first operand is less than or equal to that of the | |

second, and 0.0 otherwise. | |

tmp0 = VectorLoad(op0); | |

tmp1 = VectorLoad(op1); | |

result.x = (tmp0.x <= tmp1.x) ? 1.0 : 0.0; | |

result.y = (tmp0.y <= tmp1.y) ? 1.0 : 0.0; | |

result.z = (tmp0.z <= tmp1.z) ? 1.0 : 0.0; | |

result.w = (tmp0.w <= tmp1.w) ? 1.0 : 0.0; | |

if (tmp0.x is NaN or tmp1.x is NaN) result.x = NaN; | |

if (tmp0.y is NaN or tmp1.y is NaN) result.y = NaN; | |

if (tmp0.z is NaN or tmp1.z is NaN) result.z = NaN; | |

if (tmp0.w is NaN or tmp1.w is NaN) result.w = NaN; | |

The following special-case rules apply to SLE: | |

1. (NaN <= <x>) and (<x> <= NaN) are FALSE for all <x>. | |

2. (+INF <= +INF) and (-INF <= -INF) are TRUE. | |

3. (-0.0 <= +0.0) and (+0.0 <= -0.0) are TRUE. | |

The SLE instruction is available only in the VP2 execution environment. | |

Section 2.14.3.35, SLT: Set on Less Than | |

The SLT instruction performs a component-wise comparison of the two | |

operands. Each component of the result vector is 1.0 if the corresponding | |

component of the first operand is less than that of the second, and 0.0 | |

otherwise. | |

tmp0 = VectorLoad(op0); | |

tmp1 = VectorLoad(op1); | |

result.x = (tmp0.x < tmp1.x) ? 1.0 : 0.0; | |

result.y = (tmp0.y < tmp1.y) ? 1.0 : 0.0; | |

result.z = (tmp0.z < tmp1.z) ? 1.0 : 0.0; | |

result.w = (tmp0.w < tmp1.w) ? 1.0 : 0.0; | |

if (tmp0.x is NaN or tmp1.x is NaN) result.x = NaN; | |

if (tmp0.y is NaN or tmp1.y is NaN) result.y = NaN; | |

if (tmp0.z is NaN or tmp1.z is NaN) result.z = NaN; | |

if (tmp0.w is NaN or tmp1.w is NaN) result.w = NaN; | |

The following special-case rules apply to SLT: | |

1. (NaN < <x>) and (<x> < NaN) are FALSE for all <x>. | |

2. (-0.0 < +0.0) and (+0.0 < -0.0) are FALSE. | |

Section 2.14.3.36, SNE: Set on Not Equal | |

The SNE instruction performs a component-wise comparison of the two | |

operands. Each component of the result vector is 1.0 if the corresponding | |

component of the first operand is not equal to that of the second, and 0.0 | |

otherwise. | |

tmp0 = VectorLoad(op0); | |

tmp1 = VectorLoad(op1); | |

result.x = (tmp0.x != tmp1.x) ? 1.0 : 0.0; | |

result.y = (tmp0.y != tmp1.y) ? 1.0 : 0.0; | |

result.z = (tmp0.z != tmp1.z) ? 1.0 : 0.0; | |

result.w = (tmp0.w != tmp1.w) ? 1.0 : 0.0; | |

if (tmp0.x is NaN or tmp1.x is NaN) result.x = NaN; | |

if (tmp0.y is NaN or tmp1.y is NaN) result.y = NaN; | |

if (tmp0.z is NaN or tmp1.z is NaN) result.z = NaN; | |

if (tmp0.w is NaN or tmp1.w is NaN) result.w = NaN; | |

The following special-case rules apply to SNE: | |

1. (<x> != <y>) and (<y> != <x>) always produce the same result. | |

2. (NaN != <x>) is TRUE for all <x>, including NaN. | |

3. (+INF != +INF) and (-INF != -INF) are FALSE. | |

4. (-0.0 != +0.0) and (+0.0 != -0.0) are TRUE. | |

The SNE instruction is available only in the VP2 execution environment. | |

Section 2.14.3.37, SSG: Set Sign | |

The SSG instruction generates a result vector containing the signs of each | |

component of the single operand. Each component of the result vector is | |

1.0 if the corresponding component of the operand is greater than zero, | |

0.0 if the corresponding component of the operand is equal to zero, and | |

-1.0 if the corresponding component of the operand is less than zero. | |

tmp = VectorLoad(op0); | |

result.x = SetSign(tmp.x); | |

result.y = SetSign(tmp.y); | |

result.z = SetSign(tmp.z); | |

result.w = SetSign(tmp.w); | |

The following special-case rules apply to SSG: | |

1. SetSign(NaN) = NaN. | |

2. SetSign(-0.0) = SetSign(+0.0) = 0.0. | |

3. SetSign(-INF) = -1.0. | |

4. SetSign(+INF) = +1.0. | |

5. SetSign(x) = -1.0, if -INF < x < -0.0. | |

6. SetSign(x) = +1.0, if +0.0 < x < +INF. | |

The SSG instruction is available only in the VP2 execution environment. | |

Section 2.14.3.38, STR: Set on True | |

The STR instruction is a degenerate case of the other "Set on" | |

instructions that sets all components of the result vector to 1.0. | |

result.x = 1.0; | |

result.y = 1.0; | |

result.z = 1.0; | |

result.w = 1.0; | |

The STR instruction is available only in the VP2 execution environment. | |

Section 2.14.3.39, SUB: Subtract | |

The SUB instruction performs a component-wise subtraction of the second | |

operand from the first to yield a result vector. | |

tmp0 = VectorLoad(op0); | |

tmp1 = VectorLoad(op1); | |

result.x = tmp0.x - tmp1.x; | |

result.y = tmp0.y - tmp1.y; | |

result.z = tmp0.z - tmp1.z; | |

result.w = tmp0.w - tmp1.w; | |

The SUB instruction is completely equivalent to an identical ADD | |

instruction in which the negate operator on the second operand is | |

reversed: | |

1. "SUB R0, R1, R2" is equivalent to "ADD R0, R1, -R2". | |

2. "SUB R0, R1, -R2" is equivalent to "ADD R0, R1, R2". | |

3. "SUB R0, R1, |R2|" is equivalent to "ADD R0, R1, -|R2|". | |

4. "SUB R0, R1, -|R2|" is equivalent to "ADD R0, R1, |R2|". | |

The SUB instruction is available only in the VP1.1 and VP2 execution | |

environments. | |

2.14.4 Vertex Arrays for Vertex Attributes | |

Data for vertex attributes in vertex program mode may be specified | |

using vertex array commands. The client may specify and enable any | |

of sixteen vertex attribute arrays. | |

The vertex attribute arrays are ignored when vertex program mode | |

is disabled. When vertex program mode is enabled, vertex attribute | |

arrays are used. | |

The command | |

void VertexAttribPointerNV(uint index, int size, enum type, | |

sizei stride, const void *pointer); | |

describes the locations and organizations of the sixteen vertex | |

attribute arrays. index specifies the particular vertex attribute | |

to be described. size indicates the number of values per vertex | |

that are stored in the array; size must be one of 1, 2, 3, or 4. | |

type specifies the data type of the values stored in the array. | |

type must be one of SHORT, FLOAT, DOUBLE, or UNSIGNED_BYTE and these | |

values correspond to the array types short, int, float, double, and | |

ubyte respectively. The INVALID_OPERATION error is generated if | |

type is UNSIGNED_BYTE and size is not 4. The INVALID_VALUE error | |

is generated if index is greater than 15. The INVALID_VALUE error | |

is generated if stride is negative. | |

The one, two, three, or four values in an array that correspond to a | |

single vertex attribute comprise an array element. The values within | |

each array element at stored sequentially in memory. If the stride | |

is specified as zero, then array elements are stored sequentially | |

as well. Otherwise points to the ith and (i+1)st elements of an array | |

differ by stride basic machine units (typically unsigned bytes), | |

the pointer to the (i+1)st element being greater. pointer specifies | |

the location in memory of the first value of the first element of | |

the array being specified. | |

Vertex attribute arrays are enabled with the EnableClientState command | |

and disabled with the DisableClientState command. The value of the | |

argument to either command is VERTEX_ATTRIB_ARRAYi_NV where i is an | |

integer between 0 and 15; specifying a value of i enables or | |

disables the vertex attribute array with index i. The constants | |

obey VERTEX_ATTRIB_ARRAYi_NV = VERTEX_ATTRIB_ARRAY0_NV + i. | |

When vertex program mode is enabled, the ArrayElement command operates | |

as described in this section in contrast to the behavior described | |

in section 2.8. Likewise, any vertex array transfer commands that | |

are defined in terms of ArrayElement (DrawArrays, DrawElements, and | |

DrawRangeElements) assume the operation of ArrayElement described | |

in this section when vertex program mode is enabled. | |

When vertex program mode is enabled, the ArrayElement command | |

transfers the ith element of particular enabled vertex arrays as | |

described below. For each enabled vertex attribute array, it is | |

as though the corresponding command from section 2.14.1.1 were | |

called with a pointer to element i. For each vertex attribute, | |

the corresponding command is VertexAttrib[size][type]v, where size | |

is one of [1,2,3,4], and type is one of [s,f,d,ub], corresponding | |

to the array types short, int, float, double, and ubyte respectively. | |

However, if a given vertex attribute array is disabled, but its | |

corresponding aliased conventional per-vertex parameter's vertex | |

array (as described in section 2.14.1.6) is enabled, then it is | |

as though the corresponding command from section 2.7 or section | |

2.6.2 were called with a pointer to element i. In this case, the | |

corresponding command is determined as described in section 2.8's | |

description of ArrayElement. | |

If the vertex attribute array 0 is enabled, it is as though | |

VertexAttrib[size][type]v(0, ...) is executed last, after the | |

executions of other corresponding commands. If the vertex attribute | |

array 0 is disabled but the vertex array is enabled, it is as though | |

Vertex[size][type]v is executed last, after the executions of other | |

corresponding commands. | |

2.14.5 Vertex State Programs | |

Vertex state programs share the same instruction set as and a similar | |

execution model to vertex programs. While vertex programs are executed | |

implicitly when a vertex transformation is provoked, vertex state programs | |

are executed explicitly, independently of any vertices. Vertex state | |

programs can write program parameter registers, but may not write vertex | |

result registers. Vertex state programs have not been extended beyond the | |

the VP1.0 execution environment, and are offered solely for compatibility | |

with that execution environment. | |

The purpose of a vertex state program is to update program parameter | |

registers by means of an application-defined program. Typically, an | |

application will load a set of program parameters and then execute a | |

vertex state program that reads and updates the program parameter | |

registers. For example, a vertex state program might normalize a set of | |

unnormalized vectors previously loaded as program parameters. The | |

expectation is that subsequently executed vertex programs would use the | |

normalized program parameters. | |

Vertex state programs are loaded with the same LoadProgramNV command (see | |

section 2.14.1.8) used to load vertex programs except that the target must | |

be VERTEX_STATE_PROGRAM_NV when loading a vertex state program. | |

Vertex state programs must conform to a more limited grammar than the | |

grammar for vertex programs. The vertex state program grammar for | |

syntactically valid sequences is the same as the grammar defined in | |

section 2.14.1.8 with the following modified rules: | |

<program> ::= <vp1-program> | |

<vp1-program> ::= "!!VSP1.0" <programBody> "END" | |

<dstReg> ::= <absProgParamReg> | |

| <temporaryReg> | |

<vertexAttribReg> ::= "v" "[" "0" "]" | |

A vertex state program fails to load if it does not write at least | |

one program parameter register. | |

A vertex state program fails to load if it contains more than 128 | |

instructions. | |

A vertex state program fails to load if any instruction sources more | |

than one unique program parameter register. | |

A vertex state program fails to load if any instruction sources | |

more than one unique vertex attribute register (this is necessarily | |

true because only vertex attribute 0 is available in vertex state | |

programs). | |

The error INVALID_OPERATION is generated if a vertex state program | |

fails to load because it is not syntactically correct or for one | |

of the other reasons listed above. | |

A successfully loaded vertex state program is parsed into a sequence | |

of instructions. Each instruction is identified by its tokenized | |

name. The operation of these instructions when executed is defined | |

in section 2.14.1.10. | |

Executing vertex state programs is legal only outside a Begin/End | |

pair. A vertex state program may not read any vertex attribute | |

register other than register zero. A vertex state program may not | |

write any vertex result register. | |

The command | |

ExecuteProgramNV(enum target, uint id, const float *params); | |

executes the vertex state program named by id. The target must be | |

VERTEX_STATE_PROGRAM_NV and the id must be the name of program loaded | |

with a target type of VERTEX_STATE_PROGRAM_NV. params points to | |

an array of four floating-point values that are loaded into vertex | |

attribute register zero (the only vertex attribute readable from a | |

vertex state program). | |

The INVALID_OPERATION error is generated if the named program is | |

nonexistent, is invalid, or the program is not a vertex state | |

program. A vertex state program may not be valid for reasons | |

explained in section 2.14.5. | |

2.14.6, Program Options | |

In the VP1.1 and VP2.0 execution environment, vertex programs may specify | |

one or more program options that modify the execution environment, | |

according to the <option> grammar rule. The set of options available to | |

the program is described below. | |

Section 2.14.6.1, Position-Invariant Vertex Program Option | |

If <vp11-option> or <vp2-option> matches "NV_position_invariant", the | |

vertex program is presumed to be position-invariant. By default, vertex | |

programs are not position-invariant. Even if programs emulate the | |

conventional OpenGL transformation model, they may still not produce the | |

exact same transform results, due to rounding errors or different | |

operation orders. Such programs may not work well for multi-pass | |

rendering algorithms where the second and subsequent passes use an EQUAL | |

depth test. | |

Position-invariant vertex programs do not compute a final vertex position; | |

instead, the GL computes vertex coordinates as described in section 2.10. | |

This computation should produce exactly the same results as the | |

conventional OpenGL transformation model, assuming vertex weighting and | |

vertex blending are disabled. | |

A vertex program that specifies the position-invariant option will fail to | |

load if it writes to the HPOS result register. | |

Additionally, in the VP1.1 execution environment, position-invariant | |

programs can not use relative addressing for program parameters. Any | |

position-invariant VP1.1 program matches the grammar rule | |

<relProgParamReg>, will fail to load. No such restriction exists for | |

VP2.0 programs. | |

For position-invariant programs, the limit on the number of instructions | |

allowed in a program is reduced by four: position-invariant VP1.1 and | |

VP2.0 programs may have no more than 124 or 252 instructions, | |

respectively. | |

2.14.7 Tracking Matrices | |

As a convenience to applications, standard GL matrix state can be | |

tracked into program parameter vectors. This permits vertex programs | |

to access matrices specified through GL matrix commands. | |

In addition to GL's conventional matrices, several additional matrices | |

are available for tracking. These matrices have names of the form | |

MATRIXi_NV where i is between zero and n-1 where n is the value | |

of the MAX_TRACK_MATRICES_NV implementation dependent constant. | |

The MATRIXi_NV constants obey MATRIXi_NV = MATRIX0_NV + i. The value | |

of MAX_TRACK_MATRICES_NV must be at least eight. The maximum | |

stack depth for tracking matrices is defined by the | |

MAX_TRACK_MATRIX_STACK_DEPTH_NV and must be at least 1. | |

The command | |

TrackMatrixNV(enum target, uint address, enum matrix, enum transform); | |

tracks a given transformed version of a particular matrix into | |

a contiguous sequence of four vertex program parameter registers | |

beginning at address. target must be VERTEX_PROGRAM_NV (though | |

tracked matrices apply to vertex state programs as well because both | |

vertex state programs and vertex programs shared the same program | |

parameter registers). matrix must be one of NONE, MODELVIEW, | |

PROJECTION, TEXTURE, TEXTUREi_ARB (where i is between 0 and n-1 | |

where n is the number of texture units supported), COLOR (if | |

the ARB_imaging subset is supported), MODELVIEW_PROJECTION_NV, | |

or MATRIXi_NV. transform must be one of IDENTITY_NV, INVERSE_NV, | |

TRANSPOSE_NV, or INVERSE_TRANSPOSE_NV. The INVALID_VALUE error is | |

generated if address is not a multiple of four. | |

The MODELVIEW_PROJECTION_NV matrix represents the concatenation of | |

the current modelview and projection matrices. If M is the current | |

modelview matrix and P is the current projection matrix, then the | |

MODELVIEW_PROJECTION_NV matrix is C and computed as | |

C = P M | |

Matrix tracking for the specified program parameter register and the | |

next consecutive three registers is disabled when NONE is supplied | |

for matrix. When tracking is disabled the previously tracked program | |

parameter registers retain the state of their last tracked values. | |

Otherwise, the specified transformed version of matrix is tracked into | |

the specified program parameter register and the next three registers. | |

Whenever the matrix changes, the transformed version of the matrix | |

is updated in the specified range of program parameter registers. | |

If TEXTURE is specified for matrix, the texture matrix for the current | |

active texture unit is tracked. If TEXTUREi_ARB is specified for | |

matrix, the <i>th texture matrix is tracked. | |

Matrices are tracked row-wise meaning that the top row of the | |

transformed matrix is loaded into the program parameter address, | |

the second from the top row of the transformed matrix is loaded into | |

the program parameter address+1, the third from the top row of the | |

transformed matrix is loaded into the program parameter address+2, | |

and the bottom row of the transformed matrix is loaded into the | |

program parameter address+3. The transformed matrix may be identical | |

to the specified matrix, the inverse of the specified matrix, the | |

transpose of the specified matrix, or the inverse transpose of the | |

specified matrix, depending on the value of transform. | |

When matrix tracking is enabled for a particular program parameter | |

register sequence, updates to the program parameter using | |

ProgramParameterNV commands, a vertex program, or a vertex state | |

program are not possible. The INVALID_OPERATION error is generated | |

if a ProgramParameterNV command is used to update a program parameter | |

register currently tracking a matrix. | |

The INVALID_OPERATION error is generated by ExecuteProgramNV when | |

the vertex state program requested for execution writes to a program | |

parameter register that is currently tracking a matrix because the | |

program is considered invalid. | |

2.14.8 Required Vertex Program State | |

The state required for vertex programs consists of: | |

a bit indicating whether or not program mode is enabled; | |

a bit indicating whether or not two-sided color mode is enabled; | |

a bit indicating whether or not program-specified point size mode | |

is enabled; | |

256 4-component floating-point program parameter registers; | |

16 4-component vertex attribute registers (though this state is | |

aliased with the current normal, primary color, secondary color, | |

fog coordinate, weights, and texture coordinate sets); | |

24 sets of matrix tracking state for each set of four sequential | |

program parameter registers, consisting of a n-valued integer | |

indicated the tracked matrix or GL_NONE (where n is 5 + the number | |

of texture units supported + the number of tracking matrices | |

supported) and a four-valued integer indicating the transformation | |

of the tracked matrix; | |

an unsigned integer naming the currently bound vertex program | |

and the state must be maintained to indicate which integers | |

are currently in use as program names. | |

Each existent program object consists of a target, a boolean indicating | |

whether the program is resident, an array of type ubyte containing the | |

program string, and the length of the program string array. Initially, | |

no program objects exist. | |

Program mode, two-sided color mode, and program-specified point size | |

mode are all initially disabled. | |

The initial state of all 256 program parameter registers is (0,0,0,0). | |

The initial state of the 16 vertex attribute registers is (0,0,0,1) | |

except in cases where a vertex attribute register aliases to a | |

conventional GL transform mode vertex parameter in which case | |

the initial state is the initial state of the respective aliased | |

conventional vertex parameter. | |

The initial state of the 24 sets of matrix tracking state is NONE | |

for the tracked matrix and IDENTITY_NV for the transformation of the | |

tracked matrix. | |

The initial currently bound program is zero. | |

The client state required to implement the 16 vertex attribute | |

arrays consists of 16 boolean values, 16 memory pointers, 16 integer | |

stride values, 16 symbolic constants representing array types, | |

and 16 integers representing values per element. Initially, the | |

boolean values are each disabled, the memory pointers are each null, | |

the strides are each zero, the array types are each FLOAT, and the | |

integers representing values per element are each four." | |

Additions to Chapter 3 of the OpenGL 1.3 Specification (Rasterization) | |

None. | |

Additions to Chapter 4 of the OpenGL 1.3 Specification (Per-Fragment | |

Operations and the Frame Buffer) | |

None. | |

Additions to Chapter 5 of the OpenGL 1.3 Specification (Special Functions) | |

None. | |

Additions to Chapter 6 of the OpenGL 1.3 Specification (State and | |

State Requests) | |

None. | |

Additions to Appendix A of the OpenGL 1.3 Specification (Invariance) | |

None. | |

Additions to the AGL/GLX/WGL Specifications | |

None. | |

GLX Protocol | |

All relevant protocol is defined in the NV_vertex_program extension. | |

Errors | |

This list includes the errors specified in the NV_vertex_program | |

extension, modified as appropriate. | |

The error INVALID_VALUE is generated if VertexAttribNV is called where | |

index is greater than 15. | |

The error INVALID_VALUE is generated if any ProgramParameterNV has an | |

index is greater than 255 (was 95 in NV_vertex_program). | |

The error INVALID_VALUE is generated if VertexAttribPointerNV is called | |

where index is greater than 15. | |

The error INVALID_VALUE is generated if VertexAttribPointerNV is called | |

where size is not one of 1, 2, 3, or 4. | |

The error INVALID_VALUE is generated if VertexAttribPointerNV is called | |

where stride is negative. | |

The error INVALID_OPERATION is generated if VertexAttribPointerNV is | |

called where type is UNSIGNED_BYTE and size is not 4. | |

The error INVALID_VALUE is generated if LoadProgramNV is used to load a | |

program with an id of zero. | |

The error INVALID_OPERATION is generated if LoadProgramNV is used to load | |

an id that is currently loaded with a program of a different program | |

target. | |

The error INVALID_OPERATION is generated if the program passed to | |

LoadProgramNV fails to load because it is not syntactically correct based | |

on the specified target. The value of PROGRAM_ERROR_POSITION_NV is still | |

updated when this error is generated. | |

The error INVALID_OPERATION is generated if LoadProgramNV has a target of | |

VERTEX_PROGRAM_NV and the specified program fails to load because it does | |

not write the HPOS register at least once. The value of | |

PROGRAM_ERROR_POSITION_NV is still updated when this error is generated. | |

The error INVALID_OPERATION is generated if LoadProgramNV has a target of | |

VERTEX_STATE_PROGRAM_NV and the specified program fails to load because it | |

does not write at least one program parameter register. The value of | |

PROGRAM_ERROR_POSITION_NV is still updated when this error is generated. | |

The error INVALID_OPERATION is generated if the vertex program or vertex | |

state program passed to LoadProgramNV fails to load because it contains | |

more than 128 instructions (VP1 programs) or 256 instructions (VP2 | |

programs). The value of PROGRAM_ERROR_POSITION_NV is still updated when | |

this error is generated. | |

The error INVALID_OPERATION is generated if a program is loaded with | |

LoadProgramNV for id when id is currently loaded with a program of a | |

different target. | |

The error INVALID_OPERATION is generated if BindProgramNV attempts to bind | |

to a program name that is not a vertex program (for example, if the | |

program is a vertex state program). | |

The error INVALID_VALUE is generated if GenProgramsNV is called where n is | |

negative. | |

The error INVALID_VALUE is generated if AreProgramsResidentNV is called | |

and any of the queried programs are zero or do not exist. | |

The error INVALID_OPERATION is generated if ExecuteProgramNV executes a | |

program that does not exist. | |

The error INVALID_OPERATION is generated if ExecuteProgramNV executes a | |

program that is not a vertex state program. | |

The error INVALID_OPERATION is generated if Begin, RasterPos, or a command | |

that performs an explicit Begin is called when vertex program mode is | |

enabled and the currently bound vertex program writes program parameters | |

that are currently being tracked. | |

The error INVALID_OPERATION is generated if ExecuteProgramNV is called and | |

the vertex state program to execute writes program parameters that are | |

currently being tracked. | |

The error INVALID_VALUE is generated if TrackMatrixNV has a target of | |

VERTEX_PROGRAM_NV and attempts to track an address is not a multiple of | |

four. | |

The error INVALID_VALUE is generated if GetProgramParameterNV is called to | |

query an index greater than 255 (was 95 in NV_vertex_program). | |

The error INVALID_VALUE is generated if GetVertexAttribNV is called to | |

query an <index> greater than 15, or if <index> is zero and <pname> is | |

CURRENT_ATTRIB_NV. | |

The error INVALID_VALUE is generated if GetVertexAttribPointervNV is | |

called to query an index greater than 15. | |

The error INVALID_OPERATION is generated if GetProgramivNV is called and | |

the program named id does not exist. | |

The error INVALID_OPERATION is generated if GetProgramStringNV is called | |

and the program named <program> does not exist. | |

The error INVALID_VALUE is generated if GetTrackMatrixivNV is called with | |

an <address> that is not divisible by four or greater than or equal to 256 | |

(was 96 in NV_vertex_program). | |

The error INVALID_VALUE is generated if AreProgramsResidentNV, | |

DeleteProgramsNV, GenProgramsNV, or RequestResidentProgramsNV are called | |

where <n> is negative. | |

The error INVALID_VALUE is generated if LoadProgramNV is called where | |

<len> is negative. | |

The error INVALID_VALUE is generated if ProgramParameters4dvNV or | |

ProgramParameters4fvNV are called where <count> is negative. | |

The error INVALID_VALUE is generated if VertexAttribs{1,2,3,4}{d,f,s}vNV | |

is called where <count> is negative. | |

The error INVALID_ENUM is generated if BindProgramNV, | |

GetProgramParameterfvNV, GetProgramParameterdvNV, GetTrackMatrixivNV, | |

ProgramParameter4fNV, ProgramParameter4dNV, ProgramParameter4fvNV, | |

ProgramParameter4dvNV, ProgramParameters4fvNV, ProgramParameters4dvNV, | |

or TrackMatrixNV are called where <target> is not VERTEX_PROGRAM_NV. | |

The error INVALID_ENUM is generated if LoadProgramNV or | |

ExecuteProgramNV are called where <target> is not either | |

VERTEX_PROGRAM_NV or VERTEX_STATE_PROGRAM_NV. | |

New State | |

(Modify Table X.5, New State Introduced by NV_vertex_program from the | |

NV_vertex_program specification.) | |

Get Value Type Get Command Initial Value Description Sec Attribute | |

--------------------- ------ ----------------------- ------------- ------------------ -------- ------------ | |

PROGRAM_PARAMETER_NV 256xR4 GetProgramParameterNV (0,0,0,0) program parameters 2.14.1.2 - | |

(Modify Table X.7. Vertex Program Per-vertex Execution State. "VP1" and | |

"VP2" refer to the VP1 and VP2 execution environments, respectively.) | |

Get Value Type Get Command Initial Value Description Sec Attribute | |

--------- ------ ----------- ------------- ----------------------- -------- --------- | |

- 12xR4 - (0,0,0,0) VP1 temporary registers 2.14.1.4 - | |

- 16xR4 - (0,0,0,0) VP2 temporary registers 2.14.1.4 - | |

- 15xR4 - (0,0,0,1) vertex result registers 2.14.1.4 - | |

Z4 - (0,0,0,0) VP1 address register 2.14.1.3 - | |

2xZ4 - (0,0,0,0) VP2 address registers 2.14.1.3 - | |

Revision History | |

Rev. Date Author Changes | |

---- -------- ------- -------------------------------------------- | |

33 03/18/08 pbrown Fixed incorrectly documented clamp in the RCC | |

instruction. | |

32 05/16/04 pbrown Documented that it's not possible to results from | |

LG2 that are any more precise than what is | |

available in the fp32 storage format. | |

31 08/17/03 pbrown Added several overlooked opcodes (RCC, SUB, SIN) | |

to the grammar. They are documented in the spec | |

body, however. | |

30 02/28/03 pbrown Fixed incorrect condition code example. | |

29 12/08/02 pbrown Fixed minor bug where "ABS" and "DPH" were listed | |

twice in the grammar. | |

28 10/29/02 pbrown Remove support for indirect branching. Added | |

missing o[CLPx] outputs to the grammar. Minor | |

typo fixes. | |

25 07/19/02 pbrown Fixed several miscellaneous errors in the spec. | |

24 06/28/02 pbrown Fixed several erroneous resource limitations. | |

23 06/07/02 pbrown Removed stray and erroneous abs() from the | |

documentation of the LG2 instruction. | |

22 06/06/02 pbrown Added missing items from NV_vertex_program1_1, in | |

particular, program options. Documented the | |

VP2.0 position-invariant programs have no | |

restrictions on indirect addressing. | |

21 06/19/02 pbrown Cleaned up miscellaneous errors and issues | |

in the spec. | |

20 05/17/02 pbrown Documented LOG instruction as taking the | |

absolute value of the operand, as in VP1.0. | |

Fixed special-case rules for MUL. Added clamps | |

to special-case clamping rules for RCC. | |

18 05/09/02 pbrown Clarified the handling of NaN/UN in certain | |

instructions and conditional operations. | |

17 04/26/02 pbrown Fix incorrectly specified algorithm for computing | |

the y result in the LOG instruction. | |

16 04/21/02 pbrown Added example for "paletted skinning". | |

Documented size limitation (10 bits) on the | |

address register and ARA, ARL, and ARR | |

instructions. The limits needs to be exposed | |

because of the ARA instruction. Cleaned up | |

documentation on absolute value on input | |

operations. Added examples for masked writes and | |

CC updates, and for branching. Fixed | |

out-of-range indexed branch language and | |

pseudocode to clamp to the actual table size | |

(rather than the theoretical maximum). | |

Documented ABS as semi-deprecated in VP2. Fixed | |

special cases for MIN, MAX, SEQ, SGE, SGT, SLE, | |

SLT, and SNE. Fix completely botched description | |

of RET. | |

15 04/05/02 pbrown Updated introduction to indicate that | |

ARL/ARR/ARA all can update condition code. | |

Minor fixes and optimizations to the looping | |

examples. Add missing "set on" opcodes to the | |

grammar. Fixed spec to clamp branch table | |

indices to [0,15]. Added a couple caveats to | |

the "ABS" pseudo-instruction. Documented | |

"ARR" as using IEEE round to nearest even | |

mode. Documented special cases for "SSG". |