skia / external / github.com / KhronosGroup / OpenGL-Registry / eae1d6dde1e283f6fdf803274a2484007e592599 / . / extensions / NV / NV_fragment_program.txt

Name | |

NV_fragment_program | |

Name Strings | |

GL_NV_fragment_program | |

Contact | |

Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com) | |

Mark J. Kilgard, NVIDIA Corporation (mjk 'at' nvidia.com) | |

Notice | |

Copyright NVIDIA Corporation, 2001-2002. | |

IP Status | |

NVIDIA Proprietary. | |

Status | |

Implemented in CineFX (NV30) Emulation driver, August 2002. | |

Shipping in Release 40 NVIDIA driver for CineFX hardware, January 2003. | |

Version | |

Last Modified Date: 2005/05/24 | |

NVIDIA Revision: 73 | |

Number | |

282 | |

Dependencies | |

Written based on the wording of the OpenGL 1.2.1 specification and | |

requires OpenGL 1.2.1. | |

Requires support for the ARB_multitexture extension with at least | |

two texture units. | |

NV_vertex_program affects the definition of this extension. The only | |

dependency is that both extensions use the same mechanisms for defining | |

and binding programs. | |

NV_texture_shader trivially affects the definition of this extension. | |

NV_texture_rectangle trivially affects the definition of this extension. | |

ARB_texture_cube_map trivially affects the definition of this extension. | |

EXT_fog_coord trivially affects the definition of this extension. | |

NV_depth_clamp affects the definition of this extension. | |

ARB_depth_texture and SGIX_depth_texture affect the definition of this | |

extension. | |

NV_float_buffer affects the definition of this extension. | |

ARB_vertex_program affects the definition of this extension. | |

ARB_fragment_program affects the definition of this extension. | |

Overview | |

OpenGL mandates a certain set of configurable per-fragment computations | |

defining texture lookup, texture environment, color sum, and fog | |

operations. Each of these areas provide a useful but limited set of fixed | |

operations. For example, unextended OpenGL 1.2.1 provides only four | |

texture environment modes, color sum, and three fog modes. Many OpenGL | |

extensions have either improved existing functionality or introduced new | |

configurable fragment operations. While these extensions have enabled new | |

and interesting rendering effects, the set of effects is limited by the | |

set of special modes introduced by the extension. This lack of | |

flexibility is in contrast to the high-level of programmability of | |

general-purpose CPUs and other (frequently software-based) shading | |

languages. The purpose of this extension is to expose to the OpenGL | |

application writer an unprecedented degree of programmability in the | |

computation of final fragment colors and depth values. | |

This extension provides a mechanism for defining fragment program | |

instruction sequences for application-defined fragment programs. When in | |

fragment program mode, a program is executed each time a fragment is | |

produced by rasterization. The inputs for the program are the attributes | |

(position, colors, texture coordinates) associated with the fragment and a | |

set of constant registers. A fragment program can perform mathematical | |

computations and texture lookups using arbitrary texture coordinates. The | |

results of a fragment program are new color and depth values for the | |

fragment. | |

This extension defines a programming model including a 4-component vector | |

instruction set, 16- and 32-bit floating-point data types, and a | |

relatively large set of temporary registers. The programming model also | |

includes a condition code vector which can be used to mask register writes | |

at run-time or kill fragments altogether. The syntax, program | |

instructions, and general semantics are similar to those in the | |

NV_vertex_program and NV_vertex_program2 extensions, which provide for the | |

execution of an arbitrary program each time the GL receives a vertex. | |

The fragment program execution environment is designed for efficient | |

hardware implementation and to support a wide variety of programs. By | |

design, the entire set of existing fragment programs defined by existing | |

OpenGL per-fragment computation extensions can be implemented using the | |

extension's programming model. | |

The fragment program execution environment accesses textures via | |

arbitrarily computed texture coordinates. As such, there is no necessary | |

correspondence between the texture coordinates and texture maps previously | |

lumped into a single "texture unit". This extension separates the notion | |

of "texture coordinate sets" and "texture image units" (texture maps and | |

associated parameters), allowing implementations with a different number | |

of each. The initial implementation of this extension will support 8 | |

texture coordinate sets and 16 texture image units. | |

Issues | |

What limitations exist in this extension? | |

RESOLVED: Very few. Programs can not exceed a maximum program length | |

(which is no less than 1024 instructions), and can use no more than | |

32-64 temporary registers. Programs can not access more than one | |

fragment attribute or program parameter (constant) per instruction, | |

but can work around this restriction using temporaries. The number of | |

textures that can be used by a program is limited to the number of | |

texture image units provided by the implementation (16 in the initial | |

implementation of this extension). | |

These limits are fairly high. Additionally, there is no limit on the | |

total number of texture lookups that can be performed by a program. | |

There is no limit on the length of a texture dependency chain -- one | |

can write a program that performs over 1000 consecutive dependent | |

texture lookups. There is no restrictions on dependencies between | |

texture mapping instructions and arithmetic instructions. Texture | |

lookups can be performed using arbitrarily computed texture | |

coordinates. Applications can carry out their calculations with full | |

32-bit single precision, although two lower-precision modes are also | |

available. | |

How does texture mapping work with fragment programs? | |

RESOLVED: This extension provides three instructions used to perform | |

texture lookups. | |

The "TEX" instruction performs a lookup with the (s,t,r) values taken | |

from an interpolated texture coordinate, an arbitrarily computed | |

vector, or even a program constant. The "TXP" instruction performs a | |

similar lookup, except that it uses the fourth component of the source | |

vector to performs a perspective divide, using (s/q, t/q, r/q). In | |

both cases, the GL will automatically compute partial derivatives used | |

for filter and LOD selection. | |

The "TXD" instruction operates like "TEX", except that it allows the | |

program to explicitly specify two additional vectors containing the | |

partial derivatives of the texture coordinate with respect to x and y | |

window coordinates. | |

All three instructions write a filtered texel value to a temporary or | |

output register. Other than the computation of texture coordinates | |

and partial derivatives, texture lookups not performed any differently | |

in fragment program mode. In particular, any applicable LOD biases, | |

wrap modes, minification and magnification filters, and anisotropic | |

filtering controls are still applied in fragment program mode. | |

The results of the texture lookup are available to be used arbitrarily | |

by subsequent fragment program instructions. Fragment programs are | |

allowed to access any texture map arbitrarily many times. | |

Can fragment programs be used to compute depth values? | |

RESOLVED: Yes. A fragment program can perform arbitrary | |

computations to compute a final value for the fragment, which it | |

should write to the "z" component of the o[DEPR] register. The "z" | |

value written should be in the range [0,1], regardless of the size of | |

the depth buffer. | |

To assist in the computation of the final Z value, a fragment program | |

can access the interpolated depth of the fragment (prior to any | |

displacement) by reading the "z" component of the f[WPOS] attribute | |

register. | |

How should near and far plane clipping work in fragment program mode if | |

the current fragment program computes a depth value? | |

RESOLVED: Geometric clipping to the near and far clip plane should be | |

disabled. Clipping should be done based on the depth values computed | |

per-fragment. The rationale is that per-fragment depth displacement | |

operations may effectively move portions of a primitive initially | |

outside the clip volume inside, and vice versa. | |

Note that under the NV_depth_clamp extension, geometric clipping to | |

the near and far clip planes is also disabled, and the fragment depth | |

values are clamped to the depth range. If depth clamp mode is enabled | |

when using a fragment program that computes a depth value, the | |

computed depth value will be clamped to the depth range. | |

Should fragment programs be allowed to use multiple precisions for | |

operands and operations? | |

RESOLVED: Yes. Low-precision operands are generally adequate for | |

representing colors. Allowing low-precision registers also allows for | |

a larger number of temporary registers (at lower precision). | |

Low-precision operations also provide the opportunity for a higher | |

level of performance. | |

Applications are free to use only high-precision operations or mix | |

high- and low-precision operations as necessary. | |

What levels of precision are supported in arithmetic operations? | |

RESOLVED: Arithmetic operations can be performed at three different | |

precisions. 32-bit floating point precision (fp32) uses the IEEE | |

single-precision standard with a sign bit, 8 exponent bits, and 23 | |

mantissa bits. 16-bit floating-point precision (fp16) uses a similar | |

floating-point representation, but with 5 exponent bits and 10 | |

mantissa bits. Additionally, many arithmetic operations can also be | |

carried out at 12-bit fixed point precision (fx12), where values in | |

the range [-2,+2) are represented as signed values with 10 fraction | |

bits. | |

How should the precision with which operations are carried out be | |

specified? Should we infer the precision from the types of the operands | |

or result vectors? Or should it be an attribute of the instruction? | |

RESOLVED: Applications can optionally specify the precision of | |

individual instructions by adding a suffix of "R", "H", and "X" to | |

instruction names to select fp32, fp16, and fx12 precision, | |

respectively. | |

By default, instructions will be carried out using the precision of | |

the destination register. Always inferring the precision from the | |

operands has a number of issues. First, there are a number of | |

operations (e.g., TEX/TXP/TXD) where result type has little to no | |

correspondance to the type of the operands. In these cases, precision | |

suffixes are not supported. Second, one could have instructions | |

automatically cast operands and compute results using the type of the | |

highest precision operand or result. This behavior would be | |

problematic since all fragment attribute registers and program | |

parameters are kept at full precision, but full precision may not be | |

needed by the operation. | |

The choice of precision level allows programs to trade off precision | |

for potentially higher performance. Giving the program explicit | |

control over the precision also allows it to dictate precision | |

explicitly and eliminate any uncertainty over type casting. | |

For instructions whose specified precision is different than the precision | |

of the operands or the result registers, how are the operations performed? | |

How are the condition codes updated? | |

RESOLVED: Operations are performed with operands and results at the | |

precision specified by the instruction. After the operation is | |

complete, the result is converted to the precision of the destination | |

register, after which the condition code is generated. | |

In an alternate approach, the condition code could be generated from | |

the result. However, in some cases, the register contents would not | |

match the condition code. In such cases, it may not be reliable to | |

use the condition code to prevent division by zero or other special | |

cases. | |

How does this extension interact with the ARB_multisample extension? In | |

the ARB_multisample extension, each fragment has multiple depth values. | |

In this extension, a single interpolated depth value may be modified by a | |

fragment program. | |

RESOLVED: The depth values for the extra samples are generated by | |

computing partials of the computed depth value and using these | |

partials to derive the depth values for each of the extra samples. | |

How does this extension interact with polygon offset? Both extensions | |

modify fragment depth values. | |

RESOLVED: As in the base OpenGL spec, the depth offset generated by | |

polygon offset is added during polygon rasterization. The depth value | |

provided to programs in f[WPOS].z already includes polygon offset, if | |

enabled. If the depth value is replaced by a fragment program, the | |

polygon offset value will NOT be recomputed and added back after | |

program execution. | |

This is probably not desirable for fragment programs that modify depth | |

values since the partials used to generate the offset may not match | |

the partials of the computed depth value. Polygon offset for filled | |

polygons can be approximated in a fragment program using the depth | |

partials obtained by the DDX and DDY instructions. This will not work | |

properly for line- and point-mode polygons, since the partials used | |

for offset are computed over the polygon, while the partials resulting | |

from the DDX and DDY instructions are computed along the line (or are | |

zero for point-mode polygons). In addition, separate treatment of | |

points, line segments, and polygons is not possible in a fragment | |

program. | |

Should depth component replacement be an property of the fragment program | |

or a separate enable? | |

RESOLVED: It should be a program property. Using the output register | |

notation simplifies matters: depth components are replaced if and | |

only if the DEPR register is written to. This alleviates the | |

application and driver burden of maintaining separate state. | |

How does this extension affect the handling of q texture coordinates in | |

the OpenGL spec? | |

RESOLVED: Fragment programs are allowed to access an associated q | |

texture coordinate, so this attribute must be produced by | |

rasterization. In unextended OpenGL 1.2, the q coordinate is | |

eliminated in the rasterization portions of the spec after dividing | |

each of s, t, and r by it. This extension updates the specification | |

to pass q coordinates through at least to conventional texture | |

mapping. When fragment program mode are disabled, q coordinates will | |

be eliminated there in an identical manner. This modification has the | |

added benefit of simplifying the equations used for attribute | |

interpolation. | |

How should clip w coordinates be handled by this extension? | |

RESOLVED: Fragment programs are allowed to access the reciprocal of | |

the clip w coordinate, so this attribute must be produced by | |

rasterization. The OpenGL 1.2 spec doesn't explictly enumerate the | |

attributes associated with the fragment, but we add treatment of the w | |

clip coordinate in the appropriate locations. | |

The reciprocal of the clip w coordinate in traditional graphics | |

hardware is produced by screen-space linear interpolation of the | |

reciprocals of the clip w coordinates of the vertices. However, this | |

spec says the clip w coordinate is produced by perspective-correct | |

interpolation of the (non-reciprocated) clip w vertex coordinates. | |

These two formulations turn out to be equivalent, and the latter is | |

more convenient since the core OpenGL spec already contains formulas | |

for perspective-correct interpolation of vertex attributes. | |

What is produced by the TEX/TXP/TXD instructions if the requested texture | |

image is inconsistent? | |

RESOLVED: The result vector is specified to be (0,0,0,0). This | |

behavior is consistent with the NV_texture_shader extension. Note | |

that like in NV_texture_shader, these instructions ignore the standard | |

hierarchy of texture enables and programs can access textures that are | |

not specifically "enabled". | |

Should a minimum precision be specified for certain fragment attribute | |

registers (in particular COL0, COL1) that may not be generated with full | |

fp32 precision? | |

RESOLVED: No. It is expected that the precision of COL0/COL1 should | |

generally be at least as high as that of the frame buffer. | |

Fragment color components (f[COL0] and f[COL1]) are generally | |

low-precision fixed-point values in the range [0,1]. Is it possible to | |

pass unclamped or high-precision color components to fragment programs? | |

RESOLVED: Yes, although you can't exactly call them "colors". | |

High-precision per-vertex color values can be written into any unused | |

texture coordinate set, either via a MultiTexCoord call or using a | |

vertex program. These "texture coordinates" will be interpolated | |

during rasterization, and can be used arbitrarily by a fragment | |

program. | |

In particular, there is no requirement that per-fragment attributes | |

called "texture coordinates" be used for texture mapping. | |

Should this specification guarantee that temporary registers are | |

initialized to zero? | |

RESOLVED: Yes. This will allow for the modular construction of | |

programs that accumulate results in registers. For example, | |

per-fragment lighting may use MAD instructions to accumulate color | |

contributions at each light. Without zero-initialization, the program | |

would require an explicit MOV instruction to load 0 or the use of the | |

MUL instruction for the first light. | |

Should this specification support Unicode program strings? | |

RESOLVED: Not necessary. | |

Programs defined by NV_vertex_program begin with "!!VP1.0". Should | |

fragment programs have a similar identifier? | |

RESOLVED: Yes, "!!FP1.0", identifying the first revision of this | |

fragment program language. | |

Should per-fragment attributes have equivalent integer names in the | |

program language, as per-vertex attributes do in NV_vertex_program? | |

RESOLVED: No. In NV_vertex_program, "generic" vertex attributes | |

could be specified directly by an application using only an attribute | |

number. Those numbers may have no necessary correlation with the | |

conventional attribute names, although conventional vertex attributes | |

are mapped to attribute numbers. However, conventional attributes are | |

the only outputs of vertex programs and of rasterization. Therefore, | |

there is no need for a similar input-by-number functionality for | |

fragment programs. | |

Should we provide the ability to issue instructions that do not update | |

temporary or output registers? | |

RESOLVED: Yes. Programs may issue instructions whose only purpose is | |

to update the condition code register, and requiring such instructions | |

to write to a temporary may require the use of an additional temporary | |

and/or defeat possible program optimizations. We accomplish this by | |

adding two write-only temporary pseudo-registers ("RC" and "HC") that | |

can be specified as destination registers. | |

Do the packing and unpacking instructions in this extension make any | |

sense? | |

RESOLVED: Yes. They are useful for packing and unpacking multiple | |

components in a single channel of a floating-point frame buffer. For | |

example, a 128-bit "RGBA" frame buffer could pack 16 8-bit quantities | |

or 8 16-bit quantities, all of which could be used in later | |

rasterization passes. See the NV_float_buffer extension for more | |

information. | |

Should we provide a method for specifying a fp16 depth component output | |

value? | |

RESOLVED: No. There is no good reason for supporting half-precision | |

Z outputs. Even with 16-bit Z buffers, the 10-bit mantissa of the | |

half-precision float is rather limiting. There would effectively be | |

only 11 good bits in the back half of the Z buffer. | |

Should RequestResidentProgramsNV (or a new equivalent function) take a | |

target? Dealing with working sets of different program types is a bit | |

messy. Should we document some limitation if we get programs of different | |

types? | |

RESOLVED: In retrospect, it may have been a good idea to attach a | |

target to this command, but there isn't a good reason to mess with | |

something that already works for vertex programs. The driver is | |

responsible for ensuring consistent results when the program types | |

specified are mixed. | |

What happens on data type conversions where the original value is not | |

exactly representable in the new data type, either due to overflow or | |

insufficient precision in the destination type? | |

RESOLVED: In case of overflow, the original value is clamped to the | |

+/-INF (fp16 or fp32) or the nearest representable value (fx12). In | |

case of imprecision, the conversion is either to round or truncate to | |

the nearest representable value. | |

Should this extension support IEEE-style denorms? For 32-bit IEEE | |

floating point, denorms are numbers smaller in absolute value than 2^-126. | |

For 16-bit floats used by this extension, denorms are numbers smaller in | |

absolute value than 2^-14. | |

RESOLVED: For 32-bit data types, hardware support for denorms was | |

considered too expensive relative to the benefit provided. | |

Computational results that would otherwise produce denorms are flushed | |

to zero. For 16-bit data types, hardware denorm support will be | |

present. The expense of hardware denorm support is lower and the | |

potential precision benefit is greater for 16-bit data types. | |

OpenGL provides a hierarchy of texture enables. The texture lookup | |

operations in NV_texture_shader effectively override the texture enable | |

hierarchy and select a specific texture to enable. What should be done by | |

this extension? | |

RESOLVED: This extension will build upon NV_texture_shader and reduce | |

the driver overhead of validating the texture enables. Texture | |

lookups can be specified by instructions like "TEX H0, f[TEX2], TEX2, | |

3D", which would indicate to use texture coordinate set number 2 to do | |

a lookup in the texture object bound to the TEXTURE_3D target in | |

texture image unit 2. | |

Each texture unit can have only one "active" target. Programs are not | |

allowed to reference different texture targets in the same texture | |

image unit. In the example above, any other texture instructions | |

using texture image unit 2 must specify the 3D texture target. | |

What is the interaction with NV_register_combiners? | |

RESOLVED: Register combiners are not available when fragment programs | |

are enabled. | |

Previous version of this specification supported the notion of | |

combiner programs, where the result of fragment program execution was | |

a set of four "texture lookup" values that fed the register combiners. | |

For convenience, should we include pseudo-instructions not present in the | |

hardware instruction set that are trivially implementable? For example, | |

absolute value and subtract instructions could fall in this category. An | |

"ABS R1,R0" instruction would be equivalent to "MAX R1,R0,-R0", and a "SUB | |

R2,R0,R1" would be equivalent to "ADD R2,R0,-R1" | |

RESOLVED: In general, yes. A SUB instruction is provided for | |

convenience. This extension does not provide a separate ABS | |

instruction because it supports absolute value operations of each | |

operand. | |

Should there be a '+' in the <optionalSign> portion of the grammar? There | |

isn't one in the GL_NV_vertex_program spec. | |

RESOLVED: Yes, for orthogonality/readability. A '+' obviously adds | |

no functionality. In NV_vertex_program, an <optionalSign> of "-" was | |

always a negation operator. However, in fragment programs, it can | |

also be used as a sign for a constant value. | |

Can the same fragment attribute register, program parameter register, or | |

constants be used for multiple operands in the same instruction? If so, | |

can it be used with different swizzle patterns? | |

RESOLVED: Yes and yes. | |

This extension allows different limits for the number of texture | |

coordinate sets and the number of texture image units (i.e., texture maps | |

and associated data). The state in ActiveTextureARB affects both | |

coordinate sets (TexGen, matrix operations) and image units (TexParameter, | |

TexEnv). How should we deal with this? | |

RESOLVED: Continue to use ActiveTextureARB and emit an | |

INVALID_OPERATION if the active texture refers to an unsupported | |

coordinate set/image unit. Other options included creating dummy | |

(unusable) state for unsupported coordinate sets/image units and | |

continue to use ActiveTextureARB normally, or creating separate state | |

and state-setting commands for coordinate sets and image units. | |

Separate state is the cleanest solution, but would add more calls and | |

potentially cause more programmer confusion. Dummy state would avoid | |

additional error checks, but the demands of dummy state could grow if | |

the number of texture image units and texture coordinate sets | |

increases. | |

The current OpenGL spec is vague as to what state is affected by the | |

active texture selector and has no distination between | |

coordinate-related and image-related state. The state tables could | |

use a good clean-up in this area. | |

The LRP instruction is defined so that the result of "LRP R0, R0, R1, R2" | |

is R0*R1+(1-R0)*R2. There are conflicting precedents here. The | |

definition here matches the "lrp" instruction in the DirectX 8.0 pixel | |

shader language. However, an equivalent RenderMan lerp operation would | |

yield a result of (1-R0)*R1+R0*R2. Which ordering should be implemented? | |

RESOLVED: NVIDIA hardware implements the former operand ordering, and | |

there is no good reason to specify a different ordering. To convert a | |

"LRP" using the latter ordering to NV_fragment_program, swap the third | |

and fourth arguments. | |

Should this extension provide tracking of matrices or any other state, | |

similar to that provided in NV_vertex_program? | |

RESOLVED: No. | |

Should this extension provide global program parameters -- values shared | |

between multiple fragment programs? | |

RESOLVED: No. | |

Should this extension provide program parameters specific to a program? | |

If so, how? | |

RESOLVED: Yes. These parameters will be called "local parameters". | |

This extension will provide both named and numbered local parameters. | |

Local parameters can be managed by the driver and eliminate the need | |

for applications to manage a global name space. | |

Named local parameters work much like standard variable names in most | |

programming languages. They are created using the "DECLARE" | |

instruction within the fragment program itself. For example: | |

DECLARE color = {1,0,0,1}; | |

Named local parameters are used simply by referencing the variable | |

name. They do not require the array syntax like the global parameters | |

in the NV_vertex_program extension. They can be updated using the | |

commands ProgramNamedParameter4[f,fv]NV. | |

Numbered local parameters are not declared. They are used by simply | |

referencing an element of an array called "p". For example, | |

MOV R0, p[12]; | |

loads the value of numbered local parameter 12 into register R0. | |

Numbered local parameters can be updated using the commands | |

ProgramLocalParameter4[d,dv,f,fv]ARB. | |

The numbered local parameter APIs were added to this extension late in | |

its development, and are provided for compatibility with the | |

ARB_vertex_program extension, and what will likely be supported in | |

ARB_fragment_program as well. Providing this mechanism allows | |

programs to use the same mechanisms to set local parameters in both | |

extension. | |

Why are the APIs for setting named and numbered local parameters | |

different? | |

RESOLVED: The named parameter API was created prior to | |

ARB_vertex_program (and the possible future ARB_fragment_program) and | |

uses conventions borrowed from NV_vertex_program. A slightly | |

different API was chosen during the ARB standardization process; see | |

the ARB_vertex_program specification for more details. | |

The named parameter API takes a program ID and a parameter name, and | |

sets the parameter for the program with the specified ID. The | |

specified program does not need to be bound (via BindProgramNV) in | |

order to modify the values of its named parameters. The numbered | |

parameter API takes a program target enum (FRAGMENT_PROGRAM_NV) and a | |

parameter number and modifies the corresponding numbered parameter of | |

the currently bound program. | |

What should be the initial value of uninitialized local parameters? | |

RESOLVED: (0,0,0,0). This choice is somewhat arbitrary, but matches | |

previous extensions (e.g., NV_vertex_program). | |

Should this extension support program parameter arrays? | |

RESOLVED: No hardware support is present. Note that from the point | |

of view of a fragment program, a texture map can be used as a 1-, 2-, | |

or 3-dimensional array of constants. | |

Should this extension provide support constants in fragment programs? If | |

so, how? | |

RESOLVED: Yes. Scalar or vector constants can be defined inline | |

(e.g., "1.0" or "{1,2,3,4}"). In addition, named constants are | |

supported using the "DEFINE" instruction, which allow programmers to | |

change the values of constants used in multiple instructions simply be | |

changing the value assigned to the named constant. | |

Note that because this extension uses program strings, the | |

floating-point value of any constants generated on the fly must be | |

printed to the program string. An alternate method that avoids the | |

need to print constants is to declare a named local program parameter | |

and initialize it with the ProgramNamedParameter4[f,fv]() calls. | |

Should named constants be allowed to be redefined? | |

RESOLVED: No. If you want to redefine the values of constants, you | |

can create an equivalent named program parameter by changing the | |

"DEFINE" keyword to "DECLARE". | |

Should functions used to update or query named local parameters take a | |

zero-terminated string (as with most strings in the C programming | |

language), or should they require an explicit string length? If the | |

former, should we create a version of LoadProgramNV that does not require | |

a string length. | |

RESOLVED: Stick with explicit string length. Strings that are | |

defined as constants can have the length computed at compile-time. | |

Strings read from files will have the length known in advance. | |

Programs to build strings at run-time also likely keep the length | |

up-to-date. Passing an explicit length saves time, since the driver | |

doesn't have to do a strlen(). | |

What is the deal with the alpha of the secondary color? | |

RESOLVED: In unextended OpenGL 1.2, the alpha component of the | |

secondary color is forced to 0.0. In the EXT_secondary_color | |

extension, the alpha of the per-vertex secondary colors is defined to | |

be 0.0. NV_vertex_program allows vertex programs to produce a | |

per-vertex alpha component, but it is forced to zero for the purposes | |

of the color sum. In the NV_register_combiners extension, the alpha | |

component of the secondary color is undefined. What a mess. | |

In this extension, the alpha of the secondary color is well-defined | |

and can be used normally. When in vertex program mode | |

Why are fragment program instructions involving f[FOGC] or f[TEX0] through | |

f[TEX7] automatically carried out at full precision? | |

RESOLVED: This is an artifact of the method that these interpolants | |

are generated the NVIDIA graphics hardware. If such instructions | |

absolutely must be carried out at lower precision, the requirement can | |

be met by first loading the interpolants into a temporary register. | |

With a different number of texture coordinate sets and texture image | |

units, how many copies of each kind of texture state are there? | |

RESOLVED: The intention is that texture state be broken into three | |

groups. (1) There are MAX_TEXTURE_COORDS_NV copies of texture | |

coordinate set state, which includes current texture coordinates, | |

TexGen state, and texture matrices. (2) There are | |

MAX_TEXTURE_IMAGE_UNITS_NV copies of texture image unit state, which | |

include texture maps, texture parameters, LOD bias parameters. (3) | |

There are MAX_TEXTURE_UNITS_ARB copies of legacy OpenGL texture unit | |

state (e.g., texture enables, TexEnv blending state), all of which are | |

unused when in fragment program mode. | |

It is not necessary that MAX_TEXTURE_UNITS_ARB be equal to the minimum | |

of MAX_TEXTURE_COORDS_NV and MAX_TEXTURE_IMAGE_UNITS -- | |

implementations may choose not to extend fixed-function OpenGL texture | |

mapping modes beyond a certain point. | |

The GLX protocol for LoadProgramNV (and ProgramNamedParameterNV) may end | |

up with programs >64KB. This will overflow the limits of the GLX Render | |

protocol, resulting in the need to use RenderLarge path. This is an issue | |

with vertex programs, also. | |

RESOLVED: Yes, it is. | |

Should textures used by fragment programs be declared? For example, | |

"TEXTURE TEX3, 2D", indicating that the 2D texture should be used for all | |

accesses to texture unit 3. The dimension could be dropped from the TEX | |

family of instructions, and some of the compile-time error checking could | |

be dropped. | |

RESOLVED: Maybe it should be, but for better or worse, it isn't. | |

It is not all that uncommon to have negative q values with projective | |

texture mapping, but results are undefined if any q values are negative in | |

this specification. Why? | |

RESOLVED: This restriction carries on a similar one in the initial | |

OpenGL specification. The motivation for this restriction is that | |

when interpolating, it is possible for a fragment to have an | |

interpolated q coordinate at or near 0.0. Since the texture | |

coordinates used for projective texture mapping are s/q, t/q, and r/q, | |

this will result in a divide-by-zero error or suffer from significant | |

numerical instability. Results will be inaccurate for such fragments. | |

Other than the numerical stability issue above, NVIDIA hardware should | |

have no problems with negative q coordinates. | |

Should programs that replace depth have their own special program type, | |

Such as "!!FPD1.0" and "!!FPDC1.0"? | |

RESOLVED: No. If a program has an instruction that writes to | |

o[DEPR], the final fragment depth value is taken from o[DEPR].z. | |

Otherwise, the fragment's original depth value is used. | |

What fx12 value should NaN map to? | |

RESOLVED: For the lack of any better choice, 0.0. | |

How are special-case encodings (-INF, +INF, -0.0, +0.0, NaN) handled for | |

arithmetic and comparison operations? | |

RESOLVED: The special cases for all floating-point operations are | |

designed to match the IEEE specification for floating-point numbers as | |

closely as possible. The results produced by special cases should be | |

enumerated in the sections of this spec describing the operations. | |

There are some cases where the implemented fragment program behavior | |

does not match IEEE conventions, and these cases should be noted in | |

this specification. | |

How can condition codes be used to mask out register writes? How about | |

killing fragments? What other things can you do? | |

RESOLVED: The following example computes a component wise |R1-R2|: | |

SUBC R0, R1, R2; # "C" suffix means update condition code | |

MOV R0 (LT), -R0; # Conditional write mask in parentheses | |

The first instruction computes a component-wise difference between R1 | |

and R2, storing R1-R2 in register R0. The "C" suffix in the | |

instruction means to update the condition code based on the sign of | |

the result vector components. The second instruction inverts the sign | |

of the components of R0. However the "(LT)" portion says that the | |

destination register should be updated only if the corresponding | |

condition code component is LT (negative). This means that only those | |

components of R0 | |

To kill a fragment if the red (x) component of a texture lookup | |

returns zero: | |

TEXC R0, f[TEX0], TEX0, 2D; | |

KIL EQ.x; | |

To kill based on the green (y) component, use "EQ.y" instead. To kill | |

if any of the four components is zero, use "EQ.xyzw" or just "EQ". | |

Fragment programs do not support boolean expressions. These can | |

generally be achieved using conditional write mask. | |

To evaluate the expression "(R0.x == 0) && (R1.x == 0)": | |

MOVC RC.x, R0.x; | |

MOVC RC.x (EQ), R1.x; | |

To evaluate the expression "(R0.x == 0) || (R1.x == 0)": | |

MOVC RC.x, R0.x; | |

MOVC RC.x (NE), R1.x; | |

In both cases, the x component of the condition code will contain "EQ" | |

if and only if the condition is TRUE. | |

How can fragment programs be used to implement non-standard texture | |

filtering modes? | |

RESOLVED: As one example, consider a case where you want to do linear | |

filtering in a 2D texture map, but only horizontally. To achieve | |

this, first set the texture filtering mode to NEAREST. For a 16 x n | |

texture, you might do something like: | |

DEFINE halfTexel = { 0.03125, 0 }; # 1/32 (1/2 a texel) | |

ADD R2, f[TEX0], -halfTexel; # coords of left sample | |

ADD R1, f[TEX0], +halfTexel; # coords of right sample | |

TEX R0, R2, TEX0, 2D; # lookup left sample | |

TEX R1, R1, TEX0, 2D; # lookup right sample | |

MUL R2.x, R2.x, 16; # scale X coords to texels | |

FRC R2.x, R2.x; # get fraction, filter weight | |

LRP R0, R2.x, R1, R0; # blend samples based on weight | |

There are plenty of other interesting things that can be done. | |

Should this specification provide more examples? | |

RESOLVED: Yes, it should. | |

Is the OpenGL ARB working on a multi-vendor standard for fragment | |

programmability? Will there be an ARB_fragment_program extension? If so, | |

how will this extension interact with the ARB standard? | |

RESOLVED: Yes, as of July 2002, there was a multi-vendor working | |

group and a draft specification. The ARB extension is expected to | |

have several features not present in this extension, such as state | |

tracking and global parameters (called "program environment | |

parameters"). It will also likely lack certain features found in this | |

extension. | |

Why does the HEMI mapping apply to the third component of signed HILO | |

textures, but not to unsigned HILO textures? | |

RESOLVED: This behavior matches the behavior of NV_texture_shader | |

(e.g., the DOT_PRODUCT_NV mode). The HEMI mapping will construct the | |

third component of a unit vector whose first two components are | |

encoded in the HILO texture. | |

New Procedures and Functions | |

void ProgramNamedParameter4fNV(uint id, sizei len, const ubyte *name, | |

float x, float y, float z, float w); | |

void ProgramNamedParameter4dNV(uint id, sizei len, const ubyte *name, | |

double x, double y, double z, double w); | |

void ProgramNamedParameter4fvNV(uint id, sizei len, const ubyte *name, | |

const float v[]); | |

void ProgramNamedParameter4dvNV(uint id, sizei len, const ubyte *name, | |

const double v[]); | |

void GetProgramNamedParameterfvNV(uint id, sizei len, const ubyte *name, | |

float *params); | |

void GetProgramNamedParameterdvNV(uint id, sizei len, const ubyte *name, | |

double *params); | |

void ProgramLocalParameter4dARB(enum target, uint index, | |

double x, double y, double z, double w); | |

void ProgramLocalParameter4dvARB(enum target, uint index, | |

const double *params); | |

void ProgramLocalParameter4fARB(enum target, uint index, | |

float x, float y, float z, float w); | |

void ProgramLocalParameter4fvARB(enum target, uint index, | |

const float *params); | |

void GetProgramLocalParameterdvARB(enum target, uint index, | |

double *params); | |

void GetProgramLocalParameterfvARB(enum target, uint index, | |

float *params); | |

New Tokens | |

Accepted by the <cap> parameter of Disable, Enable, and IsEnabled, by the | |

<pname> parameter of GetBooleanv, GetIntegerv, GetFloatv, and GetDoublev, | |

and by the <target> parameter of BindProgramNV, LoadProgramNV, | |

ProgramLocalParameter4dARB, ProgramLocalParameter4dvARB, | |

ProgramLocalParameter4fARB, ProgramLocalParameter4fvARB, | |

GetProgramLocalParameterdvARB, and GetProgramLocalParameterfvARB: | |

FRAGMENT_PROGRAM_NV 0x8870 | |

Accepted by the <pname> parameter of GetBooleanv, GetIntegerv, GetFloatv, | |

and GetDoublev: | |

MAX_TEXTURE_COORDS_NV 0x8871 | |

MAX_TEXTURE_IMAGE_UNITS_NV 0x8872 | |

FRAGMENT_PROGRAM_BINDING_NV 0x8873 | |

MAX_FRAGMENT_PROGRAM_LOCAL_PARAMETERS_NV 0x8868 | |

Accepted by the <name> parameter of GetString: | |

PROGRAM_ERROR_STRING_NV 0x8874 | |

Additions to Chapter 2 of the OpenGL 1.2.1 Specification (OpenGL Operation) | |

Modify Section 2.11, Clipping (p.39) | |

(replace the first paragraph of the section, p. 39) Primitives are clipped | |

to the clip volume. In clip coordinates, the view volume is defined by | |

-w_c <= x_c <= w_c, | |

-w_c <= y_c <= w_c, and | |

-w_c <= z_c <= w_c. | |

Clipping to the near and far clip planes is ignored if fragment program | |

mode (section 3.11) or texture shaders (see NV_texture_shader | |

specification) are enabled, if the current fragment program or texture | |

shader computes per-fragment depth values. In this case, the view volume | |

is defined by: | |

-w_c <= x_c <= w_c and | |

-w_c <= y_c <= w_c. | |

Additions to Chapter 3 of the OpenGL 1.2.1 Specification (Rasterization) | |

Modify Chapter 3 introduction (p. 57) | |

(p.57, modify 1st paragraph) ... Figure 3.1 diagrams the rasterization | |

process. The color value assigned to a fragment is initially determined | |

by the rasterization operations (Sections 3.3 through 3.7) and modified by | |

either the execution of the texturing, color sum, and fog operations as | |

defined in Sections 3.8, 3.9, and 3.10, or of a fragment program defined | |

in Section 3.11. The final depth value is initially determined by the | |

rasterization operations and may be modified by a fragment program. | |

note: Antialiasing Application is renumbered from Section 3.11 to Section | |

3.12. | |

Modify Figure 3.1 (p.58) | |

Primitive Assembly | |

| | |

+-----------+-----------+-----------+-----------+ | |

| | | | | | |

| | | Pixel | | |

Point Line Polygon Rectangle Bitmap | |

Raster- Raster- Raster- Raster- Raster- | |

ization ization ization ization ization | |

| | | | | | |

+-----------+-----------+-----------+-----------+ | |

| | |

| | |

+-----------------+-----------------+ | |

| | | | |

Conventional Texture Fragment | |

Texture Fetch Shaders Programs | |

| | | | |

| +--------------+ | | |

| | | | |

TEXTURE_ o o | | |

SHADER_NV | | |

enable o | | |

| | | |

+-------------+ | | |

| | | | |

Conventional Register | | |

TexEnv Combiners | | |

| | | | |

Color Sum | | | |

| | | | |

Fog | | | |

| | | | |

| +----------+ | | |

| | | | |

REGISTER_ o o | | |

COMBINERS_ | | |

NV enable o | | |

| | | |

+-----------------+ +--------------+ | |

| | | |

FRAGMENT_ o o | |

PROGRAM_ | |

NV enable o | |

| | |

| | |

Coverage | |

Application | |

| | |

v | |

to fragment processing | |

Modify Section 3.3, Points (p.61) | |

All fragments produced in rasterizing a non-antialiased point are assigned | |

the same associated data, which are those of the vertex corresponding to | |

the point. (delete reference to divide by q). | |

If anitialiasing is enabled, then ... The data associated with each | |

fragment are otherwise the data associated with the point being | |

rasterized. (delete reference to divide by q) | |

Modify Section 3.4.1, Basic Line Segment Rasterization (p.66) | |

(Note that t=0 at p_a and t=1 at p_b). The value of an associated datum f | |

from the fragment, whether it be R, G, B, or A (in RGBA mode) or a color | |

index (in color index mode), the s, t, r, or q texture coordinate, or the | |

clip w coordinate (the depth value, window z, must be found using equation | |

3.3, below), is found as | |

f = (1-t) * f_a / w_a + t * f_b / w_b (3.2) | |

--------------------------------- | |

(1-t) / w_a + t / w_b | |

where f_a and f_b are the data associated with the starting and ending | |

endpoints of the segment, respectively; w_a and w_b are the clip | |

w coordinates of the starting and ending endpoints of the segments | |

respectively. Note that linear interpolation would use | |

f = (1-t) * f_a + t * f_b. (3.3) | |

... A GL implementation may choose to approximate equation 3.2 with 3.3, | |

but this will normally lead to unacceptable distortion effects when | |

interpolating texture coordinates or clip w coordinates. | |

Modify Section 3.5.1, Basic Polygon Rasterization (p.71) | |

Denote a datum at p_a, p_b, or p_c ... is given by | |

f = a * f_a / w_a + b * f_b / w_b + c * f_c / w_c (3.4) | |

--------------------------------------------- | |

a / w_a + b / w_b + c / w_c | |

where w_a, w_b, and w_c are the clip w coordinates of p_a, p_b, and p_c, | |

respectively. a, b, and c are the barycentric coordinates of the fragment | |

for which the data are produced. a, b, and c must correspond precisely to | |

the exact coordinates ... at the fragment's center. | |

Just as with line segment rasterization, equation 3.4 may be approximated | |

by | |

f = a * f_a + b * f_b + c * f_c; (3.5) | |

this may yield ... for texture coordinates or clip w coordinates. | |

Modify Section 3.6.4, Rasterization of Pixel Rectangles (p.100) | |

A fragment arising from a group ... are given by those associated with the | |

current raster position. (delete reference to divide by q) | |

Modify Section 3.7, Bitmaps (p.111) | |

Otherwise, a rectangular array ... The associated data for each fragment | |

are those associated with the current raster position. (delete reference | |

to divide by q) Once the fragments have been produced ... | |

Modify Section 3.8, Texturing (p.112) | |

... an image at the location indicated by a fragment's texture coordinates | |

to modify the fragments primary RGBA color. Texturing does not affect the | |

secondary color. | |

Texturing is specified only for RGBA mode; its use in color index mode is | |

undefined. | |

Except when in fragment program mode (Section 3.11), the (s,t,r) texture | |

coordinates used for texturing are the values s/q, t/q, and r/q, | |

respectively, where s, t, r, and q are the texture coordinates associated | |

with the fragment. When in fragment program mode, the (s,t,r) texture | |

coordinates are specified by the program. If q is less than or equal to | |

zero, the results of texturing are undefined. | |

Add new Section 3.11, Fragment Programs (p.140) | |

Fragment program mode is enabled and disabled with the Enable and Disable | |

commands using the symbolic constant FRAGMENT_PROGRAM_NV. When fragment | |

program mode is enabled, standard and extended texturing, color sum, and | |

fog application stages are ignored and a general purpose program is | |

executed instead. | |

A fragment program is a sequence of instructions that execute on a | |

per-fragment basis. In fragment program mode, the currently bound | |

fragment program is executed as each fragment is generated by the | |

rasterization operations. Fragment programs execute a finite fixed | |

sequence of instructions with no branching or looping, and operate | |

independently from the processing of other fragments. Fragment programs | |

are used to compute new color values to be associated with each fragment, | |

and can optionally compute a new depth value for each fragment as well. | |

Fragment program mode is not available in color index mode and is | |

considered disabled, regardless of the state of FRAGMENT_PROGRAM_NV. When | |

fragment program mode is enabled, texture shaders and register combiners | |

(NV_texture_shader and NV_register_combiners extension) are disabled, | |

regardless of the state of TEXTURE_SHADER_NV and REGISTER_COMBINERS_NV. | |

Section 3.11.1, Fragment Program Registers | |

Fragment programs operate on a set of program registers. Each program | |

register is a 4-component vector, whose components are referred to as "x", | |

"y", "z", and "w" respectively. The components of a fragment register are | |

always referred to in this manner, regardless of the meaning of their | |

contents. | |

The four components of each fragment program register have one of two | |

different representations: 32-bit floating-point (fp32) or 16-bit | |

floating-point (fp16). More details on these representations can be found | |

in Section 3.11.4.1. | |

There are several different classes of program registers. Attribute | |

registers (Table X.1) correspond to the fragment's associated data | |

produced by rasterization. Temporary registers (Table X.2) hold | |

intermediate results generated by the fragment program. Output registers | |

(Table X.3) hold the final results of a fragment program. The single | |

condition code register is used to mask writes to other registers or to | |

determine if a fragment should be discarded. | |

Section 3.11.1.1, Fragment Program Attribute Registers | |

The fragment program attribute registers (Table X.1) hold the location of | |

the fragment and the data associated with the fragment produced by | |

rasterization. | |

Fragment Attribute Component | |

Register Name Description Interpretation | |

-------------- ----------------------------------- -------------- | |

f[WPOS] Position of the fragment center. (x,y,z,1/w) | |

f[COL0] Interpolated primary color (r,g,b,a) | |

f[COL1] Interpolated secondary color (r,g,b,a) | |

f[FOGC] Interpolated fog distance/coord (z,0,0,0) | |

f[TEX0] Texture coordinate (unit 0) (s,t,r,q) | |

f[TEX1] Texture coordinate (unit 1) (s,t,r,q) | |

f[TEX2] Texture coordinate (unit 2) (s,t,r,q) | |

f[TEX3] Texture coordinate (unit 3) (s,t,r,q) | |

f[TEX4] Texture coordinate (unit 4) (s,t,r,q) | |

f[TEX5] Texture coordinate (unit 5) (s,t,r,q) | |

f[TEX6] Texture coordinate (unit 6) (s,t,r,q) | |

f[TEX7] Texture coordinate (unit 7) (s,t,r,q) | |

Table X.1: Fragment Attribute Registers. The component interpretation | |

column describes the mapping of attribute values to register components. | |

For example, the "x" component of f[COL0] holds the red color component, | |

and the "x" component of f[TEX0] holds the "s" texture coordinate for | |

texture unit 0. The entries "0" and "1" indicate that the attribute | |

register components hold the constants 0 and 1, respectively. | |

f[WPOS].x and f[WPOS].y hold the (x,y) window coordinates of the fragment | |

center, and relative to the lower left corner of the window. f[WPOS].z | |

holds the associated z window coordinate, normally in the range [0,1]. | |

f[WPOS].w holds the reciprocal of the associated clip w coordinate. | |

f[COL0] and f[COL1] hold the associated RGBA primary and secondary colors | |

of the fragment, respectively. | |

f[FOGC] holds the associated eye distance or fog coordinate normally used | |

for fog computations. | |

f[TEX0] through f[TEX7] hold the associated texture coordinates for | |

texture coordinate sets 0 through 7, respectively. | |

All attribute register components are treated as 32-bit floats. However, | |

the components of primary and secondary colors (f[COL0] and f[COL1]) may | |

be generated with reduced precision. | |

The contents of the fragment attribute registers may not be modified by a | |

fragment program. In addition, each fragment program instruction can use | |

at most one unique attribute register. | |

Section 3.11.1.2, Fragment Program Temporary Registers | |

The fragment temporary registers (Table X.2) hold intermediate values used | |

during the execution of a fragment program. There are 96 temporary | |

register names, but not all can be used simultaneously. | |

Fragment Temporary | |

Register Name Description | |

------------------ ----------------------------------------------------- | |

R0-R31 Four 32-bit (fp32) floating point values (s.e8.m23) | |

H0-H63 Four 16-bit (fp16) floating point values (s.e5.m10) | |

Table X.2: Fragment Temporary Registers. | |

In addition to the normal temporary registers, there are two temporary | |

pseudo-registers, "RC" and "HC". RC and HC are treated as unnumbered, | |

write-only temporary registers. The components of RC have a fp32 data | |

type; the components of HC have a fp16 data type. The sole purpose of | |

these registers is to permit instructions to modify the condition code | |

register (section 3.11.1.4) without overwriting the values in any | |

temporary register. | |

Fragment program instructions can read and write temporary registers. | |

There is no restriction on the number of temporary registers that can be | |

accessed by any given instruction. | |

All temporary registers are initialized to (0,0,0,0) each time a fragment | |

program executes. | |

Section 3.11.1.3, Fragment Program Output Registers | |

The fragment program output registers hold the final results of the | |

fragment program. The possible final results of a fragment program are a | |

high- or low-precision RGBA fragment color, and a fragment depth value. | |

Output | |

Register Name Description | |

------------- ------------------------------------------------------- | |

o[COLR] Final RGBA fragment color, fp32 format | |

o[COLH] Final RGBA fragment color, fp16 format | |

o[DEPR] Final fragment depth value, fp32 format | |

Table X.3: Fragment Program Output Registers. | |

o[COLR] and o[COLH] specify the color of a fragment. These two registers | |

are identical, except for the associated data type of the components. The | |

R, G, B, and A components of the fragment color are taken from the x, y, | |

z, and w components respectively of the o[COLR] or o[COLH]. A fragment | |

program will fail to load if it writes to both o[COLR] and o[COLH]. | |

o[DEPR] can be used to replace the associated depth value of a fragment. | |

The new depth value is taken from the z component of o[DEPR]. If a | |

fragment program does not write to o[DEPR], the associated depth value is | |

unmodified. | |

A fragment program will fail to load if it does not write to at least one | |

output register. | |

The fragment program output registers may not be read by a fragment | |

program, but may be written to multiple times. | |

The values of all fragment program output registers are initially | |

undefined. | |

Section 3.11.1.4, Fragment Program Condition Code Register | |

The condition code register (CC) is a single four-component vector. Each | |

component of this register is one of four enumerated values: GT (greater | |

than), EQ (equal), LT (less than), or UN (unordered). The condition code | |

register can be used to mask writes to fragment data register components | |

or to terminate processing of a fragment altogether (via the KIL | |

instruction). | |

Most fragment program instructions can optionally update the condition | |

code register. When a fragment program instruction updates the condition | |

code register, a condition code component is set to LT if the | |

corresponding component of the result vector is less than zero, EQ if it | |

is equal to zero, GT if it is greater than zero, and UN if it is NaN (not | |

a number). | |

The condition code register is initialized to a vector of EQ values each | |

time a fragment program executes. | |

Section 3.11.2, Fragment Program Parameters | |

In addition to using the registers defined in Section 3.11.1, fragment | |

programs may also use fragment program parameters in their computation. | |

Fragment program parameters are constant during the execution of fragment | |

programs, but some parameters may be modified outside the execution of a | |

fragment program. | |

There are five different types of program parameters: embedded scalar | |

constants, embedded vector constants, named constants, named local | |

parameters, and numbered local parameters. | |

Embedded scalar constants are written as standard floating-point numbers | |

with an optional sign designator ("+" or "-") and optional scientific | |

notation (e.g., "E+06", meaning "times 10^6"). | |

Embedded vector constants are written as a comma-separated array of one to | |

four scalar constants, surrounded by braces (like a C/C++ array | |

initializer). Vector constants are always treated as 4-component vectors: | |

constants with fewer than four components are expanded to 4-components by | |

filling missing y and z components with 0.0 and missing w components with | |

1.0. Thus, the vector constant "{2}" is equivalent to "{2,0,0,1}", | |

"{3,4}" is equivalent to "{3,4,0,1}", and "{5,6,7}" is equivalent to | |

"{5,6,7,1}". | |

Named constants allow fragment program instructions to define scalar or | |

vector constants that can be referenced by name. Named constants are | |

created using the DEFINE instruction: | |

DEFINE pi = 3.1415926535; | |

DEFINE color = {0.2, 0.5, 0.8, 1.0}; | |

The DEFINE instruction associates a constant name with a scalar or vector | |

constant value. Subsequent fragment program instructions that use the | |

constant name are equivalent to those using the corresponding constant | |

value. | |

Named local parameters are similar to named vector constants, but their | |

values can be modified after the program is loaded. Local parameters are | |

created using the DECLARE instruction: | |

DECLARE fog_color1; | |

DECLARE fog_color2 = {0.3, 0.6, 0.9, 0.1}; | |

The DECLARE instruction creates a 4-component vector associated with the | |

local parameter name. Subsequent fragment program instructions | |

referencing the local parameter name are processed as though the current | |

value of the local parameter vector were specified instead of the | |

parameter name. A DECLARE instruction can optionally specify an initial | |

value for the local parameter, which can be either a scalar or vector | |

constant. Scalar constants are expanded to 4-component vectors by | |

replicating the scalar value in each component. The initial value of | |

local parameters not initialized by the program is (0,0,0,0). | |

A named local parameter for a specific program can be updated using the | |

calls ProgramNamedParameter4fNV or ProgramNamedParameter4fvNV (section | |

5.7). Named local parameters are accessible only by the program in which | |

they are defined. Modifying a local parameter affects the only the | |

associated program and does not affect local parameters with the same name | |

that are found in any other fragment program. | |

Numbered local parameters are similar to named local parameters, except | |

that they are referred to by number and are not declared in fragment | |

programs. Each fragment program object has an array of four-component | |

floating-point vectors that can be used by the program. The number of | |

vectors is given by the implementation-dependent constant | |

MAX_FRAGMENT_PROGRAM_LOCAL_PARAMETERS_NV, and must be at least 64. A | |

numbered local parameter is accessed by a fragment program as members of | |

an array called "p". For example, the instruction | |

MOV R0, p[31]; | |

copies the contents of numbered local parameter 31 into temporary register | |

R0. | |

Constant and local parameter names can be arbitrary strings consisting of | |

letters (upper or lower-case), numbers, underscores ("_"), and dollar | |

signs ("$"). Keywords defined in the grammar (including instruction | |

names) can not be used as constant names, nor can strings that start with | |

numbers, or strings that specify valid temporary register or texture | |

numbers (e.g., "R0"-"R31", "H0"-"H63"", "TEX0"-"TEX15"). A fragment | |

program will fail to load if a DEFINE or DECLARE instruction specifies an | |

invalid constant or local parameter name. | |

A fragment program will fail to load if an instruction contains a named | |

parameter not specified in a previous DEFINE or DECLARE instruction. A | |

fragment program will also fail to load if a DEFINE or DECLARE instruction | |

attempts to re-define a named parameter specified in a previous DEFINE or | |

DECLARE instruction. | |

The contents of the fragment program parameters may not be modified by a | |

fragment program. In addition, each fragment program instruction can | |

normally use at most one unique program parameter. The only exception to | |

this rule is if all program parameter references specify named or embedded | |

constants that taken together contain no more than four unique scalar | |

values. For such instructions, the GL will automatically generate an | |

equivalent instruction that references a single merged vector constant. | |

This merging allows programs to specify instructions like the following: | |

Instruction Equivalent Instruction | |

--------------------- --------------------------------------- | |

MAD R0, R1, 2, -1; MAD R0, R1, {2,-1,0,0}.x, {2,-1,0,0}.y; | |

ADD R0, {1,2,3,4}, 4; ADD R0, {1,2,3,4}.xyzw, {1,2,3,4}.w; | |

Before counting the number of unique values, any named constants are first | |

converted to the equivalent embedded constants. When generating a | |

combined vector constant, the GL does not perform swizzling, component | |

selection, negation, or absolute value operations. The following | |

instructions are invalid, as they contain more than four unique scalar | |

values. | |

Invalid Instructions | |

----------------------------------- | |

ADD R0, {1,2,3,4}, -4; | |

ADD R0, {1,2,3,4}, |-4|; | |

ADD R0, {1,2,3,4}, -{-1,-2,-3,-4}; | |

ADD R0, {1,2,3,4}, {4,5,6,7}.x; | |

Section 3.11.3, Fragment Program Specification | |

Fragment programs are specified as an array of ubytes. The array is a | |

string of ASCII characters encoding the program. The command | |

LoadProgramNV loads a fragment program when the target parameter is | |

FRAGMENT_PROGRAM_NV. The command BindProgramNV enables a fragment program | |

for execution. | |

At program load time, the program is parsed into a set of tokens possibly | |

separated by white space. Spaces, tabs, newlines, carriage returns, and | |

comments are considered whitespace. Comments begin with the character "#" | |

and are terminated by a newline, a carriage return, or the end of the | |

program array. Fragment programs are case-sensitive -- upper and lower | |

case letters are treated differently. The proper choice of case can be | |

inferred from the grammar. | |

The Backus-Naur Form (BNF) grammar below specifies the syntactically valid | |

sequences for fragment programs. The set of valid tokens can be inferred | |

from the grammar. The token "" represents an empty string and is used to | |

indicate optional rules. A program is invalid if it contains any | |

undefined tokens or characters. | |

<program> ::= <progPrefix> <instructionSequence> "END" | |

<progPrefix> ::= "!!FP1.0" | |

<instructionSequence> ::= <instructionSequence> <instructionStatement> | |

| <instructionStatement> | |

<instructionStatement> ::= <instruction> ";" | |

| <constantDefinition> ";" | |

| <localDeclaration> ";" | |

<instruction> ::= <VECTORop-instruction> | |

| <SCALARop-instruction> | |

| <BINSCop-instruction> | |

| <BINop-instruction> | |

| <TRIop-instruction> | |

| <KILop-instruction> | |

| <TEXop-instruction> | |

| <TXDop-instruction> | |

<VECTORop-instruction> ::= <VECTORop> <maskedDstReg> "," | |

<vectorSrc> | |

<VECTORop> ::= "DDX" | "DDX_SAT" | |

| "DDXR" | "DDXR_SAT" | |

| "DDXH" | "DDXH_SAT" | |

| "DDXC" | "DDXC_SAT" | |

| "DDXRC" | "DDXRC_SAT" | |

| "DDXHC" | "DDXHC_SAT" | |

| "DDY" | "DDY_SAT" | |

| "DDYR" | "DDYR_SAT" | |

| "DDYH" | "DDYH_SAT" | |

| "DDYC" | "DDYC_SAT" | |

| "DDYRC" | "DDYRC_SAT" | |

| "DDYHC" | "DDYHC_SAT" | |

| "FLR" | "FLR_SAT" | |

| "FLRR" | "FLRR_SAT" | |

| "FLRH" | "FLRH_SAT" | |

| "FLRX" | "FLRX_SAT" | |

| "FLRC" | "FLRC_SAT" | |

| "FLRRC" | "FLRRC_SAT" | |

| "FLRHC" | "FLRHC_SAT" | |

| "FLRXC" | "FLRXC_SAT" | |

| "FRC" | "FRC_SAT" | |

| "FRCR" | "FRCR_SAT" | |

| "FRCH" | "FRCH_SAT" | |

| "FRCX" | "FRCX_SAT" | |

| "FRCC" | "FRCC_SAT" | |

| "FRCRC" | "FRCRC_SAT" | |

| "FRCHC" | "FRCHC_SAT" | |

| "FRCXC" | "FRCXC_SAT" | |

| "LIT" | "LIT_SAT" | |

| "LITR" | "LITR_SAT" | |

| "LITH" | "LITH_SAT" | |

| "LITC" | "LITC_SAT" | |

| "LITRC" | "LITRC_SAT" | |

| "LITHC" | "LITHC_SAT" | |

| "MOV" | "MOV_SAT" | |

| "MOVR" | "MOVR_SAT" | |

| "MOVH" | "MOVH_SAT" | |

| "MOVX" | "MOVX_SAT" | |

| "MOVC" | "MOVC_SAT" | |

| "MOVRC" | "MOVRC_SAT" | |

| "MOVHC" | "MOVHC_SAT" | |

| "MOVXC" | "MOVXC_SAT" | |

| "PK2H" | |

| "PK2US" | |

| "PK4B" | |

| "PK4UB" | |

<SCALARop-instruction> ::= <SCALARop> <maskedDstReg> "," | |

<scalarSrc> | |

<SCALARop> ::= "COS" | "COS_SAT" | |

| "COSR" | "COSR_SAT" | |

| "COSH" | "COSH_SAT" | |

| "COSC" | "COSC_SAT" | |

| "COSRC" | "COSRC_SAT" | |

| "COSHC" | "COSHC_SAT" | |

| "EX2" | "EX2_SAT" | |

| "EX2R" | "EX2R_SAT" | |

| "EX2H" | "EX2H_SAT" | |

| "EX2C" | "EX2C_SAT" | |

| "EX2RC" | "EX2RC_SAT" | |

| "EX2HC" | "EX2HC_SAT" | |

| "LG2" | "LG2_SAT" | |

| "LG2R" | "LG2R_SAT" | |

| "LG2H" | "LG2H_SAT" | |

| "LG2C" | "LG2C_SAT" | |

| "LG2RC" | "LG2RC_SAT" | |

| "LG2HC" | "LG2HC_SAT" | |

| "RCP" | "RCP_SAT" | |

| "RCPR" | "RCPR_SAT" | |

| "RCPH" | "RCPH_SAT" | |

| "RCPC" | "RCPC_SAT" | |

| "RCPRC" | "RCPRC_SAT" | |

| "RCPHC" | "RCPHC_SAT" | |

| "RSQ" | "RSQ_SAT" | |

| "RSQR" | "RSQR_SAT" | |

| "RSQH" | "RSQH_SAT" | |

| "RSQC" | "RSQC_SAT" | |

| "RSQRC" | "RSQRC_SAT" | |

| "RSQHC" | "RSQHC_SAT" | |

| "SIN" | "SIN_SAT" | |

| "SINR" | "SINR_SAT" | |

| "SINH" | "SINH_SAT" | |

| "SINC" | "SINC_SAT" | |

| "SINRC" | "SINRC_SAT" | |

| "SINHC" | "SINHC_SAT" | |

| "UP2H" | "UP2H_SAT" | |

| "UP2HC" | "UP2HC_SAT" | |

| "UP2US" | "UP2US_SAT" | |

| "UP2USC" | "UP2USC_SAT" | |

| "UP4B" | "UP4B_SAT" | |

| "UP4BC" | "UP4BC_SAT" | |

| "UP4UB" | "UP4UB_SAT" | |

| "UP4UBC" | "UP4UBC_SAT" | |

<BINSCop-instruction> ::= <BINSCop> <maskedDstReg> "," | |

<scalarSrc> "," <scalarSrc> | |

<BINSCop> ::= "POW" | "POW_SAT" | |

| "POWR" | "POWR_SAT" | |

| "POWH" | "POWH_SAT" | |

| "POWC" | "POWC_SAT" | |

| "POWRC" | "POWRC_SAT" | |

| "POWHC" | "POWHC_SAT" | |

<BINop-instruction> ::= <BINop> <maskedDstReg> "," | |

<vectorSrc> "," <vectorSrc> | |

<BINop> ::= "ADD" | "ADD_SAT" | |

| "ADDR" | "ADDR_SAT" | |

| "ADDH" | "ADDH_SAT" | |

| "ADDX" | "ADDX_SAT" | |

| "ADDC" | "ADDC_SAT" | |

| "ADDRC" | "ADDRC_SAT" | |

| "ADDHC" | "ADDHC_SAT" | |

| "ADDXC" | "ADDXC_SAT" | |

| "DP3" | "DP3_SAT" | |

| "DP3R" | "DP3R_SAT" | |

| "DP3H" | "DP3H_SAT" | |

| "DP3X" | "DP3X_SAT" | |

| "DP3C" | "DP3C_SAT" | |

| "DP3RC" | "DP3RC_SAT" | |

| "DP3HC" | "DP3HC_SAT" | |

| "DP3XC" | "DP3XC_SAT" | |

| "DP4" | "DP4_SAT" | |

| "DP4R" | "DP4R_SAT" | |

| "DP4H" | "DP4H_SAT" | |

| "DP4X" | "DP4X_SAT" | |

| "DP4C" | "DP4C_SAT" | |

| "DP4RC" | "DP4RC_SAT" | |

| "DP4HC" | "DP4HC_SAT" | |

| "DP4XC" | "DP4XC_SAT" | |

| "DST" | "DST_SAT" | |

| "DSTR" | "DSTR_SAT" | |

| "DSTH" | "DSTH_SAT" | |

| "DSTC" | "DSTC_SAT" | |

| "DSTRC" | "DSTRC_SAT" | |

| "DSTHC" | "DSTHC_SAT" | |

| "MAX" | "MAX_SAT" | |

| "MAXR" | "MAXR_SAT" | |

| "MAXH" | "MAXH_SAT" | |

| "MAXX" | "MAXX_SAT" | |

| "MAXC" | "MAXC_SAT" | |

| "MAXRC" | "MAXRC_SAT" | |

| "MAXHC" | "MAXHC_SAT" | |

| "MAXXC" | "MAXXC_SAT" | |

| "MIN" | "MIN_SAT" | |

| "MINR" | "MINR_SAT" | |

| "MINH" | "MINH_SAT" | |

| "MINX" | "MINX_SAT" | |

| "MINC" | "MINC_SAT" | |

| "MINRC" | "MINRC_SAT" | |

| "MINHC" | "MINHC_SAT" | |

| "MINXC" | "MINXC_SAT" | |

| "MUL" | "MUL_SAT" | |

| "MULR" | "MULR_SAT" | |

| "MULH" | "MULH_SAT" | |

| "MULX" | "MULX_SAT" | |

| "MULC" | "MULC_SAT" | |

| "MULRC" | "MULRC_SAT" | |

| "MULHC" | "MULHC_SAT" | |

| "MULXC" | "MULXC_SAT" | |

| "RFL" | "RFL_SAT" | |

| "RFLR" | "RFLR_SAT" | |

| "RFLH" | "RFLH_SAT" | |

| "RFLC" | "RFLC_SAT" | |

| "RFLRC" | "RFLRC_SAT" | |

| "RFLHC" | "RFLHC_SAT" | |

| "SEQ" | "SEQ_SAT" | |

| "SEQR" | "SEQR_SAT" | |

| "SEQH" | "SEQH_SAT" | |

| "SEQX" | "SEQX_SAT" | |

| "SEQC" | "SEQC_SAT" | |

| "SEQRC" | "SEQRC_SAT" | |

| "SEQHC" | "SEQHC_SAT" | |

| "SEQXC" | "SEQXC_SAT" | |

| "SFL" | "SFL_SAT" | |

| "SFLR" | "SFLR_SAT" | |

| "SFLH" | "SFLH_SAT" | |

| "SFLX" | "SFLX_SAT" | |

| "SFLC" | "SFLC_SAT" | |

| "SFLRC" | "SFLRC_SAT" | |

| "SFLHC" | "SFLHC_SAT" | |

| "SFLXC" | "SFLXC_SAT" | |

| "SGE" | "SGE_SAT" | |

| "SGER" | "SGER_SAT" | |

| "SGEH" | "SGEH_SAT" | |

| "SGEX" | "SGEX_SAT" | |

| "SGEC" | "SGEC_SAT" | |

| "SGERC" | "SGERC_SAT" | |

| "SGEHC" | "SGEHC_SAT" | |

| "SGEXC" | "SGEXC_SAT" | |

| "SGT" | "SGT_SAT" | |

| "SGTR" | "SGTR_SAT" | |

| "SGTH" | "SGTH_SAT" | |

| "SGTX" | "SGTX_SAT" | |

| "SGTC" | "SGTC_SAT" | |

| "SGTRC" | "SGTRC_SAT" | |

| "SGTHC" | "SGTHC_SAT" | |

| "SGTXC" | "SGTXC_SAT" | |

| "SLE" | "SLE_SAT" | |

| "SLER" | "SLER_SAT" | |

| "SLEH" | "SLEH_SAT" | |

| "SLEX" | "SLEX_SAT" | |

| "SLEC" | "SLEC_SAT" | |

| "SLERC" | "SLERC_SAT" | |

| "SLEHC" | "SLEHC_SAT" | |

| "SLEXC" | "SLEXC_SAT" | |

| "SLT" | "SLT_SAT" | |

| "SLTR" | "SLTR_SAT" | |

| "SLTH" | "SLTH_SAT" | |

| "SLTX" | "SLTX_SAT" | |

| "SLTC" | "SLTC_SAT" | |

| "SLTRC" | "SLTRC_SAT" | |

| "SLTHC" | "SLTHC_SAT" | |

| "SLTXC" | "SLTXC_SAT" | |

| "SNE" | "SNE_SAT" | |

| "SNER" | "SNER_SAT" | |

| "SNEH" | "SNEH_SAT" | |

| "SNEX" | "SNEX_SAT" | |

| "SNEC" | "SNEC_SAT" | |

| "SNERC" | "SNERC_SAT" | |

| "SNEHC" | "SNEHC_SAT" | |

| "SNEXC" | "SNEXC_SAT" | |

| "STR" | "STR_SAT" | |

| "STRR" | "STRR_SAT" | |

| "STRH" | "STRH_SAT" | |

| "STRX" | "STRX_SAT" | |

| "STRC" | "STRC_SAT" | |

| "STRRC" | "STRRC_SAT" | |

| "STRHC" | "STRHC_SAT" | |

| "STRXC" | "STRXC_SAT" | |

| "SUB" | "SUB_SAT" | |

| "SUBR" | "SUBR_SAT" | |

| "SUBH" | "SUBH_SAT" | |

| "SUBX" | "SUBX_SAT" | |

| "SUBC" | "SUBC_SAT" | |

| "SUBRC" | "SUBRC_SAT" | |

| "SUBHC" | "SUBHC_SAT" | |

| "SUBXC" | "SUBXC_SAT" | |

<TRIop-instruction> ::= <TRIop> <maskedDstReg> "," | |

<vectorSrc> "," <vectorSrc> "," | |

<vectorSrc> | |

<TRIop> ::= "MAD" | "MAD_SAT" | |

| "MADR" | "MADR_SAT" | |

| "MADH" | "MADH_SAT" | |

| "MADX" | "MADX_SAT" | |

| "MADC" | "MADC_SAT" | |

| "MADRC" | "MADRC_SAT" | |

| "MADHC" | "MADHC_SAT" | |

| "MADXC" | "MADXC_SAT" | |

| "LRP" | "LRP_SAT" | |

| "LRPR" | "LRPR_SAT" | |

| "LRPH" | "LRPH_SAT" | |

| "LRPX" | "LRPX_SAT" | |

| "LRPC" | "LRPC_SAT" | |

| "LRPRC" | "LRPRC_SAT" | |

| "LRPHC" | "LRPHC_SAT" | |

| "LRPXC" | "LRPXC_SAT" | |

| "X2D" | "X2D_SAT" | |

| "X2DR" | "X2DR_SAT" | |

| "X2DH" | "X2DH_SAT" | |

| "X2DC" | "X2DC_SAT" | |

| "X2DRC" | "X2DRC_SAT" | |

| "X2DHC" | "X2DHC_SAT" | |

<KILop-instruction> ::= <KILop> <ccMask> | |

<KILop> ::= "KIL" | |

<TEXop-instruction> ::= <TEXop> <maskedDstReg> "," | |

<vectorSrc> "," <texImageId> | |

<TEXop> ::= "TEX" | "TEX_SAT" | |

| "TEXC" | "TEXC_SAT" | |

| "TXP" | "TXP_SAT" | |

| "TXPC" | "TXPC_SAT" | |

<TXDop-instruction> ::= <TXDop> <maskedDstReg> "," | |

<vectorSrc> "," <vectorSrc> "," | |

<vectorSrc> "," <texImageId> | |

<TXDop> ::= "TXD" | "TXD_SAT" | |

| "TXDC" | "TXDC_SAT" | |

<scalarSrc> ::= <absScalarSrc> | |

| <baseScalarSrc> | |

<absScalarSrc> ::= <negate> "|" <baseScalarSrc> "|" | |

<baseScalarSrc> ::= <signedScalarConstant> | |

| <negate> <namedScalarConstant> | |

| <negate> <vectorConstant> <scalarSuffix> | |

| <negate> <namedLocalParameter> <scalarSuffix> | |

| <negate> <numberedLocal> <scalarSuffix> | |

| <negate> <srcRegister> <scalarSuffix> | |

<vectorSrc> ::= <absVectorSrc> | |

| <baseVectorSrc> | |

<absVectorSrc> ::= <negate> "|" <baseVectorSrc> "|" | |

<baseVectorSrc> ::= <signedScalarConstant> | |

| <negate> <namedScalarConstant> | |

| <negate> <vectorConstant> <scalarSuffix> | |

| <negate> <vectorConstant> <swizzleSuffix> | |

| <negate> <namedLocalParameter> <scalarSuffix> | |

| <negate> <namedLocalParameter> <swizzleSuffix> | |

| <negate> <numberedLocal> <scalarSuffix> | |

| <negate> <numberedLocal> <swizzleSuffix> | |

| <negate> <srcRegister> <scalarSuffix> | |

| <negate> <srcRegister> <swizzleSuffix> | |

<maskedDstReg> ::= <dstRegister> <optionalWriteMask> | |

<optionalCCMask> | |

<dstRegister> ::= <fragTempReg> | |

| <fragOutputReg> | |

| "RC" | |

| "HC" | |

<optionalCCMask> ::= "(" <ccMask> ")" | |

| "" | |

<ccMask> ::= <ccMaskRule> <swizzleSuffix> | |

| <ccMaskRule> <scalarSuffix> | |

<ccMaskRule> ::= "EQ" | "GE" | "GT" | "LE" | "LT" | "NE" | | |

"TR" | "FL" | |

<optionalWriteMask> ::= "" | |

| "." "x" | |

| "." "y" | |

| "." "x" "y" | |

| "." "z" | |

| "." "x" "z" | |

| "." "y" "z" | |

| "." "x" "y" "z" | |

| "." "w" | |

| "." "x" "w" | |

| "." "y" "w" | |

| "." "x" "y" "w" | |

| "." "z" "w" | |

| "." "x" "z" "w" | |

| "." "y" "z" "w" | |

| "." "x" "y" "z" "w" | |

<srcRegister> ::= <fragAttribReg> | |

| <fragTempReg> | |

<fragAttribReg> ::= "f" "[" <fragAttribRegId> "]" | |

<fragAttribRegId> ::= "WPOS" | "COL0" | "COL1" | "FOGC" | "TEX0" | |

| "TEX1" | "TEX2" | "TEX3" | "TEX4" | "TEX5" | |

| "TEX6" | "TEX7" | |

<fragTempReg> ::= <fragF32Reg> | |

| <fragF16Reg> | |

<fragF32Reg> ::= "R0" | "R1" | "R2" | "R3" | |

| "R4" | "R5" | "R6" | "R7" | |

| "R8" | "R9" | "R10" | "R11" | |

| "R12" | "R13" | "R14" | "R15" | |

| "R16" | "R17" | "R18" | "R19" | |

| "R20" | "R21" | "R22" | "R23" | |

| "R24" | "R25" | "R26" | "R27" | |

| "R28" | "R29" | "R30" | "R31" | |

<fragF16Reg> ::= "H0" | "H1" | "H2" | "H3" | |

| "H4" | "H5" | "H6" | "H7" | |

| "H8" | "H9" | "H10" | "H11" | |

| "H12" | "H13" | "H14" | "H15" | |

| "H16" | "H17" | "H18" | "H19" | |

| "H20" | "H21" | "H22" | "H23" | |

| "H24" | "H25" | "H26" | "H27" | |

| "H28" | "H29" | "H30" | "H31" | |

| "H32" | "H33" | "H34" | "H35" | |

| "H36" | "H37" | "H38" | "H39" | |

| "H40" | "H41" | "H42" | "H43" | |

| "H44" | "H45" | "H46" | "H47" | |

| "H48" | "H49" | "H50" | "H51" | |

| "H52" | "H53" | "H54" | "H55" | |

| "H56" | "H57" | "H58" | "H59" | |

| "H60" | "H61" | "H62" | "H63" | |

<fragOutputReg> ::= "o" "[" <fragOutputRegName> "]" | |

<fragOutputRegName> ::= "COLR" | "COLH" | "DEPR" | |

<numberedLocal> ::= "p" "[" <localNumber> "]" | |

<localNumber> ::= <integer> from 0 to | |

MAX_FRAGMENT_PROGRAM_LOCAL_PARAMETERS_NV - 1 | |

<scalarSuffix> ::= "." <component> | |

<swizzleSuffix> ::= "" | |

| "." <component> <component> | |

<component> <component> | |

<component> ::= "x" | "y" | "z" | "w" | |

<texImageId> ::= <texImageUnit> "," <texImageTarget> | |

<texImageUnit> ::= "TEX0" | "TEX1" | "TEX2" | "TEX3" | |

| "TEX4" | "TEX5" | "TEX6" | "TEX7" | |

| "TEX8" | "TEX9" | "TEX10" | "TEX11" | |

| "TEX12" | "TEX13" | "TEX14" | "TEX15" | |

<texImageTarget> ::= "1D" | "2D" | "3D" | "CUBE" | "RECT" | |

<constantDefinition> ::= "DEFINE" <namedVectorConstant> "=" | |

<vectorConstant> | |

| "DEFINE" <namedScalarConstant> "=" | |

<scalarConstant> | |

<localDeclaration> ::= "DECLARE" <namedLocalParameter> | |

<optionalLocalValue> | |

<optionalLocalValue> ::= "" | |

| "=" <vectorConstant> | |

| "=" <scalarConstant> | |

<vectorConstant> ::= {" <vectorConstantList> "}" | |

| <namedVectorConstant> | |

<vectorConstantList> ::= <scalarConstant> | |

| <scalarConstant> "," <scalarConstant> | |

| <scalarConstant> "," <scalarConstant> "," | |

<scalarConstant> | |

| <scalarConstant> "," <scalarConstant> "," | |

<scalarConstant> "," <scalarConstant> | |

<scalarConstant> ::= <signedScalarConstant> | |

| <namedScalarConstant> | |

<signedScalarConstant> ::= <optionalSign> <floatConstant> | |

<namedScalarConstant> ::= <identifier> ((name of a scalar constant | |

in a DEFINE instruction)) | |

<namedVectorConstant> ::= <identifier> ((name of a vector constant | |

in a DEFINE instruction)) | |

<namedLocalParameter> ::= <identifier> ((name of a local parameter | |

in a DECLARE instruction)) | |

<negate> ::= "-" | "+" | "" | |

<optionalSign> ::= "-" | "+" | "" | |

<identifier> ::= see text below | |

<floatConstant> ::= see text below | |

The <identifier> rule matches a sequence of one or more letters ("A" | |

through "Z", "a" through "z", "_", and "$") and digits ("0" through "9); | |

the first character must be a letter. The underscore ("_") and dollar | |

sign ("$") count as a letters. Upper and lower case letters are different | |

(names are case-sensitive). | |

The <floatConstant> rule matches a floating-point constant consisting | |

of an integer part, a decimal point, a fraction part, an "e" or | |

"E", and an optionally signed integer exponent. The integer and | |

fraction parts both consist of a sequence of on or more digits ("0" | |

through "9"). Either the integer part or the fraction parts (not | |

both) may be missing; either the decimal point or the "e" (or "E") | |

and the exponent (not both) may be missing. | |

A fragment program fails to load if it contains more than the maximum | |

number of executable instructions. If ARB_fragment_program is supported, | |

this limit is the value of MAX_PROGRAM_INSTRUCTIONS_ARB for the | |

FRAGMENT_PROGRAM_ARB target. Otherwise, the limit is 1024. Executable | |

instructions are those matching the <instruction> rule in the grammar, and | |

do not include DEFINE or DECLARE instructions. | |

A fragment program fails to load if its total temporary and output | |

register count exceeds 64. Each fp32 temporary or output register used by | |

the program (R0-R31, o[COLR], and o[DEPR]) counts as two registers; each | |

fp16 temporary or output register used by the program (H0-H63 and o[COLH]) | |

count as a single register. | |

A fragment program fails to load if any instruction sources more than one | |

unique fragment attribute register. Instructions sourcing the same | |

attribute register multiple times are acceptable. | |

A fragment program fails to load if any instruction sources more than one | |

unique program parameter register. Instructions sourcing the same program | |

parameter multiple times are acceptable. | |

A fragment program fails to load if multiple texture lookup instructions | |

reference different targets for the same texture image unit. | |

A fragment program fails to load if it writes to both the o[COLR] and | |

o[COLH] output registers. | |

The error INVALID_OPERATION is generated by LoadProgramNV if a fragment | |

program fails to load because it is not syntactically correct or for one | |

of the semantic restrictions listed above. | |

The error INVALID_OPERATION is generated by LoadProgramNV if a program is | |

loaded for id when id is currently loaded with a program of a different | |

target. | |

A successfully loaded fragment program is parsed into a sequence of | |

instructions. Each instruction is identified by its tokenized name. The | |

operation of these instructions when executed is defined in Sections | |

3.11.4 and 3.11.5. | |

Section 3.11.4, Fragment Program Operation | |

There are forty-five fragment program instructions. Fragment program | |

instructions may have up to eight variants, including a suffix of "R", | |

"H", or "X" to specify arithmetic precision (section 3.11.4.2), a suffix | |

of "C" to allow an update of the condition code register (section | |

3.11.4.4), and a suffix of "_SAT" to clamp the result vector components to | |

the range [0,1] (section 3.11.4.4). For example, the sixteen forms of the | |

"ADD" instruction are "ADD", "ADDR", "ADDH", "ADDX", "ADDC", "ADDRC", | |

"ADDHC", "ADDXC", "ADD_SAT", "ADDR_SAT", "ADDH_SAT", "ADDX_SAT", | |

"ADDC_SAT", "ADDRC_SAT", "ADDHC_SAT", and "ADDXC_SAT". | |

Some mathematical instructions that support precision suffixes, typically | |

those that involve complicated floating-point computations, do not support | |

the "X" precision suffix. | |

The fragment program instructions and their respective input and output | |

parameters are summarized in Table X.4. | |

Instruction Inputs Output Description | |

----------------- ------ ------ -------------------------------- | |

ADD[RHX][C][_SAT] v,v v add | |

COS[RH ][C][_SAT] s ssss cosine | |

DDX[RH ][C][_SAT] v v derivative relative to x | |

DDY[RH ][C][_SAT] v v derivative relative to y | |

DP3[RHX][C][_SAT] v,v ssss 3-component dot product | |

DP4[RHX][C][_SAT] v,v ssss 4-component dot product | |

DST[RH ][C][_SAT] v,v v distance vector | |

EX2[RH ][C][_SAT] s ssss exponential base 2 | |

FLR[RHX][C][_SAT] v v floor | |

FRC[RHX][C][_SAT] v v fraction | |

KIL none none conditionally discard fragment | |

LG2[RH ][C][_SAT] s ssss logarithm base 2 | |

LIT[RH ][C][_SAT] v v compute light coefficients | |

LRP[RHX][C][_SAT] v,v,v v linear interpolation | |

MAD[RHX][C][_SAT] v,v,v v multiply and add | |

MAX[RHX][C][_SAT] v,v v maximum | |

MIN[RHX][C][_SAT] v,v v minimum | |

MOV[RHX][C][_SAT] v v move | |

MUL[RHX][C][_SAT] v,v v multiply | |

PK2H v ssss pack two 16-bit floats | |

PK2US v ssss pack two unsigned 16-bit scalars | |

PK4B v ssss pack four signed 8-bit scalars | |

PK4UB v ssss pack four unsigned 8-bit scalars | |

POW[RH ][C][_SAT] s,s ssss exponentiation (x^y) | |

RCP[RH ][C][_SAT] s ssss reciprocal | |

RFL[RH ][C][_SAT] v,v v reflection vector | |

RSQ[RH ][C][_SAT] s ssss reciprocal square root | |

SEQ[RHX][C][_SAT] v,v v set on equal | |

SFL[RHX][C][_SAT] v,v v set on false | |

SGE[RHX][C][_SAT] v,v v set on greater than or equal | |

SGT[RHX][C][_SAT] v,v v set on greater than | |

SIN[RH ][C][_SAT] s ssss sine | |

SLE[RHX][C][_SAT] v,v v set on less than or equal | |

SLT[RHX][C][_SAT] v,v v set on less than | |

SNE[RHX][C][_SAT] v,v v set on not equal | |

STR[RHX][C][_SAT] v,v v set on true | |

SUB[RHX][C][_SAT] v,v v subtract | |

TEX[C][_SAT] v v texture lookup | |

TXD[C][_SAT] v,v,v v texture lookup w/partials | |

TXP[C][_SAT] v v projective texture lookup | |

UP2H[C][_SAT] s v unpack two 16-bit floats | |

UP2US[C][_SAT] s v unpack two unsigned 16-bit scalars | |

UP4B[C][_SAT] s v unpack four signed 8-bit scalars | |

UP4UB[C][_SAT] s v unpack four unsigned 8-bit scalars | |

X2D[RH ][C][_SAT] v,v,v v 2D coordinate transformation | |

Table X.4: Summary of fragment program instructions. "[RHX]" indicates | |

an optional arithmetic precision suffix. "[C]" indicates an optional | |

condition code update suffix. "[_SAT]" indicates an optional clamp of | |

result vector components to [0,1]. "v" indicates a 4-component vector | |

input or output, "s" indicates a scalar input, and "ssss" indicates a | |

scalar output replicated across a 4-component vector. | |

Section 3.11.4.1: Fragment Program Storage Precision | |

Registers in fragment program are stored in two different representations: | |

16-bit floating-point (fp16) and 32-bit floating-point (fp32). There is | |

an additional 12-bit fixed-point representation (fx12) used only as an | |

internal representation for instructions with the "X" precision qualifier. | |

In the 32-bit float (fp32) representation, each component is represented | |

in floating-point with eight exponent and twenty-three mantissa bits, as | |

in the standard IEEE single-precision format. If S represents the sign (0 | |

or 1), E represents the exponent in the range [0,255], and M represents | |

the mantissa in the range [0,2^23-1], then a fp32 float is decoded as: | |

(-1)^S * 0.0, if E == 0, | |

(-1)^S * 2^(E-127) * (1 + M/2^23), if 0 < E < 255, | |

(-1)^S * INF, if E == 255 and M == 0, | |

NaN, if E == 255 and M != 0. | |

INF (Infinity) is a special representation indicating numerical overflow. | |

NaN (Not a Number) is a special representation indicating the result of | |

illegal arithmetic operations, such as computing the square root or | |

logarithm of a negative number. Note that all normal fp32 values, zero, | |

and INF have an associated sign. -0.0 and +0.0 are considered equivalent | |

for the purposes of comparisons. | |

This representation is identical to the IEEE single-precision | |

floating-point standard, except that no special representation is provided | |

for denorms -- numbers in the range (-2^-126, +2^-126). All such numbers | |

are flushed to zero. | |

In a 16-bit float (fp16) register, each component is represented | |

similarly, except with only five exponent and ten mantissa bits. If S | |

represents the sign (0 or 1), E represents the exponent in the range | |

[0,31], and M represents the mantissa in the range [0,2^10-1], then an | |

fp32 float is decoded as: | |

(-1)^S * 0.0, if E == 0 and M == 0, | |

(-1)^S * 2^-14 * M/2^10 if E == 0 and M != 0, | |

(-1)^S * 2^(E-15) * (1 + M/2^10), if 0 < E < 31, | |

(-1)^S * INF, if E == 31 and M == 0, or | |

NaN, if E == 31 and M != 0. | |

One important difference is that the fp16 representation, unlike fp32, | |

supports denorms to maximize the limited precision of the 16-bit floating | |

point encodings. | |

In the 12-bit fixed-point (fx12) format, numbers are represented as signed | |

12-bit two's complement integers with 10 fraction bits. The range of | |

representable values is [-2048/1024, +2047/1024]. | |

Section 3.11.4.2: Fragment Program Operation Precision | |

Fragment program instructions frequently perform mathematical operations. | |

Such operations may be performed at one of three different precisions. | |

Fragment programs can specify the precision of each instruction by using | |

the precision suffix. If an instruction has a suffix of "R", calculations | |

are carried out with 32-bit floating point operands and results. If an | |

instruction has a suffix of "H", calculations are carried out using 16-bit | |

floating point operands and results. If an instruction has a suffix of | |

"X", calculations are carried out using 12-bit fixed point operands and | |

results. For example, the instruction "MULR" performs a 32-bit | |

floating-point multiply, "MULH" performs a 16-bit floating-point multiply, | |

and "MULX" performs a 12-bit fixed-point multiply. If no precision suffix | |

is specified, calculations are carried out using the precision of the | |

temporary register receiving the result. | |

Fragment program instructions may source registers or constants whose | |

precisions differ from the precision specified with the instruction. | |

Instructions may also generate intermediate results with a different | |

precision than that of the destination register. In these cases, the | |

values sourced are converted to the precision specified by the | |

instruction. | |

When converting to fx12 format, -INF and any values less than -2048/1024 | |

become -2048/1024. +INF, and any values greater than +2047/1024 become | |

+2047/1024. NaN becomes 0. | |

When converting to fp16 format, any values less than or equal to -2^16 are | |

converted to -INF. Any values greater than or equal to +2^16 are | |

converted to +INF. -INF, +INF, NaN, -0.0, and +0.0 are unchanged. Any | |

other values that are not exactly representable in fp16 format are | |

converted to one of the two nearest representable values. | |

When converting to fp32 format, any values less than or equal to -2^128 | |

are converted to -INF. Any values greater than or equal to +2^128 are | |

converted to +INF. -INF, +INF, NaN, -0.0, and +0.0 are unchanged. Any | |

other values that are not exactly representable in fp32 format are | |

converted to one of the two nearest representable values. | |

Fragment program instructions using the fragment attribute registers | |

f[FOGC] or f[TEX0] through f[TEX7] will be carried out at full fp32 | |

precision, regardless of the precision specified by the instruction. | |

Section 3.11.4.3: Fragment Program Operands | |

Except for KIL, fragment program instructions operate on either vector or | |

scalar operands, indicated in the grammar (see section 3.11.3) by the | |

rules <vectorSrc> and <scalarSrc> respectively. | |

The basic set of scalar operands is defined by the grammar rule | |

<baseScalarSrc>. Scalar operands can be scalar constants (embedded or | |

named), or single components of vector constants, local parameters, or | |

registers allowed by the <srcRegister> rule. A vector component is | |

selected by the <scalarSuffix> rule, where the characters "x", "y", "z", | |

and "w" select the x, y, z, and w components, respectively, of the vector. | |

The basic set of vector operands is defined by the grammar rule | |

<baseVectorSrc>. Vector operands can include vector constants, local | |

parameters, or registers allowed by the <srcRegister> rule. | |

Basic vector operands can be swizzled according to the <swizzleSuffix> | |

rule. In its most general form, the <swizzleSuffix> rule matches the | |

pattern ".????" where each question mark is one of "x", "y", "z", or "w". | |

For such patterns, the x, y, z, and w components of the operand are taken | |

from the vector components named by the first, second, third, and fourth | |

character of the pattern, respectively. For example, if the swizzle | |

suffix is ".yzzx" and the specified source contains {2,8,9,0}, the | |

swizzled operand used by the instruction is {8,9,9,2}. If the | |

<swizzleSuffix> rule matches "", it is treated as though it were ".xyzw". | |

Operands can optionally be negated according to the <negate> rule in | |

<baseScalarSrc> or <baseVectorSrc>. If the <negate> matches "-", each | |

value is negated. | |

The absolute value of operands can be taken if the <vectorSrc> or | |

<scalarSrc> rules match <absScalarSrc> or <absVectorSrc>. In this case, | |

the absolute value of each component is taken. In addition, if the | |

<negate> rule in <absScalarSrc> or <absVectorSrc> matches "-", the result | |

is then negated. | |

Instructions requiring vector operands can also use scalar operands in the | |

case where the <vectorSrc> rule matches <scalarSrc>. In such cases, a | |

4-component vector is produced by replicating the scalar. | |

After operands are loaded, they are converted to a data type corresponding | |

to the operation precision specified in the fragment program instruction. | |

The following pseudo-code spells out the operand generation process. | |

"SrcT" and "InstT" refer to the data types of the specified register or | |

constant and the instruction, respectively. "VecSrcT" and "VecInstT" | |

refer to 4-component vectors of the corresponding type. "absolute" is | |

TRUE if the operand matches the <absScalarSrc> or <absVectorSrc> rules, | |

and FALSE otherwise. "negateBase" is TRUE if the <negate> rule in | |

<baseScalarSrc> or <baseVectorSrc> matches "-" and FALSE otherwise. | |

"negateAbs" is TRUE if the <negate> rule in <absScalarSrc> or | |

<absVectorSrc> matches "-" and FALSE otherwise. The ".c***", ".*c**", | |

".**c*", ".***c" modifiers refer to the x, y, z, and w components obtained | |

by the swizzle operation. TypeConvert() is assumed to convert a scalar of | |

type SrcT to a scalar of type InstT using the type conversion process | |

specified above. | |

VecInstT VectorLoad(VecSrcT source) | |

{ | |

VecSrcT srcVal; | |

VecInstT convertedVal; | |

srcVal.x = source.c***; | |

srcVal.y = source.*c**; | |

srcVal.z = source.**c*; | |

srcVal.w = source.***c; | |

if (negateBase) { | |

srcVal.x = -srcVal.x; | |

srcVal.y = -srcVal.y; | |

srcVal.z = -srcVal.z; | |

srcVal.w = -srcVal.w; | |

} | |

if (absolute) { | |

srcVal.x = abs(srcVal.x); | |

srcVal.y = abs(srcVal.y); | |

srcVal.z = abs(srcVal.z); | |

srcVal.w = abs(srcVal.w); | |

} | |

if (negateAbs) { | |

srcVal.x = -srcVal.x; | |

srcVal.y = -srcVal.y; | |

srcVal.z = -srcVal.z; | |

srcVal.w = -srcVal.w; | |

} | |

convertedVal.x = TypeConvert(srcVal.x); | |

convertedVal.y = TypeConvert(srcVal.y); | |

convertedVal.z = TypeConvert(srcVal.z); | |

convertedVal.w = TypeConvert(srcVal.w); | |

return convertedVal; | |

} | |

InstT ScalarLoad(VecSrcT source) | |

{ | |

SrcT srcVal; | |

InstT convertedVal; | |

srcVal = source.c***; | |

if (negateBase) { | |

srcVal = -srcVal; | |

} | |

if (absolute) { | |

srcVal = abs(srcVal); | |

} | |

if (negateAbs) { | |

srcVal = -srcVal; | |

} | |

convertedVal = TypeConvert(srcVal); | |

return convertedVal; | |

} | |

Section 3.11.4.4, Fragment Program Destination Register Update | |

Each fragment program instruction, except for KIL, writes a 4-component | |

result vector to a single temporary or output register. | |

The four components of the result vector are first optionally clamped to | |

the range [0,1]. The components will be clamped if and only if the result | |

clamp suffix "_SAT" is present in the instruction name. The instruction | |

"ADD_SAT" will clamp the results to [0,1]; the otherwise equivalent | |

instruction "ADD" will not. | |

Since the instruction may be carried out at a different precision than the | |

destination register, the components of the results vector are then | |

converted to the data type corresponding to destination register. | |

Writes to individual components of the temporary register are controlled | |

by two sets of enables: individual component write masks specified as part | |

of the instruction and the optional condition code mask. | |

The component write mask is specified by the <optionalWriteMask> rule | |

found in the <maskedDstReg> rule. If the optional mask is "", all | |

components are enabled. Otherwise, the optional mask names the individual | |

components to enable. The characters "x", "y", "z", and "w" match the x, | |

y, z, and w components respectively. For example, an optional mask of | |

".xzw" indicates that the x, z, and w components should be enabled for | |

writing but the y component should not. The grammar requires that the | |

destination register mask components must be listed in "xyzw" order. | |

The optional condition code mask is specified by the <optionalCCMask> rule | |

found in the <maskedDstReg> rule. If <optionalCCMask> matches "", all | |

components are enabled. Otherwise, the condition code register is loaded | |

and swizzled according to the swizzling specified by <swizzleSuffix>. | |

Each component of the swizzled condition code is tested according to the | |

rule given by <ccMaskRule>. <ccMaskRule> may have the values "EQ", "NE", | |

"LT", "GE", LE", or "GT", which mean to enable writes if the corresponding | |

condition code field evaluates to equal, not equal, less than, greater | |

than or equal, less than or equal, or greater than, respectively. | |

Comparisons involving condition codes of "UN" (unordered) evaluate to true | |

for "NE" and false otherwise. For example, if the condition code is | |

(GT,LT,EQ,GT) and the condition code mask is "(NE.zyxw)", the swizzle | |

operation will load (EQ,LT,GT,GT) and the mask will thus will enable | |

writes on the y, z, and w components. In addition, "TR" always enables | |

writes and "FL" always disables writes, regardless of the condition code. | |

Each component of the destination register is updated with the result of | |

the fragment program if and only if the component is enabled for writes by | |

both the component write mask and the optional condition code mask. | |

Otherwise, the component of the destination register remains unchanged. | |

A fragment program instruction can also optionally update the condition | |

code register. The condition code is updated if the condition code | |

register update suffix "C" is present in the instruction name. The | |

instruction "ADDC" will update the condition code; the otherwise | |

equivalent instruction "ADD" will not. If condition code updates are | |

enabled, each component of the destination register enabled for writes is | |

compared to zero. The corresponding component of the condition code is | |

set to "LT", "EQ", or "GT", if the written component is less than, equal | |

to, or greater than zero, respectively. Condition code components are set | |

to "UN" if the written component is NaN. Note that values of -0.0 and | |

+0.0 both evaluate to "EQ". If a component of the destination register is | |

not enabled for writes, the corresponding condition code component is | |

unchanged. | |

In the following example code, | |

# R1=(-2, 0, 2, NaN) R0 CC | |

MOVC R0, R1; # ( -2, 0, 2, NaN) (LT,EQ,GT,UN) | |

MOVC R0.xyz, R1.yzwx; # ( 0, 2, NaN, NaN) (EQ,GT,UN,UN) | |

MOVC R0 (NE), R1.zywx; # ( 0, 0, NaN, -2) (EQ,EQ,UN,LT) | |

the first instruction writes (-2,0,2,NaN) to R0 and updates the condition | |

code to (LT,EQ,GT,UN). The second instruction, only the "x", "y", and "z" | |

components of R0 and the condition code are updated, so R0 ends up with | |

(0,2,NaN,NaN) and the condition code ends up with (EQ,GT,UN,UN). In the | |

third instruction, the condition code mask disables writes to the x | |

component (its condition code field is "EQ"), so R0 ends up with | |

(0,0,NaN,-2) and the condition code ends up with (EQ,EQ,UN,LT). | |

The following pseudocode illustrates the process of writing a result | |

vector to the destination register. In the example, "ccMaskRule" refers | |

to the condition code mask rule given by <ccMaskRule> (or "" if no rule is | |

specified), "instrmask" refers to the component write mask given by the | |

<optionalWriteMask> rule, "updatecc" is TRUE if condition code updates are | |

enabled, and "clamp01" is TRUE if [0,1] result clamping is enabled. | |

"destination" and "cc" refer to the register selected by <dstRegister> and | |

the condition code, respectively. | |

boolean TestCC(CondCode field) { | |

switch (ccMaskRule) { | |

case "EQ": return (field == "EQ"); | |

case "NE": return (field != "EQ"); | |

case "LT": return (field == "LT"); | |

case "GE": return (field == "GT" || field == "EQ"); | |

case "LE": return (field == "LT" || field == "EQ"); | |

case "GT": return (field == "GT"); | |

case "TR": return TRUE; | |

case "FL": return FALSE; | |

case "": return TRUE; | |

} | |

enum GenerateCC(DstT value) { | |

if (value == NaN) { | |

return UN; | |

} else if (value < 0) { | |

return LT; | |

} else if (value == 0) { | |

return EQ; | |

} else { | |

return GT; | |

} | |

} | |

void UpdateDestination(VecDstT destination, VecInstT result) | |

{ | |

// Load the original destination register and condition code. | |

VecDstT resultDst; | |

VecDstT merged; | |

VecCC mergedCC; | |

// Clamp the result vector components to [0,1], if requested. | |

if (clamp01) { | |

if (result.x < 0) result.x = 0; | |

else if (result.x > 1) result.x = 1; | |

if (result.y < 0) result.y = 0; | |

else if (result.y > 1) result.y = 1; | |

if (result.z < 0) result.z = 0; | |

else if (result.z > 1) result.z = 1; | |

if (result.w < 0) result.w = 0; | |

else if (result.w > 1) result.w = 1; | |

} | |

// Convert the result to the type of the destination register. | |

resultDst.x = TypeConvert(result.x); | |

resultDst.y = TypeConvert(result.y); | |

resultDst.z = TypeConvert(result.z); | |

resultDst.w = TypeConvert(result.w); | |

// Merge the converted result into the destination register, under | |

// control of the compile- and run-time write masks. | |

merged = destination; | |

mergedCC = cc; | |

if (instrMask.x && TestCC(cc.c***)) { | |

merged.x = result.x; | |

if (updatecc) mergedCC.x = GenerateCC(result.x); | |

} | |

if (instrMask.y && TestCC(cc.*c**)) { | |

merged.y = result.y; | |

if (updatecc) mergedCC.y = GenerateCC(result.y); | |

} | |

if (instrMask.z && TestCC(cc.**c*)) { | |

merged.z = result.z; | |

if (updatecc) mergedCC.z = GenerateCC(result.z); | |

} | |

if (instrMask.w && TestCC(cc.***c)) { | |

merged.w = result.w; | |

if (updatecc) mergedCC.w = GenerateCC(result.w); | |

} | |

// Write out the new destination register and result code. | |

destination = merged; | |

cc = mergedCC; | |

} | |

Section 3.11.5, Fragment Program Instruction Set | |

The following sections describe the instruction set available to fragment | |

programs. | |

Section 3.11.5.1, ADD: Add | |

The ADD instruction performs a component-wise add of the two operands to | |

yield a result vector. | |

tmp0 = VectorLoad(op0); | |

tmp1 = VectorLoad(op1); | |

result.x = tmp0.x + tmp1.x; | |

result.y = tmp0.y + tmp1.y; | |

result.z = tmp0.z + tmp1.z; | |

result.w = tmp0.w + tmp1.w; | |

The following special-case rules apply to addition: | |

1. "A+B" is always equivalent to "B+A". | |

2. NaN + <x> = NaN, for all <x>. | |

3. +INF + <x> = +INF, for all <x> except NaN and -INF. | |

4. -INF + <x> = -INF, for all <x> except NaN and +INF. | |

5. +INF + -INF = NaN. | |

6. -0.0 + <x> = <x>, for all <x>. | |

7. +0.0 + <x> = <x>, for all <x> except -0.0. | |

Section 3.11.5.2, COS: Cosine | |

The COS instruction approximates the cosine of the angle specified by the | |

scalar operand and replicates the approximation to all four components of | |

the result vector. The angle is specified in radians and does not have to | |

be in the range [0,2*PI]. | |

tmp = ScalarLoad(op0); | |

result.x = ApproxCosine(tmp); | |

result.y = ApproxCosine(tmp); | |

result.z = ApproxCosine(tmp); | |

result.w = ApproxCosine(tmp); | |

The approximation function ApproxCosine is accurate to at least 22 bits | |

with an angle in the range [0,2*PI]. | |

| ApproxCosine(x) - cos(x) | < 1.0 / 2^22, if 0.0 <= x < 2.0 * PI. | |

The error in the approximation will typically increase with the absolute | |

value of the angle when the angle falls outside the range [0,2*PI]. | |

The following special-case rules apply to cosine approximation: | |

1. ApproxCosine(NaN) = NaN. | |

2. ApproxCosine(+/-INF) = NaN. | |

3. ApproxCosine(+/-0.0) = +1.0. | |

Section 3.11.5.3, DDX: Derivative Relative to X | |

The DDX instruction computes approximate partial derivatives of the four | |

components of the single operand with respect to the X window coordinate | |

to yield a result vector. The partial derivative is evaluated at the | |

center of the pixel. | |

f = VectorLoad(op0); | |

result = ComputePartialX(f); | |

Note that the partial derivates obtained by this instruction are | |

approximate, and derivative-of-derivate instruction sequences may not | |

yield accurate second derivatives. | |

For components with partial derivatives that overflow (including +/-INF | |

inputs), the resulting partials may be encoded as large floating-point | |

numbers instead of +/-INF. | |

Section 3.11.5.4, DDY: Derivative Relative to Y | |

The DDY instruction computes approximate partial derivatives of the four | |

components of the single operand with respect to the Y window coordinate | |

to yield a result vector. The partial derivative is evaluated at the | |

center of the pixel. | |

f = VectorLoad(op0); | |

result = ComputePartialY(f); | |

Note that the partial derivates obtained by this instruction are | |

approximate, and derivative-of-derivate instruction sequences may not | |

yield accurate second derivatives. | |

For components with partial derivatives that overflow (including +/-INF | |

inputs), the resulting partials may be encoded as large floating-point | |

numbers instead of +/-INF. | |

Section 3.11.5.5, DP3: 3-Component Dot Product | |

The DP3 instruction computes a three component dot product of the two | |

operands (using the x, y, and z components) and replicates the dot product | |

to all four components of the result vector. | |

tmp0 = VectorLoad(op0); | |

tmp1 = VectorLoad(op1): | |

result.x = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + | |

(tmp0.z * tmp2.z); | |

result.y = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + | |

(tmp0.z * tmp2.z); | |

result.z = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + | |

(tmp0.z * tmp2.z); | |

result.w = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + | |

(tmp0.z * tmp2.z); | |

Section 3.11.5.6, DP4: 4-Component Dot Product | |

The DP4 instruction computes a four component dot product of the two | |

operands and replicates the dot product to all four components of the | |

result vector. | |

tmp0 = VectorLoad(op0); | |

tmp1 = VectorLoad(op1): | |

result.x = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + | |

(tmp0.z * tmp2.z) + (tmp0.w * tmp1.w); | |

result.y = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + | |

(tmp0.z * tmp2.z) + (tmp0.w * tmp1.w); | |

result.z = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + | |

(tmp0.z * tmp2.z) + (tmp0.w * tmp1.w); | |

result.w = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + | |

(tmp0.z * tmp2.z) + (tmp0.w * tmp1.w); | |

Section 3.11.5.7, DST: Distance Vector | |

The DST instruction computes a distance vector from two specially- | |

formatted operands. The first operand should be of the form [NA, d^2, | |

d^2, NA] and the second operand should be of the form [NA, 1/d, NA, 1/d], | |

where NA values are not relevant to the calculation and d is a vector | |

length. If both vectors satisfy these conditions, the result vector will | |

be of the form [1.0, d, d^2, 1/d]. | |

The exact behavior is specified in the following pseudo-code: | |

tmp0 = VectorLoad(op0); | |

tmp1 = VectorLoad(op1); | |

result.x = 1.0; | |

result.y = tmp0.y * tmp1.y; | |

result.z = tmp0.z; | |

result.w = tmp1.w; | |

Given an arbitrary vector, d^2 can be obtained using the DOT3 instruction | |

(using the same vector for both operands) and 1/d can be obtained from d^2 | |

using the RSQ instruction. | |

This distance vector is useful for per-fragment light attenuation | |

calculations: a DOT3 operation involving the distance vector and an | |

attenuation constants vector will yield the attenuation factor. | |

Section 3.11.5.8, EX2: Exponential Base 2 | |

The EX2 instruction approximates 2 raised to the power of the scalar | |

operand and replicates it to all four components of the result | |

vector. | |

tmp = ScalarLoad(op0); | |

result.x = Approx2ToX(tmp); | |

result.y = Approx2ToX(tmp); | |

result.z = Approx2ToX(tmp); | |

result.w = Approx2ToX(tmp); | |

The approximation function is accurate to at least 22 bits: | |

| Approx2ToX(x) - 2^x | < 1.0 / 2^22, if 0.0 <= x < 1.0, | |

and, in general, | |

| Approx2ToX(x) - 2^x | < (1.0 / 2^22) * (2^floor(x)). | |

The following special-case rules apply to exponential approximation: | |

1. Approx2ToX(NaN) = NaN. | |

2. Approx2ToX(-INF) = +0.0. | |

3. Approx2ToX(+INF) = +INF. | |

4. Approx2ToX(+/-0.0) = +1.0. | |

Section 3.11.5.9, FLR: Floor | |

The FLR instruction performs a component-wise floor operation on the | |

operand to generate a result vector. The floor of a value is defined as | |

the largest integer less than or equal to the value. The floor of 2.3 is | |

2.0; the floor of -3.6 is -4.0. | |

tmp = VectorLoad(op0); | |

result.x = floor(tmp.x); | |

result.y = floor(tmp.y); | |

result.z = floor(tmp.z); | |

result.w = floor(tmp.w); | |

The following special-case rules apply to floor computation: | |

1. floor(NaN) = NaN. | |

2. floor(<x>) = <x>, for -0.0, +0.0, -INF, and +INF. In all cases, the | |

sign of the result is equal to the sign of the operand. | |

Section 3.11.5.10, FRC: Fraction | |

The FRC instruction extracts the fractional portion of each component of | |

the operand to generate a result vector. The fractional portion of a | |

component is defined as the result after subtracting off the floor of the | |

component (see FLR), and is always in the range [0.00, 1.00). | |

For negative values, the fractional portion is NOT the number written to | |

the right of the decimal point -- the fractional portion of -1.7 is not | |

0.7 -- it is 0.3. 0.3 is produced by subtracting the floor of -1.7 (-2.0) | |

from -1.7. | |

tmp = VectorLoad(op0); | |

result.x = tmp.x - floor(tmp.x); | |

result.y = tmp.y - floor(tmp.y); | |

result.z = tmp.z - floor(tmp.z); | |

result.w = tmp.w - floor(tmp.w); | |

The following special-case rules, which can be derived from the rules for | |

FLR and ADD apply to fraction computation: | |

1. fraction(NaN) = NaN. | |

2. fraction(+/-INF) = NaN. | |

3. fraction(+/-0.0) = +0.0. | |

Section 3.11.5.11, KIL: Conditionally Discard Fragment | |

The KIL instruction is unlike any other instruction in the instruction | |

set. This instruction evaluates components of a swizzled condition code | |

using a test expression identical to that used to evaluate condition code | |

write masks (Section 3.11.4.4). If any condition code component evaluates | |

to TRUE, the fragment is discarded. Otherwise, the instruction has no | |

effect. The condition code components are specified, swizzled, and | |

evaluated in the same manner as the condition code write mask. | |

if (TestCC(rc.c***) || TestCC(rc.*c**) || | |

TestCC(rc.**c*) || TestCC(rc.***c)) { | |

// Discard the fragment. | |

} else { | |

// Do nothing. | |

} | |

If the fragment is discarded, it is treated as though it were not produced | |

by rasterization. In particular, none of the per-fragment operations | |

(such as stencil tests, blends, stencil, depth, or color buffer writes) | |

are performed on the fragment. | |

Section 3.11.5.12, LG2: Logarithm Base 2 | |

The LG2 instruction approximates the base 2 logarithm of the scalar | |

operand and replicates it to all four components of the result vector. | |

tmp = ScalarLoad(op0); | |

result.x = ApproxLog2(tmp); | |

result.y = ApproxLog2(tmp); | |

result.z = ApproxLog2(tmp); | |

result.w = ApproxLog2(tmp); | |

The approximation function is accurate to at least 22 bits: | |

| ApproxLog2(x) - log_2(x) | < 1.0 / 2^22. | |

Note that for large values of x, there are not enough bits in the | |

floating-point storage format to represent a result that precisely. | |

The following special-case rules apply to logarithm approximation: | |

1. ApproxLog2(NaN) = NaN. | |

2. ApproxLog2(+INF) = +INF. | |

3. ApproxLog2(+/-0.0) = -INF. | |

4. ApproxLog2(x) = NaN, -INF < x < -0.0. | |

5. ApproxLog2(-INF) = NaN. | |

Section 3.11.5.13, LIT: Compute Light Coefficients | |

The LIT instruction accelerates per-fragment lighting by computing | |

lighting coefficients for ambient, diffuse, and specular light | |

contributions. The "x" component of the operand is assumed to hold a | |

diffuse dot product (n dot VP_pli, as in the vertex lighting equations in | |

Section 2.13.1). The "y" component of the operand is assumed to hold a | |

specular dot product (n dot h_i). The "w" component of the operand is | |

assumed to hold the specular exponent of the material (s_rm). | |

The "x" component of the result vector receives the value that should be | |

multiplied by the ambient light/material product (always 1.0). The "y" | |

component of the result vector receives the value that should be | |

multiplied by the diffuse light/material product (n dot VP_pli). The "z" | |

component of the result vector receives the value that should be | |

multiplied by the specular light/material product (f_i * (n dot h_i) ^ | |

s_rm). The "w" component of the result is the constant 1.0. | |

Negative diffuse and specular dot products are clamped to 0.0, as is done | |

in the standard per-vertex lighting operations. In addition, if the | |

diffuse dot product is zero or negative, the specular coefficient is | |

forced to zero. | |

tmp = VectorLoad(op0); | |

if (t.x < 0) t.x = 0; | |

if (t.y < 0) t.y = 0; | |

result.x = 1.0; | |

result.y = t.x; | |

result.z = (t.x > 0) ? ApproxPower(t.y, t.w) : 0.0; | |

result.w = 1.0; | |

The exponentiation approximation used to compute result.z are identical to | |

that used in the POW instruction, including errors and the processing of | |

any special cases. | |

Section 3.11.5.14, LRP: Linear Interpolation | |

The LRP instruction performs a component-wise linear interpolation to | |

yield a result vector. It interpolates between the components of the | |

second and third operands, using the first operand as a weight. | |

tmp0 = VectorLoad(op0); | |

tmp1 = VectorLoad(op1); | |

tmp2 = VectorLoad(op2); | |

result.x = tmp0.x * tmp1.x + (1 - tmp0.x) * tmp2.x; | |

result.y = tmp0.y * tmp1.y + (1 - tmp0.y) * tmp2.y; | |

result.z = tmp0.z * tmp1.z + (1 - tmp0.z) * tmp2.z; | |

result.w = tmp0.w * tmp1.w + (1 - tmp0.w) * tmp2.w; | |

Section 3.11.5.15, MAD: Multiply and Add | |

The MAD instruction performs a component-wise multiply of the first two | |

operands, and then does a component-wise add of the product to the third | |

operand to yield a result vector. | |

tmp0 = VectorLoad(op0); | |

tmp1 = VectorLoad(op1); | |

tmp2 = VectorLoad(op2); | |

result.x = tmp0.x * tmp1.x + tmp2.x; | |

result.y = tmp0.y * tmp1.y + tmp2.y; | |

result.z = tmp0.z * tmp1.z + tmp2.z; | |

result.w = tmp0.w * tmp1.w + tmp2.w; | |

Section 3.11.5.16, MAX: maximum | |

The MAX instruction computes component-wise maximums of the values in the | |

two operands to yield a result vector. | |

tmp0 = VectorLoad(op0); | |

tmp1 = VectorLoad(op1); | |

result.x = max(tmp0.x, tmp1.x); | |

result.y = max(tmp0.y, tmp1.y); | |

result.z = max(tmp0.z, tmp1.z); | |

result.w = max(tmp0.w, tmp1.w); | |

The following special cases apply to the maximum operation: | |

1. max(A,B) is always equivalent to max(B,A). | |

2. max(NaN, <x>) == NaN, for all <x>. | |

Section 3.11.5.17, MIN: minimum | |

The MIN instruction computes component-wise minimums of the values in the | |

two operands to yield a result vector. | |

tmp0 = VectorLoad(op0); | |

tmp1 = VectorLoad(op1); | |

result.x = min(tmp0.x, tmp1.x); | |

result.y = min(tmp0.y, tmp1.y); | |

result.z = min(tmp0.z, tmp1.z); | |

result.w = min(tmp0.w, tmp1.w); | |

The following special cases apply to the minimum operation: | |

1. min(A,B) is always equivalent to min(B,A). | |

2. min(NaN, <x>) == NaN, for all <x>. | |

Section 3.11.5.18, MOV: Move | |

The MOV instruction copies the value of the operand to yield a result | |

vector. | |

result = VectorLoad(op0); | |

Section 3.11.5.19, MUL: Multiply | |

The MUL instruction performs a component-wise multiply of the two operands | |

to yield a result vector. | |

tmp0 = VectorLoad(op0); | |

tmp1 = VectorLoad(op1); | |

result.x = tmp0.x * tmp1.x; | |

result.y = tmp0.y * tmp1.y; | |

result.z = tmp0.z * tmp1.z; | |

result.w = tmp0.w * tmp1.w; | |

The following special-case rules apply to multiplication: | |

1. "A*B" is always equivalent to "B*A". | |

2. NaN * <x> = NaN, for all <x>. | |

3. +/-0.0 * +/-INF = NaN. | |

4. +/-0.0 * <x> = +/-0.0, for all <x> except -INF, +INF, and NaN. The | |

sign of the result is positive if the signs of the two operands match | |

and negative otherwise. | |

5. +/-INF * <x> = +/-INF, for all <x> except -0.0, +0.0, and NaN. The | |

sign of the result is positive if the signs of the two operands match | |

and negative otherwise. | |

6. +1.0 * <x> = <x>, for all <x>. | |

Section 3.11.5.20, PK2H: Pack Two 16-bit Floats | |

The PK2H instruction converts the "x" and "y" components of the single | |

operand into 16-bit floating-point format, packs the bit representation of | |

these two floats into a 32-bit value, and replicates that value to all | |

four components of the result vector. The PK2H instruction can be | |

reversed by the UP2H instruction below. | |

tmp0 = VectorLoad(op0); | |

/* result obtained by combining raw bits of tmp0.x, tmp0.y */ | |

result.x = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16); | |

result.y = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16); | |

result.z = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16); | |

result.w = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16); | |

The result must be written to a register with 32-bit components (an "R" | |

register, o[COLR], or o[DEPR]). A fragment program will fail to load if | |

any other register type is specified. | |

Section 3.11.5.21, PK2US: Pack Two Unsigned 16-bit Scalars | |

The PK2US instruction converts the "x" and "y" components of the single | |

operand into a packed pair of 16-bit unsigned scalars. The scalars are | |

represented in a bit pattern where all '0' bits corresponds to 0.0 and all | |

'1' bits corresponds to 1.0. The bit representations of the two converted | |

components are packed into a 32-bit value, and that value is replicated to | |

all four components of the result vector. The PK2US instruction can be | |

reversed by the UP2US instruction below. | |

tmp0 = VectorLoad(op0); | |

if (tmp0.x < 0.0) tmp0.x = 0.0; | |

if (tmp0.x > 1.0) tmp0.x = 1.0; | |

if (tmp0.y < 0.0) tmp0.y = 0.0; | |

if (tmp0.y > 1.0) tmp0.y = 1.0; | |

us.x = round(65535.0 * tmp0.x); /* us is a ushort vector */ | |

us.y = round(65535.0 * tmp0.y); | |

/* result obtained by combining raw bits of us. */ | |

result.x = ((us.x) | (us.y << 16)); | |

result.y = ((us.x) | (us.y << 16)); | |

result.z = ((us.x) | (us.y << 16)); | |

result.w = ((us.x) | (us.y << 16)); | |

The result must be written to a register with 32-bit components (an "R" | |

register, o[COLR], or o[DEPR]). A fragment program will fail to load if | |

any other register type is specified. | |

Section 3.11.5.22, PK4B: Pack Four Signed 8-bit Scalars | |

The PK4B instruction converts the four components of the single operand | |

into 8-bit signed quantities. The signed quantities are represented in a | |

bit pattern where all '0' bits corresponds to -128/127 and all '1' bits | |

corresponds to +127/127. The bit representations of the four converted | |

components are packed into a 32-bit value, and that value is replicated to | |

all four components of the result vector. The PK4B instruction can be | |

reversed by the UP4B instruction below. | |

tmp0 = VectorLoad(op0); | |

if (tmp0.x < -128/127) tmp0.x = -128/127; | |

if (tmp0.y < -128/127) tmp0.y = -128/127; | |

if (tmp0.z < -128/127) tmp0.z = -128/127; | |

if (tmp0.w < -128/127) tmp0.w = -128/127; | |

if (tmp0.x > +127/127) tmp0.x = +127/127; | |

if (tmp0.y > +127/127) tmp0.y = +127/127; | |

if (tmp0.z > +127/127) tmp0.z = +127/127; | |

if (tmp0.w > +127/127) tmp0.w = +127/127; | |

ub.x = round(127.0 * tmp0.x + 128.0); /* ub is a ubyte vector */ | |

ub.y = round(127.0 * tmp0.y + 128.0); | |

ub.z = round(127.0 * tmp0.z + 128.0); | |

ub.w = round(127.0 * tmp0.w + 128.0); | |

/* result obtained by combining raw bits of ub. */ | |

result.x = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24)); | |

result.y = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24)); | |

result.z = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24)); | |

result.w = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24)); | |

The result must be written to a register with 32-bit components (an "R" | |

register, o[COLR], or o[DEPR]). A fragment program will fail to load if | |

any other register type is specified. | |

Section 3.11.5.23, PK4UB: Pack Four Unsigned 8-bit Scalars | |

The PK4UB instruction converts the four components of the single operand | |

into a packed grouping of 8-bit unsigned scalars. The scalars are | |

represented in a bit pattern where all '0' bits corresponds to 0.0 and all | |

'1' bits corresponds to 1.0. The bit representations of the four | |

converted components are packed into a 32-bit value, and that value is | |

replicated to all four components of the result vector. The PK4UB | |

instruction can be reversed by the UP4UB instruction below. | |

tmp0 = VectorLoad(op0); | |

if (tmp0.x < 0.0) tmp0.x = 0.0; | |

if (tmp0.x > 1.0) tmp0.x = 1.0; | |

if (tmp0.y < 0.0) tmp0.y = 0.0; | |

if (tmp0.y > 1.0) tmp0.y = 1.0; | |

if (tmp0.z < 0.0) tmp0.z = 0.0; | |

if (tmp0.z > 1.0) tmp0.z = 1.0; | |

if (tmp0.w < 0.0) tmp0.w = 0.0; | |

if (tmp0.w > 1.0) tmp0.w = 1.0; | |

ub.x = round(255.0 * tmp0.x); /* ub is a ubyte vector */ | |

ub.y = round(255.0 * tmp0.y); | |

ub.z = round(255.0 * tmp0.z); | |

ub.w = round(255.0 * tmp0.w); | |

/* result obtained by combining raw bits of ub. */ | |

result.x = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24)); | |

result.y = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24)); | |

result.z = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24)); | |

result.w = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24)); | |

The result must be written to a register with 32-bit components (an "R" | |

register, o[COLR], or o[DEPR]). A fragment program will fail to load if | |

any other register type is specified. | |

Section 3.11.5.24, POW: Exponentiation | |

The POW instruction approximates the value of the first scalar operand | |

raised to the power of the second scalar operand and replicates it to all | |

four components of the result vector. | |

tmp0 = ScalarLoad(op0); | |

tmp1 = ScalarLoad(op1); | |

result.x = ApproxPower(tmp0, tmp1); | |

result.y = ApproxPower(tmp0, tmp1); | |

result.z = ApproxPower(tmp0, tmp1); | |

result.w = ApproxPower(tmp0, tmp1); | |

The exponentiation approximation function is defined in terms of the base | |

2 exponentiation and logarithm approximation operations in the EX2 and LG2 | |

instructions, including errors and the processing of any special cases. | |

In particular, | |

ApproxPower(a,b) = ApproxExp2(b * ApproxLog2(a)). | |

The following special-case rules, which can be derived from the rules in | |

the LG2, MUL, and EX2 instructions, apply to exponentiation: | |

1. ApproxPower(<x>, <y>) = NaN, if x < -0.0, | |

2. ApproxPower(<x>, <y>) = NaN, if x or y is NaN. | |

3. ApproxPower(+/-0.0, +/-0.0) = NaN. | |

4. ApproxPower(+INF, +/-0.0) = NaN. | |

5. ApproxPower(+1.0, +/-INF) = NaN. | |

6. ApproxPower(+/-0.0, <x>) = +0.0, if x > +0.0. | |

7. ApproxPower(+/-0.0, <x>) = +INF, if x < -0.0. | |

8. ApproxPower(+1.0, <x>) = +1.0, if -INF < x < +INF. | |

9. ApproxPower(+INF, <x>) = +INF, if x > +0.0. | |

10. ApproxPower(+INF, <x>) = +INF, if x < -0.0. | |

11. ApproxPower(<x>, +/-0.0) = +1.0, if +0.0 < x < +INF. | |

12. ApproxPower(<x>, +1.0) ~= <x>, if x >= +0.0. | |

13. ApproxPower(<x>, +INF) = +0.0, if -0.0 <= x < +1.0, | |

+INF, if x > +1.0, | |

14. ApproxPower(<x>, -INF) = +INF, if -0.0 <= x < +1.0, | |

+0.0, if x > +1.0, | |

Note that 0^0 is defined here as NaN, since ApproxLog2(0) = -INF, and | |

0*(-INF) = NaN. In many other applications, including the standard C | |

pow() function, 0^0 is defined as 1.0. This behavior can be emulated | |

using additional instructions in much that same way that the pow() | |

function is implemented on many CPUs. | |

Note that a logarithm is involved even if the exponent is an integer. | |

This means that any exponentiating with a negative base will produce NaN. | |

In constrast, it is possible in a "normal" mathematical formulation to | |

raise negative numbers to integral powers (e.g., (-3)^2== 9, and | |

(-0.5)^-2==4). | |

Section 3.11.5.25, RCP: Reciprocal | |

The RCP instruction approximates the reciprocal of the scalar operand and | |

replicates it to all four components of the result vector. | |

tmp = ScalarLoad(op0); | |

result.x = ApproxReciprocal(tmp); | |

result.y = ApproxReciprocal(tmp); | |

result.z = ApproxReciprocal(tmp); | |

result.w = ApproxReciprocal(tmp); | |

The approximation function is accurate to at least 22 bits: | |

| ApproxReciprocal(x) - (1/x) | < 1.0 / 2^22, if 1.0 <= x < 2.0. | |

The following special-case rules apply to reciprocation: | |

1. ApproxReciprocal(NaN) = NaN. | |

2. ApproxReciprocal(+INF) = +0.0. | |

3. ApproxReciprocal(-INF) = -0.0. | |

4. ApproxReciprocal(+0.0) = +INF. | |

5. ApproxReciprocal(-0.0) = -INF. | |

Section 3.11.5.26, RFL: Reflection Vector | |

The RFL instruction computes the reflection of the second vector operand | |

(the "direction" vector) about the vector specified by the first vector | |

operand (the "axis" vector). Both operands are treated as 3D vectors (the | |

w components are ignored). The result vector is another 3D vector (the | |

"reflected direction" vector). The length of the result vector, ignoring | |

rounding errors, should equal that of the second operand. | |

axis = VectorLoad(op0); | |

direction = VectorLoad(op1); | |

tmp.w = (axis.x * axis.x + axis.y * axis.y + | |

axis.z * axis.z); | |

tmp.x = (axis.x * direction.x + axis.y * direction.y + | |

axis.z * direction.z); | |

tmp.x = 2.0 * tmp.x; | |

tmp.x = tmp.x / tmp.w; | |

result.x = tmp.x * axis.x - direction.x; | |

result.y = tmp.x * axis.y - direction.y; | |

result.z = tmp.x * axis.z - direction.z; | |

A fragment program will fail to load if the w component of the result is | |

enabled in the component write mask (see the <optionalWriteMask> rule in | |

the grammar). | |

Section 3.11.5.27, RSQ: Reciprocal Square Root | |

The RSQ instruction approximates the reciprocal of the square root of the | |

scalar operand and replicates it to all four components of the result | |

vector. | |

tmp = ScalarLoad(op0); | |

result.x = ApproxRSQRT(tmp); | |

result.y = ApproxRSQRT(tmp); | |

result.z = ApproxRSQRT(tmp); | |

result.w = ApproxRSQRT(tmp); | |

The approximation function is accurate to at least 22 bits: | |

| ApproxRSQRT(x) - (1/x) | < 1.0 / 2^22, if 1.0 <= x < 4.0. | |

The following special-case rules apply to reciprocal square roots: | |

1. ApproxRSQRT(NaN) = NaN. | |

2. ApproxRSQRT(+INF) = +0.0. | |

3. ApproxRSQRT(-INF) = NaN. | |

4. ApproxRSQRT(+0.0) = +INF. | |

5. ApproxRSQRT(-0.0) = -INF. | |

6. ApproxRSQRT(x) = NaN, if -INF < x < -0.0. | |

Section 3.11.5.28, SEQ: Set on Equal To | |

The SEQ instruction performs a component-wise comparison of the two | |

operands. Each component of the result vector is 1.0 if the corresponding | |

component of the first operand is equal to that of the second, and 0.0 | |

otherwise. | |

tmp0 = VectorLoad(op0); | |

tmp1 = VectorLoad(op1); | |

result.x = (tmp0.x == tmp1.x) ? 1.0 : 0.0; | |

result.y = (tmp0.y == tmp1.y) ? 1.0 : 0.0; | |

result.z = (tmp0.z == tmp1.z) ? 1.0 : 0.0; | |

result.w = (tmp0.w == tmp1.w) ? 1.0 : 0.0; | |

The following special-case rules apply to SEQ: | |

1. (<x> == <y>) and (<y> == <x>) always produce the same result. | |

1. (NaN == <x>) is FALSE for all <x>, including NaN. | |

2. (+INF == +INF) and (-INF == -INF) are TRUE. | |

3. (-0.0 == +0.0) and (+0.0 == -0.0) are TRUE. | |

Section 3.11.5.29, SFL: Set on False | |

The SFL instruction is a degenerate case of the other "Set on" | |

instructions that sets all components of the result vector to | |

0.0. | |

result.x = 0.0; | |

result.y = 0.0; | |

result.z = 0.0; | |

result.w = 0.0; | |

Section 3.11.5.30, SGE: Set on Greater Than or Equal | |

The SGE instruction performs a component-wise comparison of the two | |

operands. Each component of the result vector is 1.0 if the corresponding | |

component of the first operands is greater than or equal that of the | |

second, and 0.0 otherwise. | |

tmp0 = VectorLoad(op0); | |

tmp1 = VectorLoad(op1); | |

result.x = (tmp0.x >= tmp1.x) ? 1.0 : 0.0; | |

result.y = (tmp0.y >= tmp1.y) ? 1.0 : 0.0; | |

result.z = (tmp0.z >= tmp1.z) ? 1.0 : 0.0; | |

result.w = (tmp0.w >= tmp1.w) ? 1.0 : 0.0; | |

The following special-case rules apply to SGE: | |

1. (NaN >= <x>) and (<x> >= NaN) are FALSE for all <x>. | |

2. (+INF >= +INF) and (-INF >= -INF) are TRUE. | |

3. (-0.0 >= +0.0) and (+0.0 >= -0.0) are TRUE. | |

Section 3.11.5.31, SGT: Set on Greater Than | |

The SGT instruction performs a component-wise comparison of the two | |

operands. Each component of the result vector is 1.0 if the corresponding | |

component of the first operands is greater than that of the second, and | |

0.0 otherwise. | |

tmp0 = VectorLoad(op0); | |

tmp1 = VectorLoad(op1); | |

result.x = (tmp0.x > tmp1.x) ? 1.0 : 0.0; | |

result.y = (tmp0.y > tmp1.y) ? 1.0 : 0.0; | |

result.z = (tmp0.z > tmp1.z) ? 1.0 : 0.0; | |

result.w = (tmp0.w > tmp1.w) ? 1.0 : 0.0; | |

The following special-case rules apply to SGT: | |

1. (NaN > <x>) and (<x> > NaN) are FALSE for all <x>. | |

2. (-0.0 > +0.0) and (+0.0 > -0.0) are FALSE. | |

Section 3.11.5.32, SIN: Sine | |

The SIN instruction approximates the sine of the angle specified by the | |

scalar operand and replicates it to all four components of the result | |

vector. The angle is specified in radians and does not have to be in the | |

range [0,2*PI]. | |

tmp = ScalarLoad(op0); | |

result.x = ApproxSine(tmp); | |

result.y = ApproxSine(tmp); | |

result.z = ApproxSine(tmp); | |

result.w = ApproxSine(tmp); | |

The approximation function is accurate to at least 22 bits with an angle | |

in the range [0,2*PI]. | |

| ApproxSine(x) - sin(x) | < 1.0 / 2^22, if 0.0 <= x < 2.0 * PI. | |

The error in the approximation will typically increase with the absolute | |

value of the angle when the angle falls outside the range [0,2*PI]. | |

The following special-case rules apply to cosine approximation: | |

1. ApproxSine(NaN) = NaN. | |

2. ApproxSine(+/-INF) = NaN. | |

3. ApproxSine(+/-0.0) = +/-0.0. The sign of the result is equal to the | |

sign of the single operand. | |

Section 3.11.5.33, SLE: Set on Less Than or Equal | |

The SLE instruction performs a component-wise comparison of the two | |

operands. Each component of the result vector is 1.0 if the corresponding | |

component of the first operand is less than or equal to that of the | |

second, and 0.0 otherwise. | |

tmp0 = VectorLoad(op0); | |

tmp1 = VectorLoad(op1); | |

result.x = (tmp0.x <= tmp1.x) ? 1.0 : 0.0; | |

result.y = (tmp0.y <= tmp1.y) ? 1.0 : 0.0; | |

result.z = (tmp0.z <= tmp1.z) ? 1.0 : 0.0; | |

result.w = (tmp0.w <= tmp1.w) ? 1.0 : 0.0; | |

The following special-case rules apply to SLE: | |

1. (NaN <= <x>) and (<x> <= NaN) are FALSE for all <x>. | |

2. (+INF <= +INF) and (-INF <= -INF) are TRUE. | |

3. (-0.0 <= +0.0) and (+0.0 <= -0.0) are TRUE. | |

Section 3.11.5.34, SLT: Set on Less Than | |

The SLT instruction performs a component-wise comparison of the two | |

operands. Each component of the result vector is 1.0 if the corresponding | |

component of the first operand is less than that of the second, and 0.0 | |

otherwise. | |

tmp0 = VectorLoad(op0); | |

tmp1 = VectorLoad(op1); | |

result.x = (tmp0.x < tmp1.x) ? 1.0 : 0.0; | |

result.y = (tmp0.y < tmp1.y) ? 1.0 : 0.0; | |

result.z = (tmp0.z < tmp1.z) ? 1.0 : 0.0; | |

result.w = (tmp0.w < tmp1.w) ? 1.0 : 0.0; | |

The following special-case rules apply to SLT: | |

1. (NaN < <x>) and (<x> < NaN) are FALSE for all <x>. | |

2. (-0.0 < +0.0) and (+0.0 < -0.0) are FALSE. | |

Section 3.11.5.35, SNE: Set on Not Equal | |

The SNE instruction performs a component-wise comparison of the two | |

operands. Each component of the result vector is 1.0 if the corresponding | |

component of the first operand is not equal to that of the second, and 0.0 | |

otherwise. | |

tmp0 = VectorLoad(op0); | |

tmp1 = VectorLoad(op1); | |

result.x = (tmp0.x != tmp1.x) ? 1.0 : 0.0; | |

result.y = (tmp0.y != tmp1.y) ? 1.0 : 0.0; | |

result.z = (tmp0.z != tmp1.z) ? 1.0 : 0.0; | |

result.w = (tmp0.w != tmp1.w) ? 1.0 : 0.0; | |

The following special-case rules apply to SNE: | |

1. (<x> != <y>) and (<y> != <x>) always produce the same result. | |

2. (NaN != <x>) is TRUE for all <x>, including NaN. | |

3. (+INF != +INF) and (-INF != -INF) are FALSE. | |

4. (-0.0 != +0.0) and (+0.0 != -0.0) are TRUE. | |

Section 3.11.5.36, STR: Set on True | |

The STR instruction is a degenerate case of the other "Set on" | |

instructions that sets all components of the result vector to 1.0. | |

result.x = 1.0; | |

result.y = 1.0; | |

result.z = 1.0; | |

result.w = 1.0; | |

Section 3.11.5.37, SUB: Subtract | |

The SUB instruction performs a component-wise subtraction of the second | |

operand from the first to yield a result vector. | |

tmp0 = VectorLoad(op0); | |

tmp1 = VectorLoad(op1); | |

result.x = tmp0.x - tmp1.x; | |

result.y = tmp0.y - tmp1.y; | |

result.z = tmp0.z - tmp1.z; | |

result.w = tmp0.w - tmp1.w; | |

The SUB instruction is completely equivalent to an identical ADD | |

instruction in which the negate operator on the second operand is | |

reversed: | |

1. "SUB R0, R1, R2" is equivalent to "ADD R0, R1, -R2". | |

2. "SUB R0, R1, -R2" is equivalent to "ADD R0, R1, R2". | |

3. "SUB R0, R1, |R2|" is equivalent to "ADD R0, R1, -|R2|". | |

4. "SUB R0, R1, -|R2|" is equivalent to "ADD R0, R1, |R2|". | |

Section 3.11.5.38, TEX: Texture Lookup | |

The TEX instruction performs a filtered texture lookup using the texture | |

target given by <texImageTarget> belonging to the texture image unit given | |

by <texImageUnit>. <texImageTarget> values of "1D", "2D", "3D", "CUBE", | |

and "RECT" correspond to the texture targets TEXTURE_1D, TEXTURE_2D, | |

TEXTURE_3D, TEXTURE_CUBE_MAP_ARB, and TEXTURE_RECTANGLE_NV, respectively. | |

The (s,t,r) texture coordinates used for the lookup are the x, y, and z | |

components of the single operand. | |

The texture lookup is performed as specified in Section 3.8. The LOD | |

calculations in Section 3.8.5 are performed using an implementation | |

dependent method to derive ds/dx, ds/dy, dt/dx, dt/dy, dr/dx, and dr/dy. | |

The mapping of filtered texture components to the components of the result | |

vector is dependent on the base internal format of the texture and is | |

specified in Table X.5. | |

Result Vector Components | |

Base Internal Format X Y Z W | |

-------------------- ----- ----- ----- ----- | |

ALPHA 0.0 0.0 0.0 At | |

LUMINANCE Lt Lt Lt 1.0 | |

LUMINANCE_ALPHA Lt Lt Lt At | |

INTENSITY It It It It | |

RGB Rt Gt Bt 1.0 | |

RGBA Rt Gt Bt At | |

HILO_NV (signed) HIt LOt HEMI 1.0 | |

HILO_NV (unsigned) HIt LOt 1.0 1.0 | |

DSDT_NV DSt DTt 0.0 1.0 | |

DSDT_MAG_NV DSt DTt MAGt 1.0 | |

DSDT_MAG_INTENSITY_NV DSt DTt MAGt It | |

FLOAT_R_NV Rt 0.0 0.0 1.0 | |

FLOAT_RG_NV Rt Gt 0.0 1.0 | |

FLOAT_RGB_NV Rt Gt Bt 1.0 | |

FLOAT_RGBA_NV Rt Gt Bt At | |

Table X.5: Mapping of filtered texel components to result vector | |

components for the TEX instruction. 0.0 and 1.0 indicate that the | |

corresponding constant value is written to the result vector. | |

DEPTH_COMPONENT textures are treated as ALPHA, LUMINANCE, or INTENSITY, | |

as specified in the texture's depth texture mode. | |

For HILO_NV textures with signed components, "HEMI" is defined as | |

sqrt(MAX(0, 1-(HIt^2+LOt^2))). | |

This instruction specifies a particular texture target, ignoring the | |

standard hierarchy of texture enables (TEXTURE_CUBE_MAP_ARB, TEXTURE_3D, | |

TEXTURE_2D, TEXTURE_1D) used to select a texture target in unextended | |

OpenGL. If the specified texture target has a consistent set of images, a | |

lookup is performed. Otherwise, the result of the instruction is the | |

vector (0,0,0,0). | |

Although this instruction allows the selection of any texture target, a | |

fragment program can not use more than one texture target for any given | |

texture image unit. | |

Section 3.11.5.39, TXD: Texture Lookup with Derivatives | |

The TXD instruction performs a filtered texture lookup using the texture | |

target given by <texImageTarget> belonging to the texture image unit given | |

by <texImageUnit>. <texImageTarget> values of "1D", "2D", "3D", "CUBE", | |

and "RECT" correspond to the texture targets TEXTURE_1D, TEXTURE_2D, | |

TEXTURE_3D, TEXTURE_CUBE_MAP_ARB, and TEXTURE_RECTANGLE_NV, respectively. | |

The (s,t,r) texture coordinates used for the lookup are the x, y, and z | |

components of the first operand. The partial derivatives in the X | |

direction (ds/dx, dt/dx, dr/dx) are specified by the x, y, and z | |

components of the second operand. The partial derivatives in the Y | |

direction (ds/dy, dt/dy, dr/dy) are specified by the x, y, and z | |

components of the third operand. | |

The texture lookup is performed as specified in Section 3.8. The LOD | |

calculations in Section 3.8.5 are performed using the specified partial | |

derivatives. The mapping of filtered texture components to the components | |

of the result vector is dependent on the base internal format of the | |

texture and is specified in Table X.5. | |

This instruction specifies a particular texture target, ignoring the | |

standard hierarchy of texture enables (TEXTURE_CUBE_MAP_ARB, TEXTURE_3D, | |

TEXTURE_2D, TEXTURE_1D) used to select a texture target in unextended | |

OpenGL. If the specified texture target has a consistent set of images, a | |

lookup is performed. Otherwise, the result of the instruction is the | |

vector (0,0,0,0). | |

Although this instruction allows the selection of any texture target, a | |

fragment program can not use more than one texture target for any given | |

texture image unit. | |

Section 3.11.5.40, TXP: Projective Texture Lookup | |

The TXP instruction performs a filtered texture lookup using the texture | |

target given by <texImageTarget> belonging to the texture image unit given | |

by <texImageUnit>. <texImageTarget> values of "1D", "2D", "3D", "CUBE", | |

and "RECT" correspond to the texture targets TEXTURE_1D, TEXTURE_2D, | |

TEXTURE_3D, TEXTURE_CUBE_MAP_ARB, and TEXTURE_RECTANGLE_NV, respectively. | |

For cube map textures, the (s,t,r) texture coordinates used for the lookup | |

are given by x, y, and z, respectively. For all other textures, the | |

(s,t,r) texture coordinates used for the lookup are given by x/w, y/w, and | |

z/w, respectively, where x, y, z, and w are the corresponding components | |

of the operand. | |

The texture lookup is performed as specified in Section 3.8. The LOD | |

calculations in Section 3.8.5 are performed using an implementation | |

dependent method to derive ds/dx, ds/dy, dt/dx, dt/dy, dr/dx, and dr/dy. | |

The mapping of filtered texture components to the components of the result | |

vector is dependent on the base internal format of the texture and is | |

specified in Table X.5. | |

This instruction specifies a particular texture target, ignoring the | |

standard hierarchy of texture enables (TEXTURE_CUBE_MAP_ARB, TEXTURE_3D, | |

TEXTURE_2D, TEXTURE_1D) used to select a texture target in unextended | |

OpenGL. If the specified texture target has a consistent set of images, a | |

lookup is performed. Otherwise, the result of the instruction is the | |

vector (0,0,0,0). | |

Although this instruction allows the selection of any texture target, a | |

fragment program can not use more than one texture target for any given | |

texture image unit. | |

Section 3.11.5.41, UP2H: Unpack Two 16-Bit Floats | |

The UP2H instruction unpacks two 16-bit floats stored together in a 32-bit | |

scalar operand. The first 16-bit float (stored in the 16 least | |

significant bits) is written into the "x" and "z" components of the result | |

vector; the second is written into the "y" and "w" components of the | |

result vector. | |

This operation undoes the type conversion and packing performed by the | |

PK2H instruction. | |

tmp = ScalarLoad(op0); | |

result.x = (fp16) (RawBits(tmp) & 0xFFFF); | |

result.y = (fp16) ((RawBits(tmp) >> 16) & 0xFFFF); | |

result.z = (fp16) (RawBits(tmp) & 0xFFFF); | |

result.w = (fp16) ((RawBits(tmp) >> 16) & 0xFFFF); | |

Since the source operand must be a 32-bit scalar, a fragment program will | |

fail to load if the operand is not obtained from a register with 32-bit | |

components or from a program parameter. | |

Section 3.11.5.42, UP2US: Unpack Two Unsigned 16-Bit Scalars | |

The UP2US instruction unpacks two 16-bit unsigned values packed together | |

in a 32-bit scalar operand. The unsigned quantities are encoded where a | |

bit pattern of all '0' bits corresponds to 0.0 and a pattern of all '1' | |

bits corresponds to 1.0. The "x" and "z" components of the result vector | |

are obtained from the 16 least significant bits of the operand; the "y" | |

and "w" components are obtained from the 16 most significant bits. | |

This operation undoes the type conversion and packing performed by the | |

PK2US instruction. | |

tmp = ScalarLoad(op0); | |

result.x = ((RawBits(tmp) >> 0) & 0xFFFF) / 65535.0; | |

result.y = ((RawBits(tmp) >> 16) & 0xFFFF) / 65535.0; | |

result.z = ((RawBits(tmp) >> 0) & 0xFFFF) / 65535.0; | |

result.w = ((RawBits(tmp) >> 16) & 0xFFFF) / 65535.0; | |

Since the source operand must be a 32-bit scalar, a fragment program will | |

fail to load if the operand is not obtained from a register with 32-bit | |

components or from a program parameter. | |

Section 3.11.5.43, UP4B: Unpack Four Signed 8-Bit Values | |

The UP4B instruction unpacks four 8-bit signed values packed together in a | |

32-bit scalar operand. The signed quantities are encoded where a bit | |

pattern of all '0' bits corresponds to -128/127 and a pattern of all '1' | |

bits corresponds to +127/127. The "x" component of the result vector is | |

the converted value corresponding to the 8 least significant bits of the | |

operand; the "w" component corresponds to the 8 most significant bits. | |

This operation undoes the type conversion and packing performed by the | |

PK4B instruction. | |

tmp = ScalarLoad(op0); | |

result.x = (((RawBits(tmp) >> 0) & 0xFF) - 128) / 127.0; | |

result.y = (((RawBits(tmp) >> 8) & 0xFF) - 128) / 127.0; | |

result.z = (((RawBits(tmp) >> 16) & 0xFF) - 128) / 127.0; | |

result.w = (((RawBits(tmp) >> 24) & 0xFF) - 128) / 127.0; | |

Since the source operand must be a 32-bit scalar, a fragment program will | |

fail to load if the operand is not obtained from a register with 32-bit | |

components or from a program parameter. | |

Section 3.11.5.44, UP4UB: Unpack Four Unsigned 8-Bit Scalars | |

The UP4UB instruction unpacks four 8-bit unsigned values packed together | |

in a 32-bit scalar operand. The unsigned quantities are encoded where a | |

bit pattern of all '0' bits corresponds to 0.0 and a pattern of all '1' | |

bits corresponds to 1.0. The "x" component of the result vector is | |

obtained from the 8 least significant bits of the operand; the "w" | |

component is obtained from the 8 most significant bits. | |

This operation undoes the type conversion and packing performed by the | |

PK4UB instruction. | |

tmp = ScalarLoad(op0); | |

result.x = ((RawBits(tmp) >> 0) & 0xFF) / 255.0; | |

result.y = ((RawBits(tmp) >> 8) & 0xFF) / 255.0; | |

result.z = ((RawBits(tmp) >> 16) & 0xFF) / 255.0; | |

result.w = ((RawBits(tmp) >> 24) & 0xFF) / 255.0; | |

Since the source operand must be a 32-bit scalar, a fragment program will | |

fail to load if the operand is not obtained from a register with 32-bit | |

components or from a program parameter. | |

Section 3.11.5.45, X2D: 2D Coordinate Transformation | |

The X2D instruction multiplies the 2D offset vector specified by the "x" | |

and "y" components of the second vector operand by the 2x2 matrix | |

specified by the four components of the third vector operand, and adds the | |

transformed offset vector to the 2D vector specified by the "x" and "y" | |

components of the first vector operand. The first component of the sum is | |

written to the "x" and "z" components of the result; the second component | |

is written to the "y" and "w" components of the result. | |

The X2D instruction can be used to displace texture coordinates in the | |

same manner as the OFFSET_TEXTURE_2D_NV mode in the GL_NV_texture_shader | |

extension. | |

tmp0 = VectorLoad(op0); | |

tmp1 = VectorLoad(op1); | |

tmp2 = VectorLoad(op2); | |

result.x = tmp0.x + tmp1.x * tmp2.x + tmp1.y * tmp2.y; | |

result.y = tmp0.y + tmp1.x * tmp2.z + tmp1.y * tmp2.w; | |

result.z = tmp0.x + tmp1.x * tmp2.x + tmp1.y * tmp2.y; | |

result.w = tmp0.y + tmp1.x * tmp2.z + tmp1.y * tmp2.w; | |

Section 3.11.6, Fragment Program Outputs | |

Upon completion of fragment program execution, the output registers are | |

used to replace the fragment's associated data. | |

The RGBA color of the fragment is taken from the color output register | |

used by the program (COLR or COLH). The R, G, B, and A color components | |

are extracted from the "x", "y", "z", and "w" components, respectively, of | |

the output register and are clamped to the range [0,1]. | |

If the DEPR output register is written by the fragment program, the depth | |

value of the fragment is taken from the z component of the DEPR output | |

register. If depth clamping is enabled, the depth value is clamped to the | |

range [min(n,f), max(n,f)], where n and f are the near and far depth range | |

values. If depth clamping is disabled, the fragment is discarded if its | |

depth value is outside the range [min(n,f), max(n,f)]. | |

Section 3.11.7, Required Fragment Program State | |

The state required for managing fragment programs consists of: | |

a bit indicating whether or not fragment program mode is enabled; | |

an unsigned integer naming the currently bound fragment program | |

and the state that must be maintained to indicate which integers are | |

currently in use as fragment program names. | |

Fragment program mode is initially disabled. The initial state of all 128 | |

fragment program parameter registers is (0,0,0,0). The initial currently | |

bound fragment program is zero. | |

Each fragment program object consists of: | |

an enumerant given the program target (FRAGMENT_PROGRAM_NV); | |

a boolean indicating whether the program is resident; | |

an array of type ubyte containing the program string; | |

an integer representing the length of the program string array; | |

one four-component floating-point vector for each named local | |

parameter in the program; | |

and a set of MAX_FRAGMENT_PROGRAM_LOCAL_PARAMETERS_NV four-component | |

floating-point vectors to hold numbered local parameters, each initially | |

set to (0,0,0,0). | |

Initially, no program objects exist. | |

Additionally, the state required during the execution of a fragment | |

program consists of: twelve 4-component floating-point fragment attribute | |

registers, thirty-two 128-bit physical temporary registers, and a single | |

4-component condition code, whose components have one of four values (LT, | |

EQ, GT, or UN). | |

Each time a fragment program is executed, the fragment attribute registers | |

are initialized with the fragment's location and associated data, all | |

temporary register components are initialized to zero, and all condition | |

code components are initialized to EQ. | |

Renumber Section 3.11 to Section 3.12, Antialiasing Application (p.140). | |

No changes to the text of the section. | |

Additions to Chapter 4 of the OpenGL 1.2.1 Specification (Per-Fragment | |

Operations and the Framebuffer) | |

None | |

Additions to Chapter 5 of the OpenGL 1.2.1 Specification (Special Functions) | |