blob: d908f13c3fea869b39d3136d70eb33b65f46f5b1 [file] [log] [blame]
Name
NV_fragment_program
Name Strings
GL_NV_fragment_program
Contact
Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com)
Mark J. Kilgard, NVIDIA Corporation (mjk 'at' nvidia.com)
Notice
Copyright NVIDIA Corporation, 2001-2002.
IP Status
NVIDIA Proprietary.
Status
Implemented in CineFX (NV30) Emulation driver, August 2002.
Shipping in Release 40 NVIDIA driver for CineFX hardware, January 2003.
Version
Last Modified Date: 2005/05/24
NVIDIA Revision: 73
Number
282
Dependencies
Written based on the wording of the OpenGL 1.2.1 specification and
requires OpenGL 1.2.1.
Requires support for the ARB_multitexture extension with at least
two texture units.
NV_vertex_program affects the definition of this extension. The only
dependency is that both extensions use the same mechanisms for defining
and binding programs.
NV_texture_shader trivially affects the definition of this extension.
NV_texture_rectangle trivially affects the definition of this extension.
ARB_texture_cube_map trivially affects the definition of this extension.
EXT_fog_coord trivially affects the definition of this extension.
NV_depth_clamp affects the definition of this extension.
ARB_depth_texture and SGIX_depth_texture affect the definition of this
extension.
NV_float_buffer affects the definition of this extension.
ARB_vertex_program affects the definition of this extension.
ARB_fragment_program affects the definition of this extension.
Overview
OpenGL mandates a certain set of configurable per-fragment computations
defining texture lookup, texture environment, color sum, and fog
operations. Each of these areas provide a useful but limited set of fixed
operations. For example, unextended OpenGL 1.2.1 provides only four
texture environment modes, color sum, and three fog modes. Many OpenGL
extensions have either improved existing functionality or introduced new
configurable fragment operations. While these extensions have enabled new
and interesting rendering effects, the set of effects is limited by the
set of special modes introduced by the extension. This lack of
flexibility is in contrast to the high-level of programmability of
general-purpose CPUs and other (frequently software-based) shading
languages. The purpose of this extension is to expose to the OpenGL
application writer an unprecedented degree of programmability in the
computation of final fragment colors and depth values.
This extension provides a mechanism for defining fragment program
instruction sequences for application-defined fragment programs. When in
fragment program mode, a program is executed each time a fragment is
produced by rasterization. The inputs for the program are the attributes
(position, colors, texture coordinates) associated with the fragment and a
set of constant registers. A fragment program can perform mathematical
computations and texture lookups using arbitrary texture coordinates. The
results of a fragment program are new color and depth values for the
fragment.
This extension defines a programming model including a 4-component vector
instruction set, 16- and 32-bit floating-point data types, and a
relatively large set of temporary registers. The programming model also
includes a condition code vector which can be used to mask register writes
at run-time or kill fragments altogether. The syntax, program
instructions, and general semantics are similar to those in the
NV_vertex_program and NV_vertex_program2 extensions, which provide for the
execution of an arbitrary program each time the GL receives a vertex.
The fragment program execution environment is designed for efficient
hardware implementation and to support a wide variety of programs. By
design, the entire set of existing fragment programs defined by existing
OpenGL per-fragment computation extensions can be implemented using the
extension's programming model.
The fragment program execution environment accesses textures via
arbitrarily computed texture coordinates. As such, there is no necessary
correspondence between the texture coordinates and texture maps previously
lumped into a single "texture unit". This extension separates the notion
of "texture coordinate sets" and "texture image units" (texture maps and
associated parameters), allowing implementations with a different number
of each. The initial implementation of this extension will support 8
texture coordinate sets and 16 texture image units.
Issues
What limitations exist in this extension?
RESOLVED: Very few. Programs can not exceed a maximum program length
(which is no less than 1024 instructions), and can use no more than
32-64 temporary registers. Programs can not access more than one
fragment attribute or program parameter (constant) per instruction,
but can work around this restriction using temporaries. The number of
textures that can be used by a program is limited to the number of
texture image units provided by the implementation (16 in the initial
implementation of this extension).
These limits are fairly high. Additionally, there is no limit on the
total number of texture lookups that can be performed by a program.
There is no limit on the length of a texture dependency chain -- one
can write a program that performs over 1000 consecutive dependent
texture lookups. There is no restrictions on dependencies between
texture mapping instructions and arithmetic instructions. Texture
lookups can be performed using arbitrarily computed texture
coordinates. Applications can carry out their calculations with full
32-bit single precision, although two lower-precision modes are also
available.
How does texture mapping work with fragment programs?
RESOLVED: This extension provides three instructions used to perform
texture lookups.
The "TEX" instruction performs a lookup with the (s,t,r) values taken
from an interpolated texture coordinate, an arbitrarily computed
vector, or even a program constant. The "TXP" instruction performs a
similar lookup, except that it uses the fourth component of the source
vector to performs a perspective divide, using (s/q, t/q, r/q). In
both cases, the GL will automatically compute partial derivatives used
for filter and LOD selection.
The "TXD" instruction operates like "TEX", except that it allows the
program to explicitly specify two additional vectors containing the
partial derivatives of the texture coordinate with respect to x and y
window coordinates.
All three instructions write a filtered texel value to a temporary or
output register. Other than the computation of texture coordinates
and partial derivatives, texture lookups not performed any differently
in fragment program mode. In particular, any applicable LOD biases,
wrap modes, minification and magnification filters, and anisotropic
filtering controls are still applied in fragment program mode.
The results of the texture lookup are available to be used arbitrarily
by subsequent fragment program instructions. Fragment programs are
allowed to access any texture map arbitrarily many times.
Can fragment programs be used to compute depth values?
RESOLVED: Yes. A fragment program can perform arbitrary
computations to compute a final value for the fragment, which it
should write to the "z" component of the o[DEPR] register. The "z"
value written should be in the range [0,1], regardless of the size of
the depth buffer.
To assist in the computation of the final Z value, a fragment program
can access the interpolated depth of the fragment (prior to any
displacement) by reading the "z" component of the f[WPOS] attribute
register.
How should near and far plane clipping work in fragment program mode if
the current fragment program computes a depth value?
RESOLVED: Geometric clipping to the near and far clip plane should be
disabled. Clipping should be done based on the depth values computed
per-fragment. The rationale is that per-fragment depth displacement
operations may effectively move portions of a primitive initially
outside the clip volume inside, and vice versa.
Note that under the NV_depth_clamp extension, geometric clipping to
the near and far clip planes is also disabled, and the fragment depth
values are clamped to the depth range. If depth clamp mode is enabled
when using a fragment program that computes a depth value, the
computed depth value will be clamped to the depth range.
Should fragment programs be allowed to use multiple precisions for
operands and operations?
RESOLVED: Yes. Low-precision operands are generally adequate for
representing colors. Allowing low-precision registers also allows for
a larger number of temporary registers (at lower precision).
Low-precision operations also provide the opportunity for a higher
level of performance.
Applications are free to use only high-precision operations or mix
high- and low-precision operations as necessary.
What levels of precision are supported in arithmetic operations?
RESOLVED: Arithmetic operations can be performed at three different
precisions. 32-bit floating point precision (fp32) uses the IEEE
single-precision standard with a sign bit, 8 exponent bits, and 23
mantissa bits. 16-bit floating-point precision (fp16) uses a similar
floating-point representation, but with 5 exponent bits and 10
mantissa bits. Additionally, many arithmetic operations can also be
carried out at 12-bit fixed point precision (fx12), where values in
the range [-2,+2) are represented as signed values with 10 fraction
bits.
How should the precision with which operations are carried out be
specified? Should we infer the precision from the types of the operands
or result vectors? Or should it be an attribute of the instruction?
RESOLVED: Applications can optionally specify the precision of
individual instructions by adding a suffix of "R", "H", and "X" to
instruction names to select fp32, fp16, and fx12 precision,
respectively.
By default, instructions will be carried out using the precision of
the destination register. Always inferring the precision from the
operands has a number of issues. First, there are a number of
operations (e.g., TEX/TXP/TXD) where result type has little to no
correspondance to the type of the operands. In these cases, precision
suffixes are not supported. Second, one could have instructions
automatically cast operands and compute results using the type of the
highest precision operand or result. This behavior would be
problematic since all fragment attribute registers and program
parameters are kept at full precision, but full precision may not be
needed by the operation.
The choice of precision level allows programs to trade off precision
for potentially higher performance. Giving the program explicit
control over the precision also allows it to dictate precision
explicitly and eliminate any uncertainty over type casting.
For instructions whose specified precision is different than the precision
of the operands or the result registers, how are the operations performed?
How are the condition codes updated?
RESOLVED: Operations are performed with operands and results at the
precision specified by the instruction. After the operation is
complete, the result is converted to the precision of the destination
register, after which the condition code is generated.
In an alternate approach, the condition code could be generated from
the result. However, in some cases, the register contents would not
match the condition code. In such cases, it may not be reliable to
use the condition code to prevent division by zero or other special
cases.
How does this extension interact with the ARB_multisample extension? In
the ARB_multisample extension, each fragment has multiple depth values.
In this extension, a single interpolated depth value may be modified by a
fragment program.
RESOLVED: The depth values for the extra samples are generated by
computing partials of the computed depth value and using these
partials to derive the depth values for each of the extra samples.
How does this extension interact with polygon offset? Both extensions
modify fragment depth values.
RESOLVED: As in the base OpenGL spec, the depth offset generated by
polygon offset is added during polygon rasterization. The depth value
provided to programs in f[WPOS].z already includes polygon offset, if
enabled. If the depth value is replaced by a fragment program, the
polygon offset value will NOT be recomputed and added back after
program execution.
This is probably not desirable for fragment programs that modify depth
values since the partials used to generate the offset may not match
the partials of the computed depth value. Polygon offset for filled
polygons can be approximated in a fragment program using the depth
partials obtained by the DDX and DDY instructions. This will not work
properly for line- and point-mode polygons, since the partials used
for offset are computed over the polygon, while the partials resulting
from the DDX and DDY instructions are computed along the line (or are
zero for point-mode polygons). In addition, separate treatment of
points, line segments, and polygons is not possible in a fragment
program.
Should depth component replacement be an property of the fragment program
or a separate enable?
RESOLVED: It should be a program property. Using the output register
notation simplifies matters: depth components are replaced if and
only if the DEPR register is written to. This alleviates the
application and driver burden of maintaining separate state.
How does this extension affect the handling of q texture coordinates in
the OpenGL spec?
RESOLVED: Fragment programs are allowed to access an associated q
texture coordinate, so this attribute must be produced by
rasterization. In unextended OpenGL 1.2, the q coordinate is
eliminated in the rasterization portions of the spec after dividing
each of s, t, and r by it. This extension updates the specification
to pass q coordinates through at least to conventional texture
mapping. When fragment program mode are disabled, q coordinates will
be eliminated there in an identical manner. This modification has the
added benefit of simplifying the equations used for attribute
interpolation.
How should clip w coordinates be handled by this extension?
RESOLVED: Fragment programs are allowed to access the reciprocal of
the clip w coordinate, so this attribute must be produced by
rasterization. The OpenGL 1.2 spec doesn't explictly enumerate the
attributes associated with the fragment, but we add treatment of the w
clip coordinate in the appropriate locations.
The reciprocal of the clip w coordinate in traditional graphics
hardware is produced by screen-space linear interpolation of the
reciprocals of the clip w coordinates of the vertices. However, this
spec says the clip w coordinate is produced by perspective-correct
interpolation of the (non-reciprocated) clip w vertex coordinates.
These two formulations turn out to be equivalent, and the latter is
more convenient since the core OpenGL spec already contains formulas
for perspective-correct interpolation of vertex attributes.
What is produced by the TEX/TXP/TXD instructions if the requested texture
image is inconsistent?
RESOLVED: The result vector is specified to be (0,0,0,0). This
behavior is consistent with the NV_texture_shader extension. Note
that like in NV_texture_shader, these instructions ignore the standard
hierarchy of texture enables and programs can access textures that are
not specifically "enabled".
Should a minimum precision be specified for certain fragment attribute
registers (in particular COL0, COL1) that may not be generated with full
fp32 precision?
RESOLVED: No. It is expected that the precision of COL0/COL1 should
generally be at least as high as that of the frame buffer.
Fragment color components (f[COL0] and f[COL1]) are generally
low-precision fixed-point values in the range [0,1]. Is it possible to
pass unclamped or high-precision color components to fragment programs?
RESOLVED: Yes, although you can't exactly call them "colors".
High-precision per-vertex color values can be written into any unused
texture coordinate set, either via a MultiTexCoord call or using a
vertex program. These "texture coordinates" will be interpolated
during rasterization, and can be used arbitrarily by a fragment
program.
In particular, there is no requirement that per-fragment attributes
called "texture coordinates" be used for texture mapping.
Should this specification guarantee that temporary registers are
initialized to zero?
RESOLVED: Yes. This will allow for the modular construction of
programs that accumulate results in registers. For example,
per-fragment lighting may use MAD instructions to accumulate color
contributions at each light. Without zero-initialization, the program
would require an explicit MOV instruction to load 0 or the use of the
MUL instruction for the first light.
Should this specification support Unicode program strings?
RESOLVED: Not necessary.
Programs defined by NV_vertex_program begin with "!!VP1.0". Should
fragment programs have a similar identifier?
RESOLVED: Yes, "!!FP1.0", identifying the first revision of this
fragment program language.
Should per-fragment attributes have equivalent integer names in the
program language, as per-vertex attributes do in NV_vertex_program?
RESOLVED: No. In NV_vertex_program, "generic" vertex attributes
could be specified directly by an application using only an attribute
number. Those numbers may have no necessary correlation with the
conventional attribute names, although conventional vertex attributes
are mapped to attribute numbers. However, conventional attributes are
the only outputs of vertex programs and of rasterization. Therefore,
there is no need for a similar input-by-number functionality for
fragment programs.
Should we provide the ability to issue instructions that do not update
temporary or output registers?
RESOLVED: Yes. Programs may issue instructions whose only purpose is
to update the condition code register, and requiring such instructions
to write to a temporary may require the use of an additional temporary
and/or defeat possible program optimizations. We accomplish this by
adding two write-only temporary pseudo-registers ("RC" and "HC") that
can be specified as destination registers.
Do the packing and unpacking instructions in this extension make any
sense?
RESOLVED: Yes. They are useful for packing and unpacking multiple
components in a single channel of a floating-point frame buffer. For
example, a 128-bit "RGBA" frame buffer could pack 16 8-bit quantities
or 8 16-bit quantities, all of which could be used in later
rasterization passes. See the NV_float_buffer extension for more
information.
Should we provide a method for specifying an fp16 depth component output
value?
RESOLVED: No. There is no good reason for supporting half-precision
Z outputs. Even with 16-bit Z buffers, the 10-bit mantissa of the
half-precision float is rather limiting. There would effectively be
only 11 good bits in the back half of the Z buffer.
Should RequestResidentProgramsNV (or a new equivalent function) take a
target? Dealing with working sets of different program types is a bit
messy. Should we document some limitation if we get programs of different
types?
RESOLVED: In retrospect, it may have been a good idea to attach a
target to this command, but there isn't a good reason to mess with
something that already works for vertex programs. The driver is
responsible for ensuring consistent results when the program types
specified are mixed.
What happens on data type conversions where the original value is not
exactly representable in the new data type, either due to overflow or
insufficient precision in the destination type?
RESOLVED: In case of overflow, the original value is clamped to the
+/-INF (fp16 or fp32) or the nearest representable value (fx12). In
case of imprecision, the conversion is either to round or truncate to
the nearest representable value.
Should this extension support IEEE-style denorms? For 32-bit IEEE
floating point, denorms are numbers smaller in absolute value than 2^-126.
For 16-bit floats used by this extension, denorms are numbers smaller in
absolute value than 2^-14.
RESOLVED: For 32-bit data types, hardware support for denorms was
considered too expensive relative to the benefit provided.
Computational results that would otherwise produce denorms are flushed
to zero. For 16-bit data types, hardware denorm support will be
present. The expense of hardware denorm support is lower and the
potential precision benefit is greater for 16-bit data types.
OpenGL provides a hierarchy of texture enables. The texture lookup
operations in NV_texture_shader effectively override the texture enable
hierarchy and select a specific texture to enable. What should be done by
this extension?
RESOLVED: This extension will build upon NV_texture_shader and reduce
the driver overhead of validating the texture enables. Texture
lookups can be specified by instructions like "TEX H0, f[TEX2], TEX2,
3D", which would indicate to use texture coordinate set number 2 to do
a lookup in the texture object bound to the TEXTURE_3D target in
texture image unit 2.
Each texture unit can have only one "active" target. Programs are not
allowed to reference different texture targets in the same texture
image unit. In the example above, any other texture instructions
using texture image unit 2 must specify the 3D texture target.
What is the interaction with NV_register_combiners?
RESOLVED: Register combiners are not available when fragment programs
are enabled.
Previous version of this specification supported the notion of
combiner programs, where the result of fragment program execution was
a set of four "texture lookup" values that fed the register combiners.
For convenience, should we include pseudo-instructions not present in the
hardware instruction set that are trivially implementable? For example,
absolute value and subtract instructions could fall in this category. An
"ABS R1,R0" instruction would be equivalent to "MAX R1,R0,-R0", and a "SUB
R2,R0,R1" would be equivalent to "ADD R2,R0,-R1"
RESOLVED: In general, yes. A SUB instruction is provided for
convenience. This extension does not provide a separate ABS
instruction because it supports absolute value operations of each
operand.
Should there be a '+' in the <optionalSign> portion of the grammar? There
isn't one in the GL_NV_vertex_program spec.
RESOLVED: Yes, for orthogonality/readability. A '+' obviously adds
no functionality. In NV_vertex_program, an <optionalSign> of "-" was
always a negation operator. However, in fragment programs, it can
also be used as a sign for a constant value.
Can the same fragment attribute register, program parameter register, or
constants be used for multiple operands in the same instruction? If so,
can it be used with different swizzle patterns?
RESOLVED: Yes and yes.
This extension allows different limits for the number of texture
coordinate sets and the number of texture image units (i.e., texture maps
and associated data). The state in ActiveTextureARB affects both
coordinate sets (TexGen, matrix operations) and image units (TexParameter,
TexEnv). How should we deal with this?
RESOLVED: Continue to use ActiveTextureARB and emit an
INVALID_OPERATION if the active texture refers to an unsupported
coordinate set/image unit. Other options included creating dummy
(unusable) state for unsupported coordinate sets/image units and
continue to use ActiveTextureARB normally, or creating separate state
and state-setting commands for coordinate sets and image units.
Separate state is the cleanest solution, but would add more calls and
potentially cause more programmer confusion. Dummy state would avoid
additional error checks, but the demands of dummy state could grow if
the number of texture image units and texture coordinate sets
increases.
The current OpenGL spec is vague as to what state is affected by the
active texture selector and has no distination between
coordinate-related and image-related state. The state tables could
use a good clean-up in this area.
The LRP instruction is defined so that the result of "LRP R0, R0, R1, R2"
is R0*R1+(1-R0)*R2. There are conflicting precedents here. The
definition here matches the "lrp" instruction in the DirectX 8.0 pixel
shader language. However, an equivalent RenderMan lerp operation would
yield a result of (1-R0)*R1+R0*R2. Which ordering should be implemented?
RESOLVED: NVIDIA hardware implements the former operand ordering, and
there is no good reason to specify a different ordering. To convert a
"LRP" using the latter ordering to NV_fragment_program, swap the third
and fourth arguments.
Should this extension provide tracking of matrices or any other state,
similar to that provided in NV_vertex_program?
RESOLVED: No.
Should this extension provide global program parameters -- values shared
between multiple fragment programs?
RESOLVED: No.
Should this extension provide program parameters specific to a program?
If so, how?
RESOLVED: Yes. These parameters will be called "local parameters".
This extension will provide both named and numbered local parameters.
Local parameters can be managed by the driver and eliminate the need
for applications to manage a global name space.
Named local parameters work much like standard variable names in most
programming languages. They are created using the "DECLARE"
instruction within the fragment program itself. For example:
DECLARE color = {1,0,0,1};
Named local parameters are used simply by referencing the variable
name. They do not require the array syntax like the global parameters
in the NV_vertex_program extension. They can be updated using the
commands ProgramNamedParameter4[f,fv]NV.
Numbered local parameters are not declared. They are used by simply
referencing an element of an array called "p". For example,
MOV R0, p[12];
loads the value of numbered local parameter 12 into register R0.
Numbered local parameters can be updated using the commands
ProgramLocalParameter4[d,dv,f,fv]ARB.
The numbered local parameter APIs were added to this extension late in
its development, and are provided for compatibility with the
ARB_vertex_program extension, and what will likely be supported in
ARB_fragment_program as well. Providing this mechanism allows
programs to use the same mechanisms to set local parameters in both
extension.
Why are the APIs for setting named and numbered local parameters
different?
RESOLVED: The named parameter API was created prior to
ARB_vertex_program (and the possible future ARB_fragment_program) and
uses conventions borrowed from NV_vertex_program. A slightly
different API was chosen during the ARB standardization process; see
the ARB_vertex_program specification for more details.
The named parameter API takes a program ID and a parameter name, and
sets the parameter for the program with the specified ID. The
specified program does not need to be bound (via BindProgramNV) in
order to modify the values of its named parameters. The numbered
parameter API takes a program target enum (FRAGMENT_PROGRAM_NV) and a
parameter number and modifies the corresponding numbered parameter of
the currently bound program.
What should be the initial value of uninitialized local parameters?
RESOLVED: (0,0,0,0). This choice is somewhat arbitrary, but matches
previous extensions (e.g., NV_vertex_program).
Should this extension support program parameter arrays?
RESOLVED: No hardware support is present. Note that from the point
of view of a fragment program, a texture map can be used as a 1-, 2-,
or 3-dimensional array of constants.
Should this extension provide support constants in fragment programs? If
so, how?
RESOLVED: Yes. Scalar or vector constants can be defined inline
(e.g., "1.0" or "{1,2,3,4}"). In addition, named constants are
supported using the "DEFINE" instruction, which allow programmers to
change the values of constants used in multiple instructions simply be
changing the value assigned to the named constant.
Note that because this extension uses program strings, the
floating-point value of any constants generated on the fly must be
printed to the program string. An alternate method that avoids the
need to print constants is to declare a named local program parameter
and initialize it with the ProgramNamedParameter4[f,fv]() calls.
Should named constants be allowed to be redefined?
RESOLVED: No. If you want to redefine the values of constants, you
can create an equivalent named program parameter by changing the
"DEFINE" keyword to "DECLARE".
Should functions used to update or query named local parameters take a
zero-terminated string (as with most strings in the C programming
language), or should they require an explicit string length? If the
former, should we create a version of LoadProgramNV that does not require
a string length.
RESOLVED: Stick with explicit string length. Strings that are
defined as constants can have the length computed at compile-time.
Strings read from files will have the length known in advance.
Programs to build strings at run-time also likely keep the length
up-to-date. Passing an explicit length saves time, since the driver
doesn't have to do a strlen().
What is the deal with the alpha of the secondary color?
RESOLVED: In unextended OpenGL 1.2, the alpha component of the
secondary color is forced to 0.0. In the EXT_secondary_color
extension, the alpha of the per-vertex secondary colors is defined to
be 0.0. NV_vertex_program allows vertex programs to produce a
per-vertex alpha component, but it is forced to zero for the purposes
of the color sum. In the NV_register_combiners extension, the alpha
component of the secondary color is undefined. What a mess.
In this extension, the alpha of the secondary color is well-defined
and can be used normally. When in vertex program mode
Why are fragment program instructions involving f[FOGC] or f[TEX0] through
f[TEX7] automatically carried out at full precision?
RESOLVED: This is an artifact of the method that these interpolants
are generated the NVIDIA graphics hardware. If such instructions
absolutely must be carried out at lower precision, the requirement can
be met by first loading the interpolants into a temporary register.
With a different number of texture coordinate sets and texture image
units, how many copies of each kind of texture state are there?
RESOLVED: The intention is that texture state be broken into three
groups. (1) There are MAX_TEXTURE_COORDS_NV copies of texture
coordinate set state, which includes current texture coordinates,
TexGen state, and texture matrices. (2) There are
MAX_TEXTURE_IMAGE_UNITS_NV copies of texture image unit state, which
include texture maps, texture parameters, LOD bias parameters. (3)
There are MAX_TEXTURE_UNITS_ARB copies of legacy OpenGL texture unit
state (e.g., texture enables, TexEnv blending state), all of which are
unused when in fragment program mode.
It is not necessary that MAX_TEXTURE_UNITS_ARB be equal to the minimum
of MAX_TEXTURE_COORDS_NV and MAX_TEXTURE_IMAGE_UNITS --
implementations may choose not to extend fixed-function OpenGL texture
mapping modes beyond a certain point.
The GLX protocol for LoadProgramNV (and ProgramNamedParameterNV) may end
up with programs >64KB. This will overflow the limits of the GLX Render
protocol, resulting in the need to use RenderLarge path. This is an issue
with vertex programs, also.
RESOLVED: Yes, it is.
Should textures used by fragment programs be declared? For example,
"TEXTURE TEX3, 2D", indicating that the 2D texture should be used for all
accesses to texture unit 3. The dimension could be dropped from the TEX
family of instructions, and some of the compile-time error checking could
be dropped.
RESOLVED: Maybe it should be, but for better or worse, it isn't.
It is not all that uncommon to have negative q values with projective
texture mapping, but results are undefined if any q values are negative in
this specification. Why?
RESOLVED: This restriction carries on a similar one in the initial
OpenGL specification. The motivation for this restriction is that
when interpolating, it is possible for a fragment to have an
interpolated q coordinate at or near 0.0. Since the texture
coordinates used for projective texture mapping are s/q, t/q, and r/q,
this will result in a divide-by-zero error or suffer from significant
numerical instability. Results will be inaccurate for such fragments.
Other than the numerical stability issue above, NVIDIA hardware should
have no problems with negative q coordinates.
Should programs that replace depth have their own special program type,
Such as "!!FPD1.0" and "!!FPDC1.0"?
RESOLVED: No. If a program has an instruction that writes to
o[DEPR], the final fragment depth value is taken from o[DEPR].z.
Otherwise, the fragment's original depth value is used.
What fx12 value should NaN map to?
RESOLVED: For the lack of any better choice, 0.0.
How are special-case encodings (-INF, +INF, -0.0, +0.0, NaN) handled for
arithmetic and comparison operations?
RESOLVED: The special cases for all floating-point operations are
designed to match the IEEE specification for floating-point numbers as
closely as possible. The results produced by special cases should be
enumerated in the sections of this spec describing the operations.
There are some cases where the implemented fragment program behavior
does not match IEEE conventions, and these cases should be noted in
this specification.
How can condition codes be used to mask out register writes? How about
killing fragments? What other things can you do?
RESOLVED: The following example computes a component wise |R1-R2|:
SUBC R0, R1, R2; # "C" suffix means update condition code
MOV R0 (LT), -R0; # Conditional write mask in parentheses
The first instruction computes a component-wise difference between R1
and R2, storing R1-R2 in register R0. The "C" suffix in the
instruction means to update the condition code based on the sign of
the result vector components. The second instruction inverts the sign
of the components of R0. However the "(LT)" portion says that the
destination register should be updated only if the corresponding
condition code component is LT (negative). This means that only those
components of R0
To kill a fragment if the red (x) component of a texture lookup
returns zero:
TEXC R0, f[TEX0], TEX0, 2D;
KIL EQ.x;
To kill based on the green (y) component, use "EQ.y" instead. To kill
if any of the four components is zero, use "EQ.xyzw" or just "EQ".
Fragment programs do not support boolean expressions. These can
generally be achieved using conditional write mask.
To evaluate the expression "(R0.x == 0) && (R1.x == 0)":
MOVC RC.x, R0.x;
MOVC RC.x (EQ), R1.x;
To evaluate the expression "(R0.x == 0) || (R1.x == 0)":
MOVC RC.x, R0.x;
MOVC RC.x (NE), R1.x;
In both cases, the x component of the condition code will contain "EQ"
if and only if the condition is TRUE.
How can fragment programs be used to implement non-standard texture
filtering modes?
RESOLVED: As one example, consider a case where you want to do linear
filtering in a 2D texture map, but only horizontally. To achieve
this, first set the texture filtering mode to NEAREST. For a 16 x n
texture, you might do something like:
DEFINE halfTexel = { 0.03125, 0 }; # 1/32 (1/2 a texel)
ADD R2, f[TEX0], -halfTexel; # coords of left sample
ADD R1, f[TEX0], +halfTexel; # coords of right sample
TEX R0, R2, TEX0, 2D; # lookup left sample
TEX R1, R1, TEX0, 2D; # lookup right sample
MUL R2.x, R2.x, 16; # scale X coords to texels
FRC R2.x, R2.x; # get fraction, filter weight
LRP R0, R2.x, R1, R0; # blend samples based on weight
There are plenty of other interesting things that can be done.
Should this specification provide more examples?
RESOLVED: Yes, it should.
Is the OpenGL ARB working on a multi-vendor standard for fragment
programmability? Will there be an ARB_fragment_program extension? If so,
how will this extension interact with the ARB standard?
RESOLVED: Yes, as of July 2002, there was a multi-vendor working
group and a draft specification. The ARB extension is expected to
have several features not present in this extension, such as state
tracking and global parameters (called "program environment
parameters"). It will also likely lack certain features found in this
extension.
Why does the HEMI mapping apply to the third component of signed HILO
textures, but not to unsigned HILO textures?
RESOLVED: This behavior matches the behavior of NV_texture_shader
(e.g., the DOT_PRODUCT_NV mode). The HEMI mapping will construct the
third component of a unit vector whose first two components are
encoded in the HILO texture.
New Procedures and Functions
void ProgramNamedParameter4fNV(uint id, sizei len, const ubyte *name,
float x, float y, float z, float w);
void ProgramNamedParameter4dNV(uint id, sizei len, const ubyte *name,
double x, double y, double z, double w);
void ProgramNamedParameter4fvNV(uint id, sizei len, const ubyte *name,
const float v[]);
void ProgramNamedParameter4dvNV(uint id, sizei len, const ubyte *name,
const double v[]);
void GetProgramNamedParameterfvNV(uint id, sizei len, const ubyte *name,
float *params);
void GetProgramNamedParameterdvNV(uint id, sizei len, const ubyte *name,
double *params);
void ProgramLocalParameter4dARB(enum target, uint index,
double x, double y, double z, double w);
void ProgramLocalParameter4dvARB(enum target, uint index,
const double *params);
void ProgramLocalParameter4fARB(enum target, uint index,
float x, float y, float z, float w);
void ProgramLocalParameter4fvARB(enum target, uint index,
const float *params);
void GetProgramLocalParameterdvARB(enum target, uint index,
double *params);
void GetProgramLocalParameterfvARB(enum target, uint index,
float *params);
New Tokens
Accepted by the <cap> parameter of Disable, Enable, and IsEnabled, by the
<pname> parameter of GetBooleanv, GetIntegerv, GetFloatv, and GetDoublev,
and by the <target> parameter of BindProgramNV, LoadProgramNV,
ProgramLocalParameter4dARB, ProgramLocalParameter4dvARB,
ProgramLocalParameter4fARB, ProgramLocalParameter4fvARB,
GetProgramLocalParameterdvARB, and GetProgramLocalParameterfvARB:
FRAGMENT_PROGRAM_NV 0x8870
Accepted by the <pname> parameter of GetBooleanv, GetIntegerv, GetFloatv,
and GetDoublev:
MAX_TEXTURE_COORDS_NV 0x8871
MAX_TEXTURE_IMAGE_UNITS_NV 0x8872
FRAGMENT_PROGRAM_BINDING_NV 0x8873
MAX_FRAGMENT_PROGRAM_LOCAL_PARAMETERS_NV 0x8868
Accepted by the <name> parameter of GetString:
PROGRAM_ERROR_STRING_NV 0x8874
Additions to Chapter 2 of the OpenGL 1.2.1 Specification (OpenGL Operation)
Modify Section 2.11, Clipping (p.39)
(replace the first paragraph of the section, p. 39) Primitives are clipped
to the clip volume. In clip coordinates, the view volume is defined by
-w_c <= x_c <= w_c,
-w_c <= y_c <= w_c, and
-w_c <= z_c <= w_c.
Clipping to the near and far clip planes is ignored if fragment program
mode (section 3.11) or texture shaders (see NV_texture_shader
specification) are enabled, if the current fragment program or texture
shader computes per-fragment depth values. In this case, the view volume
is defined by:
-w_c <= x_c <= w_c and
-w_c <= y_c <= w_c.
Additions to Chapter 3 of the OpenGL 1.2.1 Specification (Rasterization)
Modify Chapter 3 introduction (p. 57)
(p.57, modify 1st paragraph) ... Figure 3.1 diagrams the rasterization
process. The color value assigned to a fragment is initially determined
by the rasterization operations (Sections 3.3 through 3.7) and modified by
either the execution of the texturing, color sum, and fog operations as
defined in Sections 3.8, 3.9, and 3.10, or of a fragment program defined
in Section 3.11. The final depth value is initially determined by the
rasterization operations and may be modified by a fragment program.
note: Antialiasing Application is renumbered from Section 3.11 to Section
3.12.
Modify Figure 3.1 (p.58)
Primitive Assembly
|
+-----------+-----------+-----------+-----------+
| | | | |
| | | Pixel |
Point Line Polygon Rectangle Bitmap
Raster- Raster- Raster- Raster- Raster-
ization ization ization ization ization
| | | | |
+-----------+-----------+-----------+-----------+
|
|
+-----------------+-----------------+
| | |
Conventional Texture Fragment
Texture Fetch Shaders Programs
| | |
| +--------------+ |
| | |
TEXTURE_ o o |
SHADER_NV |
enable o |
| |
+-------------+ |
| | |
Conventional Register |
TexEnv Combiners |
| | |
Color Sum | |
| | |
Fog | |
| | |
| +----------+ |
| | |
REGISTER_ o o |
COMBINERS_ |
NV enable o |
| |
+-----------------+ +--------------+
| |
FRAGMENT_ o o
PROGRAM_
NV enable o
|
|
Coverage
Application
|
v
to fragment processing
Modify Section 3.3, Points (p.61)
All fragments produced in rasterizing a non-antialiased point are assigned
the same associated data, which are those of the vertex corresponding to
the point. (delete reference to divide by q).
If anitialiasing is enabled, then ... The data associated with each
fragment are otherwise the data associated with the point being
rasterized. (delete reference to divide by q)
Modify Section 3.4.1, Basic Line Segment Rasterization (p.66)
(Note that t=0 at p_a and t=1 at p_b). The value of an associated datum f
from the fragment, whether it be R, G, B, or A (in RGBA mode) or a color
index (in color index mode), the s, t, r, or q texture coordinate, or the
clip w coordinate (the depth value, window z, must be found using equation
3.3, below), is found as
f = (1-t) * f_a / w_a + t * f_b / w_b (3.2)
---------------------------------
(1-t) / w_a + t / w_b
where f_a and f_b are the data associated with the starting and ending
endpoints of the segment, respectively; w_a and w_b are the clip
w coordinates of the starting and ending endpoints of the segments
respectively. Note that linear interpolation would use
f = (1-t) * f_a + t * f_b. (3.3)
... A GL implementation may choose to approximate equation 3.2 with 3.3,
but this will normally lead to unacceptable distortion effects when
interpolating texture coordinates or clip w coordinates.
Modify Section 3.5.1, Basic Polygon Rasterization (p.71)
Denote a datum at p_a, p_b, or p_c ... is given by
f = a * f_a / w_a + b * f_b / w_b + c * f_c / w_c (3.4)
---------------------------------------------
a / w_a + b / w_b + c / w_c
where w_a, w_b, and w_c are the clip w coordinates of p_a, p_b, and p_c,
respectively. a, b, and c are the barycentric coordinates of the fragment
for which the data are produced. a, b, and c must correspond precisely to
the exact coordinates ... at the fragment's center.
Just as with line segment rasterization, equation 3.4 may be approximated
by
f = a * f_a + b * f_b + c * f_c; (3.5)
this may yield ... for texture coordinates or clip w coordinates.
Modify Section 3.6.4, Rasterization of Pixel Rectangles (p.100)
A fragment arising from a group ... are given by those associated with the
current raster position. (delete reference to divide by q)
Modify Section 3.7, Bitmaps (p.111)
Otherwise, a rectangular array ... The associated data for each fragment
are those associated with the current raster position. (delete reference
to divide by q) Once the fragments have been produced ...
Modify Section 3.8, Texturing (p.112)
... an image at the location indicated by a fragment's texture coordinates
to modify the fragments primary RGBA color. Texturing does not affect the
secondary color.
Texturing is specified only for RGBA mode; its use in color index mode is
undefined.
Except when in fragment program mode (Section 3.11), the (s,t,r) texture
coordinates used for texturing are the values s/q, t/q, and r/q,
respectively, where s, t, r, and q are the texture coordinates associated
with the fragment. When in fragment program mode, the (s,t,r) texture
coordinates are specified by the program. If q is less than or equal to
zero, the results of texturing are undefined.
Add new Section 3.11, Fragment Programs (p.140)
Fragment program mode is enabled and disabled with the Enable and Disable
commands using the symbolic constant FRAGMENT_PROGRAM_NV. When fragment
program mode is enabled, standard and extended texturing, color sum, and
fog application stages are ignored and a general purpose program is
executed instead.
A fragment program is a sequence of instructions that execute on a
per-fragment basis. In fragment program mode, the currently bound
fragment program is executed as each fragment is generated by the
rasterization operations. Fragment programs execute a finite fixed
sequence of instructions with no branching or looping, and operate
independently from the processing of other fragments. Fragment programs
are used to compute new color values to be associated with each fragment,
and can optionally compute a new depth value for each fragment as well.
Fragment program mode is not available in color index mode and is
considered disabled, regardless of the state of FRAGMENT_PROGRAM_NV. When
fragment program mode is enabled, texture shaders and register combiners
(NV_texture_shader and NV_register_combiners extension) are disabled,
regardless of the state of TEXTURE_SHADER_NV and REGISTER_COMBINERS_NV.
Section 3.11.1, Fragment Program Registers
Fragment programs operate on a set of program registers. Each program
register is a 4-component vector, whose components are referred to as "x",
"y", "z", and "w" respectively. The components of a fragment register are
always referred to in this manner, regardless of the meaning of their
contents.
The four components of each fragment program register have one of two
different representations: 32-bit floating-point (fp32) or 16-bit
floating-point (fp16). More details on these representations can be found
in Section 3.11.4.1.
There are several different classes of program registers. Attribute
registers (Table X.1) correspond to the fragment's associated data
produced by rasterization. Temporary registers (Table X.2) hold
intermediate results generated by the fragment program. Output registers
(Table X.3) hold the final results of a fragment program. The single
condition code register is used to mask writes to other registers or to
determine if a fragment should be discarded.
Section 3.11.1.1, Fragment Program Attribute Registers
The fragment program attribute registers (Table X.1) hold the location of
the fragment and the data associated with the fragment produced by
rasterization.
Fragment Attribute Component
Register Name Description Interpretation
-------------- ----------------------------------- --------------
f[WPOS] Position of the fragment center. (x,y,z,1/w)
f[COL0] Interpolated primary color (r,g,b,a)
f[COL1] Interpolated secondary color (r,g,b,a)
f[FOGC] Interpolated fog distance/coord (z,0,0,0)
f[TEX0] Texture coordinate (unit 0) (s,t,r,q)
f[TEX1] Texture coordinate (unit 1) (s,t,r,q)
f[TEX2] Texture coordinate (unit 2) (s,t,r,q)
f[TEX3] Texture coordinate (unit 3) (s,t,r,q)
f[TEX4] Texture coordinate (unit 4) (s,t,r,q)
f[TEX5] Texture coordinate (unit 5) (s,t,r,q)
f[TEX6] Texture coordinate (unit 6) (s,t,r,q)
f[TEX7] Texture coordinate (unit 7) (s,t,r,q)
Table X.1: Fragment Attribute Registers. The component interpretation
column describes the mapping of attribute values to register components.
For example, the "x" component of f[COL0] holds the red color component,
and the "x" component of f[TEX0] holds the "s" texture coordinate for
texture unit 0. The entries "0" and "1" indicate that the attribute
register components hold the constants 0 and 1, respectively.
f[WPOS].x and f[WPOS].y hold the (x,y) window coordinates of the fragment
center, and relative to the lower left corner of the window. f[WPOS].z
holds the associated z window coordinate, normally in the range [0,1].
f[WPOS].w holds the reciprocal of the associated clip w coordinate.
f[COL0] and f[COL1] hold the associated RGBA primary and secondary colors
of the fragment, respectively.
f[FOGC] holds the associated eye distance or fog coordinate normally used
for fog computations.
f[TEX0] through f[TEX7] hold the associated texture coordinates for
texture coordinate sets 0 through 7, respectively.
All attribute register components are treated as 32-bit floats. However,
the components of primary and secondary colors (f[COL0] and f[COL1]) may
be generated with reduced precision.
The contents of the fragment attribute registers may not be modified by a
fragment program. In addition, each fragment program instruction can use
at most one unique attribute register.
Section 3.11.1.2, Fragment Program Temporary Registers
The fragment temporary registers (Table X.2) hold intermediate values used
during the execution of a fragment program. There are 96 temporary
register names, but not all can be used simultaneously.
Fragment Temporary
Register Name Description
------------------ -----------------------------------------------------
R0-R31 Four 32-bit (fp32) floating point values (s.e8.m23)
H0-H63 Four 16-bit (fp16) floating point values (s.e5.m10)
Table X.2: Fragment Temporary Registers.
In addition to the normal temporary registers, there are two temporary
pseudo-registers, "RC" and "HC". RC and HC are treated as unnumbered,
write-only temporary registers. The components of RC have an fp32 data
type; the components of HC have an fp16 data type. The sole purpose of
these registers is to permit instructions to modify the condition code
register (section 3.11.1.4) without overwriting the values in any
temporary register.
Fragment program instructions can read and write temporary registers.
There is no restriction on the number of temporary registers that can be
accessed by any given instruction.
All temporary registers are initialized to (0,0,0,0) each time a fragment
program executes.
Section 3.11.1.3, Fragment Program Output Registers
The fragment program output registers hold the final results of the
fragment program. The possible final results of a fragment program are a
high- or low-precision RGBA fragment color, and a fragment depth value.
Output
Register Name Description
------------- -------------------------------------------------------
o[COLR] Final RGBA fragment color, fp32 format
o[COLH] Final RGBA fragment color, fp16 format
o[DEPR] Final fragment depth value, fp32 format
Table X.3: Fragment Program Output Registers.
o[COLR] and o[COLH] specify the color of a fragment. These two registers
are identical, except for the associated data type of the components. The
R, G, B, and A components of the fragment color are taken from the x, y,
z, and w components respectively of the o[COLR] or o[COLH]. A fragment
program will fail to load if it writes to both o[COLR] and o[COLH].
o[DEPR] can be used to replace the associated depth value of a fragment.
The new depth value is taken from the z component of o[DEPR]. If a
fragment program does not write to o[DEPR], the associated depth value is
unmodified.
A fragment program will fail to load if it does not write to at least one
output register.
The fragment program output registers may not be read by a fragment
program, but may be written to multiple times.
The values of all fragment program output registers are initially
undefined.
Section 3.11.1.4, Fragment Program Condition Code Register
The condition code register (CC) is a single four-component vector. Each
component of this register is one of four enumerated values: GT (greater
than), EQ (equal), LT (less than), or UN (unordered). The condition code
register can be used to mask writes to fragment data register components
or to terminate processing of a fragment altogether (via the KIL
instruction).
Most fragment program instructions can optionally update the condition
code register. When a fragment program instruction updates the condition
code register, a condition code component is set to LT if the
corresponding component of the result vector is less than zero, EQ if it
is equal to zero, GT if it is greater than zero, and UN if it is NaN (not
a number).
The condition code register is initialized to a vector of EQ values each
time a fragment program executes.
Section 3.11.2, Fragment Program Parameters
In addition to using the registers defined in Section 3.11.1, fragment
programs may also use fragment program parameters in their computation.
Fragment program parameters are constant during the execution of fragment
programs, but some parameters may be modified outside the execution of a
fragment program.
There are five different types of program parameters: embedded scalar
constants, embedded vector constants, named constants, named local
parameters, and numbered local parameters.
Embedded scalar constants are written as standard floating-point numbers
with an optional sign designator ("+" or "-") and optional scientific
notation (e.g., "E+06", meaning "times 10^6").
Embedded vector constants are written as a comma-separated array of one to
four scalar constants, surrounded by braces (like a C/C++ array
initializer). Vector constants are always treated as 4-component vectors:
constants with fewer than four components are expanded to 4-components by
filling missing y and z components with 0.0 and missing w components with
1.0. Thus, the vector constant "{2}" is equivalent to "{2,0,0,1}",
"{3,4}" is equivalent to "{3,4,0,1}", and "{5,6,7}" is equivalent to
"{5,6,7,1}".
Named constants allow fragment program instructions to define scalar or
vector constants that can be referenced by name. Named constants are
created using the DEFINE instruction:
DEFINE pi = 3.1415926535;
DEFINE color = {0.2, 0.5, 0.8, 1.0};
The DEFINE instruction associates a constant name with a scalar or vector
constant value. Subsequent fragment program instructions that use the
constant name are equivalent to those using the corresponding constant
value.
Named local parameters are similar to named vector constants, but their
values can be modified after the program is loaded. Local parameters are
created using the DECLARE instruction:
DECLARE fog_color1;
DECLARE fog_color2 = {0.3, 0.6, 0.9, 0.1};
The DECLARE instruction creates a 4-component vector associated with the
local parameter name. Subsequent fragment program instructions
referencing the local parameter name are processed as though the current
value of the local parameter vector were specified instead of the
parameter name. A DECLARE instruction can optionally specify an initial
value for the local parameter, which can be either a scalar or vector
constant. Scalar constants are expanded to 4-component vectors by
replicating the scalar value in each component. The initial value of
local parameters not initialized by the program is (0,0,0,0).
A named local parameter for a specific program can be updated using the
calls ProgramNamedParameter4fNV or ProgramNamedParameter4fvNV (section
5.7). Named local parameters are accessible only by the program in which
they are defined. Modifying a local parameter affects the only the
associated program and does not affect local parameters with the same name
that are found in any other fragment program.
Numbered local parameters are similar to named local parameters, except
that they are referred to by number and are not declared in fragment
programs. Each fragment program object has an array of four-component
floating-point vectors that can be used by the program. The number of
vectors is given by the implementation-dependent constant
MAX_FRAGMENT_PROGRAM_LOCAL_PARAMETERS_NV, and must be at least 64. A
numbered local parameter is accessed by a fragment program as members of
an array called "p". For example, the instruction
MOV R0, p[31];
copies the contents of numbered local parameter 31 into temporary register
R0.
Constant and local parameter names can be arbitrary strings consisting of
letters (upper or lower-case), numbers, underscores ("_"), and dollar
signs ("$"). Keywords defined in the grammar (including instruction
names) can not be used as constant names, nor can strings that start with
numbers, or strings that specify valid temporary register or texture
numbers (e.g., "R0"-"R31", "H0"-"H63"", "TEX0"-"TEX15"). A fragment
program will fail to load if a DEFINE or DECLARE instruction specifies an
invalid constant or local parameter name.
A fragment program will fail to load if an instruction contains a named
parameter not specified in a previous DEFINE or DECLARE instruction. A
fragment program will also fail to load if a DEFINE or DECLARE instruction
attempts to re-define a named parameter specified in a previous DEFINE or
DECLARE instruction.
The contents of the fragment program parameters may not be modified by a
fragment program. In addition, each fragment program instruction can
normally use at most one unique program parameter. The only exception to
this rule is if all program parameter references specify named or embedded
constants that taken together contain no more than four unique scalar
values. For such instructions, the GL will automatically generate an
equivalent instruction that references a single merged vector constant.
This merging allows programs to specify instructions like the following:
Instruction Equivalent Instruction
--------------------- ---------------------------------------
MAD R0, R1, 2, -1; MAD R0, R1, {2,-1,0,0}.x, {2,-1,0,0}.y;
ADD R0, {1,2,3,4}, 4; ADD R0, {1,2,3,4}.xyzw, {1,2,3,4}.w;
Before counting the number of unique values, any named constants are first
converted to the equivalent embedded constants. When generating a
combined vector constant, the GL does not perform swizzling, component
selection, negation, or absolute value operations. The following
instructions are invalid, as they contain more than four unique scalar
values.
Invalid Instructions
-----------------------------------
ADD R0, {1,2,3,4}, -4;
ADD R0, {1,2,3,4}, |-4|;
ADD R0, {1,2,3,4}, -{-1,-2,-3,-4};
ADD R0, {1,2,3,4}, {4,5,6,7}.x;
Section 3.11.3, Fragment Program Specification
Fragment programs are specified as an array of ubytes. The array is a
string of ASCII characters encoding the program. The command
LoadProgramNV loads a fragment program when the target parameter is
FRAGMENT_PROGRAM_NV. The command BindProgramNV enables a fragment program
for execution.
At program load time, the program is parsed into a set of tokens possibly
separated by white space. Spaces, tabs, newlines, carriage returns, and
comments are considered whitespace. Comments begin with the character "#"
and are terminated by a newline, a carriage return, or the end of the
program array. Fragment programs are case-sensitive -- upper and lower
case letters are treated differently. The proper choice of case can be
inferred from the grammar.
The Backus-Naur Form (BNF) grammar below specifies the syntactically valid
sequences for fragment programs. The set of valid tokens can be inferred
from the grammar. The token "" represents an empty string and is used to
indicate optional rules. A program is invalid if it contains any
undefined tokens or characters.
<program> ::= <progPrefix> <instructionSequence> "END"
<progPrefix> ::= "!!FP1.0"
<instructionSequence> ::= <instructionSequence> <instructionStatement>
| <instructionStatement>
<instructionStatement> ::= <instruction> ";"
| <constantDefinition> ";"
| <localDeclaration> ";"
<instruction> ::= <VECTORop-instruction>
| <SCALARop-instruction>
| <BINSCop-instruction>
| <BINop-instruction>
| <TRIop-instruction>
| <KILop-instruction>
| <TEXop-instruction>
| <TXDop-instruction>
<VECTORop-instruction> ::= <VECTORop> <maskedDstReg> ","
<vectorSrc>
<VECTORop> ::= "DDX" | "DDX_SAT"
| "DDXR" | "DDXR_SAT"
| "DDXH" | "DDXH_SAT"
| "DDXC" | "DDXC_SAT"
| "DDXRC" | "DDXRC_SAT"
| "DDXHC" | "DDXHC_SAT"
| "DDY" | "DDY_SAT"
| "DDYR" | "DDYR_SAT"
| "DDYH" | "DDYH_SAT"
| "DDYC" | "DDYC_SAT"
| "DDYRC" | "DDYRC_SAT"
| "DDYHC" | "DDYHC_SAT"
| "FLR" | "FLR_SAT"
| "FLRR" | "FLRR_SAT"
| "FLRH" | "FLRH_SAT"
| "FLRX" | "FLRX_SAT"
| "FLRC" | "FLRC_SAT"
| "FLRRC" | "FLRRC_SAT"
| "FLRHC" | "FLRHC_SAT"
| "FLRXC" | "FLRXC_SAT"
| "FRC" | "FRC_SAT"
| "FRCR" | "FRCR_SAT"
| "FRCH" | "FRCH_SAT"
| "FRCX" | "FRCX_SAT"
| "FRCC" | "FRCC_SAT"
| "FRCRC" | "FRCRC_SAT"
| "FRCHC" | "FRCHC_SAT"
| "FRCXC" | "FRCXC_SAT"
| "LIT" | "LIT_SAT"
| "LITR" | "LITR_SAT"
| "LITH" | "LITH_SAT"
| "LITC" | "LITC_SAT"
| "LITRC" | "LITRC_SAT"
| "LITHC" | "LITHC_SAT"
| "MOV" | "MOV_SAT"
| "MOVR" | "MOVR_SAT"
| "MOVH" | "MOVH_SAT"
| "MOVX" | "MOVX_SAT"
| "MOVC" | "MOVC_SAT"
| "MOVRC" | "MOVRC_SAT"
| "MOVHC" | "MOVHC_SAT"
| "MOVXC" | "MOVXC_SAT"
| "PK2H"
| "PK2US"
| "PK4B"
| "PK4UB"
<SCALARop-instruction> ::= <SCALARop> <maskedDstReg> ","
<scalarSrc>
<SCALARop> ::= "COS" | "COS_SAT"
| "COSR" | "COSR_SAT"
| "COSH" | "COSH_SAT"
| "COSC" | "COSC_SAT"
| "COSRC" | "COSRC_SAT"
| "COSHC" | "COSHC_SAT"
| "EX2" | "EX2_SAT"
| "EX2R" | "EX2R_SAT"
| "EX2H" | "EX2H_SAT"
| "EX2C" | "EX2C_SAT"
| "EX2RC" | "EX2RC_SAT"
| "EX2HC" | "EX2HC_SAT"
| "LG2" | "LG2_SAT"
| "LG2R" | "LG2R_SAT"
| "LG2H" | "LG2H_SAT"
| "LG2C" | "LG2C_SAT"
| "LG2RC" | "LG2RC_SAT"
| "LG2HC" | "LG2HC_SAT"
| "RCP" | "RCP_SAT"
| "RCPR" | "RCPR_SAT"
| "RCPH" | "RCPH_SAT"
| "RCPC" | "RCPC_SAT"
| "RCPRC" | "RCPRC_SAT"
| "RCPHC" | "RCPHC_SAT"
| "RSQ" | "RSQ_SAT"
| "RSQR" | "RSQR_SAT"
| "RSQH" | "RSQH_SAT"
| "RSQC" | "RSQC_SAT"
| "RSQRC" | "RSQRC_SAT"
| "RSQHC" | "RSQHC_SAT"
| "SIN" | "SIN_SAT"
| "SINR" | "SINR_SAT"
| "SINH" | "SINH_SAT"
| "SINC" | "SINC_SAT"
| "SINRC" | "SINRC_SAT"
| "SINHC" | "SINHC_SAT"
| "UP2H" | "UP2H_SAT"
| "UP2HC" | "UP2HC_SAT"
| "UP2US" | "UP2US_SAT"
| "UP2USC" | "UP2USC_SAT"
| "UP4B" | "UP4B_SAT"
| "UP4BC" | "UP4BC_SAT"
| "UP4UB" | "UP4UB_SAT"
| "UP4UBC" | "UP4UBC_SAT"
<BINSCop-instruction> ::= <BINSCop> <maskedDstReg> ","
<scalarSrc> "," <scalarSrc>
<BINSCop> ::= "POW" | "POW_SAT"
| "POWR" | "POWR_SAT"
| "POWH" | "POWH_SAT"
| "POWC" | "POWC_SAT"
| "POWRC" | "POWRC_SAT"
| "POWHC" | "POWHC_SAT"
<BINop-instruction> ::= <BINop> <maskedDstReg> ","
<vectorSrc> "," <vectorSrc>
<BINop> ::= "ADD" | "ADD_SAT"
| "ADDR" | "ADDR_SAT"
| "ADDH" | "ADDH_SAT"
| "ADDX" | "ADDX_SAT"
| "ADDC" | "ADDC_SAT"
| "ADDRC" | "ADDRC_SAT"
| "ADDHC" | "ADDHC_SAT"
| "ADDXC" | "ADDXC_SAT"
| "DP3" | "DP3_SAT"
| "DP3R" | "DP3R_SAT"
| "DP3H" | "DP3H_SAT"
| "DP3X" | "DP3X_SAT"
| "DP3C" | "DP3C_SAT"
| "DP3RC" | "DP3RC_SAT"
| "DP3HC" | "DP3HC_SAT"
| "DP3XC" | "DP3XC_SAT"
| "DP4" | "DP4_SAT"
| "DP4R" | "DP4R_SAT"
| "DP4H" | "DP4H_SAT"
| "DP4X" | "DP4X_SAT"
| "DP4C" | "DP4C_SAT"
| "DP4RC" | "DP4RC_SAT"
| "DP4HC" | "DP4HC_SAT"
| "DP4XC" | "DP4XC_SAT"
| "DST" | "DST_SAT"
| "DSTR" | "DSTR_SAT"
| "DSTH" | "DSTH_SAT"
| "DSTC" | "DSTC_SAT"
| "DSTRC" | "DSTRC_SAT"
| "DSTHC" | "DSTHC_SAT"
| "MAX" | "MAX_SAT"
| "MAXR" | "MAXR_SAT"
| "MAXH" | "MAXH_SAT"
| "MAXX" | "MAXX_SAT"
| "MAXC" | "MAXC_SAT"
| "MAXRC" | "MAXRC_SAT"
| "MAXHC" | "MAXHC_SAT"
| "MAXXC" | "MAXXC_SAT"
| "MIN" | "MIN_SAT"
| "MINR" | "MINR_SAT"
| "MINH" | "MINH_SAT"
| "MINX" | "MINX_SAT"
| "MINC" | "MINC_SAT"
| "MINRC" | "MINRC_SAT"
| "MINHC" | "MINHC_SAT"
| "MINXC" | "MINXC_SAT"
| "MUL" | "MUL_SAT"
| "MULR" | "MULR_SAT"
| "MULH" | "MULH_SAT"
| "MULX" | "MULX_SAT"
| "MULC" | "MULC_SAT"
| "MULRC" | "MULRC_SAT"
| "MULHC" | "MULHC_SAT"
| "MULXC" | "MULXC_SAT"
| "RFL" | "RFL_SAT"
| "RFLR" | "RFLR_SAT"
| "RFLH" | "RFLH_SAT"
| "RFLC" | "RFLC_SAT"
| "RFLRC" | "RFLRC_SAT"
| "RFLHC" | "RFLHC_SAT"
| "SEQ" | "SEQ_SAT"
| "SEQR" | "SEQR_SAT"
| "SEQH" | "SEQH_SAT"
| "SEQX" | "SEQX_SAT"
| "SEQC" | "SEQC_SAT"
| "SEQRC" | "SEQRC_SAT"
| "SEQHC" | "SEQHC_SAT"
| "SEQXC" | "SEQXC_SAT"
| "SFL" | "SFL_SAT"
| "SFLR" | "SFLR_SAT"
| "SFLH" | "SFLH_SAT"
| "SFLX" | "SFLX_SAT"
| "SFLC" | "SFLC_SAT"
| "SFLRC" | "SFLRC_SAT"
| "SFLHC" | "SFLHC_SAT"
| "SFLXC" | "SFLXC_SAT"
| "SGE" | "SGE_SAT"
| "SGER" | "SGER_SAT"
| "SGEH" | "SGEH_SAT"
| "SGEX" | "SGEX_SAT"
| "SGEC" | "SGEC_SAT"
| "SGERC" | "SGERC_SAT"
| "SGEHC" | "SGEHC_SAT"
| "SGEXC" | "SGEXC_SAT"
| "SGT" | "SGT_SAT"
| "SGTR" | "SGTR_SAT"
| "SGTH" | "SGTH_SAT"
| "SGTX" | "SGTX_SAT"
| "SGTC" | "SGTC_SAT"
| "SGTRC" | "SGTRC_SAT"
| "SGTHC" | "SGTHC_SAT"
| "SGTXC" | "SGTXC_SAT"
| "SLE" | "SLE_SAT"
| "SLER" | "SLER_SAT"
| "SLEH" | "SLEH_SAT"
| "SLEX" | "SLEX_SAT"
| "SLEC" | "SLEC_SAT"
| "SLERC" | "SLERC_SAT"
| "SLEHC" | "SLEHC_SAT"
| "SLEXC" | "SLEXC_SAT"
| "SLT" | "SLT_SAT"
| "SLTR" | "SLTR_SAT"
| "SLTH" | "SLTH_SAT"
| "SLTX" | "SLTX_SAT"
| "SLTC" | "SLTC_SAT"
| "SLTRC" | "SLTRC_SAT"
| "SLTHC" | "SLTHC_SAT"
| "SLTXC" | "SLTXC_SAT"
| "SNE" | "SNE_SAT"
| "SNER" | "SNER_SAT"
| "SNEH" | "SNEH_SAT"
| "SNEX" | "SNEX_SAT"
| "SNEC" | "SNEC_SAT"
| "SNERC" | "SNERC_SAT"
| "SNEHC" | "SNEHC_SAT"
| "SNEXC" | "SNEXC_SAT"
| "STR" | "STR_SAT"
| "STRR" | "STRR_SAT"
| "STRH" | "STRH_SAT"
| "STRX" | "STRX_SAT"
| "STRC" | "STRC_SAT"
| "STRRC" | "STRRC_SAT"
| "STRHC" | "STRHC_SAT"
| "STRXC" | "STRXC_SAT"
| "SUB" | "SUB_SAT"
| "SUBR" | "SUBR_SAT"
| "SUBH" | "SUBH_SAT"
| "SUBX" | "SUBX_SAT"
| "SUBC" | "SUBC_SAT"
| "SUBRC" | "SUBRC_SAT"
| "SUBHC" | "SUBHC_SAT"
| "SUBXC" | "SUBXC_SAT"
<TRIop-instruction> ::= <TRIop> <maskedDstReg> ","
<vectorSrc> "," <vectorSrc> ","
<vectorSrc>
<TRIop> ::= "MAD" | "MAD_SAT"
| "MADR" | "MADR_SAT"
| "MADH" | "MADH_SAT"
| "MADX" | "MADX_SAT"
| "MADC" | "MADC_SAT"
| "MADRC" | "MADRC_SAT"
| "MADHC" | "MADHC_SAT"
| "MADXC" | "MADXC_SAT"
| "LRP" | "LRP_SAT"
| "LRPR" | "LRPR_SAT"
| "LRPH" | "LRPH_SAT"
| "LRPX" | "LRPX_SAT"
| "LRPC" | "LRPC_SAT"
| "LRPRC" | "LRPRC_SAT"
| "LRPHC" | "LRPHC_SAT"
| "LRPXC" | "LRPXC_SAT"
| "X2D" | "X2D_SAT"
| "X2DR" | "X2DR_SAT"
| "X2DH" | "X2DH_SAT"
| "X2DC" | "X2DC_SAT"
| "X2DRC" | "X2DRC_SAT"
| "X2DHC" | "X2DHC_SAT"
<KILop-instruction> ::= <KILop> <ccMask>
<KILop> ::= "KIL"
<TEXop-instruction> ::= <TEXop> <maskedDstReg> ","
<vectorSrc> "," <texImageId>
<TEXop> ::= "TEX" | "TEX_SAT"
| "TEXC" | "TEXC_SAT"
| "TXP" | "TXP_SAT"
| "TXPC" | "TXPC_SAT"
<TXDop-instruction> ::= <TXDop> <maskedDstReg> ","
<vectorSrc> "," <vectorSrc> ","
<vectorSrc> "," <texImageId>
<TXDop> ::= "TXD" | "TXD_SAT"
| "TXDC" | "TXDC_SAT"
<scalarSrc> ::= <absScalarSrc>
| <baseScalarSrc>
<absScalarSrc> ::= <negate> "|" <baseScalarSrc> "|"
<baseScalarSrc> ::= <signedScalarConstant>
| <negate> <namedScalarConstant>
| <negate> <vectorConstant> <scalarSuffix>
| <negate> <namedLocalParameter> <scalarSuffix>
| <negate> <numberedLocal> <scalarSuffix>
| <negate> <srcRegister> <scalarSuffix>
<vectorSrc> ::= <absVectorSrc>
| <baseVectorSrc>
<absVectorSrc> ::= <negate> "|" <baseVectorSrc> "|"
<baseVectorSrc> ::= <signedScalarConstant>
| <negate> <namedScalarConstant>
| <negate> <vectorConstant> <scalarSuffix>
| <negate> <vectorConstant> <swizzleSuffix>
| <negate> <namedLocalParameter> <scalarSuffix>
| <negate> <namedLocalParameter> <swizzleSuffix>
| <negate> <numberedLocal> <scalarSuffix>
| <negate> <numberedLocal> <swizzleSuffix>
| <negate> <srcRegister> <scalarSuffix>
| <negate> <srcRegister> <swizzleSuffix>
<maskedDstReg> ::= <dstRegister> <optionalWriteMask>
<optionalCCMask>
<dstRegister> ::= <fragTempReg>
| <fragOutputReg>
| "RC"
| "HC"
<optionalCCMask> ::= "(" <ccMask> ")"
| ""
<ccMask> ::= <ccMaskRule> <swizzleSuffix>
| <ccMaskRule> <scalarSuffix>
<ccMaskRule> ::= "EQ" | "GE" | "GT" | "LE" | "LT" | "NE" |
"TR" | "FL"
<optionalWriteMask> ::= ""
| "." "x"
| "." "y"
| "." "x" "y"
| "." "z"
| "." "x" "z"
| "." "y" "z"
| "." "x" "y" "z"
| "." "w"
| "." "x" "w"
| "." "y" "w"
| "." "x" "y" "w"
| "." "z" "w"
| "." "x" "z" "w"
| "." "y" "z" "w"
| "." "x" "y" "z" "w"
<srcRegister> ::= <fragAttribReg>
| <fragTempReg>
<fragAttribReg> ::= "f" "[" <fragAttribRegId> "]"
<fragAttribRegId> ::= "WPOS" | "COL0" | "COL1" | "FOGC" | "TEX0"
| "TEX1" | "TEX2" | "TEX3" | "TEX4" | "TEX5"
| "TEX6" | "TEX7"
<fragTempReg> ::= <fragF32Reg>
| <fragF16Reg>
<fragF32Reg> ::= "R0" | "R1" | "R2" | "R3"
| "R4" | "R5" | "R6" | "R7"
| "R8" | "R9" | "R10" | "R11"
| "R12" | "R13" | "R14" | "R15"
| "R16" | "R17" | "R18" | "R19"
| "R20" | "R21" | "R22" | "R23"
| "R24" | "R25" | "R26" | "R27"
| "R28" | "R29" | "R30" | "R31"
<fragF16Reg> ::= "H0" | "H1" | "H2" | "H3"
| "H4" | "H5" | "H6" | "H7"
| "H8" | "H9" | "H10" | "H11"
| "H12" | "H13" | "H14" | "H15"
| "H16" | "H17" | "H18" | "H19"
| "H20" | "H21" | "H22" | "H23"
| "H24" | "H25" | "H26" | "H27"
| "H28" | "H29" | "H30" | "H31"
| "H32" | "H33" | "H34" | "H35"
| "H36" | "H37" | "H38" | "H39"
| "H40" | "H41" | "H42" | "H43"
| "H44" | "H45" | "H46" | "H47"
| "H48" | "H49" | "H50" | "H51"
| "H52" | "H53" | "H54" | "H55"
| "H56" | "H57" | "H58" | "H59"
| "H60" | "H61" | "H62" | "H63"
<fragOutputReg> ::= "o" "[" <fragOutputRegName> "]"
<fragOutputRegName> ::= "COLR" | "COLH" | "DEPR"
<numberedLocal> ::= "p" "[" <localNumber> "]"
<localNumber> ::= <integer> from 0 to
MAX_FRAGMENT_PROGRAM_LOCAL_PARAMETERS_NV - 1
<scalarSuffix> ::= "." <component>
<swizzleSuffix> ::= ""
| "." <component> <component>
<component> <component>
<component> ::= "x" | "y" | "z" | "w"
<texImageId> ::= <texImageUnit> "," <texImageTarget>
<texImageUnit> ::= "TEX0" | "TEX1" | "TEX2" | "TEX3"
| "TEX4" | "TEX5" | "TEX6" | "TEX7"
| "TEX8" | "TEX9" | "TEX10" | "TEX11"
| "TEX12" | "TEX13" | "TEX14" | "TEX15"
<texImageTarget> ::= "1D" | "2D" | "3D" | "CUBE" | "RECT"
<constantDefinition> ::= "DEFINE" <namedVectorConstant> "="
<vectorConstant>
| "DEFINE" <namedScalarConstant> "="
<scalarConstant>
<localDeclaration> ::= "DECLARE" <namedLocalParameter>
<optionalLocalValue>
<optionalLocalValue> ::= ""
| "=" <vectorConstant>
| "=" <scalarConstant>
<vectorConstant> ::= {" <vectorConstantList> "}"
| <namedVectorConstant>
<vectorConstantList> ::= <scalarConstant>
| <scalarConstant> "," <scalarConstant>
| <scalarConstant> "," <scalarConstant> ","
<scalarConstant>
| <scalarConstant> "," <scalarConstant> ","
<scalarConstant> "," <scalarConstant>
<scalarConstant> ::= <signedScalarConstant>
| <namedScalarConstant>
<signedScalarConstant> ::= <optionalSign> <floatConstant>
<namedScalarConstant> ::= <identifier> ((name of a scalar constant
in a DEFINE instruction))
<namedVectorConstant> ::= <identifier> ((name of a vector constant
in a DEFINE instruction))
<namedLocalParameter> ::= <identifier> ((name of a local parameter
in a DECLARE instruction))
<negate> ::= "-" | "+" | ""
<optionalSign> ::= "-" | "+" | ""
<identifier> ::= see text below
<floatConstant> ::= see text below
The <identifier> rule matches a sequence of one or more letters ("A"
through "Z", "a" through "z", "_", and "$") and digits ("0" through "9);
the first character must be a letter. The underscore ("_") and dollar
sign ("$") count as a letters. Upper and lower case letters are different
(names are case-sensitive).
The <floatConstant> rule matches a floating-point constant consisting
of an integer part, a decimal point, a fraction part, an "e" or
"E", and an optionally signed integer exponent. The integer and
fraction parts both consist of a sequence of on or more digits ("0"
through "9"). Either the integer part or the fraction parts (not
both) may be missing; either the decimal point or the "e" (or "E")
and the exponent (not both) may be missing.
A fragment program fails to load if it contains more than the maximum
number of executable instructions. If ARB_fragment_program is supported,
this limit is the value of MAX_PROGRAM_INSTRUCTIONS_ARB for the
FRAGMENT_PROGRAM_ARB target. Otherwise, the limit is 1024. Executable
instructions are those matching the <instruction> rule in the grammar, and
do not include DEFINE or DECLARE instructions.
A fragment program fails to load if its total temporary and output
register count exceeds 64. Each fp32 temporary or output register used by
the program (R0-R31, o[COLR], and o[DEPR]) counts as two registers; each
fp16 temporary or output register used by the program (H0-H63 and o[COLH])
count as a single register.
A fragment program fails to load if any instruction sources more than one
unique fragment attribute register. Instructions sourcing the same
attribute register multiple times are acceptable.
A fragment program fails to load if any instruction sources more than one
unique program parameter register. Instructions sourcing the same program
parameter multiple times are acceptable.
A fragment program fails to load if multiple texture lookup instructions
reference different targets for the same texture image unit.
A fragment program fails to load if it writes to both the o[COLR] and
o[COLH] output registers.
The error INVALID_OPERATION is generated by LoadProgramNV if a fragment
program fails to load because it is not syntactically correct or for one
of the semantic restrictions listed above.
The error INVALID_OPERATION is generated by LoadProgramNV if a program is
loaded for id when id is currently loaded with a program of a different
target.
A successfully loaded fragment program is parsed into a sequence of
instructions. Each instruction is identified by its tokenized name. The
operation of these instructions when executed is defined in Sections
3.11.4 and 3.11.5.
Section 3.11.4, Fragment Program Operation
There are forty-five fragment program instructions. Fragment program
instructions may have up to eight variants, including a suffix of "R",
"H", or "X" to specify arithmetic precision (section 3.11.4.2), a suffix
of "C" to allow an update of the condition code register (section
3.11.4.4), and a suffix of "_SAT" to clamp the result vector components to
the range [0,1] (section 3.11.4.4). For example, the sixteen forms of the
"ADD" instruction are "ADD", "ADDR", "ADDH", "ADDX", "ADDC", "ADDRC",
"ADDHC", "ADDXC", "ADD_SAT", "ADDR_SAT", "ADDH_SAT", "ADDX_SAT",
"ADDC_SAT", "ADDRC_SAT", "ADDHC_SAT", and "ADDXC_SAT".
Some mathematical instructions that support precision suffixes, typically
those that involve complicated floating-point computations, do not support
the "X" precision suffix.
The fragment program instructions and their respective input and output
parameters are summarized in Table X.4.
Instruction Inputs Output Description
----------------- ------ ------ --------------------------------
ADD[RHX][C][_SAT] v,v v add
COS[RH ][C][_SAT] s ssss cosine
DDX[RH ][C][_SAT] v v derivative relative to x
DDY[RH ][C][_SAT] v v derivative relative to y
DP3[RHX][C][_SAT] v,v ssss 3-component dot product
DP4[RHX][C][_SAT] v,v ssss 4-component dot product
DST[RH ][C][_SAT] v,v v distance vector
EX2[RH ][C][_SAT] s ssss exponential base 2
FLR[RHX][C][_SAT] v v floor
FRC[RHX][C][_SAT] v v fraction
KIL none none conditionally discard fragment
LG2[RH ][C][_SAT] s ssss logarithm base 2
LIT[RH ][C][_SAT] v v compute light coefficients
LRP[RHX][C][_SAT] v,v,v v linear interpolation
MAD[RHX][C][_SAT] v,v,v v multiply and add
MAX[RHX][C][_SAT] v,v v maximum
MIN[RHX][C][_SAT] v,v v minimum
MOV[RHX][C][_SAT] v v move
MUL[RHX][C][_SAT] v,v v multiply
PK2H v ssss pack two 16-bit floats
PK2US v ssss pack two unsigned 16-bit scalars
PK4B v ssss pack four signed 8-bit scalars
PK4UB v ssss pack four unsigned 8-bit scalars
POW[RH ][C][_SAT] s,s ssss exponentiation (x^y)
RCP[RH ][C][_SAT] s ssss reciprocal
RFL[RH ][C][_SAT] v,v v reflection vector
RSQ[RH ][C][_SAT] s ssss reciprocal square root
SEQ[RHX][C][_SAT] v,v v set on equal
SFL[RHX][C][_SAT] v,v v set on false
SGE[RHX][C][_SAT] v,v v set on greater than or equal
SGT[RHX][C][_SAT] v,v v set on greater than
SIN[RH ][C][_SAT] s ssss sine
SLE[RHX][C][_SAT] v,v v set on less than or equal
SLT[RHX][C][_SAT] v,v v set on less than
SNE[RHX][C][_SAT] v,v v set on not equal
STR[RHX][C][_SAT] v,v v set on true
SUB[RHX][C][_SAT] v,v v subtract
TEX[C][_SAT] v v texture lookup
TXD[C][_SAT] v,v,v v texture lookup w/partials
TXP[C][_SAT] v v projective texture lookup
UP2H[C][_SAT] s v unpack two 16-bit floats
UP2US[C][_SAT] s v unpack two unsigned 16-bit scalars
UP4B[C][_SAT] s v unpack four signed 8-bit scalars
UP4UB[C][_SAT] s v unpack four unsigned 8-bit scalars
X2D[RH ][C][_SAT] v,v,v v 2D coordinate transformation
Table X.4: Summary of fragment program instructions. "[RHX]" indicates
an optional arithmetic precision suffix. "[C]" indicates an optional
condition code update suffix. "[_SAT]" indicates an optional clamp of
result vector components to [0,1]. "v" indicates a 4-component vector
input or output, "s" indicates a scalar input, and "ssss" indicates a
scalar output replicated across a 4-component vector.
Section 3.11.4.1: Fragment Program Storage Precision
Registers in fragment program are stored in two different representations:
16-bit floating-point (fp16) and 32-bit floating-point (fp32). There is
an additional 12-bit fixed-point representation (fx12) used only as an
internal representation for instructions with the "X" precision qualifier.
In the 32-bit float (fp32) representation, each component is represented
in floating-point with eight exponent and twenty-three mantissa bits, as
in the standard IEEE single-precision format. If S represents the sign (0
or 1), E represents the exponent in the range [0,255], and M represents
the mantissa in the range [0,2^23-1], then an fp32 float is decoded as:
(-1)^S * 0.0, if E == 0,
(-1)^S * 2^(E-127) * (1 + M/2^23), if 0 < E < 255,
(-1)^S * INF, if E == 255 and M == 0,
NaN, if E == 255 and M != 0.
INF (Infinity) is a special representation indicating numerical overflow.
NaN (Not a Number) is a special representation indicating the result of
illegal arithmetic operations, such as computing the square root or
logarithm of a negative number. Note that all normal fp32 values, zero,
and INF have an associated sign. -0.0 and +0.0 are considered equivalent
for the purposes of comparisons.
This representation is identical to the IEEE single-precision
floating-point standard, except that no special representation is provided
for denorms -- numbers in the range (-2^-126, +2^-126). All such numbers
are flushed to zero.
In a 16-bit float (fp16) register, each component is represented
similarly, except with only five exponent and ten mantissa bits. If S
represents the sign (0 or 1), E represents the exponent in the range
[0,31], and M represents the mantissa in the range [0,2^10-1], then an
fp32 float is decoded as:
(-1)^S * 0.0, if E == 0 and M == 0,
(-1)^S * 2^-14 * M/2^10 if E == 0 and M != 0,
(-1)^S * 2^(E-15) * (1 + M/2^10), if 0 < E < 31,
(-1)^S * INF, if E == 31 and M == 0, or
NaN, if E == 31 and M != 0.
One important difference is that the fp16 representation, unlike fp32,
supports denorms to maximize the limited precision of the 16-bit floating
point encodings.
In the 12-bit fixed-point (fx12) format, numbers are represented as signed
12-bit two's complement integers with 10 fraction bits. The range of
representable values is [-2048/1024, +2047/1024].
Section 3.11.4.2: Fragment Program Operation Precision
Fragment program instructions frequently perform mathematical operations.
Such operations may be performed at one of three different precisions.
Fragment programs can specify the precision of each instruction by using
the precision suffix. If an instruction has a suffix of "R", calculations
are carried out with 32-bit floating point operands and results. If an
instruction has a suffix of "H", calculations are carried out using 16-bit
floating point operands and results. If an instruction has a suffix of
"X", calculations are carried out using 12-bit fixed point operands and
results. For example, the instruction "MULR" performs a 32-bit
floating-point multiply, "MULH" performs a 16-bit floating-point multiply,
and "MULX" performs a 12-bit fixed-point multiply. If no precision suffix
is specified, calculations are carried out using the precision of the
temporary register receiving the result.
Fragment program instructions may source registers or constants whose
precisions differ from the precision specified with the instruction.
Instructions may also generate intermediate results with a different
precision than that of the destination register. In these cases, the
values sourced are converted to the precision specified by the
instruction.
When converting to fx12 format, -INF and any values less than -2048/1024
become -2048/1024. +INF, and any values greater than +2047/1024 become
+2047/1024. NaN becomes 0.
When converting to fp16 format, any values less than or equal to -2^16 are
converted to -INF. Any values greater than or equal to +2^16 are
converted to +INF. -INF, +INF, NaN, -0.0, and +0.0 are unchanged. Any
other values that are not exactly representable in fp16 format are
converted to one of the two nearest representable values.
When converting to fp32 format, any values less than or equal to -2^128
are converted to -INF. Any values greater than or equal to +2^128 are
converted to +INF. -INF, +INF, NaN, -0.0, and +0.0 are unchanged. Any
other values that are not exactly representable in fp32 format are
converted to one of the two nearest representable values.
Fragment program instructions using the fragment attribute registers
f[FOGC] or f[TEX0] through f[TEX7] will be carried out at full fp32
precision, regardless of the precision specified by the instruction.
Section 3.11.4.3: Fragment Program Operands
Except for KIL, fragment program instructions operate on either vector or
scalar operands, indicated in the grammar (see section 3.11.3) by the
rules <vectorSrc> and <scalarSrc> respectively.
The basic set of scalar operands is defined by the grammar rule
<baseScalarSrc>. Scalar operands can be scalar constants (embedded or
named), or single components of vector constants, local parameters, or
registers allowed by the <srcRegister> rule. A vector component is
selected by the <scalarSuffix> rule, where the characters "x", "y", "z",
and "w" select the x, y, z, and w components, respectively, of the vector.
The basic set of vector operands is defined by the grammar rule
<baseVectorSrc>. Vector operands can include vector constants, local
parameters, or registers allowed by the <srcRegister> rule.
Basic vector operands can be swizzled according to the <swizzleSuffix>
rule. In its most general form, the <swizzleSuffix> rule matches the
pattern ".????" where each question mark is one of "x", "y", "z", or "w".
For such patterns, the x, y, z, and w components of the operand are taken
from the vector components named by the first, second, third, and fourth
character of the pattern, respectively. For example, if the swizzle
suffix is ".yzzx" and the specified source contains {2,8,9,0}, the
swizzled operand used by the instruction is {8,9,9,2}. If the
<swizzleSuffix> rule matches "", it is treated as though it were ".xyzw".
Operands can optionally be negated according to the <negate> rule in
<baseScalarSrc> or <baseVectorSrc>. If the <negate> matches "-", each
value is negated.
The absolute value of operands can be taken if the <vectorSrc> or
<scalarSrc> rules match <absScalarSrc> or <absVectorSrc>. In this case,
the absolute value of each component is taken. In addition, if the
<negate> rule in <absScalarSrc> or <absVectorSrc> matches "-", the result
is then negated.
Instructions requiring vector operands can also use scalar operands in the
case where the <vectorSrc> rule matches <scalarSrc>. In such cases, a
4-component vector is produced by replicating the scalar.
After operands are loaded, they are converted to a data type corresponding
to the operation precision specified in the fragment program instruction.
The following pseudo-code spells out the operand generation process.
"SrcT" and "InstT" refer to the data types of the specified register or
constant and the instruction, respectively. "VecSrcT" and "VecInstT"
refer to 4-component vectors of the corresponding type. "absolute" is
TRUE if the operand matches the <absScalarSrc> or <absVectorSrc> rules,
and FALSE otherwise. "negateBase" is TRUE if the <negate> rule in
<baseScalarSrc> or <baseVectorSrc> matches "-" and FALSE otherwise.
"negateAbs" is TRUE if the <negate> rule in <absScalarSrc> or
<absVectorSrc> matches "-" and FALSE otherwise. The ".c***", ".*c**",
".**c*", ".***c" modifiers refer to the x, y, z, and w components obtained
by the swizzle operation. TypeConvert() is assumed to convert a scalar of
type SrcT to a scalar of type InstT using the type conversion process
specified above.
VecInstT VectorLoad(VecSrcT source)
{
VecSrcT srcVal;
VecInstT convertedVal;
srcVal.x = source.c***;
srcVal.y = source.*c**;
srcVal.z = source.**c*;
srcVal.w = source.***c;
if (negateBase) {
srcVal.x = -srcVal.x;
srcVal.y = -srcVal.y;
srcVal.z = -srcVal.z;
srcVal.w = -srcVal.w;
}
if (absolute) {
srcVal.x = abs(srcVal.x);
srcVal.y = abs(srcVal.y);
srcVal.z = abs(srcVal.z);
srcVal.w = abs(srcVal.w);
}
if (negateAbs) {
srcVal.x = -srcVal.x;
srcVal.y = -srcVal.y;
srcVal.z = -srcVal.z;
srcVal.w = -srcVal.w;
}
convertedVal.x = TypeConvert(srcVal.x);
convertedVal.y = TypeConvert(srcVal.y);
convertedVal.z = TypeConvert(srcVal.z);
convertedVal.w = TypeConvert(srcVal.w);
return convertedVal;
}
InstT ScalarLoad(VecSrcT source)
{
SrcT srcVal;
InstT convertedVal;
srcVal = source.c***;
if (negateBase) {
srcVal = -srcVal;
}
if (absolute) {
srcVal = abs(srcVal);
}
if (negateAbs) {
srcVal = -srcVal;
}
convertedVal = TypeConvert(srcVal);
return convertedVal;
}
Section 3.11.4.4, Fragment Program Destination Register Update
Each fragment program instruction, except for KIL, writes a 4-component
result vector to a single temporary or output register.
The four components of the result vector are first optionally clamped to
the range [0,1]. The components will be clamped if and only if the result
clamp suffix "_SAT" is present in the instruction name. The instruction
"ADD_SAT" will clamp the results to [0,1]; the otherwise equivalent
instruction "ADD" will not.
Since the instruction may be carried out at a different precision than the
destination register, the components of the results vector are then
converted to the data type corresponding to destination register.
Writes to individual components of the temporary register are controlled
by two sets of enables: individual component write masks specified as part
of the instruction and the optional condition code mask.
The component write mask is specified by the <optionalWriteMask> rule
found in the <maskedDstReg> rule. If the optional mask is "", all
components are enabled. Otherwise, the optional mask names the individual
components to enable. The characters "x", "y", "z", and "w" match the x,
y, z, and w components respectively. For example, an optional mask of
".xzw" indicates that the x, z, and w components should be enabled for
writing but the y component should not. The grammar requires that the
destination register mask components must be listed in "xyzw" order.
The optional condition code mask is specified by the <optionalCCMask> rule
found in the <maskedDstReg> rule. If <optionalCCMask> matches "", all
components are enabled. Otherwise, the condition code register is loaded
and swizzled according to the swizzling specified by <swizzleSuffix>.
Each component of the swizzled condition code is tested according to the
rule given by <ccMaskRule>. <ccMaskRule> may have the values "EQ", "NE",
"LT", "GE", LE", or "GT", which mean to enable writes if the corresponding
condition code field evaluates to equal, not equal, less than, greater
than or equal, less than or equal, or greater than, respectively.
Comparisons involving condition codes of "UN" (unordered) evaluate to true
for "NE" and false otherwise. For example, if the condition code is
(GT,LT,EQ,GT) and the condition code mask is "(NE.zyxw)", the swizzle
operation will load (EQ,LT,GT,GT) and the mask will thus will enable
writes on the y, z, and w components. In addition, "TR" always enables
writes and "FL" always disables writes, regardless of the condition code.
Each component of the destination register is updated with the result of
the fragment program if and only if the component is enabled for writes by
both the component write mask and the optional condition code mask.
Otherwise, the component of the destination register remains unchanged.
A fragment program instruction can also optionally update the condition
code register. The condition code is updated if the condition code
register update suffix "C" is present in the instruction name. The
instruction "ADDC" will update the condition code; the otherwise
equivalent instruction "ADD" will not. If condition code updates are
enabled, each component of the destination register enabled for writes is
compared to zero. The corresponding component of the condition code is
set to "LT", "EQ", or "GT", if the written component is less than, equal
to, or greater than zero, respectively. Condition code components are set
to "UN" if the written component is NaN. Note that values of -0.0 and
+0.0 both evaluate to "EQ". If a component of the destination register is
not enabled for writes, the corresponding condition code component is
unchanged.
In the following example code,
# R1=(-2, 0, 2, NaN) R0 CC
MOVC R0, R1; # ( -2, 0, 2, NaN) (LT,EQ,GT,UN)
MOVC R0.xyz, R1.yzwx; # ( 0, 2, NaN, NaN) (EQ,GT,UN,UN)
MOVC R0 (NE), R1.zywx; # ( 0, 0, NaN, -2) (EQ,EQ,UN,LT)
the first instruction writes (-2,0,2,NaN) to R0 and updates the condition
code to (LT,EQ,GT,UN). The second instruction, only the "x", "y", and "z"
components of R0 and the condition code are updated, so R0 ends up with
(0,2,NaN,NaN) and the condition code ends up with (EQ,GT,UN,UN). In the
third instruction, the condition code mask disables writes to the x
component (its condition code field is "EQ"), so R0 ends up with
(0,0,NaN,-2) and the condition code ends up with (EQ,EQ,UN,LT).
The following pseudocode illustrates the process of writing a result
vector to the destination register. In the example, "ccMaskRule" refers
to the condition code mask rule given by <ccMaskRule> (or "" if no rule is
specified), "instrmask" refers to the component write mask given by the
<optionalWriteMask> rule, "updatecc" is TRUE if condition code updates are
enabled, and "clamp01" is TRUE if [0,1] result clamping is enabled.
"destination" and "cc" refer to the register selected by <dstRegister> and
the condition code, respectively.
boolean TestCC(CondCode field) {
switch (ccMaskRule) {
case "EQ": return (field == "EQ");
case "NE": return (field != "EQ");
case "LT": return (field == "LT");
case "GE": return (field == "GT" || field == "EQ");
case "LE": return (field == "LT" || field == "EQ");
case "GT": return (field == "GT");
case "TR": return TRUE;
case "FL": return FALSE;
case "": return TRUE;
}
enum GenerateCC(DstT value) {
if (value == NaN) {
return UN;
} else if (value < 0) {
return LT;
} else if (value == 0) {
return EQ;
} else {
return GT;
}
}
void UpdateDestination(VecDstT destination, VecInstT result)
{
// Load the original destination register and condition code.
VecDstT resultDst;
VecDstT merged;
VecCC mergedCC;
// Clamp the result vector components to [0,1], if requested.
if (clamp01) {
if (result.x < 0) result.x = 0;
else if (result.x > 1) result.x = 1;
if (result.y < 0) result.y = 0;
else if (result.y > 1) result.y = 1;
if (result.z < 0) result.z = 0;
else if (result.z > 1) result.z = 1;
if (result.w < 0) result.w = 0;
else if (result.w > 1) result.w = 1;
}
// Convert the result to the type of the destination register.
resultDst.x = TypeConvert(result.x);
resultDst.y = TypeConvert(result.y);
resultDst.z = TypeConvert(result.z);
resultDst.w = TypeConvert(result.w);
// Merge the converted result into the destination register, under
// control of the compile- and run-time write masks.
merged = destination;
mergedCC = cc;
if (instrMask.x && TestCC(cc.c***)) {
merged.x = result.x;
if (updatecc) mergedCC.x = GenerateCC(result.x);
}
if (instrMask.y && TestCC(cc.*c**)) {
merged.y = result.y;
if (updatecc) mergedCC.y = GenerateCC(result.y);
}
if (instrMask.z && TestCC(cc.**c*)) {
merged.z = result.z;
if (updatecc) mergedCC.z = GenerateCC(result.z);
}
if (instrMask.w && TestCC(cc.***c)) {
merged.w = result.w;
if (updatecc) mergedCC.w = GenerateCC(result.w);
}
// Write out the new destination register and result code.
destination = merged;
cc = mergedCC;
}
Section 3.11.5, Fragment Program Instruction Set
The following sections describe the instruction set available to fragment
programs.
Section 3.11.5.1, ADD: Add
The ADD instruction performs a component-wise add of the two operands to
yield a result vector.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
result.x = tmp0.x + tmp1.x;
result.y = tmp0.y + tmp1.y;
result.z = tmp0.z + tmp1.z;
result.w = tmp0.w + tmp1.w;
The following special-case rules apply to addition:
1. "A+B" is always equivalent to "B+A".
2. NaN + <x> = NaN, for all <x>.
3. +INF + <x> = +INF, for all <x> except NaN and -INF.
4. -INF + <x> = -INF, for all <x> except NaN and +INF.
5. +INF + -INF = NaN.
6. -0.0 + <x> = <x>, for all <x>.
7. +0.0 + <x> = <x>, for all <x> except -0.0.
Section 3.11.5.2, COS: Cosine
The COS instruction approximates the cosine of the angle specified by the
scalar operand and replicates the approximation to all four components of
the result vector. The angle is specified in radians and does not have to
be in the range [0,2*PI].
tmp = ScalarLoad(op0);
result.x = ApproxCosine(tmp);
result.y = ApproxCosine(tmp);
result.z = ApproxCosine(tmp);
result.w = ApproxCosine(tmp);
The approximation function ApproxCosine is accurate to at least 22 bits
with an angle in the range [0,2*PI].
| ApproxCosine(x) - cos(x) | < 1.0 / 2^22, if 0.0 <= x < 2.0 * PI.
The error in the approximation will typically increase with the absolute
value of the angle when the angle falls outside the range [0,2*PI].
The following special-case rules apply to cosine approximation:
1. ApproxCosine(NaN) = NaN.
2. ApproxCosine(+/-INF) = NaN.
3. ApproxCosine(+/-0.0) = +1.0.
Section 3.11.5.3, DDX: Derivative Relative to X
The DDX instruction computes approximate partial derivatives of the four
components of the single operand with respect to the X window coordinate
to yield a result vector. The partial derivative is evaluated at the
center of the pixel.
f = VectorLoad(op0);
result = ComputePartialX(f);
Note that the partial derivates obtained by this instruction are
approximate, and derivative-of-derivate instruction sequences may not
yield accurate second derivatives.
For components with partial derivatives that overflow (including +/-INF
inputs), the resulting partials may be encoded as large floating-point
numbers instead of +/-INF.
Section 3.11.5.4, DDY: Derivative Relative to Y
The DDY instruction computes approximate partial derivatives of the four
components of the single operand with respect to the Y window coordinate
to yield a result vector. The partial derivative is evaluated at the
center of the pixel.
f = VectorLoad(op0);
result = ComputePartialY(f);
Note that the partial derivates obtained by this instruction are
approximate, and derivative-of-derivate instruction sequences may not
yield accurate second derivatives.
For components with partial derivatives that overflow (including +/-INF
inputs), the resulting partials may be encoded as large floating-point
numbers instead of +/-INF.
Section 3.11.5.5, DP3: 3-Component Dot Product
The DP3 instruction computes a three component dot product of the two
operands (using the x, y, and z components) and replicates the dot product
to all four components of the result vector.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1):
result.x = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +
(tmp0.z * tmp2.z);
result.y = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +
(tmp0.z * tmp2.z);
result.z = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +
(tmp0.z * tmp2.z);
result.w = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +
(tmp0.z * tmp2.z);
Section 3.11.5.6, DP4: 4-Component Dot Product
The DP4 instruction computes a four component dot product of the two
operands and replicates the dot product to all four components of the
result vector.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1):
result.x = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +
(tmp0.z * tmp2.z) + (tmp0.w * tmp1.w);
result.y = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +
(tmp0.z * tmp2.z) + (tmp0.w * tmp1.w);
result.z = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +
(tmp0.z * tmp2.z) + (tmp0.w * tmp1.w);
result.w = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +
(tmp0.z * tmp2.z) + (tmp0.w * tmp1.w);
Section 3.11.5.7, DST: Distance Vector
The DST instruction computes a distance vector from two specially-
formatted operands. The first operand should be of the form [NA, d^2,
d^2, NA] and the second operand should be of the form [NA, 1/d, NA, 1/d],
where NA values are not relevant to the calculation and d is a vector
length. If both vectors satisfy these conditions, the result vector will
be of the form [1.0, d, d^2, 1/d].
The exact behavior is specified in the following pseudo-code:
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
result.x = 1.0;
result.y = tmp0.y * tmp1.y;
result.z = tmp0.z;
result.w = tmp1.w;
Given an arbitrary vector, d^2 can be obtained using the DOT3 instruction
(using the same vector for both operands) and 1/d can be obtained from d^2
using the RSQ instruction.
This distance vector is useful for per-fragment light attenuation
calculations: a DOT3 operation involving the distance vector and an
attenuation constants vector will yield the attenuation factor.
Section 3.11.5.8, EX2: Exponential Base 2
The EX2 instruction approximates 2 raised to the power of the scalar
operand and replicates it to all four components of the result
vector.
tmp = ScalarLoad(op0);
result.x = Approx2ToX(tmp);
result.y = Approx2ToX(tmp);
result.z = Approx2ToX(tmp);
result.w = Approx2ToX(tmp);
The approximation function is accurate to at least 22 bits:
| Approx2ToX(x) - 2^x | < 1.0 / 2^22, if 0.0 <= x < 1.0,
and, in general,
| Approx2ToX(x) - 2^x | < (1.0 / 2^22) * (2^floor(x)).
The following special-case rules apply to exponential approximation:
1. Approx2ToX(NaN) = NaN.
2. Approx2ToX(-INF) = +0.0.
3. Approx2ToX(+INF) = +INF.
4. Approx2ToX(+/-0.0) = +1.0.
Section 3.11.5.9, FLR: Floor
The FLR instruction performs a component-wise floor operation on the
operand to generate a result vector. The floor of a value is defined as
the largest integer less than or equal to the value. The floor of 2.3 is
2.0; the floor of -3.6 is -4.0.
tmp = VectorLoad(op0);
result.x = floor(tmp.x);
result.y = floor(tmp.y);
result.z = floor(tmp.z);
result.w = floor(tmp.w);
The following special-case rules apply to floor computation:
1. floor(NaN) = NaN.
2. floor(<x>) = <x>, for -0.0, +0.0, -INF, and +INF. In all cases, the
sign of the result is equal to the sign of the operand.
Section 3.11.5.10, FRC: Fraction
The FRC instruction extracts the fractional portion of each component of
the operand to generate a result vector. The fractional portion of a
component is defined as the result after subtracting off the floor of the
component (see FLR), and is always in the range [0.00, 1.00).
For negative values, the fractional portion is NOT the number written to
the right of the decimal point -- the fractional portion of -1.7 is not
0.7 -- it is 0.3. 0.3 is produced by subtracting the floor of -1.7 (-2.0)
from -1.7.
tmp = VectorLoad(op0);
result.x = tmp.x - floor(tmp.x);
result.y = tmp.y - floor(tmp.y);
result.z = tmp.z - floor(tmp.z);
result.w = tmp.w - floor(tmp.w);
The following special-case rules, which can be derived from the rules for
FLR and ADD apply to fraction computation:
1. fraction(NaN) = NaN.
2. fraction(+/-INF) = NaN.
3. fraction(+/-0.0) = +0.0.
Section 3.11.5.11, KIL: Conditionally Discard Fragment
The KIL instruction is unlike any other instruction in the instruction
set. This instruction evaluates components of a swizzled condition code
using a test expression identical to that used to evaluate condition code
write masks (Section 3.11.4.4). If any condition code component evaluates
to TRUE, the fragment is discarded. Otherwise, the instruction has no
effect. The condition code components are specified, swizzled, and
evaluated in the same manner as the condition code write mask.
if (TestCC(rc.c***) || TestCC(rc.*c**) ||
TestCC(rc.**c*) || TestCC(rc.***c)) {
// Discard the fragment.
} else {
// Do nothing.
}
If the fragment is discarded, it is treated as though it were not produced
by rasterization. In particular, none of the per-fragment operations
(such as stencil tests, blends, stencil, depth, or color buffer writes)
are performed on the fragment.
Section 3.11.5.12, LG2: Logarithm Base 2
The LG2 instruction approximates the base 2 logarithm of the scalar
operand and replicates it to all four components of the result vector.
tmp = ScalarLoad(op0);
result.x = ApproxLog2(tmp);
result.y = ApproxLog2(tmp);
result.z = ApproxLog2(tmp);
result.w = ApproxLog2(tmp);
The approximation function is accurate to at least 22 bits:
| ApproxLog2(x) - log_2(x) | < 1.0 / 2^22.
Note that for large values of x, there are not enough bits in the
floating-point storage format to represent a result that precisely.
The following special-case rules apply to logarithm approximation:
1. ApproxLog2(NaN) = NaN.
2. ApproxLog2(+INF) = +INF.
3. ApproxLog2(+/-0.0) = -INF.
4. ApproxLog2(x) = NaN, -INF < x < -0.0.
5. ApproxLog2(-INF) = NaN.
Section 3.11.5.13, LIT: Compute Light Coefficients
The LIT instruction accelerates per-fragment lighting by computing
lighting coefficients for ambient, diffuse, and specular light
contributions. The "x" component of the operand is assumed to hold a
diffuse dot product (n dot VP_pli, as in the vertex lighting equations in
Section 2.13.1). The "y" component of the operand is assumed to hold a
specular dot product (n dot h_i). The "w" component of the operand is
assumed to hold the specular exponent of the material (s_rm).
The "x" component of the result vector receives the value that should be
multiplied by the ambient light/material product (always 1.0). The "y"
component of the result vector receives the value that should be
multiplied by the diffuse light/material product (n dot VP_pli). The "z"
component of the result vector receives the value that should be
multiplied by the specular light/material product (f_i * (n dot h_i) ^
s_rm). The "w" component of the result is the constant 1.0.
Negative diffuse and specular dot products are clamped to 0.0, as is done
in the standard per-vertex lighting operations. In addition, if the
diffuse dot product is zero or negative, the specular coefficient is
forced to zero.
tmp = VectorLoad(op0);
if (t.x < 0) t.x = 0;
if (t.y < 0) t.y = 0;
result.x = 1.0;
result.y = t.x;
result.z = (t.x > 0) ? ApproxPower(t.y, t.w) : 0.0;
result.w = 1.0;
The exponentiation approximation used to compute result.z are identical to
that used in the POW instruction, including errors and the processing of
any special cases.
Section 3.11.5.14, LRP: Linear Interpolation
The LRP instruction performs a component-wise linear interpolation to
yield a result vector. It interpolates between the components of the
second and third operands, using the first operand as a weight.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
tmp2 = VectorLoad(op2);
result.x = tmp0.x * tmp1.x + (1 - tmp0.x) * tmp2.x;
result.y = tmp0.y * tmp1.y + (1 - tmp0.y) * tmp2.y;
result.z = tmp0.z * tmp1.z + (1 - tmp0.z) * tmp2.z;
result.w = tmp0.w * tmp1.w + (1 - tmp0.w) * tmp2.w;
Section 3.11.5.15, MAD: Multiply and Add
The MAD instruction performs a component-wise multiply of the first two
operands, and then does a component-wise add of the product to the third
operand to yield a result vector.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
tmp2 = VectorLoad(op2);
result.x = tmp0.x * tmp1.x + tmp2.x;
result.y = tmp0.y * tmp1.y + tmp2.y;
result.z = tmp0.z * tmp1.z + tmp2.z;
result.w = tmp0.w * tmp1.w + tmp2.w;
Section 3.11.5.16, MAX: maximum
The MAX instruction computes component-wise maximums of the values in the
two operands to yield a result vector.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
result.x = max(tmp0.x, tmp1.x);
result.y = max(tmp0.y, tmp1.y);
result.z = max(tmp0.z, tmp1.z);
result.w = max(tmp0.w, tmp1.w);
The following special cases apply to the maximum operation:
1. max(A,B) is always equivalent to max(B,A).
2. max(NaN, <x>) == NaN, for all <x>.
Section 3.11.5.17, MIN: minimum
The MIN instruction computes component-wise minimums of the values in the
two operands to yield a result vector.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
result.x = min(tmp0.x, tmp1.x);
result.y = min(tmp0.y, tmp1.y);
result.z = min(tmp0.z, tmp1.z);
result.w = min(tmp0.w, tmp1.w);
The following special cases apply to the minimum operation:
1. min(A,B) is always equivalent to min(B,A).
2. min(NaN, <x>) == NaN, for all <x>.
Section 3.11.5.18, MOV: Move
The MOV instruction copies the value of the operand to yield a result
vector.
result = VectorLoad(op0);
Section 3.11.5.19, MUL: Multiply
The MUL instruction performs a component-wise multiply of the two operands
to yield a result vector.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
result.x = tmp0.x * tmp1.x;
result.y = tmp0.y * tmp1.y;
result.z = tmp0.z * tmp1.z;
result.w = tmp0.w * tmp1.w;
The following special-case rules apply to multiplication:
1. "A*B" is always equivalent to "B*A".
2. NaN * <x> = NaN, for all <x>.
3. +/-0.0 * +/-INF = NaN.
4. +/-0.0 * <x> = +/-0.0, for all <x> except -INF, +INF, and NaN. The
sign of the result is positive if the signs of the two operands match
and negative otherwise.
5. +/-INF * <x> = +/-INF, for all <x> except -0.0, +0.0, and NaN. The
sign of the result is positive if the signs of the two operands match
and negative otherwise.
6. +1.0 * <x> = <x>, for all <x>.
Section 3.11.5.20, PK2H: Pack Two 16-bit Floats
The PK2H instruction converts the "x" and "y" components of the single
operand into 16-bit floating-point format, packs the bit representation of
these two floats into a 32-bit value, and replicates that value to all
four components of the result vector. The PK2H instruction can be
reversed by the UP2H instruction below.
tmp0 = VectorLoad(op0);
/* result obtained by combining raw bits of tmp0.x, tmp0.y */
result.x = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16);
result.y = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16);
result.z = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16);
result.w = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16);
The result must be written to a register with 32-bit components (an "R"
register, o[COLR], or o[DEPR]). A fragment program will fail to load if
any other register type is specified.
Section 3.11.5.21, PK2US: Pack Two Unsigned 16-bit Scalars
The PK2US instruction converts the "x" and "y" components of the single
operand into a packed pair of 16-bit unsigned scalars. The scalars are
represented in a bit pattern where all '0' bits corresponds to 0.0 and all
'1' bits corresponds to 1.0. The bit representations of the two converted
components are packed into a 32-bit value, and that value is replicated to
all four components of the result vector. The PK2US instruction can be
reversed by the UP2US instruction below.
tmp0 = VectorLoad(op0);
if (tmp0.x < 0.0) tmp0.x = 0.0;
if (tmp0.x > 1.0) tmp0.x = 1.0;
if (tmp0.y < 0.0) tmp0.y = 0.0;
if (tmp0.y > 1.0) tmp0.y = 1.0;
us.x = round(65535.0 * tmp0.x); /* us is a ushort vector */
us.y = round(65535.0 * tmp0.y);
/* result obtained by combining raw bits of us. */
result.x = ((us.x) | (us.y << 16));
result.y = ((us.x) | (us.y << 16));
result.z = ((us.x) | (us.y << 16));
result.w = ((us.x) | (us.y << 16));
The result must be written to a register with 32-bit components (an "R"
register, o[COLR], or o[DEPR]). A fragment program will fail to load if
any other register type is specified.
Section 3.11.5.22, PK4B: Pack Four Signed 8-bit Scalars
The PK4B instruction converts the four components of the single operand
into 8-bit signed quantities. The signed quantities are represented in a
bit pattern where all '0' bits corresponds to -128/127 and all '1' bits
corresponds to +127/127. The bit representations of the four converted
components are packed into a 32-bit value, and that value is replicated to
all four components of the result vector. The PK4B instruction can be
reversed by the UP4B instruction below.
tmp0 = VectorLoad(op0);
if (tmp0.x < -128/127) tmp0.x = -128/127;
if (tmp0.y < -128/127) tmp0.y = -128/127;
if (tmp0.z < -128/127) tmp0.z = -128/127;
if (tmp0.w < -128/127) tmp0.w = -128/127;
if (tmp0.x > +127/127) tmp0.x = +127/127;
if (tmp0.y > +127/127) tmp0.y = +127/127;
if (tmp0.z > +127/127) tmp0.z = +127/127;
if (tmp0.w > +127/127) tmp0.w = +127/127;
ub.x = round(127.0 * tmp0.x + 128.0); /* ub is a ubyte vector */
ub.y = round(127.0 * tmp0.y + 128.0);
ub.z = round(127.0 * tmp0.z + 128.0);
ub.w = round(127.0 * tmp0.w + 128.0);
/* result obtained by combining raw bits of ub. */
result.x = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
result.y = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
result.z = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
result.w = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
The result must be written to a register with 32-bit components (an "R"
register, o[COLR], or o[DEPR]). A fragment program will fail to load if
any other register type is specified.
Section 3.11.5.23, PK4UB: Pack Four Unsigned 8-bit Scalars
The PK4UB instruction converts the four components of the single operand
into a packed grouping of 8-bit unsigned scalars. The scalars are
represented in a bit pattern where all '0' bits corresponds to 0.0 and all
'1' bits corresponds to 1.0. The bit representations of the four
converted components are packed into a 32-bit value, and that value is
replicated to all four components of the result vector. The PK4UB
instruction can be reversed by the UP4UB instruction below.
tmp0 = VectorLoad(op0);
if (tmp0.x < 0.0) tmp0.x = 0.0;
if (tmp0.x > 1.0) tmp0.x = 1.0;
if (tmp0.y < 0.0) tmp0.y = 0.0;
if (tmp0.y > 1.0) tmp0.y = 1.0;
if (tmp0.z < 0.0) tmp0.z = 0.0;
if (tmp0.z > 1.0) tmp0.z = 1.0;
if (tmp0.w < 0.0) tmp0.w = 0.0;
if (tmp0.w > 1.0) tmp0.w = 1.0;
ub.x = round(255.0 * tmp0.x); /* ub is a ubyte vector */
ub.y = round(255.0 * tmp0.y);
ub.z = round(255.0 * tmp0.z);
ub.w = round(255.0 * tmp0.w);
/* result obtained by combining raw bits of ub. */
result.x = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
result.y = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
result.z = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
result.w = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
The result must be written to a register with 32-bit components (an "R"
register, o[COLR], or o[DEPR]). A fragment program will fail to load if
any other register type is specified.
Section 3.11.5.24, POW: Exponentiation
The POW instruction approximates the value of the first scalar operand
raised to the power of the second scalar operand and replicates it to all
four components of the result vector.
tmp0 = ScalarLoad(op0);
tmp1 = ScalarLoad(op1);
result.x = ApproxPower(tmp0, tmp1);
result.y = ApproxPower(tmp0, tmp1);
result.z = ApproxPower(tmp0, tmp1);
result.w = ApproxPower(tmp0, tmp1);
The exponentiation approximation function is defined in terms of the base
2 exponentiation and logarithm approximation operations in the EX2 and LG2
instructions, including errors and the processing of any special cases.
In particular,
ApproxPower(a,b) = ApproxExp2(b * ApproxLog2(a)).
The following special-case rules, which can be derived from the rules in
the LG2, MUL, and EX2 instructions, apply to exponentiation:
1. ApproxPower(<x>, <y>) = NaN, if x < -0.0,
2. ApproxPower(<x>, <y>) = NaN, if x or y is NaN.
3. ApproxPower(+/-0.0, +/-0.0) = NaN.
4. ApproxPower(+INF, +/-0.0) = NaN.
5. ApproxPower(+1.0, +/-INF) = NaN.
6. ApproxPower(+/-0.0, <x>) = +0.0, if x > +0.0.
7. ApproxPower(+/-0.0, <x>) = +INF, if x < -0.0.
8. ApproxPower(+1.0, <x>) = +1.0, if -INF < x < +INF.
9. ApproxPower(+INF, <x>) = +INF, if x > +0.0.
10. ApproxPower(+INF, <x>) = +INF, if x < -0.0.
11. ApproxPower(<x>, +/-0.0) = +1.0, if +0.0 < x < +INF.
12. ApproxPower(<x>, +1.0) ~= <x>, if x >= +0.0.
13. ApproxPower(<x>, +INF) = +0.0, if -0.0 <= x < +1.0,
+INF, if x > +1.0,
14. ApproxPower(<x>, -INF) = +INF, if -0.0 <= x < +1.0,
+0.0, if x > +1.0,
Note that 0^0 is defined here as NaN, since ApproxLog2(0) = -INF, and
0*(-INF) = NaN. In many other applications, including the standard C
pow() function, 0^0 is defined as 1.0. This behavior can be emulated
using additional instructions in much that same way that the pow()
function is implemented on many CPUs.
Note that a logarithm is involved even if the exponent is an integer.
This means that any exponentiating with a negative base will produce NaN.
In constrast, it is possible in a "normal" mathematical formulation to
raise negative numbers to integral powers (e.g., (-3)^2== 9, and
(-0.5)^-2==4).
Section 3.11.5.25, RCP: Reciprocal
The RCP instruction approximates the reciprocal of the scalar operand and
replicates it to all four components of the result vector.
tmp = ScalarLoad(op0);
result.x = ApproxReciprocal(tmp);
result.y = ApproxReciprocal(tmp);
result.z = ApproxReciprocal(tmp);
result.w = ApproxReciprocal(tmp);
The approximation function is accurate to at least 22 bits:
| ApproxReciprocal(x) - (1/x) | < 1.0 / 2^22, if 1.0 <= x < 2.0.
The following special-case rules apply to reciprocation:
1. ApproxReciprocal(NaN) = NaN.
2. ApproxReciprocal(+INF) = +0.0.
3. ApproxReciprocal(-INF) = -0.0.
4. ApproxReciprocal(+0.0) = +INF.
5. ApproxReciprocal(-0.0) = -INF.
Section 3.11.5.26, RFL: Reflection Vector
The RFL instruction computes the reflection of the second vector operand
(the "direction" vector) about the vector specified by the first vector
operand (the "axis" vector). Both operands are treated as 3D vectors (the
w components are ignored). The result vector is another 3D vector (the
"reflected direction" vector). The length of the result vector, ignoring
rounding errors, should equal that of the second operand.
axis = VectorLoad(op0);
direction = VectorLoad(op1);
tmp.w = (axis.x * axis.x + axis.y * axis.y +
axis.z * axis.z);
tmp.x = (axis.x * direction.x + axis.y * direction.y +
axis.z * direction.z);
tmp.x = 2.0 * tmp.x;
tmp.x = tmp.x / tmp.w;
result.x = tmp.x * axis.x - direction.x;
result.y = tmp.x * axis.y - direction.y;
result.z = tmp.x * axis.z - direction.z;
A fragment program will fail to load if the w component of the result is
enabled in the component write mask (see the <optionalWriteMask> rule in
the grammar).
Section 3.11.5.27, RSQ: Reciprocal Square Root
The RSQ instruction approximates the reciprocal of the square root of the
scalar operand and replicates it to all four components of the result
vector.
tmp = ScalarLoad(op0);
result.x = ApproxRSQRT(tmp);
result.y = ApproxRSQRT(tmp);
result.z = ApproxRSQRT(tmp);
result.w = ApproxRSQRT(tmp);
The approximation function is accurate to at least 22 bits:
| ApproxRSQRT(x) - (1/x) | < 1.0 / 2^22, if 1.0 <= x < 4.0.
The following special-case rules apply to reciprocal square roots:
1. ApproxRSQRT(NaN) = NaN.
2. ApproxRSQRT(+INF) = +0.0.
3. ApproxRSQRT(-INF) = NaN.
4. ApproxRSQRT(+0.0) = +INF.
5. ApproxRSQRT(-0.0) = -INF.
6. ApproxRSQRT(x) = NaN, if -INF < x < -0.0.
Section 3.11.5.28, SEQ: Set on Equal To
The SEQ instruction performs a component-wise comparison of the two
operands. Each component of the result vector is 1.0 if the corresponding
component of the first operand is equal to that of the second, and 0.0
otherwise.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
result.x = (tmp0.x == tmp1.x) ? 1.0 : 0.0;
result.y = (tmp0.y == tmp1.y) ? 1.0 : 0.0;
result.z = (tmp0.z == tmp1.z) ? 1.0 : 0.0;
result.w = (tmp0.w == tmp1.w) ? 1.0 : 0.0;
The following special-case rules apply to SEQ:
1. (<x> == <y>) and (<y> == <x>) always produce the same result.
1. (NaN == <x>) is FALSE for all <x>, including NaN.
2. (+INF == +INF) and (-INF == -INF) are TRUE.
3. (-0.0 == +0.0) and (+0.0 == -0.0) are TRUE.
Section 3.11.5.29, SFL: Set on False
The SFL instruction is a degenerate case of the other "Set on"
instructions that sets all components of the result vector to
0.0.
result.x = 0.0;
result.y = 0.0;
result.z = 0.0;
result.w = 0.0;
Section 3.11.5.30, SGE: Set on Greater Than or Equal
The SGE instruction performs a component-wise comparison of the two
operands. Each component of the result vector is 1.0 if the corresponding
component of the first operands is greater than or equal that of the
second, and 0.0 otherwise.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
result.x = (tmp0.x >= tmp1.x) ? 1.0 : 0.0;
result.y = (tmp0.y >= tmp1.y) ? 1.0 : 0.0;
result.z = (tmp0.z >= tmp1.z) ? 1.0 : 0.0;
result.w = (tmp0.w >= tmp1.w) ? 1.0 : 0.0;
The following special-case rules apply to SGE:
1. (NaN >= <x>) and (<x> >= NaN) are FALSE for all <x>.
2. (+INF >= +INF) and (-INF >= -INF) are TRUE.
3. (-0.0 >= +0.0) and (+0.0 >= -0.0) are TRUE.
Section 3.11.5.31, SGT: Set on Greater Than
The SGT instruction performs a component-wise comparison of the two
operands. Each component of the result vector is 1.0 if the corresponding
component of the first operands is greater than that of the second, and
0.0 otherwise.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
result.x = (tmp0.x > tmp1.x) ? 1.0 : 0.0;
result.y = (tmp0.y > tmp1.y) ? 1.0 : 0.0;
result.z = (tmp0.z > tmp1.z) ? 1.0 : 0.0;
result.w = (tmp0.w > tmp1.w) ? 1.0 : 0.0;
The following special-case rules apply to SGT:
1. (NaN > <x>) and (<x> > NaN) are FALSE for all <x>.
2. (-0.0 > +0.0) and (+0.0 > -0.0) are FALSE.
Section 3.11.5.32, SIN: Sine
The SIN instruction approximates the sine of the angle specified by the
scalar operand and replicates it to all four components of the result
vector. The angle is specified in radians and does not have to be in the
range [0,2*PI].
tmp = ScalarLoad(op0);
result.x = ApproxSine(tmp);
result.y = ApproxSine(tmp);
result.z = ApproxSine(tmp);
result.w = ApproxSine(tmp);
The approximation function is accurate to at least 22 bits with an angle
in the range [0,2*PI].
| ApproxSine(x) - sin(x) | < 1.0 / 2^22, if 0.0 <= x < 2.0 * PI.
The error in the approximation will typically increase with the absolute
value of the angle when the angle falls outside the range [0,2*PI].
The following special-case rules apply to cosine approximation:
1. ApproxSine(NaN) = NaN.
2. ApproxSine(+/-INF) = NaN.
3. ApproxSine(+/-0.0) = +/-0.0. The sign of the result is equal to the
sign of the single operand.
Section 3.11.5.33, SLE: Set on Less Than or Equal
The SLE instruction performs a component-wise comparison of the two
operands. Each component of the result vector is 1.0 if the corresponding
component of the first operand is less than or equal to that of the
second, and 0.0 otherwise.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
result.x = (tmp0.x <= tmp1.x) ? 1.0 : 0.0;
result.y = (tmp0.y <= tmp1.y) ? 1.0 : 0.0;
result.z = (tmp0.z <= tmp1.z) ? 1.0 : 0.0;
result.w = (tmp0.w <= tmp1.w) ? 1.0 : 0.0;
The following special-case rules apply to SLE:
1. (NaN <= <x>) and (<x> <= NaN) are FALSE for all <x>.
2. (+INF <= +INF) and (-INF <= -INF) are TRUE.
3. (-0.0 <= +0.0) and (+0.0 <= -0.0) are TRUE.
Section 3.11.5.34, SLT: Set on Less Than
The SLT instruction performs a component-wise comparison of the two
operands. Each component of the result vector is 1.0 if the corresponding
component of the first operand is less than that of the second, and 0.0
otherwise.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
result.x = (tmp0.x < tmp1.x) ? 1.0 : 0.0;
result.y = (tmp0.y < tmp1.y) ? 1.0 : 0.0;
result.z = (tmp0.z < tmp1.z) ? 1.0 : 0.0;
result.w = (tmp0.w < tmp1.w) ? 1.0 : 0.0;
The following special-case rules apply to SLT:
1. (NaN < <x>) and (<x> < NaN) are FALSE for all <x>.
2. (-0.0 < +0.0) and (+0.0 < -0.0) are FALSE.
Section 3.11.5.35, SNE: Set on Not Equal
The SNE instruction performs a component-wise comparison of the two
operands. Each component of the result vector is 1.0 if the corresponding
component of the first operand is not equal to that of the second, and 0.0
otherwise.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
result.x = (tmp0.x != tmp1.x) ? 1.0 : 0.0;
result.y = (tmp0.y != tmp1.y) ? 1.0 : 0.0;
result.z = (tmp0.z != tmp1.z) ? 1.0 : 0.0;
result.w = (tmp0.w != tmp1.w) ? 1.0 : 0.0;
The following special-case rules apply to SNE:
1. (<x> != <y>) and (<y> != <x>) always produce the same result.
2. (NaN != <x>) is TRUE for all <x>, including NaN.
3. (+INF != +INF) and (-INF != -INF) are FALSE.
4. (-0.0 != +0.0) and (+0.0 != -0.0) are TRUE.
Section 3.11.5.36, STR: Set on True
The STR instruction is a degenerate case of the other "Set on"
instructions that sets all components of the result vector to 1.0.
result.x = 1.0;
result.y = 1.0;
result.z = 1.0;
result.w = 1.0;
Section 3.11.5.37, SUB: Subtract
The SUB instruction performs a component-wise subtraction of the second
operand from the first to yield a result vector.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
result.x = tmp0.x - tmp1.x;
result.y = tmp0.y - tmp1.y;
result.z = tmp0.z - tmp1.z;
result.w = tmp0.w - tmp1.w;
The SUB instruction is completely equivalent to an identical ADD
instruction in which the negate operator on the second operand is
reversed:
1. "SUB R0, R1, R2" is equivalent to "ADD R0, R1, -R2".
2. "SUB R0, R1, -R2" is equivalent to "ADD R0, R1, R2".
3. "SUB R0, R1, |R2|" is equivalent to "ADD R0, R1, -|R2|".
4. "SUB R0, R1, -|R2|" is equivalent to "ADD R0, R1, |R2|".
Section 3.11.5.38, TEX: Texture Lookup
The TEX instruction performs a filtered texture lookup using the texture
target given by <texImageTarget> belonging to the texture image unit given
by <texImageUnit>. <texImageTarget> values of "1D", "2D", "3D", "CUBE",
and "RECT" correspond to the texture targets TEXTURE_1D, TEXTURE_2D,
TEXTURE_3D, TEXTURE_CUBE_MAP_ARB, and TEXTURE_RECTANGLE_NV, respectively.
The (s,t,r) texture coordinates used for the lookup are the x, y, and z
components of the single operand.
The texture lookup is performed as specified in Section 3.8. The LOD
calculations in Section 3.8.5 are performed using an implementation
dependent method to derive ds/dx, ds/dy, dt/dx, dt/dy, dr/dx, and dr/dy.
The mapping of filtered texture components to the components of the result
vector is dependent on the base internal format of the texture and is
specified in Table X.5.
Result Vector Components
Base Internal Format X Y Z W
-------------------- ----- ----- ----- -----
ALPHA 0.0 0.0 0.0 At
LUMINANCE Lt Lt Lt 1.0
LUMINANCE_ALPHA Lt Lt Lt At
INTENSITY It It It It
RGB Rt Gt Bt 1.0
RGBA Rt Gt Bt At
HILO_NV (signed) HIt LOt HEMI 1.0
HILO_NV (unsigned) HIt LOt 1.0 1.0
DSDT_NV DSt DTt 0.0 1.0
DSDT_MAG_NV DSt DTt MAGt 1.0
DSDT_MAG_INTENSITY_NV DSt DTt MAGt It
FLOAT_R_NV Rt 0.0 0.0 1.0
FLOAT_RG_NV Rt Gt 0.0 1.0
FLOAT_RGB_NV Rt Gt Bt 1.0
FLOAT_RGBA_NV Rt Gt Bt At
Table X.5: Mapping of filtered texel components to result vector
components for the TEX instruction. 0.0 and 1.0 indicate that the
corresponding constant value is written to the result vector.
DEPTH_COMPONENT textures are treated as ALPHA, LUMINANCE, or INTENSITY,
as specified in the texture's depth texture mode.
For HILO_NV textures with signed components, "HEMI" is defined as
sqrt(MAX(0, 1-(HIt^2+LOt^2))).
This instruction specifies a particular texture target, ignoring the
standard hierarchy of texture enables (TEXTURE_CUBE_MAP_ARB, TEXTURE_3D,
TEXTURE_2D, TEXTURE_1D) used to select a texture target in unextended
OpenGL. If the specified texture target has a consistent set of images, a
lookup is performed. Otherwise, the result of the instruction is the
vector (0,0,0,0).
Although this instruction allows the selection of any texture target, a
fragment program can not use more than one texture target for any given
texture image unit.
Section 3.11.5.39, TXD: Texture Lookup with Derivatives
The TXD instruction performs a filtered texture lookup using the texture
target given by <texImageTarget> belonging to the texture image unit given
by <texImageUnit>. <texImageTarget> values of "1D", "2D", "3D", "CUBE",
and "RECT" correspond to the texture targets TEXTURE_1D, TEXTURE_2D,
TEXTURE_3D, TEXTURE_CUBE_MAP_ARB, and TEXTURE_RECTANGLE_NV, respectively.
The (s,t,r) texture coordinates used for the lookup are the x, y, and z
components of the first operand. The partial derivatives in the X
direction (ds/dx, dt/dx, dr/dx) are specified by the x, y, and z
components of the second operand. The partial derivatives in the Y
direction (ds/dy, dt/dy, dr/dy) are specified by the x, y, and z
components of the third operand.
The texture lookup is performed as specified in Section 3.8. The LOD
calculations in Section 3.8.5 are performed using the specified partial
derivatives. The mapping of filtered texture components to the components
of the result vector is dependent on the base internal format of the
texture and is specified in Table X.5.
This instruction specifies a particular texture target, ignoring the
standard hierarchy of texture enables (TEXTURE_CUBE_MAP_ARB, TEXTURE_3D,
TEXTURE_2D, TEXTURE_1D) used to select a texture target in unextended
OpenGL. If the specified texture target has a consistent set of images, a
lookup is performed. Otherwise, the result of the instruction is the
vector (0,0,0,0).
Although this instruction allows the selection of any texture target, a
fragment program can not use more than one texture target for any given
texture image unit.
Section 3.11.5.40, TXP: Projective Texture Lookup
The TXP instruction performs a filtered texture lookup using the texture
target given by <texImageTarget> belonging to the texture image unit given
by <texImageUnit>. <texImageTarget> values of "1D", "2D", "3D", "CUBE",
and "RECT" correspond to the texture targets TEXTURE_1D, TEXTURE_2D,
TEXTURE_3D, TEXTURE_CUBE_MAP_ARB, and TEXTURE_RECTANGLE_NV, respectively.
For cube map textures, the (s,t,r) texture coordinates used for the lookup
are given by x, y, and z, respectively. For all other textures, the
(s,t,r) texture coordinates used for the lookup are given by x/w, y/w, and
z/w, respectively, where x, y, z, and w are the corresponding components
of the operand.
The texture lookup is performed as specified in Section 3.8. The LOD
calculations in Section 3.8.5 are performed using an implementation
dependent method to derive ds/dx, ds/dy, dt/dx, dt/dy, dr/dx, and dr/dy.
The mapping of filtered texture components to the components of the result
vector is dependent on the base internal format of the texture and is
specified in Table X.5.
This instruction specifies a particular texture target, ignoring the
standard hierarchy of texture enables (TEXTURE_CUBE_MAP_ARB, TEXTURE_3D,
TEXTURE_2D, TEXTURE_1D) used to select a texture target in unextended
OpenGL. If the specified texture target has a consistent set of images, a
lookup is performed. Otherwise, the result of the instruction is the
vector (0,0,0,0).
Although this instruction allows the selection of any texture target, a
fragment program can not use more than one texture target for any given
texture image unit.
Section 3.11.5.41, UP2H: Unpack Two 16-Bit Floats
The UP2H instruction unpacks two 16-bit floats stored together in a 32-bit
scalar operand. The first 16-bit float (stored in the 16 least
significant bits) is written into the "x" and "z" components of the result
vector; the second is written into the "y" and "w" components of the
result vector.
This operation undoes the type conversion and packing performed by the
PK2H instruction.
tmp = ScalarLoad(op0);
result.x = (fp16) (RawBits(tmp) & 0xFFFF);
result.y = (fp16) ((RawBits(tmp) >> 16) & 0xFFFF);
result.z = (fp16) (RawBits(tmp) & 0xFFFF);
result.w = (fp16) ((RawBits(tmp) >> 16) & 0xFFFF);
Since the source operand must be a 32-bit scalar, a fragment program will
fail to load if the operand is not obtained from a register with 32-bit
components or from a program parameter.
Section 3.11.5.42, UP2US: Unpack Two Unsigned 16-Bit Scalars
The UP2US instruction unpacks two 16-bit unsigned values packed together
in a 32-bit scalar operand. The unsigned quantities are encoded where a
bit pattern of all '0' bits corresponds to 0.0 and a pattern of all '1'
bits corresponds to 1.0. The "x" and "z" components of the result vector
are obtained from the 16 least significant bits of the operand; the "y"
and "w" components are obtained from the 16 most significant bits.
This operation undoes the type conversion and packing performed by the
PK2US instruction.
tmp = ScalarLoad(op0);
result.x = ((RawBits(tmp) >> 0) & 0xFFFF) / 65535.0;
result.y = ((RawBits(tmp) >> 16) & 0xFFFF) / 65535.0;
result.z = ((RawBits(tmp) >> 0) & 0xFFFF) / 65535.0;
result.w = ((RawBits(tmp) >> 16) & 0xFFFF) / 65535.0;
Since the source operand must be a 32-bit scalar, a fragment program will
fail to load if the operand is not obtained from a register with 32-bit
components or from a program parameter.
Section 3.11.5.43, UP4B: Unpack Four Signed 8-Bit Values
The UP4B instruction unpacks four 8-bit signed values packed together in a
32-bit scalar operand. The signed quantities are encoded where a bit
pattern of all '0' bits corresponds to -128/127 and a pattern of all '1'
bits corresponds to +127/127. The "x" component of the result vector is
the converted value corresponding to the 8 least significant bits of the
operand; the "w" component corresponds to the 8 most significant bits.
This operation undoes the type conversion and packing performed by the
PK4B instruction.
tmp = ScalarLoad(op0);
result.x = (((RawBits(tmp) >> 0) & 0xFF) - 128) / 127.0;
result.y = (((RawBits(tmp) >> 8) & 0xFF) - 128) / 127.0;
result.z = (((RawBits(tmp) >> 16) & 0xFF) - 128) / 127.0;
result.w = (((RawBits(tmp) >> 24) & 0xFF) - 128) / 127.0;
Since the source operand must be a 32-bit scalar, a fragment program will
fail to load if the operand is not obtained from a register with 32-bit
components or from a program parameter.
Section 3.11.5.44, UP4UB: Unpack Four Unsigned 8-Bit Scalars
The UP4UB instruction unpacks four 8-bit unsigned values packed together
in a 32-bit scalar operand. The unsigned quantities are encoded where a
bit pattern of all '0' bits corresponds to 0.0 and a pattern of all '1'
bits corresponds to 1.0. The "x" component of the result vector is
obtained from the 8 least significant bits of the operand; the "w"
component is obtained from the 8 most significant bits.
This operation undoes the type conversion and packing performed by the
PK4UB instruction.
tmp = ScalarLoad(op0);
result.x = ((RawBits(tmp) >> 0) & 0xFF) / 255.0;
result.y = ((RawBits(tmp) >> 8) & 0xFF) / 255.0;
result.z = ((RawBits(tmp) >> 16) & 0xFF) / 255.0;
result.w = ((RawBits(tmp) >> 24) & 0xFF) / 255.0;
Since the source operand must be a 32-bit scalar, a fragment program will
fail to load if the operand is not obtained from a register with 32-bit
components or from a program parameter.
Section 3.11.5.45, X2D: 2D Coordinate Transformation
The X2D instruction multiplies the 2D offset vector specified by the "x"
and "y" components of the second vector operand by the 2x2 matrix
specified by the four components of the third vector operand, and adds the
transformed offset vector to the 2D vector specified by the "x" and "y"
components of the first vector operand. The first component of the sum is
written to the "x" and "z" components of the result; the second component
is written to the "y" and "w" components of the result.
The X2D instruction can be used to displace texture coordinates in the
same manner as the OFFSET_TEXTURE_2D_NV mode in the GL_NV_texture_shader
extension.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
tmp2 = VectorLoad(op2);
result.x = tmp0.x + tmp1.x * tmp2.x + tmp1.y * tmp2.y;
result.y = tmp0.y + tmp1.x * tmp2.z + tmp1.y * tmp2.w;
result.z = tmp0.x + tmp1.x * tmp2.x + tmp1.y * tmp2.y;
result.w = tmp0.y + tmp1.x * tmp2.z + tmp1.y * tmp2.w;
Section 3.11.6, Fragment Program Outputs
Upon completion of fragment program execution, the output registers are
used to replace the fragment's associated data.
The RGBA color of the fragment is taken from the color output register
used by the program (COLR or COLH). The R, G, B, and A color components
are extracted from the "x", "y", "z", and "w" components, respectively, of
the output register and are clamped to the range [0,1].
If the DEPR output register is written by the fragment program, the depth
value of the fragment is taken from the z component of the DEPR output
register. If depth clamping is enabled, the depth value is clamped to the
range [min(n,f), max(n,f)], where n and f are the near and far depth range
values. If depth clamping is disabled, the fragment is discarded if its
depth value is outside the range [min(n,f), max(n,f)].
Section 3.11.7, Required Fragment Program State
The state required for managing fragment programs consists of:
a bit indicating whether or not fragment program mode is enabled;
an unsigned integer naming the currently bound fragment program
and the state that must be maintained to indicate which integers are
currently in use as fragment program names.
Fragment program mode is initially disabled. The initial state of all 128
fragment program parameter registers is (0,0,0,0). The initial currently
bound fragment program is zero.
Each fragment program object consists of:
an enumerant given the program target (FRAGMENT_PROGRAM_NV);
a boolean indicating whether the program is resident;
an array of type ubyte containing the program string;
an integer representing the length of the program string array;
one four-component floating-point vector for each named local
parameter in the program;
and a set of MAX_FRAGMENT_PROGRAM_LOCAL_PARAMETERS_NV four-component
floating-point vectors to hold numbered local parameters, each initially
set to (0,0,0,0).
Initially, no program objects exist.
Additionally, the state required during the execution of a fragment
program consists of: twelve 4-component floating-point fragment attribute
registers, thirty-two 128-bit physical temporary registers, and a single
4-component condition code, whose components have one of four values (LT,
EQ, GT, or UN).
Each time a fragment program is executed, the fragment attribute registers
are initialized with the fragment's location and associated data, all
temporary register components are initialized to zero, and all condition
code components are initialized to EQ.
Renumber Section 3.11 to Section 3.12, Antialiasing Application (p.140).
No changes to the text of the section.
Additions to Chapter 4 of the OpenGL 1.2.1 Specification (Per-Fragment
Operations and the Framebuffer)
None
Additions to Chapter 5 of the OpenGL 1.2.1 Specification (Special Functions)
Add new section 5.7, Programs (after "Flush and Finish")