| Name |
| |
| NV_fragment_program |
| |
| Name Strings |
| |
| GL_NV_fragment_program |
| |
| Contact |
| |
| Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com) |
| Mark J. Kilgard, NVIDIA Corporation (mjk 'at' nvidia.com) |
| |
| Notice |
| |
| Copyright NVIDIA Corporation, 2001-2002. |
| |
| IP Status |
| |
| NVIDIA Proprietary. |
| |
| Status |
| |
| Implemented in CineFX (NV30) Emulation driver, August 2002. |
| Shipping in Release 40 NVIDIA driver for CineFX hardware, January 2003. |
| |
| Version |
| |
| Last Modified Date: 2005/05/24 |
| NVIDIA Revision: 73 |
| |
| Number |
| |
| 282 |
| |
| Dependencies |
| |
| Written based on the wording of the OpenGL 1.2.1 specification and |
| requires OpenGL 1.2.1. |
| |
| Requires support for the ARB_multitexture extension with at least |
| two texture units. |
| |
| NV_vertex_program affects the definition of this extension. The only |
| dependency is that both extensions use the same mechanisms for defining |
| and binding programs. |
| |
| NV_texture_shader trivially affects the definition of this extension. |
| |
| NV_texture_rectangle trivially affects the definition of this extension. |
| |
| ARB_texture_cube_map trivially affects the definition of this extension. |
| |
| EXT_fog_coord trivially affects the definition of this extension. |
| |
| NV_depth_clamp affects the definition of this extension. |
| |
| ARB_depth_texture and SGIX_depth_texture affect the definition of this |
| extension. |
| |
| NV_float_buffer affects the definition of this extension. |
| |
| ARB_vertex_program affects the definition of this extension. |
| |
| ARB_fragment_program affects the definition of this extension. |
| |
| Overview |
| |
| OpenGL mandates a certain set of configurable per-fragment computations |
| defining texture lookup, texture environment, color sum, and fog |
| operations. Each of these areas provide a useful but limited set of fixed |
| operations. For example, unextended OpenGL 1.2.1 provides only four |
| texture environment modes, color sum, and three fog modes. Many OpenGL |
| extensions have either improved existing functionality or introduced new |
| configurable fragment operations. While these extensions have enabled new |
| and interesting rendering effects, the set of effects is limited by the |
| set of special modes introduced by the extension. This lack of |
| flexibility is in contrast to the high-level of programmability of |
| general-purpose CPUs and other (frequently software-based) shading |
| languages. The purpose of this extension is to expose to the OpenGL |
| application writer an unprecedented degree of programmability in the |
| computation of final fragment colors and depth values. |
| |
| This extension provides a mechanism for defining fragment program |
| instruction sequences for application-defined fragment programs. When in |
| fragment program mode, a program is executed each time a fragment is |
| produced by rasterization. The inputs for the program are the attributes |
| (position, colors, texture coordinates) associated with the fragment and a |
| set of constant registers. A fragment program can perform mathematical |
| computations and texture lookups using arbitrary texture coordinates. The |
| results of a fragment program are new color and depth values for the |
| fragment. |
| |
| This extension defines a programming model including a 4-component vector |
| instruction set, 16- and 32-bit floating-point data types, and a |
| relatively large set of temporary registers. The programming model also |
| includes a condition code vector which can be used to mask register writes |
| at run-time or kill fragments altogether. The syntax, program |
| instructions, and general semantics are similar to those in the |
| NV_vertex_program and NV_vertex_program2 extensions, which provide for the |
| execution of an arbitrary program each time the GL receives a vertex. |
| |
| The fragment program execution environment is designed for efficient |
| hardware implementation and to support a wide variety of programs. By |
| design, the entire set of existing fragment programs defined by existing |
| OpenGL per-fragment computation extensions can be implemented using the |
| extension's programming model. |
| |
| The fragment program execution environment accesses textures via |
| arbitrarily computed texture coordinates. As such, there is no necessary |
| correspondence between the texture coordinates and texture maps previously |
| lumped into a single "texture unit". This extension separates the notion |
| of "texture coordinate sets" and "texture image units" (texture maps and |
| associated parameters), allowing implementations with a different number |
| of each. The initial implementation of this extension will support 8 |
| texture coordinate sets and 16 texture image units. |
| |
| Issues |
| |
| What limitations exist in this extension? |
| |
| RESOLVED: Very few. Programs can not exceed a maximum program length |
| (which is no less than 1024 instructions), and can use no more than |
| 32-64 temporary registers. Programs can not access more than one |
| fragment attribute or program parameter (constant) per instruction, |
| but can work around this restriction using temporaries. The number of |
| textures that can be used by a program is limited to the number of |
| texture image units provided by the implementation (16 in the initial |
| implementation of this extension). |
| |
| These limits are fairly high. Additionally, there is no limit on the |
| total number of texture lookups that can be performed by a program. |
| There is no limit on the length of a texture dependency chain -- one |
| can write a program that performs over 1000 consecutive dependent |
| texture lookups. There is no restrictions on dependencies between |
| texture mapping instructions and arithmetic instructions. Texture |
| lookups can be performed using arbitrarily computed texture |
| coordinates. Applications can carry out their calculations with full |
| 32-bit single precision, although two lower-precision modes are also |
| available. |
| |
| How does texture mapping work with fragment programs? |
| |
| RESOLVED: This extension provides three instructions used to perform |
| texture lookups. |
| |
| The "TEX" instruction performs a lookup with the (s,t,r) values taken |
| from an interpolated texture coordinate, an arbitrarily computed |
| vector, or even a program constant. The "TXP" instruction performs a |
| similar lookup, except that it uses the fourth component of the source |
| vector to performs a perspective divide, using (s/q, t/q, r/q). In |
| both cases, the GL will automatically compute partial derivatives used |
| for filter and LOD selection. |
| |
| The "TXD" instruction operates like "TEX", except that it allows the |
| program to explicitly specify two additional vectors containing the |
| partial derivatives of the texture coordinate with respect to x and y |
| window coordinates. |
| |
| All three instructions write a filtered texel value to a temporary or |
| output register. Other than the computation of texture coordinates |
| and partial derivatives, texture lookups not performed any differently |
| in fragment program mode. In particular, any applicable LOD biases, |
| wrap modes, minification and magnification filters, and anisotropic |
| filtering controls are still applied in fragment program mode. |
| |
| The results of the texture lookup are available to be used arbitrarily |
| by subsequent fragment program instructions. Fragment programs are |
| allowed to access any texture map arbitrarily many times. |
| |
| Can fragment programs be used to compute depth values? |
| |
| RESOLVED: Yes. A fragment program can perform arbitrary |
| computations to compute a final value for the fragment, which it |
| should write to the "z" component of the o[DEPR] register. The "z" |
| value written should be in the range [0,1], regardless of the size of |
| the depth buffer. |
| |
| To assist in the computation of the final Z value, a fragment program |
| can access the interpolated depth of the fragment (prior to any |
| displacement) by reading the "z" component of the f[WPOS] attribute |
| register. |
| |
| How should near and far plane clipping work in fragment program mode if |
| the current fragment program computes a depth value? |
| |
| RESOLVED: Geometric clipping to the near and far clip plane should be |
| disabled. Clipping should be done based on the depth values computed |
| per-fragment. The rationale is that per-fragment depth displacement |
| operations may effectively move portions of a primitive initially |
| outside the clip volume inside, and vice versa. |
| |
| Note that under the NV_depth_clamp extension, geometric clipping to |
| the near and far clip planes is also disabled, and the fragment depth |
| values are clamped to the depth range. If depth clamp mode is enabled |
| when using a fragment program that computes a depth value, the |
| computed depth value will be clamped to the depth range. |
| |
| Should fragment programs be allowed to use multiple precisions for |
| operands and operations? |
| |
| RESOLVED: Yes. Low-precision operands are generally adequate for |
| representing colors. Allowing low-precision registers also allows for |
| a larger number of temporary registers (at lower precision). |
| Low-precision operations also provide the opportunity for a higher |
| level of performance. |
| |
| Applications are free to use only high-precision operations or mix |
| high- and low-precision operations as necessary. |
| |
| What levels of precision are supported in arithmetic operations? |
| |
| RESOLVED: Arithmetic operations can be performed at three different |
| precisions. 32-bit floating point precision (fp32) uses the IEEE |
| single-precision standard with a sign bit, 8 exponent bits, and 23 |
| mantissa bits. 16-bit floating-point precision (fp16) uses a similar |
| floating-point representation, but with 5 exponent bits and 10 |
| mantissa bits. Additionally, many arithmetic operations can also be |
| carried out at 12-bit fixed point precision (fx12), where values in |
| the range [-2,+2) are represented as signed values with 10 fraction |
| bits. |
| |
| How should the precision with which operations are carried out be |
| specified? Should we infer the precision from the types of the operands |
| or result vectors? Or should it be an attribute of the instruction? |
| |
| RESOLVED: Applications can optionally specify the precision of |
| individual instructions by adding a suffix of "R", "H", and "X" to |
| instruction names to select fp32, fp16, and fx12 precision, |
| respectively. |
| |
| By default, instructions will be carried out using the precision of |
| the destination register. Always inferring the precision from the |
| operands has a number of issues. First, there are a number of |
| operations (e.g., TEX/TXP/TXD) where result type has little to no |
| correspondance to the type of the operands. In these cases, precision |
| suffixes are not supported. Second, one could have instructions |
| automatically cast operands and compute results using the type of the |
| highest precision operand or result. This behavior would be |
| problematic since all fragment attribute registers and program |
| parameters are kept at full precision, but full precision may not be |
| needed by the operation. |
| |
| The choice of precision level allows programs to trade off precision |
| for potentially higher performance. Giving the program explicit |
| control over the precision also allows it to dictate precision |
| explicitly and eliminate any uncertainty over type casting. |
| |
| For instructions whose specified precision is different than the precision |
| of the operands or the result registers, how are the operations performed? |
| How are the condition codes updated? |
| |
| RESOLVED: Operations are performed with operands and results at the |
| precision specified by the instruction. After the operation is |
| complete, the result is converted to the precision of the destination |
| register, after which the condition code is generated. |
| |
| In an alternate approach, the condition code could be generated from |
| the result. However, in some cases, the register contents would not |
| match the condition code. In such cases, it may not be reliable to |
| use the condition code to prevent division by zero or other special |
| cases. |
| |
| How does this extension interact with the ARB_multisample extension? In |
| the ARB_multisample extension, each fragment has multiple depth values. |
| In this extension, a single interpolated depth value may be modified by a |
| fragment program. |
| |
| RESOLVED: The depth values for the extra samples are generated by |
| computing partials of the computed depth value and using these |
| partials to derive the depth values for each of the extra samples. |
| |
| How does this extension interact with polygon offset? Both extensions |
| modify fragment depth values. |
| |
| RESOLVED: As in the base OpenGL spec, the depth offset generated by |
| polygon offset is added during polygon rasterization. The depth value |
| provided to programs in f[WPOS].z already includes polygon offset, if |
| enabled. If the depth value is replaced by a fragment program, the |
| polygon offset value will NOT be recomputed and added back after |
| program execution. |
| |
| This is probably not desirable for fragment programs that modify depth |
| values since the partials used to generate the offset may not match |
| the partials of the computed depth value. Polygon offset for filled |
| polygons can be approximated in a fragment program using the depth |
| partials obtained by the DDX and DDY instructions. This will not work |
| properly for line- and point-mode polygons, since the partials used |
| for offset are computed over the polygon, while the partials resulting |
| from the DDX and DDY instructions are computed along the line (or are |
| zero for point-mode polygons). In addition, separate treatment of |
| points, line segments, and polygons is not possible in a fragment |
| program. |
| |
| Should depth component replacement be an property of the fragment program |
| or a separate enable? |
| |
| RESOLVED: It should be a program property. Using the output register |
| notation simplifies matters: depth components are replaced if and |
| only if the DEPR register is written to. This alleviates the |
| application and driver burden of maintaining separate state. |
| |
| How does this extension affect the handling of q texture coordinates in |
| the OpenGL spec? |
| |
| RESOLVED: Fragment programs are allowed to access an associated q |
| texture coordinate, so this attribute must be produced by |
| rasterization. In unextended OpenGL 1.2, the q coordinate is |
| eliminated in the rasterization portions of the spec after dividing |
| each of s, t, and r by it. This extension updates the specification |
| to pass q coordinates through at least to conventional texture |
| mapping. When fragment program mode are disabled, q coordinates will |
| be eliminated there in an identical manner. This modification has the |
| added benefit of simplifying the equations used for attribute |
| interpolation. |
| |
| How should clip w coordinates be handled by this extension? |
| |
| RESOLVED: Fragment programs are allowed to access the reciprocal of |
| the clip w coordinate, so this attribute must be produced by |
| rasterization. The OpenGL 1.2 spec doesn't explictly enumerate the |
| attributes associated with the fragment, but we add treatment of the w |
| clip coordinate in the appropriate locations. |
| |
| The reciprocal of the clip w coordinate in traditional graphics |
| hardware is produced by screen-space linear interpolation of the |
| reciprocals of the clip w coordinates of the vertices. However, this |
| spec says the clip w coordinate is produced by perspective-correct |
| interpolation of the (non-reciprocated) clip w vertex coordinates. |
| These two formulations turn out to be equivalent, and the latter is |
| more convenient since the core OpenGL spec already contains formulas |
| for perspective-correct interpolation of vertex attributes. |
| |
| What is produced by the TEX/TXP/TXD instructions if the requested texture |
| image is inconsistent? |
| |
| RESOLVED: The result vector is specified to be (0,0,0,0). This |
| behavior is consistent with the NV_texture_shader extension. Note |
| that like in NV_texture_shader, these instructions ignore the standard |
| hierarchy of texture enables and programs can access textures that are |
| not specifically "enabled". |
| |
| Should a minimum precision be specified for certain fragment attribute |
| registers (in particular COL0, COL1) that may not be generated with full |
| fp32 precision? |
| |
| RESOLVED: No. It is expected that the precision of COL0/COL1 should |
| generally be at least as high as that of the frame buffer. |
| |
| Fragment color components (f[COL0] and f[COL1]) are generally |
| low-precision fixed-point values in the range [0,1]. Is it possible to |
| pass unclamped or high-precision color components to fragment programs? |
| |
| RESOLVED: Yes, although you can't exactly call them "colors". |
| High-precision per-vertex color values can be written into any unused |
| texture coordinate set, either via a MultiTexCoord call or using a |
| vertex program. These "texture coordinates" will be interpolated |
| during rasterization, and can be used arbitrarily by a fragment |
| program. |
| |
| In particular, there is no requirement that per-fragment attributes |
| called "texture coordinates" be used for texture mapping. |
| |
| Should this specification guarantee that temporary registers are |
| initialized to zero? |
| |
| RESOLVED: Yes. This will allow for the modular construction of |
| programs that accumulate results in registers. For example, |
| per-fragment lighting may use MAD instructions to accumulate color |
| contributions at each light. Without zero-initialization, the program |
| would require an explicit MOV instruction to load 0 or the use of the |
| MUL instruction for the first light. |
| |
| Should this specification support Unicode program strings? |
| |
| RESOLVED: Not necessary. |
| |
| Programs defined by NV_vertex_program begin with "!!VP1.0". Should |
| fragment programs have a similar identifier? |
| |
| RESOLVED: Yes, "!!FP1.0", identifying the first revision of this |
| fragment program language. |
| |
| Should per-fragment attributes have equivalent integer names in the |
| program language, as per-vertex attributes do in NV_vertex_program? |
| |
| RESOLVED: No. In NV_vertex_program, "generic" vertex attributes |
| could be specified directly by an application using only an attribute |
| number. Those numbers may have no necessary correlation with the |
| conventional attribute names, although conventional vertex attributes |
| are mapped to attribute numbers. However, conventional attributes are |
| the only outputs of vertex programs and of rasterization. Therefore, |
| there is no need for a similar input-by-number functionality for |
| fragment programs. |
| |
| Should we provide the ability to issue instructions that do not update |
| temporary or output registers? |
| |
| RESOLVED: Yes. Programs may issue instructions whose only purpose is |
| to update the condition code register, and requiring such instructions |
| to write to a temporary may require the use of an additional temporary |
| and/or defeat possible program optimizations. We accomplish this by |
| adding two write-only temporary pseudo-registers ("RC" and "HC") that |
| can be specified as destination registers. |
| |
| Do the packing and unpacking instructions in this extension make any |
| sense? |
| |
| RESOLVED: Yes. They are useful for packing and unpacking multiple |
| components in a single channel of a floating-point frame buffer. For |
| example, a 128-bit "RGBA" frame buffer could pack 16 8-bit quantities |
| or 8 16-bit quantities, all of which could be used in later |
| rasterization passes. See the NV_float_buffer extension for more |
| information. |
| |
| Should we provide a method for specifying a fp16 depth component output |
| value? |
| |
| RESOLVED: No. There is no good reason for supporting half-precision |
| Z outputs. Even with 16-bit Z buffers, the 10-bit mantissa of the |
| half-precision float is rather limiting. There would effectively be |
| only 11 good bits in the back half of the Z buffer. |
| |
| Should RequestResidentProgramsNV (or a new equivalent function) take a |
| target? Dealing with working sets of different program types is a bit |
| messy. Should we document some limitation if we get programs of different |
| types? |
| |
| RESOLVED: In retrospect, it may have been a good idea to attach a |
| target to this command, but there isn't a good reason to mess with |
| something that already works for vertex programs. The driver is |
| responsible for ensuring consistent results when the program types |
| specified are mixed. |
| |
| What happens on data type conversions where the original value is not |
| exactly representable in the new data type, either due to overflow or |
| insufficient precision in the destination type? |
| |
| RESOLVED: In case of overflow, the original value is clamped to the |
| +/-INF (fp16 or fp32) or the nearest representable value (fx12). In |
| case of imprecision, the conversion is either to round or truncate to |
| the nearest representable value. |
| |
| Should this extension support IEEE-style denorms? For 32-bit IEEE |
| floating point, denorms are numbers smaller in absolute value than 2^-126. |
| For 16-bit floats used by this extension, denorms are numbers smaller in |
| absolute value than 2^-14. |
| |
| RESOLVED: For 32-bit data types, hardware support for denorms was |
| considered too expensive relative to the benefit provided. |
| Computational results that would otherwise produce denorms are flushed |
| to zero. For 16-bit data types, hardware denorm support will be |
| present. The expense of hardware denorm support is lower and the |
| potential precision benefit is greater for 16-bit data types. |
| |
| OpenGL provides a hierarchy of texture enables. The texture lookup |
| operations in NV_texture_shader effectively override the texture enable |
| hierarchy and select a specific texture to enable. What should be done by |
| this extension? |
| |
| RESOLVED: This extension will build upon NV_texture_shader and reduce |
| the driver overhead of validating the texture enables. Texture |
| lookups can be specified by instructions like "TEX H0, f[TEX2], TEX2, |
| 3D", which would indicate to use texture coordinate set number 2 to do |
| a lookup in the texture object bound to the TEXTURE_3D target in |
| texture image unit 2. |
| |
| Each texture unit can have only one "active" target. Programs are not |
| allowed to reference different texture targets in the same texture |
| image unit. In the example above, any other texture instructions |
| using texture image unit 2 must specify the 3D texture target. |
| |
| What is the interaction with NV_register_combiners? |
| |
| RESOLVED: Register combiners are not available when fragment programs |
| are enabled. |
| |
| Previous version of this specification supported the notion of |
| combiner programs, where the result of fragment program execution was |
| a set of four "texture lookup" values that fed the register combiners. |
| |
| For convenience, should we include pseudo-instructions not present in the |
| hardware instruction set that are trivially implementable? For example, |
| absolute value and subtract instructions could fall in this category. An |
| "ABS R1,R0" instruction would be equivalent to "MAX R1,R0,-R0", and a "SUB |
| R2,R0,R1" would be equivalent to "ADD R2,R0,-R1" |
| |
| RESOLVED: In general, yes. A SUB instruction is provided for |
| convenience. This extension does not provide a separate ABS |
| instruction because it supports absolute value operations of each |
| operand. |
| |
| Should there be a '+' in the <optionalSign> portion of the grammar? There |
| isn't one in the GL_NV_vertex_program spec. |
| |
| RESOLVED: Yes, for orthogonality/readability. A '+' obviously adds |
| no functionality. In NV_vertex_program, an <optionalSign> of "-" was |
| always a negation operator. However, in fragment programs, it can |
| also be used as a sign for a constant value. |
| |
| Can the same fragment attribute register, program parameter register, or |
| constants be used for multiple operands in the same instruction? If so, |
| can it be used with different swizzle patterns? |
| |
| RESOLVED: Yes and yes. |
| |
| This extension allows different limits for the number of texture |
| coordinate sets and the number of texture image units (i.e., texture maps |
| and associated data). The state in ActiveTextureARB affects both |
| coordinate sets (TexGen, matrix operations) and image units (TexParameter, |
| TexEnv). How should we deal with this? |
| |
| RESOLVED: Continue to use ActiveTextureARB and emit an |
| INVALID_OPERATION if the active texture refers to an unsupported |
| coordinate set/image unit. Other options included creating dummy |
| (unusable) state for unsupported coordinate sets/image units and |
| continue to use ActiveTextureARB normally, or creating separate state |
| and state-setting commands for coordinate sets and image units. |
| Separate state is the cleanest solution, but would add more calls and |
| potentially cause more programmer confusion. Dummy state would avoid |
| additional error checks, but the demands of dummy state could grow if |
| the number of texture image units and texture coordinate sets |
| increases. |
| |
| The current OpenGL spec is vague as to what state is affected by the |
| active texture selector and has no distination between |
| coordinate-related and image-related state. The state tables could |
| use a good clean-up in this area. |
| |
| The LRP instruction is defined so that the result of "LRP R0, R0, R1, R2" |
| is R0*R1+(1-R0)*R2. There are conflicting precedents here. The |
| definition here matches the "lrp" instruction in the DirectX 8.0 pixel |
| shader language. However, an equivalent RenderMan lerp operation would |
| yield a result of (1-R0)*R1+R0*R2. Which ordering should be implemented? |
| |
| RESOLVED: NVIDIA hardware implements the former operand ordering, and |
| there is no good reason to specify a different ordering. To convert a |
| "LRP" using the latter ordering to NV_fragment_program, swap the third |
| and fourth arguments. |
| |
| Should this extension provide tracking of matrices or any other state, |
| similar to that provided in NV_vertex_program? |
| |
| RESOLVED: No. |
| |
| Should this extension provide global program parameters -- values shared |
| between multiple fragment programs? |
| |
| RESOLVED: No. |
| |
| Should this extension provide program parameters specific to a program? |
| If so, how? |
| |
| RESOLVED: Yes. These parameters will be called "local parameters". |
| This extension will provide both named and numbered local parameters. |
| Local parameters can be managed by the driver and eliminate the need |
| for applications to manage a global name space. |
| |
| Named local parameters work much like standard variable names in most |
| programming languages. They are created using the "DECLARE" |
| instruction within the fragment program itself. For example: |
| |
| DECLARE color = {1,0,0,1}; |
| |
| Named local parameters are used simply by referencing the variable |
| name. They do not require the array syntax like the global parameters |
| in the NV_vertex_program extension. They can be updated using the |
| commands ProgramNamedParameter4[f,fv]NV. |
| |
| Numbered local parameters are not declared. They are used by simply |
| referencing an element of an array called "p". For example, |
| |
| MOV R0, p[12]; |
| |
| loads the value of numbered local parameter 12 into register R0. |
| Numbered local parameters can be updated using the commands |
| ProgramLocalParameter4[d,dv,f,fv]ARB. |
| |
| The numbered local parameter APIs were added to this extension late in |
| its development, and are provided for compatibility with the |
| ARB_vertex_program extension, and what will likely be supported in |
| ARB_fragment_program as well. Providing this mechanism allows |
| programs to use the same mechanisms to set local parameters in both |
| extension. |
| |
| Why are the APIs for setting named and numbered local parameters |
| different? |
| |
| RESOLVED: The named parameter API was created prior to |
| ARB_vertex_program (and the possible future ARB_fragment_program) and |
| uses conventions borrowed from NV_vertex_program. A slightly |
| different API was chosen during the ARB standardization process; see |
| the ARB_vertex_program specification for more details. |
| |
| The named parameter API takes a program ID and a parameter name, and |
| sets the parameter for the program with the specified ID. The |
| specified program does not need to be bound (via BindProgramNV) in |
| order to modify the values of its named parameters. The numbered |
| parameter API takes a program target enum (FRAGMENT_PROGRAM_NV) and a |
| parameter number and modifies the corresponding numbered parameter of |
| the currently bound program. |
| |
| What should be the initial value of uninitialized local parameters? |
| |
| RESOLVED: (0,0,0,0). This choice is somewhat arbitrary, but matches |
| previous extensions (e.g., NV_vertex_program). |
| |
| Should this extension support program parameter arrays? |
| |
| RESOLVED: No hardware support is present. Note that from the point |
| of view of a fragment program, a texture map can be used as a 1-, 2-, |
| or 3-dimensional array of constants. |
| |
| Should this extension provide support constants in fragment programs? If |
| so, how? |
| |
| RESOLVED: Yes. Scalar or vector constants can be defined inline |
| (e.g., "1.0" or "{1,2,3,4}"). In addition, named constants are |
| supported using the "DEFINE" instruction, which allow programmers to |
| change the values of constants used in multiple instructions simply be |
| changing the value assigned to the named constant. |
| |
| Note that because this extension uses program strings, the |
| floating-point value of any constants generated on the fly must be |
| printed to the program string. An alternate method that avoids the |
| need to print constants is to declare a named local program parameter |
| and initialize it with the ProgramNamedParameter4[f,fv]() calls. |
| |
| Should named constants be allowed to be redefined? |
| |
| RESOLVED: No. If you want to redefine the values of constants, you |
| can create an equivalent named program parameter by changing the |
| "DEFINE" keyword to "DECLARE". |
| |
| Should functions used to update or query named local parameters take a |
| zero-terminated string (as with most strings in the C programming |
| language), or should they require an explicit string length? If the |
| former, should we create a version of LoadProgramNV that does not require |
| a string length. |
| |
| RESOLVED: Stick with explicit string length. Strings that are |
| defined as constants can have the length computed at compile-time. |
| Strings read from files will have the length known in advance. |
| Programs to build strings at run-time also likely keep the length |
| up-to-date. Passing an explicit length saves time, since the driver |
| doesn't have to do a strlen(). |
| |
| What is the deal with the alpha of the secondary color? |
| |
| RESOLVED: In unextended OpenGL 1.2, the alpha component of the |
| secondary color is forced to 0.0. In the EXT_secondary_color |
| extension, the alpha of the per-vertex secondary colors is defined to |
| be 0.0. NV_vertex_program allows vertex programs to produce a |
| per-vertex alpha component, but it is forced to zero for the purposes |
| of the color sum. In the NV_register_combiners extension, the alpha |
| component of the secondary color is undefined. What a mess. |
| |
| In this extension, the alpha of the secondary color is well-defined |
| and can be used normally. When in vertex program mode |
| |
| Why are fragment program instructions involving f[FOGC] or f[TEX0] through |
| f[TEX7] automatically carried out at full precision? |
| |
| RESOLVED: This is an artifact of the method that these interpolants |
| are generated the NVIDIA graphics hardware. If such instructions |
| absolutely must be carried out at lower precision, the requirement can |
| be met by first loading the interpolants into a temporary register. |
| |
| With a different number of texture coordinate sets and texture image |
| units, how many copies of each kind of texture state are there? |
| |
| RESOLVED: The intention is that texture state be broken into three |
| groups. (1) There are MAX_TEXTURE_COORDS_NV copies of texture |
| coordinate set state, which includes current texture coordinates, |
| TexGen state, and texture matrices. (2) There are |
| MAX_TEXTURE_IMAGE_UNITS_NV copies of texture image unit state, which |
| include texture maps, texture parameters, LOD bias parameters. (3) |
| There are MAX_TEXTURE_UNITS_ARB copies of legacy OpenGL texture unit |
| state (e.g., texture enables, TexEnv blending state), all of which are |
| unused when in fragment program mode. |
| |
| It is not necessary that MAX_TEXTURE_UNITS_ARB be equal to the minimum |
| of MAX_TEXTURE_COORDS_NV and MAX_TEXTURE_IMAGE_UNITS -- |
| implementations may choose not to extend fixed-function OpenGL texture |
| mapping modes beyond a certain point. |
| |
| The GLX protocol for LoadProgramNV (and ProgramNamedParameterNV) may end |
| up with programs >64KB. This will overflow the limits of the GLX Render |
| protocol, resulting in the need to use RenderLarge path. This is an issue |
| with vertex programs, also. |
| |
| RESOLVED: Yes, it is. |
| |
| Should textures used by fragment programs be declared? For example, |
| "TEXTURE TEX3, 2D", indicating that the 2D texture should be used for all |
| accesses to texture unit 3. The dimension could be dropped from the TEX |
| family of instructions, and some of the compile-time error checking could |
| be dropped. |
| |
| RESOLVED: Maybe it should be, but for better or worse, it isn't. |
| |
| It is not all that uncommon to have negative q values with projective |
| texture mapping, but results are undefined if any q values are negative in |
| this specification. Why? |
| |
| RESOLVED: This restriction carries on a similar one in the initial |
| OpenGL specification. The motivation for this restriction is that |
| when interpolating, it is possible for a fragment to have an |
| interpolated q coordinate at or near 0.0. Since the texture |
| coordinates used for projective texture mapping are s/q, t/q, and r/q, |
| this will result in a divide-by-zero error or suffer from significant |
| numerical instability. Results will be inaccurate for such fragments. |
| |
| Other than the numerical stability issue above, NVIDIA hardware should |
| have no problems with negative q coordinates. |
| |
| Should programs that replace depth have their own special program type, |
| Such as "!!FPD1.0" and "!!FPDC1.0"? |
| |
| RESOLVED: No. If a program has an instruction that writes to |
| o[DEPR], the final fragment depth value is taken from o[DEPR].z. |
| Otherwise, the fragment's original depth value is used. |
| |
| What fx12 value should NaN map to? |
| |
| RESOLVED: For the lack of any better choice, 0.0. |
| |
| How are special-case encodings (-INF, +INF, -0.0, +0.0, NaN) handled for |
| arithmetic and comparison operations? |
| |
| RESOLVED: The special cases for all floating-point operations are |
| designed to match the IEEE specification for floating-point numbers as |
| closely as possible. The results produced by special cases should be |
| enumerated in the sections of this spec describing the operations. |
| There are some cases where the implemented fragment program behavior |
| does not match IEEE conventions, and these cases should be noted in |
| this specification. |
| |
| How can condition codes be used to mask out register writes? How about |
| killing fragments? What other things can you do? |
| |
| RESOLVED: The following example computes a component wise |R1-R2|: |
| |
| SUBC R0, R1, R2; # "C" suffix means update condition code |
| MOV R0 (LT), -R0; # Conditional write mask in parentheses |
| |
| The first instruction computes a component-wise difference between R1 |
| and R2, storing R1-R2 in register R0. The "C" suffix in the |
| instruction means to update the condition code based on the sign of |
| the result vector components. The second instruction inverts the sign |
| of the components of R0. However the "(LT)" portion says that the |
| destination register should be updated only if the corresponding |
| condition code component is LT (negative). This means that only those |
| components of R0 |
| |
| To kill a fragment if the red (x) component of a texture lookup |
| returns zero: |
| |
| TEXC R0, f[TEX0], TEX0, 2D; |
| KIL EQ.x; |
| |
| To kill based on the green (y) component, use "EQ.y" instead. To kill |
| if any of the four components is zero, use "EQ.xyzw" or just "EQ". |
| |
| Fragment programs do not support boolean expressions. These can |
| generally be achieved using conditional write mask. |
| |
| To evaluate the expression "(R0.x == 0) && (R1.x == 0)": |
| |
| MOVC RC.x, R0.x; |
| MOVC RC.x (EQ), R1.x; |
| |
| To evaluate the expression "(R0.x == 0) || (R1.x == 0)": |
| |
| MOVC RC.x, R0.x; |
| MOVC RC.x (NE), R1.x; |
| |
| In both cases, the x component of the condition code will contain "EQ" |
| if and only if the condition is TRUE. |
| |
| How can fragment programs be used to implement non-standard texture |
| filtering modes? |
| |
| RESOLVED: As one example, consider a case where you want to do linear |
| filtering in a 2D texture map, but only horizontally. To achieve |
| this, first set the texture filtering mode to NEAREST. For a 16 x n |
| texture, you might do something like: |
| |
| DEFINE halfTexel = { 0.03125, 0 }; # 1/32 (1/2 a texel) |
| ADD R2, f[TEX0], -halfTexel; # coords of left sample |
| ADD R1, f[TEX0], +halfTexel; # coords of right sample |
| TEX R0, R2, TEX0, 2D; # lookup left sample |
| TEX R1, R1, TEX0, 2D; # lookup right sample |
| MUL R2.x, R2.x, 16; # scale X coords to texels |
| FRC R2.x, R2.x; # get fraction, filter weight |
| LRP R0, R2.x, R1, R0; # blend samples based on weight |
| |
| There are plenty of other interesting things that can be done. |
| |
| Should this specification provide more examples? |
| |
| RESOLVED: Yes, it should. |
| |
| Is the OpenGL ARB working on a multi-vendor standard for fragment |
| programmability? Will there be an ARB_fragment_program extension? If so, |
| how will this extension interact with the ARB standard? |
| |
| RESOLVED: Yes, as of July 2002, there was a multi-vendor working |
| group and a draft specification. The ARB extension is expected to |
| have several features not present in this extension, such as state |
| tracking and global parameters (called "program environment |
| parameters"). It will also likely lack certain features found in this |
| extension. |
| |
| Why does the HEMI mapping apply to the third component of signed HILO |
| textures, but not to unsigned HILO textures? |
| |
| RESOLVED: This behavior matches the behavior of NV_texture_shader |
| (e.g., the DOT_PRODUCT_NV mode). The HEMI mapping will construct the |
| third component of a unit vector whose first two components are |
| encoded in the HILO texture. |
| |
| |
| New Procedures and Functions |
| |
| void ProgramNamedParameter4fNV(uint id, sizei len, const ubyte *name, |
| float x, float y, float z, float w); |
| void ProgramNamedParameter4dNV(uint id, sizei len, const ubyte *name, |
| double x, double y, double z, double w); |
| void ProgramNamedParameter4fvNV(uint id, sizei len, const ubyte *name, |
| const float v[]); |
| void ProgramNamedParameter4dvNV(uint id, sizei len, const ubyte *name, |
| const double v[]); |
| void GetProgramNamedParameterfvNV(uint id, sizei len, const ubyte *name, |
| float *params); |
| void GetProgramNamedParameterdvNV(uint id, sizei len, const ubyte *name, |
| double *params); |
| |
| void ProgramLocalParameter4dARB(enum target, uint index, |
| double x, double y, double z, double w); |
| void ProgramLocalParameter4dvARB(enum target, uint index, |
| const double *params); |
| void ProgramLocalParameter4fARB(enum target, uint index, |
| float x, float y, float z, float w); |
| void ProgramLocalParameter4fvARB(enum target, uint index, |
| const float *params); |
| void GetProgramLocalParameterdvARB(enum target, uint index, |
| double *params); |
| void GetProgramLocalParameterfvARB(enum target, uint index, |
| float *params); |
| |
| |
| New Tokens |
| |
| Accepted by the <cap> parameter of Disable, Enable, and IsEnabled, by the |
| <pname> parameter of GetBooleanv, GetIntegerv, GetFloatv, and GetDoublev, |
| and by the <target> parameter of BindProgramNV, LoadProgramNV, |
| ProgramLocalParameter4dARB, ProgramLocalParameter4dvARB, |
| ProgramLocalParameter4fARB, ProgramLocalParameter4fvARB, |
| GetProgramLocalParameterdvARB, and GetProgramLocalParameterfvARB: |
| |
| FRAGMENT_PROGRAM_NV 0x8870 |
| |
| Accepted by the <pname> parameter of GetBooleanv, GetIntegerv, GetFloatv, |
| and GetDoublev: |
| |
| MAX_TEXTURE_COORDS_NV 0x8871 |
| MAX_TEXTURE_IMAGE_UNITS_NV 0x8872 |
| FRAGMENT_PROGRAM_BINDING_NV 0x8873 |
| MAX_FRAGMENT_PROGRAM_LOCAL_PARAMETERS_NV 0x8868 |
| |
| Accepted by the <name> parameter of GetString: |
| |
| PROGRAM_ERROR_STRING_NV 0x8874 |
| |
| |
| Additions to Chapter 2 of the OpenGL 1.2.1 Specification (OpenGL Operation) |
| |
| Modify Section 2.11, Clipping (p.39) |
| |
| (replace the first paragraph of the section, p. 39) Primitives are clipped |
| to the clip volume. In clip coordinates, the view volume is defined by |
| |
| -w_c <= x_c <= w_c, |
| -w_c <= y_c <= w_c, and |
| -w_c <= z_c <= w_c. |
| |
| Clipping to the near and far clip planes is ignored if fragment program |
| mode (section 3.11) or texture shaders (see NV_texture_shader |
| specification) are enabled, if the current fragment program or texture |
| shader computes per-fragment depth values. In this case, the view volume |
| is defined by: |
| |
| -w_c <= x_c <= w_c and |
| -w_c <= y_c <= w_c. |
| |
| |
| Additions to Chapter 3 of the OpenGL 1.2.1 Specification (Rasterization) |
| |
| Modify Chapter 3 introduction (p. 57) |
| |
| (p.57, modify 1st paragraph) ... Figure 3.1 diagrams the rasterization |
| process. The color value assigned to a fragment is initially determined |
| by the rasterization operations (Sections 3.3 through 3.7) and modified by |
| either the execution of the texturing, color sum, and fog operations as |
| defined in Sections 3.8, 3.9, and 3.10, or of a fragment program defined |
| in Section 3.11. The final depth value is initially determined by the |
| rasterization operations and may be modified by a fragment program. |
| |
| note: Antialiasing Application is renumbered from Section 3.11 to Section |
| 3.12. |
| |
| Modify Figure 3.1 (p.58) |
| |
| Primitive Assembly |
| | |
| +-----------+-----------+-----------+-----------+ |
| | | | | | |
| | | | Pixel | |
| Point Line Polygon Rectangle Bitmap |
| Raster- Raster- Raster- Raster- Raster- |
| ization ization ization ization ization |
| | | | | | |
| +-----------+-----------+-----------+-----------+ |
| | |
| | |
| +-----------------+-----------------+ |
| | | | |
| Conventional Texture Fragment |
| Texture Fetch Shaders Programs |
| | | | |
| | +--------------+ | |
| | | | |
| TEXTURE_ o o | |
| SHADER_NV | |
| enable o | |
| | | |
| +-------------+ | |
| | | | |
| Conventional Register | |
| TexEnv Combiners | |
| | | | |
| Color Sum | | |
| | | | |
| Fog | | |
| | | | |
| | +----------+ | |
| | | | |
| REGISTER_ o o | |
| COMBINERS_ | |
| NV enable o | |
| | | |
| +-----------------+ +--------------+ |
| | | |
| FRAGMENT_ o o |
| PROGRAM_ |
| NV enable o |
| | |
| | |
| Coverage |
| Application |
| | |
| v |
| to fragment processing |
| |
| |
| Modify Section 3.3, Points (p.61) |
| |
| All fragments produced in rasterizing a non-antialiased point are assigned |
| the same associated data, which are those of the vertex corresponding to |
| the point. (delete reference to divide by q). |
| |
| If anitialiasing is enabled, then ... The data associated with each |
| fragment are otherwise the data associated with the point being |
| rasterized. (delete reference to divide by q) |
| |
| Modify Section 3.4.1, Basic Line Segment Rasterization (p.66) |
| |
| (Note that t=0 at p_a and t=1 at p_b). The value of an associated datum f |
| from the fragment, whether it be R, G, B, or A (in RGBA mode) or a color |
| index (in color index mode), the s, t, r, or q texture coordinate, or the |
| clip w coordinate (the depth value, window z, must be found using equation |
| 3.3, below), is found as |
| |
| f = (1-t) * f_a / w_a + t * f_b / w_b (3.2) |
| --------------------------------- |
| (1-t) / w_a + t / w_b |
| |
| where f_a and f_b are the data associated with the starting and ending |
| endpoints of the segment, respectively; w_a and w_b are the clip |
| w coordinates of the starting and ending endpoints of the segments |
| respectively. Note that linear interpolation would use |
| |
| f = (1-t) * f_a + t * f_b. (3.3) |
| |
| ... A GL implementation may choose to approximate equation 3.2 with 3.3, |
| but this will normally lead to unacceptable distortion effects when |
| interpolating texture coordinates or clip w coordinates. |
| |
| Modify Section 3.5.1, Basic Polygon Rasterization (p.71) |
| |
| Denote a datum at p_a, p_b, or p_c ... is given by |
| |
| f = a * f_a / w_a + b * f_b / w_b + c * f_c / w_c (3.4) |
| --------------------------------------------- |
| a / w_a + b / w_b + c / w_c |
| |
| where w_a, w_b, and w_c are the clip w coordinates of p_a, p_b, and p_c, |
| respectively. a, b, and c are the barycentric coordinates of the fragment |
| for which the data are produced. a, b, and c must correspond precisely to |
| the exact coordinates ... at the fragment's center. |
| |
| Just as with line segment rasterization, equation 3.4 may be approximated |
| by |
| |
| f = a * f_a + b * f_b + c * f_c; (3.5) |
| |
| this may yield ... for texture coordinates or clip w coordinates. |
| |
| Modify Section 3.6.4, Rasterization of Pixel Rectangles (p.100) |
| |
| A fragment arising from a group ... are given by those associated with the |
| current raster position. (delete reference to divide by q) |
| |
| Modify Section 3.7, Bitmaps (p.111) |
| |
| Otherwise, a rectangular array ... The associated data for each fragment |
| are those associated with the current raster position. (delete reference |
| to divide by q) Once the fragments have been produced ... |
| |
| Modify Section 3.8, Texturing (p.112) |
| |
| ... an image at the location indicated by a fragment's texture coordinates |
| to modify the fragments primary RGBA color. Texturing does not affect the |
| secondary color. |
| |
| Texturing is specified only for RGBA mode; its use in color index mode is |
| undefined. |
| |
| Except when in fragment program mode (Section 3.11), the (s,t,r) texture |
| coordinates used for texturing are the values s/q, t/q, and r/q, |
| respectively, where s, t, r, and q are the texture coordinates associated |
| with the fragment. When in fragment program mode, the (s,t,r) texture |
| coordinates are specified by the program. If q is less than or equal to |
| zero, the results of texturing are undefined. |
| |
| Add new Section 3.11, Fragment Programs (p.140) |
| |
| Fragment program mode is enabled and disabled with the Enable and Disable |
| commands using the symbolic constant FRAGMENT_PROGRAM_NV. When fragment |
| program mode is enabled, standard and extended texturing, color sum, and |
| fog application stages are ignored and a general purpose program is |
| executed instead. |
| |
| A fragment program is a sequence of instructions that execute on a |
| per-fragment basis. In fragment program mode, the currently bound |
| fragment program is executed as each fragment is generated by the |
| rasterization operations. Fragment programs execute a finite fixed |
| sequence of instructions with no branching or looping, and operate |
| independently from the processing of other fragments. Fragment programs |
| are used to compute new color values to be associated with each fragment, |
| and can optionally compute a new depth value for each fragment as well. |
| |
| Fragment program mode is not available in color index mode and is |
| considered disabled, regardless of the state of FRAGMENT_PROGRAM_NV. When |
| fragment program mode is enabled, texture shaders and register combiners |
| (NV_texture_shader and NV_register_combiners extension) are disabled, |
| regardless of the state of TEXTURE_SHADER_NV and REGISTER_COMBINERS_NV. |
| |
| Section 3.11.1, Fragment Program Registers |
| |
| Fragment programs operate on a set of program registers. Each program |
| register is a 4-component vector, whose components are referred to as "x", |
| "y", "z", and "w" respectively. The components of a fragment register are |
| always referred to in this manner, regardless of the meaning of their |
| contents. |
| |
| The four components of each fragment program register have one of two |
| different representations: 32-bit floating-point (fp32) or 16-bit |
| floating-point (fp16). More details on these representations can be found |
| in Section 3.11.4.1. |
| |
| There are several different classes of program registers. Attribute |
| registers (Table X.1) correspond to the fragment's associated data |
| produced by rasterization. Temporary registers (Table X.2) hold |
| intermediate results generated by the fragment program. Output registers |
| (Table X.3) hold the final results of a fragment program. The single |
| condition code register is used to mask writes to other registers or to |
| determine if a fragment should be discarded. |
| |
| |
| Section 3.11.1.1, Fragment Program Attribute Registers |
| |
| The fragment program attribute registers (Table X.1) hold the location of |
| the fragment and the data associated with the fragment produced by |
| rasterization. |
| |
| Fragment Attribute Component |
| Register Name Description Interpretation |
| -------------- ----------------------------------- -------------- |
| f[WPOS] Position of the fragment center. (x,y,z,1/w) |
| f[COL0] Interpolated primary color (r,g,b,a) |
| f[COL1] Interpolated secondary color (r,g,b,a) |
| f[FOGC] Interpolated fog distance/coord (z,0,0,0) |
| f[TEX0] Texture coordinate (unit 0) (s,t,r,q) |
| f[TEX1] Texture coordinate (unit 1) (s,t,r,q) |
| f[TEX2] Texture coordinate (unit 2) (s,t,r,q) |
| f[TEX3] Texture coordinate (unit 3) (s,t,r,q) |
| f[TEX4] Texture coordinate (unit 4) (s,t,r,q) |
| f[TEX5] Texture coordinate (unit 5) (s,t,r,q) |
| f[TEX6] Texture coordinate (unit 6) (s,t,r,q) |
| f[TEX7] Texture coordinate (unit 7) (s,t,r,q) |
| |
| Table X.1: Fragment Attribute Registers. The component interpretation |
| column describes the mapping of attribute values to register components. |
| For example, the "x" component of f[COL0] holds the red color component, |
| and the "x" component of f[TEX0] holds the "s" texture coordinate for |
| texture unit 0. The entries "0" and "1" indicate that the attribute |
| register components hold the constants 0 and 1, respectively. |
| |
| f[WPOS].x and f[WPOS].y hold the (x,y) window coordinates of the fragment |
| center, and relative to the lower left corner of the window. f[WPOS].z |
| holds the associated z window coordinate, normally in the range [0,1]. |
| f[WPOS].w holds the reciprocal of the associated clip w coordinate. |
| |
| f[COL0] and f[COL1] hold the associated RGBA primary and secondary colors |
| of the fragment, respectively. |
| |
| f[FOGC] holds the associated eye distance or fog coordinate normally used |
| for fog computations. |
| |
| f[TEX0] through f[TEX7] hold the associated texture coordinates for |
| texture coordinate sets 0 through 7, respectively. |
| |
| All attribute register components are treated as 32-bit floats. However, |
| the components of primary and secondary colors (f[COL0] and f[COL1]) may |
| be generated with reduced precision. |
| |
| The contents of the fragment attribute registers may not be modified by a |
| fragment program. In addition, each fragment program instruction can use |
| at most one unique attribute register. |
| |
| |
| Section 3.11.1.2, Fragment Program Temporary Registers |
| |
| The fragment temporary registers (Table X.2) hold intermediate values used |
| during the execution of a fragment program. There are 96 temporary |
| register names, but not all can be used simultaneously. |
| |
| Fragment Temporary |
| Register Name Description |
| ------------------ ----------------------------------------------------- |
| R0-R31 Four 32-bit (fp32) floating point values (s.e8.m23) |
| H0-H63 Four 16-bit (fp16) floating point values (s.e5.m10) |
| |
| Table X.2: Fragment Temporary Registers. |
| |
| In addition to the normal temporary registers, there are two temporary |
| pseudo-registers, "RC" and "HC". RC and HC are treated as unnumbered, |
| write-only temporary registers. The components of RC have a fp32 data |
| type; the components of HC have a fp16 data type. The sole purpose of |
| these registers is to permit instructions to modify the condition code |
| register (section 3.11.1.4) without overwriting the values in any |
| temporary register. |
| |
| Fragment program instructions can read and write temporary registers. |
| There is no restriction on the number of temporary registers that can be |
| accessed by any given instruction. |
| |
| All temporary registers are initialized to (0,0,0,0) each time a fragment |
| program executes. |
| |
| |
| Section 3.11.1.3, Fragment Program Output Registers |
| |
| The fragment program output registers hold the final results of the |
| fragment program. The possible final results of a fragment program are a |
| high- or low-precision RGBA fragment color, and a fragment depth value. |
| |
| Output |
| Register Name Description |
| ------------- ------------------------------------------------------- |
| o[COLR] Final RGBA fragment color, fp32 format |
| o[COLH] Final RGBA fragment color, fp16 format |
| o[DEPR] Final fragment depth value, fp32 format |
| |
| Table X.3: Fragment Program Output Registers. |
| |
| o[COLR] and o[COLH] specify the color of a fragment. These two registers |
| are identical, except for the associated data type of the components. The |
| R, G, B, and A components of the fragment color are taken from the x, y, |
| z, and w components respectively of the o[COLR] or o[COLH]. A fragment |
| program will fail to load if it writes to both o[COLR] and o[COLH]. |
| |
| o[DEPR] can be used to replace the associated depth value of a fragment. |
| The new depth value is taken from the z component of o[DEPR]. If a |
| fragment program does not write to o[DEPR], the associated depth value is |
| unmodified. |
| |
| A fragment program will fail to load if it does not write to at least one |
| output register. |
| |
| The fragment program output registers may not be read by a fragment |
| program, but may be written to multiple times. |
| |
| The values of all fragment program output registers are initially |
| undefined. |
| |
| |
| Section 3.11.1.4, Fragment Program Condition Code Register |
| |
| The condition code register (CC) is a single four-component vector. Each |
| component of this register is one of four enumerated values: GT (greater |
| than), EQ (equal), LT (less than), or UN (unordered). The condition code |
| register can be used to mask writes to fragment data register components |
| or to terminate processing of a fragment altogether (via the KIL |
| instruction). |
| |
| Most fragment program instructions can optionally update the condition |
| code register. When a fragment program instruction updates the condition |
| code register, a condition code component is set to LT if the |
| corresponding component of the result vector is less than zero, EQ if it |
| is equal to zero, GT if it is greater than zero, and UN if it is NaN (not |
| a number). |
| |
| The condition code register is initialized to a vector of EQ values each |
| time a fragment program executes. |
| |
| |
| Section 3.11.2, Fragment Program Parameters |
| |
| In addition to using the registers defined in Section 3.11.1, fragment |
| programs may also use fragment program parameters in their computation. |
| Fragment program parameters are constant during the execution of fragment |
| programs, but some parameters may be modified outside the execution of a |
| fragment program. |
| |
| There are five different types of program parameters: embedded scalar |
| constants, embedded vector constants, named constants, named local |
| parameters, and numbered local parameters. |
| |
| Embedded scalar constants are written as standard floating-point numbers |
| with an optional sign designator ("+" or "-") and optional scientific |
| notation (e.g., "E+06", meaning "times 10^6"). |
| |
| Embedded vector constants are written as a comma-separated array of one to |
| four scalar constants, surrounded by braces (like a C/C++ array |
| initializer). Vector constants are always treated as 4-component vectors: |
| constants with fewer than four components are expanded to 4-components by |
| filling missing y and z components with 0.0 and missing w components with |
| 1.0. Thus, the vector constant "{2}" is equivalent to "{2,0,0,1}", |
| "{3,4}" is equivalent to "{3,4,0,1}", and "{5,6,7}" is equivalent to |
| "{5,6,7,1}". |
| |
| Named constants allow fragment program instructions to define scalar or |
| vector constants that can be referenced by name. Named constants are |
| created using the DEFINE instruction: |
| |
| DEFINE pi = 3.1415926535; |
| DEFINE color = {0.2, 0.5, 0.8, 1.0}; |
| |
| The DEFINE instruction associates a constant name with a scalar or vector |
| constant value. Subsequent fragment program instructions that use the |
| constant name are equivalent to those using the corresponding constant |
| value. |
| |
| Named local parameters are similar to named vector constants, but their |
| values can be modified after the program is loaded. Local parameters are |
| created using the DECLARE instruction: |
| |
| DECLARE fog_color1; |
| DECLARE fog_color2 = {0.3, 0.6, 0.9, 0.1}; |
| |
| The DECLARE instruction creates a 4-component vector associated with the |
| local parameter name. Subsequent fragment program instructions |
| referencing the local parameter name are processed as though the current |
| value of the local parameter vector were specified instead of the |
| parameter name. A DECLARE instruction can optionally specify an initial |
| value for the local parameter, which can be either a scalar or vector |
| constant. Scalar constants are expanded to 4-component vectors by |
| replicating the scalar value in each component. The initial value of |
| local parameters not initialized by the program is (0,0,0,0). |
| |
| A named local parameter for a specific program can be updated using the |
| calls ProgramNamedParameter4fNV or ProgramNamedParameter4fvNV (section |
| 5.7). Named local parameters are accessible only by the program in which |
| they are defined. Modifying a local parameter affects the only the |
| associated program and does not affect local parameters with the same name |
| that are found in any other fragment program. |
| |
| Numbered local parameters are similar to named local parameters, except |
| that they are referred to by number and are not declared in fragment |
| programs. Each fragment program object has an array of four-component |
| floating-point vectors that can be used by the program. The number of |
| vectors is given by the implementation-dependent constant |
| MAX_FRAGMENT_PROGRAM_LOCAL_PARAMETERS_NV, and must be at least 64. A |
| numbered local parameter is accessed by a fragment program as members of |
| an array called "p". For example, the instruction |
| |
| MOV R0, p[31]; |
| |
| copies the contents of numbered local parameter 31 into temporary register |
| R0. |
| |
| Constant and local parameter names can be arbitrary strings consisting of |
| letters (upper or lower-case), numbers, underscores ("_"), and dollar |
| signs ("$"). Keywords defined in the grammar (including instruction |
| names) can not be used as constant names, nor can strings that start with |
| numbers, or strings that specify valid temporary register or texture |
| numbers (e.g., "R0"-"R31", "H0"-"H63"", "TEX0"-"TEX15"). A fragment |
| program will fail to load if a DEFINE or DECLARE instruction specifies an |
| invalid constant or local parameter name. |
| |
| A fragment program will fail to load if an instruction contains a named |
| parameter not specified in a previous DEFINE or DECLARE instruction. A |
| fragment program will also fail to load if a DEFINE or DECLARE instruction |
| attempts to re-define a named parameter specified in a previous DEFINE or |
| DECLARE instruction. |
| |
| The contents of the fragment program parameters may not be modified by a |
| fragment program. In addition, each fragment program instruction can |
| normally use at most one unique program parameter. The only exception to |
| this rule is if all program parameter references specify named or embedded |
| constants that taken together contain no more than four unique scalar |
| values. For such instructions, the GL will automatically generate an |
| equivalent instruction that references a single merged vector constant. |
| This merging allows programs to specify instructions like the following: |
| |
| Instruction Equivalent Instruction |
| --------------------- --------------------------------------- |
| MAD R0, R1, 2, -1; MAD R0, R1, {2,-1,0,0}.x, {2,-1,0,0}.y; |
| ADD R0, {1,2,3,4}, 4; ADD R0, {1,2,3,4}.xyzw, {1,2,3,4}.w; |
| |
| Before counting the number of unique values, any named constants are first |
| converted to the equivalent embedded constants. When generating a |
| combined vector constant, the GL does not perform swizzling, component |
| selection, negation, or absolute value operations. The following |
| instructions are invalid, as they contain more than four unique scalar |
| values. |
| |
| Invalid Instructions |
| ----------------------------------- |
| ADD R0, {1,2,3,4}, -4; |
| ADD R0, {1,2,3,4}, |-4|; |
| ADD R0, {1,2,3,4}, -{-1,-2,-3,-4}; |
| ADD R0, {1,2,3,4}, {4,5,6,7}.x; |
| |
| |
| Section 3.11.3, Fragment Program Specification |
| |
| Fragment programs are specified as an array of ubytes. The array is a |
| string of ASCII characters encoding the program. The command |
| LoadProgramNV loads a fragment program when the target parameter is |
| FRAGMENT_PROGRAM_NV. The command BindProgramNV enables a fragment program |
| for execution. |
| |
| At program load time, the program is parsed into a set of tokens possibly |
| separated by white space. Spaces, tabs, newlines, carriage returns, and |
| comments are considered whitespace. Comments begin with the character "#" |
| and are terminated by a newline, a carriage return, or the end of the |
| program array. Fragment programs are case-sensitive -- upper and lower |
| case letters are treated differently. The proper choice of case can be |
| inferred from the grammar. |
| |
| The Backus-Naur Form (BNF) grammar below specifies the syntactically valid |
| sequences for fragment programs. The set of valid tokens can be inferred |
| from the grammar. The token "" represents an empty string and is used to |
| indicate optional rules. A program is invalid if it contains any |
| undefined tokens or characters. |
| |
| <program> ::= <progPrefix> <instructionSequence> "END" |
| |
| <progPrefix> ::= "!!FP1.0" |
| |
| <instructionSequence> ::= <instructionSequence> <instructionStatement> |
| | <instructionStatement> |
| |
| <instructionStatement> ::= <instruction> ";" |
| | <constantDefinition> ";" |
| | <localDeclaration> ";" |
| |
| <instruction> ::= <VECTORop-instruction> |
| | <SCALARop-instruction> |
| | <BINSCop-instruction> |
| | <BINop-instruction> |
| | <TRIop-instruction> |
| | <KILop-instruction> |
| | <TEXop-instruction> |
| | <TXDop-instruction> |
| |
| <VECTORop-instruction> ::= <VECTORop> <maskedDstReg> "," |
| <vectorSrc> |
| |
| <VECTORop> ::= "DDX" | "DDX_SAT" |
| | "DDXR" | "DDXR_SAT" |
| | "DDXH" | "DDXH_SAT" |
| | "DDXC" | "DDXC_SAT" |
| | "DDXRC" | "DDXRC_SAT" |
| | "DDXHC" | "DDXHC_SAT" |
| | "DDY" | "DDY_SAT" |
| | "DDYR" | "DDYR_SAT" |
| | "DDYH" | "DDYH_SAT" |
| | "DDYC" | "DDYC_SAT" |
| | "DDYRC" | "DDYRC_SAT" |
| | "DDYHC" | "DDYHC_SAT" |
| | "FLR" | "FLR_SAT" |
| | "FLRR" | "FLRR_SAT" |
| | "FLRH" | "FLRH_SAT" |
| | "FLRX" | "FLRX_SAT" |
| | "FLRC" | "FLRC_SAT" |
| | "FLRRC" | "FLRRC_SAT" |
| | "FLRHC" | "FLRHC_SAT" |
| | "FLRXC" | "FLRXC_SAT" |
| | "FRC" | "FRC_SAT" |
| | "FRCR" | "FRCR_SAT" |
| | "FRCH" | "FRCH_SAT" |
| | "FRCX" | "FRCX_SAT" |
| | "FRCC" | "FRCC_SAT" |
| | "FRCRC" | "FRCRC_SAT" |
| | "FRCHC" | "FRCHC_SAT" |
| | "FRCXC" | "FRCXC_SAT" |
| | "LIT" | "LIT_SAT" |
| | "LITR" | "LITR_SAT" |
| | "LITH" | "LITH_SAT" |
| | "LITC" | "LITC_SAT" |
| | "LITRC" | "LITRC_SAT" |
| | "LITHC" | "LITHC_SAT" |
| | "MOV" | "MOV_SAT" |
| | "MOVR" | "MOVR_SAT" |
| | "MOVH" | "MOVH_SAT" |
| | "MOVX" | "MOVX_SAT" |
| | "MOVC" | "MOVC_SAT" |
| | "MOVRC" | "MOVRC_SAT" |
| | "MOVHC" | "MOVHC_SAT" |
| | "MOVXC" | "MOVXC_SAT" |
| | "PK2H" |
| | "PK2US" |
| | "PK4B" |
| | "PK4UB" |
| |
| <SCALARop-instruction> ::= <SCALARop> <maskedDstReg> "," |
| <scalarSrc> |
| |
| <SCALARop> ::= "COS" | "COS_SAT" |
| | "COSR" | "COSR_SAT" |
| | "COSH" | "COSH_SAT" |
| | "COSC" | "COSC_SAT" |
| | "COSRC" | "COSRC_SAT" |
| | "COSHC" | "COSHC_SAT" |
| | "EX2" | "EX2_SAT" |
| | "EX2R" | "EX2R_SAT" |
| | "EX2H" | "EX2H_SAT" |
| | "EX2C" | "EX2C_SAT" |
| | "EX2RC" | "EX2RC_SAT" |
| | "EX2HC" | "EX2HC_SAT" |
| | "LG2" | "LG2_SAT" |
| | "LG2R" | "LG2R_SAT" |
| | "LG2H" | "LG2H_SAT" |
| | "LG2C" | "LG2C_SAT" |
| | "LG2RC" | "LG2RC_SAT" |
| | "LG2HC" | "LG2HC_SAT" |
| | "RCP" | "RCP_SAT" |
| | "RCPR" | "RCPR_SAT" |
| | "RCPH" | "RCPH_SAT" |
| | "RCPC" | "RCPC_SAT" |
| | "RCPRC" | "RCPRC_SAT" |
| | "RCPHC" | "RCPHC_SAT" |
| | "RSQ" | "RSQ_SAT" |
| | "RSQR" | "RSQR_SAT" |
| | "RSQH" | "RSQH_SAT" |
| | "RSQC" | "RSQC_SAT" |
| | "RSQRC" | "RSQRC_SAT" |
| | "RSQHC" | "RSQHC_SAT" |
| | "SIN" | "SIN_SAT" |
| | "SINR" | "SINR_SAT" |
| | "SINH" | "SINH_SAT" |
| | "SINC" | "SINC_SAT" |
| | "SINRC" | "SINRC_SAT" |
| | "SINHC" | "SINHC_SAT" |
| | "UP2H" | "UP2H_SAT" |
| | "UP2HC" | "UP2HC_SAT" |
| | "UP2US" | "UP2US_SAT" |
| | "UP2USC" | "UP2USC_SAT" |
| | "UP4B" | "UP4B_SAT" |
| | "UP4BC" | "UP4BC_SAT" |
| | "UP4UB" | "UP4UB_SAT" |
| | "UP4UBC" | "UP4UBC_SAT" |
| |
| <BINSCop-instruction> ::= <BINSCop> <maskedDstReg> "," |
| <scalarSrc> "," <scalarSrc> |
| |
| <BINSCop> ::= "POW" | "POW_SAT" |
| | "POWR" | "POWR_SAT" |
| | "POWH" | "POWH_SAT" |
| | "POWC" | "POWC_SAT" |
| | "POWRC" | "POWRC_SAT" |
| | "POWHC" | "POWHC_SAT" |
| |
| <BINop-instruction> ::= <BINop> <maskedDstReg> "," |
| <vectorSrc> "," <vectorSrc> |
| |
| <BINop> ::= "ADD" | "ADD_SAT" |
| | "ADDR" | "ADDR_SAT" |
| | "ADDH" | "ADDH_SAT" |
| | "ADDX" | "ADDX_SAT" |
| | "ADDC" | "ADDC_SAT" |
| | "ADDRC" | "ADDRC_SAT" |
| | "ADDHC" | "ADDHC_SAT" |
| | "ADDXC" | "ADDXC_SAT" |
| | "DP3" | "DP3_SAT" |
| | "DP3R" | "DP3R_SAT" |
| | "DP3H" | "DP3H_SAT" |
| | "DP3X" | "DP3X_SAT" |
| | "DP3C" | "DP3C_SAT" |
| | "DP3RC" | "DP3RC_SAT" |
| | "DP3HC" | "DP3HC_SAT" |
| | "DP3XC" | "DP3XC_SAT" |
| | "DP4" | "DP4_SAT" |
| | "DP4R" | "DP4R_SAT" |
| | "DP4H" | "DP4H_SAT" |
| | "DP4X" | "DP4X_SAT" |
| | "DP4C" | "DP4C_SAT" |
| | "DP4RC" | "DP4RC_SAT" |
| | "DP4HC" | "DP4HC_SAT" |
| | "DP4XC" | "DP4XC_SAT" |
| | "DST" | "DST_SAT" |
| | "DSTR" | "DSTR_SAT" |
| | "DSTH" | "DSTH_SAT" |
| | "DSTC" | "DSTC_SAT" |
| | "DSTRC" | "DSTRC_SAT" |
| | "DSTHC" | "DSTHC_SAT" |
| | "MAX" | "MAX_SAT" |
| | "MAXR" | "MAXR_SAT" |
| | "MAXH" | "MAXH_SAT" |
| | "MAXX" | "MAXX_SAT" |
| | "MAXC" | "MAXC_SAT" |
| | "MAXRC" | "MAXRC_SAT" |
| | "MAXHC" | "MAXHC_SAT" |
| | "MAXXC" | "MAXXC_SAT" |
| | "MIN" | "MIN_SAT" |
| | "MINR" | "MINR_SAT" |
| | "MINH" | "MINH_SAT" |
| | "MINX" | "MINX_SAT" |
| | "MINC" | "MINC_SAT" |
| | "MINRC" | "MINRC_SAT" |
| | "MINHC" | "MINHC_SAT" |
| | "MINXC" | "MINXC_SAT" |
| | "MUL" | "MUL_SAT" |
| | "MULR" | "MULR_SAT" |
| | "MULH" | "MULH_SAT" |
| | "MULX" | "MULX_SAT" |
| | "MULC" | "MULC_SAT" |
| | "MULRC" | "MULRC_SAT" |
| | "MULHC" | "MULHC_SAT" |
| | "MULXC" | "MULXC_SAT" |
| | "RFL" | "RFL_SAT" |
| | "RFLR" | "RFLR_SAT" |
| | "RFLH" | "RFLH_SAT" |
| | "RFLC" | "RFLC_SAT" |
| | "RFLRC" | "RFLRC_SAT" |
| | "RFLHC" | "RFLHC_SAT" |
| | "SEQ" | "SEQ_SAT" |
| | "SEQR" | "SEQR_SAT" |
| | "SEQH" | "SEQH_SAT" |
| | "SEQX" | "SEQX_SAT" |
| | "SEQC" | "SEQC_SAT" |
| | "SEQRC" | "SEQRC_SAT" |
| | "SEQHC" | "SEQHC_SAT" |
| | "SEQXC" | "SEQXC_SAT" |
| | "SFL" | "SFL_SAT" |
| | "SFLR" | "SFLR_SAT" |
| | "SFLH" | "SFLH_SAT" |
| | "SFLX" | "SFLX_SAT" |
| | "SFLC" | "SFLC_SAT" |
| | "SFLRC" | "SFLRC_SAT" |
| | "SFLHC" | "SFLHC_SAT" |
| | "SFLXC" | "SFLXC_SAT" |
| | "SGE" | "SGE_SAT" |
| | "SGER" | "SGER_SAT" |
| | "SGEH" | "SGEH_SAT" |
| | "SGEX" | "SGEX_SAT" |
| | "SGEC" | "SGEC_SAT" |
| | "SGERC" | "SGERC_SAT" |
| | "SGEHC" | "SGEHC_SAT" |
| | "SGEXC" | "SGEXC_SAT" |
| | "SGT" | "SGT_SAT" |
| | "SGTR" | "SGTR_SAT" |
| | "SGTH" | "SGTH_SAT" |
| | "SGTX" | "SGTX_SAT" |
| | "SGTC" | "SGTC_SAT" |
| | "SGTRC" | "SGTRC_SAT" |
| | "SGTHC" | "SGTHC_SAT" |
| | "SGTXC" | "SGTXC_SAT" |
| | "SLE" | "SLE_SAT" |
| | "SLER" | "SLER_SAT" |
| | "SLEH" | "SLEH_SAT" |
| | "SLEX" | "SLEX_SAT" |
| | "SLEC" | "SLEC_SAT" |
| | "SLERC" | "SLERC_SAT" |
| | "SLEHC" | "SLEHC_SAT" |
| | "SLEXC" | "SLEXC_SAT" |
| | "SLT" | "SLT_SAT" |
| | "SLTR" | "SLTR_SAT" |
| | "SLTH" | "SLTH_SAT" |
| | "SLTX" | "SLTX_SAT" |
| | "SLTC" | "SLTC_SAT" |
| | "SLTRC" | "SLTRC_SAT" |
| | "SLTHC" | "SLTHC_SAT" |
| | "SLTXC" | "SLTXC_SAT" |
| | "SNE" | "SNE_SAT" |
| | "SNER" | "SNER_SAT" |
| | "SNEH" | "SNEH_SAT" |
| | "SNEX" | "SNEX_SAT" |
| | "SNEC" | "SNEC_SAT" |
| | "SNERC" | "SNERC_SAT" |
| | "SNEHC" | "SNEHC_SAT" |
| | "SNEXC" | "SNEXC_SAT" |
| | "STR" | "STR_SAT" |
| | "STRR" | "STRR_SAT" |
| | "STRH" | "STRH_SAT" |
| | "STRX" | "STRX_SAT" |
| | "STRC" | "STRC_SAT" |
| | "STRRC" | "STRRC_SAT" |
| | "STRHC" | "STRHC_SAT" |
| | "STRXC" | "STRXC_SAT" |
| | "SUB" | "SUB_SAT" |
| | "SUBR" | "SUBR_SAT" |
| | "SUBH" | "SUBH_SAT" |
| | "SUBX" | "SUBX_SAT" |
| | "SUBC" | "SUBC_SAT" |
| | "SUBRC" | "SUBRC_SAT" |
| | "SUBHC" | "SUBHC_SAT" |
| | "SUBXC" | "SUBXC_SAT" |
| |
| <TRIop-instruction> ::= <TRIop> <maskedDstReg> "," |
| <vectorSrc> "," <vectorSrc> "," |
| <vectorSrc> |
| |
| <TRIop> ::= "MAD" | "MAD_SAT" |
| | "MADR" | "MADR_SAT" |
| | "MADH" | "MADH_SAT" |
| | "MADX" | "MADX_SAT" |
| | "MADC" | "MADC_SAT" |
| | "MADRC" | "MADRC_SAT" |
| | "MADHC" | "MADHC_SAT" |
| | "MADXC" | "MADXC_SAT" |
| | "LRP" | "LRP_SAT" |
| | "LRPR" | "LRPR_SAT" |
| | "LRPH" | "LRPH_SAT" |
| | "LRPX" | "LRPX_SAT" |
| | "LRPC" | "LRPC_SAT" |
| | "LRPRC" | "LRPRC_SAT" |
| | "LRPHC" | "LRPHC_SAT" |
| | "LRPXC" | "LRPXC_SAT" |
| | "X2D" | "X2D_SAT" |
| | "X2DR" | "X2DR_SAT" |
| | "X2DH" | "X2DH_SAT" |
| | "X2DC" | "X2DC_SAT" |
| | "X2DRC" | "X2DRC_SAT" |
| | "X2DHC" | "X2DHC_SAT" |
| |
| <KILop-instruction> ::= <KILop> <ccMask> |
| |
| <KILop> ::= "KIL" |
| |
| <TEXop-instruction> ::= <TEXop> <maskedDstReg> "," |
| <vectorSrc> "," <texImageId> |
| |
| <TEXop> ::= "TEX" | "TEX_SAT" |
| | "TEXC" | "TEXC_SAT" |
| | "TXP" | "TXP_SAT" |
| | "TXPC" | "TXPC_SAT" |
| |
| <TXDop-instruction> ::= <TXDop> <maskedDstReg> "," |
| <vectorSrc> "," <vectorSrc> "," |
| <vectorSrc> "," <texImageId> |
| |
| <TXDop> ::= "TXD" | "TXD_SAT" |
| | "TXDC" | "TXDC_SAT" |
| |
| <scalarSrc> ::= <absScalarSrc> |
| | <baseScalarSrc> |
| |
| <absScalarSrc> ::= <negate> "|" <baseScalarSrc> "|" |
| |
| <baseScalarSrc> ::= <signedScalarConstant> |
| | <negate> <namedScalarConstant> |
| | <negate> <vectorConstant> <scalarSuffix> |
| | <negate> <namedLocalParameter> <scalarSuffix> |
| | <negate> <numberedLocal> <scalarSuffix> |
| | <negate> <srcRegister> <scalarSuffix> |
| |
| <vectorSrc> ::= <absVectorSrc> |
| | <baseVectorSrc> |
| |
| <absVectorSrc> ::= <negate> "|" <baseVectorSrc> "|" |
| |
| <baseVectorSrc> ::= <signedScalarConstant> |
| | <negate> <namedScalarConstant> |
| | <negate> <vectorConstant> <scalarSuffix> |
| | <negate> <vectorConstant> <swizzleSuffix> |
| | <negate> <namedLocalParameter> <scalarSuffix> |
| | <negate> <namedLocalParameter> <swizzleSuffix> |
| | <negate> <numberedLocal> <scalarSuffix> |
| | <negate> <numberedLocal> <swizzleSuffix> |
| | <negate> <srcRegister> <scalarSuffix> |
| | <negate> <srcRegister> <swizzleSuffix> |
| |
| <maskedDstReg> ::= <dstRegister> <optionalWriteMask> |
| <optionalCCMask> |
| |
| <dstRegister> ::= <fragTempReg> |
| | <fragOutputReg> |
| | "RC" |
| | "HC" |
| |
| <optionalCCMask> ::= "(" <ccMask> ")" |
| | "" |
| |
| <ccMask> ::= <ccMaskRule> <swizzleSuffix> |
| | <ccMaskRule> <scalarSuffix> |
| |
| <ccMaskRule> ::= "EQ" | "GE" | "GT" | "LE" | "LT" | "NE" | |
| "TR" | "FL" |
| |
| <optionalWriteMask> ::= "" |
| | "." "x" |
| | "." "y" |
| | "." "x" "y" |
| | "." "z" |
| | "." "x" "z" |
| | "." "y" "z" |
| | "." "x" "y" "z" |
| | "." "w" |
| | "." "x" "w" |
| | "." "y" "w" |
| | "." "x" "y" "w" |
| | "." "z" "w" |
| | "." "x" "z" "w" |
| | "." "y" "z" "w" |
| | "." "x" "y" "z" "w" |
| |
| <srcRegister> ::= <fragAttribReg> |
| | <fragTempReg> |
| |
| <fragAttribReg> ::= "f" "[" <fragAttribRegId> "]" |
| |
| <fragAttribRegId> ::= "WPOS" | "COL0" | "COL1" | "FOGC" | "TEX0" |
| | "TEX1" | "TEX2" | "TEX3" | "TEX4" | "TEX5" |
| | "TEX6" | "TEX7" |
| |
| <fragTempReg> ::= <fragF32Reg> |
| | <fragF16Reg> |
| |
| <fragF32Reg> ::= "R0" | "R1" | "R2" | "R3" |
| | "R4" | "R5" | "R6" | "R7" |
| | "R8" | "R9" | "R10" | "R11" |
| | "R12" | "R13" | "R14" | "R15" |
| | "R16" | "R17" | "R18" | "R19" |
| | "R20" | "R21" | "R22" | "R23" |
| | "R24" | "R25" | "R26" | "R27" |
| | "R28" | "R29" | "R30" | "R31" |
| |
| <fragF16Reg> ::= "H0" | "H1" | "H2" | "H3" |
| | "H4" | "H5" | "H6" | "H7" |
| | "H8" | "H9" | "H10" | "H11" |
| | "H12" | "H13" | "H14" | "H15" |
| | "H16" | "H17" | "H18" | "H19" |
| | "H20" | "H21" | "H22" | "H23" |
| | "H24" | "H25" | "H26" | "H27" |
| | "H28" | "H29" | "H30" | "H31" |
| | "H32" | "H33" | "H34" | "H35" |
| | "H36" | "H37" | "H38" | "H39" |
| | "H40" | "H41" | "H42" | "H43" |
| | "H44" | "H45" | "H46" | "H47" |
| | "H48" | "H49" | "H50" | "H51" |
| | "H52" | "H53" | "H54" | "H55" |
| | "H56" | "H57" | "H58" | "H59" |
| | "H60" | "H61" | "H62" | "H63" |
| |
| <fragOutputReg> ::= "o" "[" <fragOutputRegName> "]" |
| |
| <fragOutputRegName> ::= "COLR" | "COLH" | "DEPR" |
| |
| <numberedLocal> ::= "p" "[" <localNumber> "]" |
| |
| <localNumber> ::= <integer> from 0 to |
| MAX_FRAGMENT_PROGRAM_LOCAL_PARAMETERS_NV - 1 |
| |
| <scalarSuffix> ::= "." <component> |
| |
| <swizzleSuffix> ::= "" |
| | "." <component> <component> |
| <component> <component> |
| |
| <component> ::= "x" | "y" | "z" | "w" |
| |
| <texImageId> ::= <texImageUnit> "," <texImageTarget> |
| |
| <texImageUnit> ::= "TEX0" | "TEX1" | "TEX2" | "TEX3" |
| | "TEX4" | "TEX5" | "TEX6" | "TEX7" |
| | "TEX8" | "TEX9" | "TEX10" | "TEX11" |
| | "TEX12" | "TEX13" | "TEX14" | "TEX15" |
| |
| <texImageTarget> ::= "1D" | "2D" | "3D" | "CUBE" | "RECT" |
| |
| <constantDefinition> ::= "DEFINE" <namedVectorConstant> "=" |
| <vectorConstant> |
| | "DEFINE" <namedScalarConstant> "=" |
| <scalarConstant> |
| |
| <localDeclaration> ::= "DECLARE" <namedLocalParameter> |
| <optionalLocalValue> |
| |
| <optionalLocalValue> ::= "" |
| | "=" <vectorConstant> |
| | "=" <scalarConstant> |
| |
| <vectorConstant> ::= {" <vectorConstantList> "}" |
| | <namedVectorConstant> |
| |
| <vectorConstantList> ::= <scalarConstant> |
| | <scalarConstant> "," <scalarConstant> |
| | <scalarConstant> "," <scalarConstant> "," |
| <scalarConstant> |
| | <scalarConstant> "," <scalarConstant> "," |
| <scalarConstant> "," <scalarConstant> |
| |
| <scalarConstant> ::= <signedScalarConstant> |
| | <namedScalarConstant> |
| |
| <signedScalarConstant> ::= <optionalSign> <floatConstant> |
| |
| <namedScalarConstant> ::= <identifier> ((name of a scalar constant |
| in a DEFINE instruction)) |
| |
| <namedVectorConstant> ::= <identifier> ((name of a vector constant |
| in a DEFINE instruction)) |
| |
| <namedLocalParameter> ::= <identifier> ((name of a local parameter |
| in a DECLARE instruction)) |
| |
| <negate> ::= "-" | "+" | "" |
| |
| <optionalSign> ::= "-" | "+" | "" |
| |
| <identifier> ::= see text below |
| |
| <floatConstant> ::= see text below |
| |
| |
| The <identifier> rule matches a sequence of one or more letters ("A" |
| through "Z", "a" through "z", "_", and "$") and digits ("0" through "9); |
| the first character must be a letter. The underscore ("_") and dollar |
| sign ("$") count as a letters. Upper and lower case letters are different |
| (names are case-sensitive). |
| |
| The <floatConstant> rule matches a floating-point constant consisting |
| of an integer part, a decimal point, a fraction part, an "e" or |
| "E", and an optionally signed integer exponent. The integer and |
| fraction parts both consist of a sequence of on or more digits ("0" |
| through "9"). Either the integer part or the fraction parts (not |
| both) may be missing; either the decimal point or the "e" (or "E") |
| and the exponent (not both) may be missing. |
| |
| A fragment program fails to load if it contains more than the maximum |
| number of executable instructions. If ARB_fragment_program is supported, |
| this limit is the value of MAX_PROGRAM_INSTRUCTIONS_ARB for the |
| FRAGMENT_PROGRAM_ARB target. Otherwise, the limit is 1024. Executable |
| instructions are those matching the <instruction> rule in the grammar, and |
| do not include DEFINE or DECLARE instructions. |
| |
| A fragment program fails to load if its total temporary and output |
| register count exceeds 64. Each fp32 temporary or output register used by |
| the program (R0-R31, o[COLR], and o[DEPR]) counts as two registers; each |
| fp16 temporary or output register used by the program (H0-H63 and o[COLH]) |
| count as a single register. |
| |
| A fragment program fails to load if any instruction sources more than one |
| unique fragment attribute register. Instructions sourcing the same |
| attribute register multiple times are acceptable. |
| |
| A fragment program fails to load if any instruction sources more than one |
| unique program parameter register. Instructions sourcing the same program |
| parameter multiple times are acceptable. |
| |
| A fragment program fails to load if multiple texture lookup instructions |
| reference different targets for the same texture image unit. |
| |
| A fragment program fails to load if it writes to both the o[COLR] and |
| o[COLH] output registers. |
| |
| The error INVALID_OPERATION is generated by LoadProgramNV if a fragment |
| program fails to load because it is not syntactically correct or for one |
| of the semantic restrictions listed above. |
| |
| The error INVALID_OPERATION is generated by LoadProgramNV if a program is |
| loaded for id when id is currently loaded with a program of a different |
| target. |
| |
| A successfully loaded fragment program is parsed into a sequence of |
| instructions. Each instruction is identified by its tokenized name. The |
| operation of these instructions when executed is defined in Sections |
| 3.11.4 and 3.11.5. |
| |
| |
| Section 3.11.4, Fragment Program Operation |
| |
| There are forty-five fragment program instructions. Fragment program |
| instructions may have up to eight variants, including a suffix of "R", |
| "H", or "X" to specify arithmetic precision (section 3.11.4.2), a suffix |
| of "C" to allow an update of the condition code register (section |
| 3.11.4.4), and a suffix of "_SAT" to clamp the result vector components to |
| the range [0,1] (section 3.11.4.4). For example, the sixteen forms of the |
| "ADD" instruction are "ADD", "ADDR", "ADDH", "ADDX", "ADDC", "ADDRC", |
| "ADDHC", "ADDXC", "ADD_SAT", "ADDR_SAT", "ADDH_SAT", "ADDX_SAT", |
| "ADDC_SAT", "ADDRC_SAT", "ADDHC_SAT", and "ADDXC_SAT". |
| |
| Some mathematical instructions that support precision suffixes, typically |
| those that involve complicated floating-point computations, do not support |
| the "X" precision suffix. |
| |
| The fragment program instructions and their respective input and output |
| parameters are summarized in Table X.4. |
| |
| Instruction Inputs Output Description |
| ----------------- ------ ------ -------------------------------- |
| ADD[RHX][C][_SAT] v,v v add |
| COS[RH ][C][_SAT] s ssss cosine |
| DDX[RH ][C][_SAT] v v derivative relative to x |
| DDY[RH ][C][_SAT] v v derivative relative to y |
| DP3[RHX][C][_SAT] v,v ssss 3-component dot product |
| DP4[RHX][C][_SAT] v,v ssss 4-component dot product |
| DST[RH ][C][_SAT] v,v v distance vector |
| EX2[RH ][C][_SAT] s ssss exponential base 2 |
| FLR[RHX][C][_SAT] v v floor |
| FRC[RHX][C][_SAT] v v fraction |
| KIL none none conditionally discard fragment |
| LG2[RH ][C][_SAT] s ssss logarithm base 2 |
| LIT[RH ][C][_SAT] v v compute light coefficients |
| LRP[RHX][C][_SAT] v,v,v v linear interpolation |
| MAD[RHX][C][_SAT] v,v,v v multiply and add |
| MAX[RHX][C][_SAT] v,v v maximum |
| MIN[RHX][C][_SAT] v,v v minimum |
| MOV[RHX][C][_SAT] v v move |
| MUL[RHX][C][_SAT] v,v v multiply |
| PK2H v ssss pack two 16-bit floats |
| PK2US v ssss pack two unsigned 16-bit scalars |
| PK4B v ssss pack four signed 8-bit scalars |
| PK4UB v ssss pack four unsigned 8-bit scalars |
| POW[RH ][C][_SAT] s,s ssss exponentiation (x^y) |
| RCP[RH ][C][_SAT] s ssss reciprocal |
| RFL[RH ][C][_SAT] v,v v reflection vector |
| RSQ[RH ][C][_SAT] s ssss reciprocal square root |
| SEQ[RHX][C][_SAT] v,v v set on equal |
| SFL[RHX][C][_SAT] v,v v set on false |
| SGE[RHX][C][_SAT] v,v v set on greater than or equal |
| SGT[RHX][C][_SAT] v,v v set on greater than |
| SIN[RH ][C][_SAT] s ssss sine |
| SLE[RHX][C][_SAT] v,v v set on less than or equal |
| SLT[RHX][C][_SAT] v,v v set on less than |
| SNE[RHX][C][_SAT] v,v v set on not equal |
| STR[RHX][C][_SAT] v,v v set on true |
| SUB[RHX][C][_SAT] v,v v subtract |
| TEX[C][_SAT] v v texture lookup |
| TXD[C][_SAT] v,v,v v texture lookup w/partials |
| TXP[C][_SAT] v v projective texture lookup |
| UP2H[C][_SAT] s v unpack two 16-bit floats |
| UP2US[C][_SAT] s v unpack two unsigned 16-bit scalars |
| UP4B[C][_SAT] s v unpack four signed 8-bit scalars |
| UP4UB[C][_SAT] s v unpack four unsigned 8-bit scalars |
| X2D[RH ][C][_SAT] v,v,v v 2D coordinate transformation |
| |
| Table X.4: Summary of fragment program instructions. "[RHX]" indicates |
| an optional arithmetic precision suffix. "[C]" indicates an optional |
| condition code update suffix. "[_SAT]" indicates an optional clamp of |
| result vector components to [0,1]. "v" indicates a 4-component vector |
| input or output, "s" indicates a scalar input, and "ssss" indicates a |
| scalar output replicated across a 4-component vector. |
| |
| |
| Section 3.11.4.1: Fragment Program Storage Precision |
| |
| Registers in fragment program are stored in two different representations: |
| 16-bit floating-point (fp16) and 32-bit floating-point (fp32). There is |
| an additional 12-bit fixed-point representation (fx12) used only as an |
| internal representation for instructions with the "X" precision qualifier. |
| |
| In the 32-bit float (fp32) representation, each component is represented |
| in floating-point with eight exponent and twenty-three mantissa bits, as |
| in the standard IEEE single-precision format. If S represents the sign (0 |
| or 1), E represents the exponent in the range [0,255], and M represents |
| the mantissa in the range [0,2^23-1], then a fp32 float is decoded as: |
| |
| (-1)^S * 0.0, if E == 0, |
| (-1)^S * 2^(E-127) * (1 + M/2^23), if 0 < E < 255, |
| (-1)^S * INF, if E == 255 and M == 0, |
| NaN, if E == 255 and M != 0. |
| |
| INF (Infinity) is a special representation indicating numerical overflow. |
| NaN (Not a Number) is a special representation indicating the result of |
| illegal arithmetic operations, such as computing the square root or |
| logarithm of a negative number. Note that all normal fp32 values, zero, |
| and INF have an associated sign. -0.0 and +0.0 are considered equivalent |
| for the purposes of comparisons. |
| |
| This representation is identical to the IEEE single-precision |
| floating-point standard, except that no special representation is provided |
| for denorms -- numbers in the range (-2^-126, +2^-126). All such numbers |
| are flushed to zero. |
| |
| In a 16-bit float (fp16) register, each component is represented |
| similarly, except with only five exponent and ten mantissa bits. If S |
| represents the sign (0 or 1), E represents the exponent in the range |
| [0,31], and M represents the mantissa in the range [0,2^10-1], then an |
| fp32 float is decoded as: |
| |
| (-1)^S * 0.0, if E == 0 and M == 0, |
| (-1)^S * 2^-14 * M/2^10 if E == 0 and M != 0, |
| (-1)^S * 2^(E-15) * (1 + M/2^10), if 0 < E < 31, |
| (-1)^S * INF, if E == 31 and M == 0, or |
| NaN, if E == 31 and M != 0. |
| |
| One important difference is that the fp16 representation, unlike fp32, |
| supports denorms to maximize the limited precision of the 16-bit floating |
| point encodings. |
| |
| In the 12-bit fixed-point (fx12) format, numbers are represented as signed |
| 12-bit two's complement integers with 10 fraction bits. The range of |
| representable values is [-2048/1024, +2047/1024]. |
| |
| Section 3.11.4.2: Fragment Program Operation Precision |
| |
| Fragment program instructions frequently perform mathematical operations. |
| Such operations may be performed at one of three different precisions. |
| Fragment programs can specify the precision of each instruction by using |
| the precision suffix. If an instruction has a suffix of "R", calculations |
| are carried out with 32-bit floating point operands and results. If an |
| instruction has a suffix of "H", calculations are carried out using 16-bit |
| floating point operands and results. If an instruction has a suffix of |
| "X", calculations are carried out using 12-bit fixed point operands and |
| results. For example, the instruction "MULR" performs a 32-bit |
| floating-point multiply, "MULH" performs a 16-bit floating-point multiply, |
| and "MULX" performs a 12-bit fixed-point multiply. If no precision suffix |
| is specified, calculations are carried out using the precision of the |
| temporary register receiving the result. |
| |
| Fragment program instructions may source registers or constants whose |
| precisions differ from the precision specified with the instruction. |
| Instructions may also generate intermediate results with a different |
| precision than that of the destination register. In these cases, the |
| values sourced are converted to the precision specified by the |
| instruction. |
| |
| When converting to fx12 format, -INF and any values less than -2048/1024 |
| become -2048/1024. +INF, and any values greater than +2047/1024 become |
| +2047/1024. NaN becomes 0. |
| |
| When converting to fp16 format, any values less than or equal to -2^16 are |
| converted to -INF. Any values greater than or equal to +2^16 are |
| converted to +INF. -INF, +INF, NaN, -0.0, and +0.0 are unchanged. Any |
| other values that are not exactly representable in fp16 format are |
| converted to one of the two nearest representable values. |
| |
| When converting to fp32 format, any values less than or equal to -2^128 |
| are converted to -INF. Any values greater than or equal to +2^128 are |
| converted to +INF. -INF, +INF, NaN, -0.0, and +0.0 are unchanged. Any |
| other values that are not exactly representable in fp32 format are |
| converted to one of the two nearest representable values. |
| |
| Fragment program instructions using the fragment attribute registers |
| f[FOGC] or f[TEX0] through f[TEX7] will be carried out at full fp32 |
| precision, regardless of the precision specified by the instruction. |
| |
| Section 3.11.4.3: Fragment Program Operands |
| |
| Except for KIL, fragment program instructions operate on either vector or |
| scalar operands, indicated in the grammar (see section 3.11.3) by the |
| rules <vectorSrc> and <scalarSrc> respectively. |
| |
| The basic set of scalar operands is defined by the grammar rule |
| <baseScalarSrc>. Scalar operands can be scalar constants (embedded or |
| named), or single components of vector constants, local parameters, or |
| registers allowed by the <srcRegister> rule. A vector component is |
| selected by the <scalarSuffix> rule, where the characters "x", "y", "z", |
| and "w" select the x, y, z, and w components, respectively, of the vector. |
| |
| The basic set of vector operands is defined by the grammar rule |
| <baseVectorSrc>. Vector operands can include vector constants, local |
| parameters, or registers allowed by the <srcRegister> rule. |
| |
| Basic vector operands can be swizzled according to the <swizzleSuffix> |
| rule. In its most general form, the <swizzleSuffix> rule matches the |
| pattern ".????" where each question mark is one of "x", "y", "z", or "w". |
| For such patterns, the x, y, z, and w components of the operand are taken |
| from the vector components named by the first, second, third, and fourth |
| character of the pattern, respectively. For example, if the swizzle |
| suffix is ".yzzx" and the specified source contains {2,8,9,0}, the |
| swizzled operand used by the instruction is {8,9,9,2}. If the |
| <swizzleSuffix> rule matches "", it is treated as though it were ".xyzw". |
| |
| Operands can optionally be negated according to the <negate> rule in |
| <baseScalarSrc> or <baseVectorSrc>. If the <negate> matches "-", each |
| value is negated. |
| |
| The absolute value of operands can be taken if the <vectorSrc> or |
| <scalarSrc> rules match <absScalarSrc> or <absVectorSrc>. In this case, |
| the absolute value of each component is taken. In addition, if the |
| <negate> rule in <absScalarSrc> or <absVectorSrc> matches "-", the result |
| is then negated. |
| |
| Instructions requiring vector operands can also use scalar operands in the |
| case where the <vectorSrc> rule matches <scalarSrc>. In such cases, a |
| 4-component vector is produced by replicating the scalar. |
| |
| After operands are loaded, they are converted to a data type corresponding |
| to the operation precision specified in the fragment program instruction. |
| |
| The following pseudo-code spells out the operand generation process. |
| "SrcT" and "InstT" refer to the data types of the specified register or |
| constant and the instruction, respectively. "VecSrcT" and "VecInstT" |
| refer to 4-component vectors of the corresponding type. "absolute" is |
| TRUE if the operand matches the <absScalarSrc> or <absVectorSrc> rules, |
| and FALSE otherwise. "negateBase" is TRUE if the <negate> rule in |
| <baseScalarSrc> or <baseVectorSrc> matches "-" and FALSE otherwise. |
| "negateAbs" is TRUE if the <negate> rule in <absScalarSrc> or |
| <absVectorSrc> matches "-" and FALSE otherwise. The ".c***", ".*c**", |
| ".**c*", ".***c" modifiers refer to the x, y, z, and w components obtained |
| by the swizzle operation. TypeConvert() is assumed to convert a scalar of |
| type SrcT to a scalar of type InstT using the type conversion process |
| specified above. |
| |
| VecInstT VectorLoad(VecSrcT source) |
| { |
| VecSrcT srcVal; |
| VecInstT convertedVal; |
| |
| srcVal.x = source.c***; |
| srcVal.y = source.*c**; |
| srcVal.z = source.**c*; |
| srcVal.w = source.***c; |
| if (negateBase) { |
| srcVal.x = -srcVal.x; |
| srcVal.y = -srcVal.y; |
| srcVal.z = -srcVal.z; |
| srcVal.w = -srcVal.w; |
| } |
| if (absolute) { |
| srcVal.x = abs(srcVal.x); |
| srcVal.y = abs(srcVal.y); |
| srcVal.z = abs(srcVal.z); |
| srcVal.w = abs(srcVal.w); |
| } |
| if (negateAbs) { |
| srcVal.x = -srcVal.x; |
| srcVal.y = -srcVal.y; |
| srcVal.z = -srcVal.z; |
| srcVal.w = -srcVal.w; |
| } |
| |
| convertedVal.x = TypeConvert(srcVal.x); |
| convertedVal.y = TypeConvert(srcVal.y); |
| convertedVal.z = TypeConvert(srcVal.z); |
| convertedVal.w = TypeConvert(srcVal.w); |
| return convertedVal; |
| } |
| |
| InstT ScalarLoad(VecSrcT source) |
| { |
| SrcT srcVal; |
| InstT convertedVal; |
| |
| srcVal = source.c***; |
| if (negateBase) { |
| srcVal = -srcVal; |
| } |
| if (absolute) { |
| srcVal = abs(srcVal); |
| } |
| if (negateAbs) { |
| srcVal = -srcVal; |
| } |
| |
| convertedVal = TypeConvert(srcVal); |
| return convertedVal; |
| } |
| |
| |
| Section 3.11.4.4, Fragment Program Destination Register Update |
| |
| Each fragment program instruction, except for KIL, writes a 4-component |
| result vector to a single temporary or output register. |
| |
| The four components of the result vector are first optionally clamped to |
| the range [0,1]. The components will be clamped if and only if the result |
| clamp suffix "_SAT" is present in the instruction name. The instruction |
| "ADD_SAT" will clamp the results to [0,1]; the otherwise equivalent |
| instruction "ADD" will not. |
| |
| Since the instruction may be carried out at a different precision than the |
| destination register, the components of the results vector are then |
| converted to the data type corresponding to destination register. |
| |
| Writes to individual components of the temporary register are controlled |
| by two sets of enables: individual component write masks specified as part |
| of the instruction and the optional condition code mask. |
| |
| The component write mask is specified by the <optionalWriteMask> rule |
| found in the <maskedDstReg> rule. If the optional mask is "", all |
| components are enabled. Otherwise, the optional mask names the individual |
| components to enable. The characters "x", "y", "z", and "w" match the x, |
| y, z, and w components respectively. For example, an optional mask of |
| ".xzw" indicates that the x, z, and w components should be enabled for |
| writing but the y component should not. The grammar requires that the |
| destination register mask components must be listed in "xyzw" order. |
| |
| The optional condition code mask is specified by the <optionalCCMask> rule |
| found in the <maskedDstReg> rule. If <optionalCCMask> matches "", all |
| components are enabled. Otherwise, the condition code register is loaded |
| and swizzled according to the swizzling specified by <swizzleSuffix>. |
| Each component of the swizzled condition code is tested according to the |
| rule given by <ccMaskRule>. <ccMaskRule> may have the values "EQ", "NE", |
| "LT", "GE", LE", or "GT", which mean to enable writes if the corresponding |
| condition code field evaluates to equal, not equal, less than, greater |
| than or equal, less than or equal, or greater than, respectively. |
| Comparisons involving condition codes of "UN" (unordered) evaluate to true |
| for "NE" and false otherwise. For example, if the condition code is |
| (GT,LT,EQ,GT) and the condition code mask is "(NE.zyxw)", the swizzle |
| operation will load (EQ,LT,GT,GT) and the mask will thus will enable |
| writes on the y, z, and w components. In addition, "TR" always enables |
| writes and "FL" always disables writes, regardless of the condition code. |
| |
| Each component of the destination register is updated with the result of |
| the fragment program if and only if the component is enabled for writes by |
| both the component write mask and the optional condition code mask. |
| Otherwise, the component of the destination register remains unchanged. |
| |
| A fragment program instruction can also optionally update the condition |
| code register. The condition code is updated if the condition code |
| register update suffix "C" is present in the instruction name. The |
| instruction "ADDC" will update the condition code; the otherwise |
| equivalent instruction "ADD" will not. If condition code updates are |
| enabled, each component of the destination register enabled for writes is |
| compared to zero. The corresponding component of the condition code is |
| set to "LT", "EQ", or "GT", if the written component is less than, equal |
| to, or greater than zero, respectively. Condition code components are set |
| to "UN" if the written component is NaN. Note that values of -0.0 and |
| +0.0 both evaluate to "EQ". If a component of the destination register is |
| not enabled for writes, the corresponding condition code component is |
| unchanged. |
| |
| In the following example code, |
| |
| # R1=(-2, 0, 2, NaN) R0 CC |
| MOVC R0, R1; # ( -2, 0, 2, NaN) (LT,EQ,GT,UN) |
| MOVC R0.xyz, R1.yzwx; # ( 0, 2, NaN, NaN) (EQ,GT,UN,UN) |
| MOVC R0 (NE), R1.zywx; # ( 0, 0, NaN, -2) (EQ,EQ,UN,LT) |
| |
| the first instruction writes (-2,0,2,NaN) to R0 and updates the condition |
| code to (LT,EQ,GT,UN). The second instruction, only the "x", "y", and "z" |
| components of R0 and the condition code are updated, so R0 ends up with |
| (0,2,NaN,NaN) and the condition code ends up with (EQ,GT,UN,UN). In the |
| third instruction, the condition code mask disables writes to the x |
| component (its condition code field is "EQ"), so R0 ends up with |
| (0,0,NaN,-2) and the condition code ends up with (EQ,EQ,UN,LT). |
| |
| The following pseudocode illustrates the process of writing a result |
| vector to the destination register. In the example, "ccMaskRule" refers |
| to the condition code mask rule given by <ccMaskRule> (or "" if no rule is |
| specified), "instrmask" refers to the component write mask given by the |
| <optionalWriteMask> rule, "updatecc" is TRUE if condition code updates are |
| enabled, and "clamp01" is TRUE if [0,1] result clamping is enabled. |
| "destination" and "cc" refer to the register selected by <dstRegister> and |
| the condition code, respectively. |
| |
| boolean TestCC(CondCode field) { |
| switch (ccMaskRule) { |
| case "EQ": return (field == "EQ"); |
| case "NE": return (field != "EQ"); |
| case "LT": return (field == "LT"); |
| case "GE": return (field == "GT" || field == "EQ"); |
| case "LE": return (field == "LT" || field == "EQ"); |
| case "GT": return (field == "GT"); |
| case "TR": return TRUE; |
| case "FL": return FALSE; |
| case "": return TRUE; |
| } |
| |
| enum GenerateCC(DstT value) { |
| if (value == NaN) { |
| return UN; |
| } else if (value < 0) { |
| return LT; |
| } else if (value == 0) { |
| return EQ; |
| } else { |
| return GT; |
| } |
| } |
| |
| void UpdateDestination(VecDstT destination, VecInstT result) |
| { |
| // Load the original destination register and condition code. |
| VecDstT resultDst; |
| VecDstT merged; |
| VecCC mergedCC; |
| |
| // Clamp the result vector components to [0,1], if requested. |
| if (clamp01) { |
| if (result.x < 0) result.x = 0; |
| else if (result.x > 1) result.x = 1; |
| if (result.y < 0) result.y = 0; |
| else if (result.y > 1) result.y = 1; |
| if (result.z < 0) result.z = 0; |
| else if (result.z > 1) result.z = 1; |
| if (result.w < 0) result.w = 0; |
| else if (result.w > 1) result.w = 1; |
| } |
| |
| // Convert the result to the type of the destination register. |
| resultDst.x = TypeConvert(result.x); |
| resultDst.y = TypeConvert(result.y); |
| resultDst.z = TypeConvert(result.z); |
| resultDst.w = TypeConvert(result.w); |
| |
| // Merge the converted result into the destination register, under |
| // control of the compile- and run-time write masks. |
| merged = destination; |
| mergedCC = cc; |
| if (instrMask.x && TestCC(cc.c***)) { |
| merged.x = result.x; |
| if (updatecc) mergedCC.x = GenerateCC(result.x); |
| } |
| if (instrMask.y && TestCC(cc.*c**)) { |
| merged.y = result.y; |
| if (updatecc) mergedCC.y = GenerateCC(result.y); |
| } |
| if (instrMask.z && TestCC(cc.**c*)) { |
| merged.z = result.z; |
| if (updatecc) mergedCC.z = GenerateCC(result.z); |
| } |
| if (instrMask.w && TestCC(cc.***c)) { |
| merged.w = result.w; |
| if (updatecc) mergedCC.w = GenerateCC(result.w); |
| } |
| |
| // Write out the new destination register and result code. |
| destination = merged; |
| cc = mergedCC; |
| } |
| |
| Section 3.11.5, Fragment Program Instruction Set |
| |
| The following sections describe the instruction set available to fragment |
| programs. |
| |
| |
| Section 3.11.5.1, ADD: Add |
| |
| The ADD instruction performs a component-wise add of the two operands to |
| yield a result vector. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| result.x = tmp0.x + tmp1.x; |
| result.y = tmp0.y + tmp1.y; |
| result.z = tmp0.z + tmp1.z; |
| result.w = tmp0.w + tmp1.w; |
| |
| The following special-case rules apply to addition: |
| |
| 1. "A+B" is always equivalent to "B+A". |
| 2. NaN + <x> = NaN, for all <x>. |
| 3. +INF + <x> = +INF, for all <x> except NaN and -INF. |
| 4. -INF + <x> = -INF, for all <x> except NaN and +INF. |
| 5. +INF + -INF = NaN. |
| 6. -0.0 + <x> = <x>, for all <x>. |
| 7. +0.0 + <x> = <x>, for all <x> except -0.0. |
| |
| |
| Section 3.11.5.2, COS: Cosine |
| |
| The COS instruction approximates the cosine of the angle specified by the |
| scalar operand and replicates the approximation to all four components of |
| the result vector. The angle is specified in radians and does not have to |
| be in the range [0,2*PI]. |
| |
| tmp = ScalarLoad(op0); |
| result.x = ApproxCosine(tmp); |
| result.y = ApproxCosine(tmp); |
| result.z = ApproxCosine(tmp); |
| result.w = ApproxCosine(tmp); |
| |
| The approximation function ApproxCosine is accurate to at least 22 bits |
| with an angle in the range [0,2*PI]. |
| |
| | ApproxCosine(x) - cos(x) | < 1.0 / 2^22, if 0.0 <= x < 2.0 * PI. |
| |
| The error in the approximation will typically increase with the absolute |
| value of the angle when the angle falls outside the range [0,2*PI]. |
| |
| The following special-case rules apply to cosine approximation: |
| |
| 1. ApproxCosine(NaN) = NaN. |
| 2. ApproxCosine(+/-INF) = NaN. |
| 3. ApproxCosine(+/-0.0) = +1.0. |
| |
| |
| Section 3.11.5.3, DDX: Derivative Relative to X |
| |
| The DDX instruction computes approximate partial derivatives of the four |
| components of the single operand with respect to the X window coordinate |
| to yield a result vector. The partial derivative is evaluated at the |
| center of the pixel. |
| |
| f = VectorLoad(op0); |
| result = ComputePartialX(f); |
| |
| Note that the partial derivates obtained by this instruction are |
| approximate, and derivative-of-derivate instruction sequences may not |
| yield accurate second derivatives. |
| |
| For components with partial derivatives that overflow (including +/-INF |
| inputs), the resulting partials may be encoded as large floating-point |
| numbers instead of +/-INF. |
| |
| |
| Section 3.11.5.4, DDY: Derivative Relative to Y |
| |
| The DDY instruction computes approximate partial derivatives of the four |
| components of the single operand with respect to the Y window coordinate |
| to yield a result vector. The partial derivative is evaluated at the |
| center of the pixel. |
| |
| f = VectorLoad(op0); |
| result = ComputePartialY(f); |
| |
| Note that the partial derivates obtained by this instruction are |
| approximate, and derivative-of-derivate instruction sequences may not |
| yield accurate second derivatives. |
| |
| For components with partial derivatives that overflow (including +/-INF |
| inputs), the resulting partials may be encoded as large floating-point |
| numbers instead of +/-INF. |
| |
| |
| Section 3.11.5.5, DP3: 3-Component Dot Product |
| |
| The DP3 instruction computes a three component dot product of the two |
| operands (using the x, y, and z components) and replicates the dot product |
| to all four components of the result vector. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1): |
| result.x = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + |
| (tmp0.z * tmp2.z); |
| result.y = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + |
| (tmp0.z * tmp2.z); |
| result.z = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + |
| (tmp0.z * tmp2.z); |
| result.w = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + |
| (tmp0.z * tmp2.z); |
| |
| |
| Section 3.11.5.6, DP4: 4-Component Dot Product |
| |
| The DP4 instruction computes a four component dot product of the two |
| operands and replicates the dot product to all four components of the |
| result vector. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1): |
| result.x = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + |
| (tmp0.z * tmp2.z) + (tmp0.w * tmp1.w); |
| result.y = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + |
| (tmp0.z * tmp2.z) + (tmp0.w * tmp1.w); |
| result.z = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + |
| (tmp0.z * tmp2.z) + (tmp0.w * tmp1.w); |
| result.w = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + |
| (tmp0.z * tmp2.z) + (tmp0.w * tmp1.w); |
| |
| |
| Section 3.11.5.7, DST: Distance Vector |
| |
| The DST instruction computes a distance vector from two specially- |
| formatted operands. The first operand should be of the form [NA, d^2, |
| d^2, NA] and the second operand should be of the form [NA, 1/d, NA, 1/d], |
| where NA values are not relevant to the calculation and d is a vector |
| length. If both vectors satisfy these conditions, the result vector will |
| be of the form [1.0, d, d^2, 1/d]. |
| |
| The exact behavior is specified in the following pseudo-code: |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| result.x = 1.0; |
| result.y = tmp0.y * tmp1.y; |
| result.z = tmp0.z; |
| result.w = tmp1.w; |
| |
| Given an arbitrary vector, d^2 can be obtained using the DOT3 instruction |
| (using the same vector for both operands) and 1/d can be obtained from d^2 |
| using the RSQ instruction. |
| |
| This distance vector is useful for per-fragment light attenuation |
| calculations: a DOT3 operation involving the distance vector and an |
| attenuation constants vector will yield the attenuation factor. |
| |
| |
| Section 3.11.5.8, EX2: Exponential Base 2 |
| |
| The EX2 instruction approximates 2 raised to the power of the scalar |
| operand and replicates it to all four components of the result |
| vector. |
| |
| tmp = ScalarLoad(op0); |
| result.x = Approx2ToX(tmp); |
| result.y = Approx2ToX(tmp); |
| result.z = Approx2ToX(tmp); |
| result.w = Approx2ToX(tmp); |
| |
| The approximation function is accurate to at least 22 bits: |
| |
| | Approx2ToX(x) - 2^x | < 1.0 / 2^22, if 0.0 <= x < 1.0, |
| |
| and, in general, |
| |
| | Approx2ToX(x) - 2^x | < (1.0 / 2^22) * (2^floor(x)). |
| |
| The following special-case rules apply to exponential approximation: |
| |
| 1. Approx2ToX(NaN) = NaN. |
| 2. Approx2ToX(-INF) = +0.0. |
| 3. Approx2ToX(+INF) = +INF. |
| 4. Approx2ToX(+/-0.0) = +1.0. |
| |
| |
| Section 3.11.5.9, FLR: Floor |
| |
| The FLR instruction performs a component-wise floor operation on the |
| operand to generate a result vector. The floor of a value is defined as |
| the largest integer less than or equal to the value. The floor of 2.3 is |
| 2.0; the floor of -3.6 is -4.0. |
| |
| tmp = VectorLoad(op0); |
| result.x = floor(tmp.x); |
| result.y = floor(tmp.y); |
| result.z = floor(tmp.z); |
| result.w = floor(tmp.w); |
| |
| The following special-case rules apply to floor computation: |
| |
| 1. floor(NaN) = NaN. |
| 2. floor(<x>) = <x>, for -0.0, +0.0, -INF, and +INF. In all cases, the |
| sign of the result is equal to the sign of the operand. |
| |
| |
| Section 3.11.5.10, FRC: Fraction |
| |
| The FRC instruction extracts the fractional portion of each component of |
| the operand to generate a result vector. The fractional portion of a |
| component is defined as the result after subtracting off the floor of the |
| component (see FLR), and is always in the range [0.00, 1.00). |
| |
| For negative values, the fractional portion is NOT the number written to |
| the right of the decimal point -- the fractional portion of -1.7 is not |
| 0.7 -- it is 0.3. 0.3 is produced by subtracting the floor of -1.7 (-2.0) |
| from -1.7. |
| |
| tmp = VectorLoad(op0); |
| result.x = tmp.x - floor(tmp.x); |
| result.y = tmp.y - floor(tmp.y); |
| result.z = tmp.z - floor(tmp.z); |
| result.w = tmp.w - floor(tmp.w); |
| |
| The following special-case rules, which can be derived from the rules for |
| FLR and ADD apply to fraction computation: |
| |
| 1. fraction(NaN) = NaN. |
| 2. fraction(+/-INF) = NaN. |
| 3. fraction(+/-0.0) = +0.0. |
| |
| |
| Section 3.11.5.11, KIL: Conditionally Discard Fragment |
| |
| The KIL instruction is unlike any other instruction in the instruction |
| set. This instruction evaluates components of a swizzled condition code |
| using a test expression identical to that used to evaluate condition code |
| write masks (Section 3.11.4.4). If any condition code component evaluates |
| to TRUE, the fragment is discarded. Otherwise, the instruction has no |
| effect. The condition code components are specified, swizzled, and |
| evaluated in the same manner as the condition code write mask. |
| |
| if (TestCC(rc.c***) || TestCC(rc.*c**) || |
| TestCC(rc.**c*) || TestCC(rc.***c)) { |
| // Discard the fragment. |
| } else { |
| // Do nothing. |
| } |
| |
| If the fragment is discarded, it is treated as though it were not produced |
| by rasterization. In particular, none of the per-fragment operations |
| (such as stencil tests, blends, stencil, depth, or color buffer writes) |
| are performed on the fragment. |
| |
| |
| Section 3.11.5.12, LG2: Logarithm Base 2 |
| |
| The LG2 instruction approximates the base 2 logarithm of the scalar |
| operand and replicates it to all four components of the result vector. |
| |
| tmp = ScalarLoad(op0); |
| result.x = ApproxLog2(tmp); |
| result.y = ApproxLog2(tmp); |
| result.z = ApproxLog2(tmp); |
| result.w = ApproxLog2(tmp); |
| |
| The approximation function is accurate to at least 22 bits: |
| |
| | ApproxLog2(x) - log_2(x) | < 1.0 / 2^22. |
| |
| Note that for large values of x, there are not enough bits in the |
| floating-point storage format to represent a result that precisely. |
| |
| The following special-case rules apply to logarithm approximation: |
| |
| 1. ApproxLog2(NaN) = NaN. |
| 2. ApproxLog2(+INF) = +INF. |
| 3. ApproxLog2(+/-0.0) = -INF. |
| 4. ApproxLog2(x) = NaN, -INF < x < -0.0. |
| 5. ApproxLog2(-INF) = NaN. |
| |
| |
| Section 3.11.5.13, LIT: Compute Light Coefficients |
| |
| The LIT instruction accelerates per-fragment lighting by computing |
| lighting coefficients for ambient, diffuse, and specular light |
| contributions. The "x" component of the operand is assumed to hold a |
| diffuse dot product (n dot VP_pli, as in the vertex lighting equations in |
| Section 2.13.1). The "y" component of the operand is assumed to hold a |
| specular dot product (n dot h_i). The "w" component of the operand is |
| assumed to hold the specular exponent of the material (s_rm). |
| |
| The "x" component of the result vector receives the value that should be |
| multiplied by the ambient light/material product (always 1.0). The "y" |
| component of the result vector receives the value that should be |
| multiplied by the diffuse light/material product (n dot VP_pli). The "z" |
| component of the result vector receives the value that should be |
| multiplied by the specular light/material product (f_i * (n dot h_i) ^ |
| s_rm). The "w" component of the result is the constant 1.0. |
| |
| Negative diffuse and specular dot products are clamped to 0.0, as is done |
| in the standard per-vertex lighting operations. In addition, if the |
| diffuse dot product is zero or negative, the specular coefficient is |
| forced to zero. |
| |
| tmp = VectorLoad(op0); |
| if (t.x < 0) t.x = 0; |
| if (t.y < 0) t.y = 0; |
| result.x = 1.0; |
| result.y = t.x; |
| result.z = (t.x > 0) ? ApproxPower(t.y, t.w) : 0.0; |
| result.w = 1.0; |
| |
| The exponentiation approximation used to compute result.z are identical to |
| that used in the POW instruction, including errors and the processing of |
| any special cases. |
| |
| |
| Section 3.11.5.14, LRP: Linear Interpolation |
| |
| The LRP instruction performs a component-wise linear interpolation to |
| yield a result vector. It interpolates between the components of the |
| second and third operands, using the first operand as a weight. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| tmp2 = VectorLoad(op2); |
| result.x = tmp0.x * tmp1.x + (1 - tmp0.x) * tmp2.x; |
| result.y = tmp0.y * tmp1.y + (1 - tmp0.y) * tmp2.y; |
| result.z = tmp0.z * tmp1.z + (1 - tmp0.z) * tmp2.z; |
| result.w = tmp0.w * tmp1.w + (1 - tmp0.w) * tmp2.w; |
| |
| |
| Section 3.11.5.15, MAD: Multiply and Add |
| |
| The MAD instruction performs a component-wise multiply of the first two |
| operands, and then does a component-wise add of the product to the third |
| operand to yield a result vector. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| tmp2 = VectorLoad(op2); |
| result.x = tmp0.x * tmp1.x + tmp2.x; |
| result.y = tmp0.y * tmp1.y + tmp2.y; |
| result.z = tmp0.z * tmp1.z + tmp2.z; |
| result.w = tmp0.w * tmp1.w + tmp2.w; |
| |
| |
| Section 3.11.5.16, MAX: maximum |
| |
| The MAX instruction computes component-wise maximums of the values in the |
| two operands to yield a result vector. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| result.x = max(tmp0.x, tmp1.x); |
| result.y = max(tmp0.y, tmp1.y); |
| result.z = max(tmp0.z, tmp1.z); |
| result.w = max(tmp0.w, tmp1.w); |
| |
| The following special cases apply to the maximum operation: |
| |
| 1. max(A,B) is always equivalent to max(B,A). |
| 2. max(NaN, <x>) == NaN, for all <x>. |
| |
| |
| |
| Section 3.11.5.17, MIN: minimum |
| |
| The MIN instruction computes component-wise minimums of the values in the |
| two operands to yield a result vector. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| result.x = min(tmp0.x, tmp1.x); |
| result.y = min(tmp0.y, tmp1.y); |
| result.z = min(tmp0.z, tmp1.z); |
| result.w = min(tmp0.w, tmp1.w); |
| |
| The following special cases apply to the minimum operation: |
| |
| 1. min(A,B) is always equivalent to min(B,A). |
| 2. min(NaN, <x>) == NaN, for all <x>. |
| |
| |
| Section 3.11.5.18, MOV: Move |
| |
| The MOV instruction copies the value of the operand to yield a result |
| vector. |
| |
| result = VectorLoad(op0); |
| |
| |
| Section 3.11.5.19, MUL: Multiply |
| |
| The MUL instruction performs a component-wise multiply of the two operands |
| to yield a result vector. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| result.x = tmp0.x * tmp1.x; |
| result.y = tmp0.y * tmp1.y; |
| result.z = tmp0.z * tmp1.z; |
| result.w = tmp0.w * tmp1.w; |
| |
| The following special-case rules apply to multiplication: |
| |
| 1. "A*B" is always equivalent to "B*A". |
| 2. NaN * <x> = NaN, for all <x>. |
| 3. +/-0.0 * +/-INF = NaN. |
| 4. +/-0.0 * <x> = +/-0.0, for all <x> except -INF, +INF, and NaN. The |
| sign of the result is positive if the signs of the two operands match |
| and negative otherwise. |
| 5. +/-INF * <x> = +/-INF, for all <x> except -0.0, +0.0, and NaN. The |
| sign of the result is positive if the signs of the two operands match |
| and negative otherwise. |
| 6. +1.0 * <x> = <x>, for all <x>. |
| |
| |
| Section 3.11.5.20, PK2H: Pack Two 16-bit Floats |
| |
| The PK2H instruction converts the "x" and "y" components of the single |
| operand into 16-bit floating-point format, packs the bit representation of |
| these two floats into a 32-bit value, and replicates that value to all |
| four components of the result vector. The PK2H instruction can be |
| reversed by the UP2H instruction below. |
| |
| tmp0 = VectorLoad(op0); |
| /* result obtained by combining raw bits of tmp0.x, tmp0.y */ |
| result.x = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16); |
| result.y = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16); |
| result.z = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16); |
| result.w = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16); |
| |
| The result must be written to a register with 32-bit components (an "R" |
| register, o[COLR], or o[DEPR]). A fragment program will fail to load if |
| any other register type is specified. |
| |
| |
| Section 3.11.5.21, PK2US: Pack Two Unsigned 16-bit Scalars |
| |
| The PK2US instruction converts the "x" and "y" components of the single |
| operand into a packed pair of 16-bit unsigned scalars. The scalars are |
| represented in a bit pattern where all '0' bits corresponds to 0.0 and all |
| '1' bits corresponds to 1.0. The bit representations of the two converted |
| components are packed into a 32-bit value, and that value is replicated to |
| all four components of the result vector. The PK2US instruction can be |
| reversed by the UP2US instruction below. |
| |
| tmp0 = VectorLoad(op0); |
| if (tmp0.x < 0.0) tmp0.x = 0.0; |
| if (tmp0.x > 1.0) tmp0.x = 1.0; |
| if (tmp0.y < 0.0) tmp0.y = 0.0; |
| if (tmp0.y > 1.0) tmp0.y = 1.0; |
| us.x = round(65535.0 * tmp0.x); /* us is a ushort vector */ |
| us.y = round(65535.0 * tmp0.y); |
| /* result obtained by combining raw bits of us. */ |
| result.x = ((us.x) | (us.y << 16)); |
| result.y = ((us.x) | (us.y << 16)); |
| result.z = ((us.x) | (us.y << 16)); |
| result.w = ((us.x) | (us.y << 16)); |
| |
| The result must be written to a register with 32-bit components (an "R" |
| register, o[COLR], or o[DEPR]). A fragment program will fail to load if |
| any other register type is specified. |
| |
| |
| Section 3.11.5.22, PK4B: Pack Four Signed 8-bit Scalars |
| |
| The PK4B instruction converts the four components of the single operand |
| into 8-bit signed quantities. The signed quantities are represented in a |
| bit pattern where all '0' bits corresponds to -128/127 and all '1' bits |
| corresponds to +127/127. The bit representations of the four converted |
| components are packed into a 32-bit value, and that value is replicated to |
| all four components of the result vector. The PK4B instruction can be |
| reversed by the UP4B instruction below. |
| |
| tmp0 = VectorLoad(op0); |
| if (tmp0.x < -128/127) tmp0.x = -128/127; |
| if (tmp0.y < -128/127) tmp0.y = -128/127; |
| if (tmp0.z < -128/127) tmp0.z = -128/127; |
| if (tmp0.w < -128/127) tmp0.w = -128/127; |
| if (tmp0.x > +127/127) tmp0.x = +127/127; |
| if (tmp0.y > +127/127) tmp0.y = +127/127; |
| if (tmp0.z > +127/127) tmp0.z = +127/127; |
| if (tmp0.w > +127/127) tmp0.w = +127/127; |
| ub.x = round(127.0 * tmp0.x + 128.0); /* ub is a ubyte vector */ |
| ub.y = round(127.0 * tmp0.y + 128.0); |
| ub.z = round(127.0 * tmp0.z + 128.0); |
| ub.w = round(127.0 * tmp0.w + 128.0); |
| /* result obtained by combining raw bits of ub. */ |
| result.x = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24)); |
| result.y = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24)); |
| result.z = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24)); |
| result.w = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24)); |
| |
| The result must be written to a register with 32-bit components (an "R" |
| register, o[COLR], or o[DEPR]). A fragment program will fail to load if |
| any other register type is specified. |
| |
| |
| Section 3.11.5.23, PK4UB: Pack Four Unsigned 8-bit Scalars |
| |
| The PK4UB instruction converts the four components of the single operand |
| into a packed grouping of 8-bit unsigned scalars. The scalars are |
| represented in a bit pattern where all '0' bits corresponds to 0.0 and all |
| '1' bits corresponds to 1.0. The bit representations of the four |
| converted components are packed into a 32-bit value, and that value is |
| replicated to all four components of the result vector. The PK4UB |
| instruction can be reversed by the UP4UB instruction below. |
| |
| tmp0 = VectorLoad(op0); |
| if (tmp0.x < 0.0) tmp0.x = 0.0; |
| if (tmp0.x > 1.0) tmp0.x = 1.0; |
| if (tmp0.y < 0.0) tmp0.y = 0.0; |
| if (tmp0.y > 1.0) tmp0.y = 1.0; |
| if (tmp0.z < 0.0) tmp0.z = 0.0; |
| if (tmp0.z > 1.0) tmp0.z = 1.0; |
| if (tmp0.w < 0.0) tmp0.w = 0.0; |
| if (tmp0.w > 1.0) tmp0.w = 1.0; |
| ub.x = round(255.0 * tmp0.x); /* ub is a ubyte vector */ |
| ub.y = round(255.0 * tmp0.y); |
| ub.z = round(255.0 * tmp0.z); |
| ub.w = round(255.0 * tmp0.w); |
| /* result obtained by combining raw bits of ub. */ |
| result.x = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24)); |
| result.y = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24)); |
| result.z = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24)); |
| result.w = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24)); |
| |
| The result must be written to a register with 32-bit components (an "R" |
| register, o[COLR], or o[DEPR]). A fragment program will fail to load if |
| any other register type is specified. |
| |
| |
| Section 3.11.5.24, POW: Exponentiation |
| |
| The POW instruction approximates the value of the first scalar operand |
| raised to the power of the second scalar operand and replicates it to all |
| four components of the result vector. |
| |
| tmp0 = ScalarLoad(op0); |
| tmp1 = ScalarLoad(op1); |
| result.x = ApproxPower(tmp0, tmp1); |
| result.y = ApproxPower(tmp0, tmp1); |
| result.z = ApproxPower(tmp0, tmp1); |
| result.w = ApproxPower(tmp0, tmp1); |
| |
| The exponentiation approximation function is defined in terms of the base |
| 2 exponentiation and logarithm approximation operations in the EX2 and LG2 |
| instructions, including errors and the processing of any special cases. |
| In particular, |
| |
| ApproxPower(a,b) = ApproxExp2(b * ApproxLog2(a)). |
| |
| The following special-case rules, which can be derived from the rules in |
| the LG2, MUL, and EX2 instructions, apply to exponentiation: |
| |
| 1. ApproxPower(<x>, <y>) = NaN, if x < -0.0, |
| 2. ApproxPower(<x>, <y>) = NaN, if x or y is NaN. |
| 3. ApproxPower(+/-0.0, +/-0.0) = NaN. |
| 4. ApproxPower(+INF, +/-0.0) = NaN. |
| 5. ApproxPower(+1.0, +/-INF) = NaN. |
| 6. ApproxPower(+/-0.0, <x>) = +0.0, if x > +0.0. |
| 7. ApproxPower(+/-0.0, <x>) = +INF, if x < -0.0. |
| 8. ApproxPower(+1.0, <x>) = +1.0, if -INF < x < +INF. |
| 9. ApproxPower(+INF, <x>) = +INF, if x > +0.0. |
| 10. ApproxPower(+INF, <x>) = +INF, if x < -0.0. |
| 11. ApproxPower(<x>, +/-0.0) = +1.0, if +0.0 < x < +INF. |
| 12. ApproxPower(<x>, +1.0) ~= <x>, if x >= +0.0. |
| 13. ApproxPower(<x>, +INF) = +0.0, if -0.0 <= x < +1.0, |
| +INF, if x > +1.0, |
| 14. ApproxPower(<x>, -INF) = +INF, if -0.0 <= x < +1.0, |
| +0.0, if x > +1.0, |
| |
| Note that 0^0 is defined here as NaN, since ApproxLog2(0) = -INF, and |
| 0*(-INF) = NaN. In many other applications, including the standard C |
| pow() function, 0^0 is defined as 1.0. This behavior can be emulated |
| using additional instructions in much that same way that the pow() |
| function is implemented on many CPUs. |
| |
| Note that a logarithm is involved even if the exponent is an integer. |
| This means that any exponentiating with a negative base will produce NaN. |
| In constrast, it is possible in a "normal" mathematical formulation to |
| raise negative numbers to integral powers (e.g., (-3)^2== 9, and |
| (-0.5)^-2==4). |
| |
| |
| Section 3.11.5.25, RCP: Reciprocal |
| |
| The RCP instruction approximates the reciprocal of the scalar operand and |
| replicates it to all four components of the result vector. |
| |
| tmp = ScalarLoad(op0); |
| result.x = ApproxReciprocal(tmp); |
| result.y = ApproxReciprocal(tmp); |
| result.z = ApproxReciprocal(tmp); |
| result.w = ApproxReciprocal(tmp); |
| |
| The approximation function is accurate to at least 22 bits: |
| |
| | ApproxReciprocal(x) - (1/x) | < 1.0 / 2^22, if 1.0 <= x < 2.0. |
| |
| The following special-case rules apply to reciprocation: |
| |
| 1. ApproxReciprocal(NaN) = NaN. |
| 2. ApproxReciprocal(+INF) = +0.0. |
| 3. ApproxReciprocal(-INF) = -0.0. |
| 4. ApproxReciprocal(+0.0) = +INF. |
| 5. ApproxReciprocal(-0.0) = -INF. |
| |
| |
| Section 3.11.5.26, RFL: Reflection Vector |
| |
| The RFL instruction computes the reflection of the second vector operand |
| (the "direction" vector) about the vector specified by the first vector |
| operand (the "axis" vector). Both operands are treated as 3D vectors (the |
| w components are ignored). The result vector is another 3D vector (the |
| "reflected direction" vector). The length of the result vector, ignoring |
| rounding errors, should equal that of the second operand. |
| |
| axis = VectorLoad(op0); |
| direction = VectorLoad(op1); |
| tmp.w = (axis.x * axis.x + axis.y * axis.y + |
| axis.z * axis.z); |
| tmp.x = (axis.x * direction.x + axis.y * direction.y + |
| axis.z * direction.z); |
| tmp.x = 2.0 * tmp.x; |
| tmp.x = tmp.x / tmp.w; |
| result.x = tmp.x * axis.x - direction.x; |
| result.y = tmp.x * axis.y - direction.y; |
| result.z = tmp.x * axis.z - direction.z; |
| |
| A fragment program will fail to load if the w component of the result is |
| enabled in the component write mask (see the <optionalWriteMask> rule in |
| the grammar). |
| |
| |
| Section 3.11.5.27, RSQ: Reciprocal Square Root |
| |
| The RSQ instruction approximates the reciprocal of the square root of the |
| scalar operand and replicates it to all four components of the result |
| vector. |
| |
| tmp = ScalarLoad(op0); |
| result.x = ApproxRSQRT(tmp); |
| result.y = ApproxRSQRT(tmp); |
| result.z = ApproxRSQRT(tmp); |
| result.w = ApproxRSQRT(tmp); |
| |
| The approximation function is accurate to at least 22 bits: |
| |
| | ApproxRSQRT(x) - (1/x) | < 1.0 / 2^22, if 1.0 <= x < 4.0. |
| |
| The following special-case rules apply to reciprocal square roots: |
| |
| 1. ApproxRSQRT(NaN) = NaN. |
| 2. ApproxRSQRT(+INF) = +0.0. |
| 3. ApproxRSQRT(-INF) = NaN. |
| 4. ApproxRSQRT(+0.0) = +INF. |
| 5. ApproxRSQRT(-0.0) = -INF. |
| 6. ApproxRSQRT(x) = NaN, if -INF < x < -0.0. |
| |
| |
| Section 3.11.5.28, SEQ: Set on Equal To |
| |
| The SEQ instruction performs a component-wise comparison of the two |
| operands. Each component of the result vector is 1.0 if the corresponding |
| component of the first operand is equal to that of the second, and 0.0 |
| otherwise. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| result.x = (tmp0.x == tmp1.x) ? 1.0 : 0.0; |
| result.y = (tmp0.y == tmp1.y) ? 1.0 : 0.0; |
| result.z = (tmp0.z == tmp1.z) ? 1.0 : 0.0; |
| result.w = (tmp0.w == tmp1.w) ? 1.0 : 0.0; |
| |
| The following special-case rules apply to SEQ: |
| |
| 1. (<x> == <y>) and (<y> == <x>) always produce the same result. |
| 1. (NaN == <x>) is FALSE for all <x>, including NaN. |
| 2. (+INF == +INF) and (-INF == -INF) are TRUE. |
| 3. (-0.0 == +0.0) and (+0.0 == -0.0) are TRUE. |
| |
| |
| Section 3.11.5.29, SFL: Set on False |
| |
| The SFL instruction is a degenerate case of the other "Set on" |
| instructions that sets all components of the result vector to |
| 0.0. |
| |
| result.x = 0.0; |
| result.y = 0.0; |
| result.z = 0.0; |
| result.w = 0.0; |
| |
| |
| Section 3.11.5.30, SGE: Set on Greater Than or Equal |
| |
| The SGE instruction performs a component-wise comparison of the two |
| operands. Each component of the result vector is 1.0 if the corresponding |
| component of the first operands is greater than or equal that of the |
| second, and 0.0 otherwise. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| result.x = (tmp0.x >= tmp1.x) ? 1.0 : 0.0; |
| result.y = (tmp0.y >= tmp1.y) ? 1.0 : 0.0; |
| result.z = (tmp0.z >= tmp1.z) ? 1.0 : 0.0; |
| result.w = (tmp0.w >= tmp1.w) ? 1.0 : 0.0; |
| |
| The following special-case rules apply to SGE: |
| |
| 1. (NaN >= <x>) and (<x> >= NaN) are FALSE for all <x>. |
| 2. (+INF >= +INF) and (-INF >= -INF) are TRUE. |
| 3. (-0.0 >= +0.0) and (+0.0 >= -0.0) are TRUE. |
| |
| |
| Section 3.11.5.31, SGT: Set on Greater Than |
| |
| The SGT instruction performs a component-wise comparison of the two |
| operands. Each component of the result vector is 1.0 if the corresponding |
| component of the first operands is greater than that of the second, and |
| 0.0 otherwise. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| result.x = (tmp0.x > tmp1.x) ? 1.0 : 0.0; |
| result.y = (tmp0.y > tmp1.y) ? 1.0 : 0.0; |
| result.z = (tmp0.z > tmp1.z) ? 1.0 : 0.0; |
| result.w = (tmp0.w > tmp1.w) ? 1.0 : 0.0; |
| |
| The following special-case rules apply to SGT: |
| |
| 1. (NaN > <x>) and (<x> > NaN) are FALSE for all <x>. |
| 2. (-0.0 > +0.0) and (+0.0 > -0.0) are FALSE. |
| |
| |
| Section 3.11.5.32, SIN: Sine |
| |
| The SIN instruction approximates the sine of the angle specified by the |
| scalar operand and replicates it to all four components of the result |
| vector. The angle is specified in radians and does not have to be in the |
| range [0,2*PI]. |
| |
| tmp = ScalarLoad(op0); |
| result.x = ApproxSine(tmp); |
| result.y = ApproxSine(tmp); |
| result.z = ApproxSine(tmp); |
| result.w = ApproxSine(tmp); |
| |
| The approximation function is accurate to at least 22 bits with an angle |
| in the range [0,2*PI]. |
| |
| | ApproxSine(x) - sin(x) | < 1.0 / 2^22, if 0.0 <= x < 2.0 * PI. |
| |
| The error in the approximation will typically increase with the absolute |
| value of the angle when the angle falls outside the range [0,2*PI]. |
| |
| The following special-case rules apply to cosine approximation: |
| |
| 1. ApproxSine(NaN) = NaN. |
| 2. ApproxSine(+/-INF) = NaN. |
| 3. ApproxSine(+/-0.0) = +/-0.0. The sign of the result is equal to the |
| sign of the single operand. |
| |
| |
| Section 3.11.5.33, SLE: Set on Less Than or Equal |
| |
| The SLE instruction performs a component-wise comparison of the two |
| operands. Each component of the result vector is 1.0 if the corresponding |
| component of the first operand is less than or equal to that of the |
| second, and 0.0 otherwise. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| result.x = (tmp0.x <= tmp1.x) ? 1.0 : 0.0; |
| result.y = (tmp0.y <= tmp1.y) ? 1.0 : 0.0; |
| result.z = (tmp0.z <= tmp1.z) ? 1.0 : 0.0; |
| result.w = (tmp0.w <= tmp1.w) ? 1.0 : 0.0; |
| |
| The following special-case rules apply to SLE: |
| |
| 1. (NaN <= <x>) and (<x> <= NaN) are FALSE for all <x>. |
| 2. (+INF <= +INF) and (-INF <= -INF) are TRUE. |
| 3. (-0.0 <= +0.0) and (+0.0 <= -0.0) are TRUE. |
| |
| |
| Section 3.11.5.34, SLT: Set on Less Than |
| |
| The SLT instruction performs a component-wise comparison of the two |
| operands. Each component of the result vector is 1.0 if the corresponding |
| component of the first operand is less than that of the second, and 0.0 |
| otherwise. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| result.x = (tmp0.x < tmp1.x) ? 1.0 : 0.0; |
| result.y = (tmp0.y < tmp1.y) ? 1.0 : 0.0; |
| result.z = (tmp0.z < tmp1.z) ? 1.0 : 0.0; |
| result.w = (tmp0.w < tmp1.w) ? 1.0 : 0.0; |
| |
| The following special-case rules apply to SLT: |
| |
| 1. (NaN < <x>) and (<x> < NaN) are FALSE for all <x>. |
| 2. (-0.0 < +0.0) and (+0.0 < -0.0) are FALSE. |
| |
| |
| Section 3.11.5.35, SNE: Set on Not Equal |
| |
| The SNE instruction performs a component-wise comparison of the two |
| operands. Each component of the result vector is 1.0 if the corresponding |
| component of the first operand is not equal to that of the second, and 0.0 |
| otherwise. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| result.x = (tmp0.x != tmp1.x) ? 1.0 : 0.0; |
| result.y = (tmp0.y != tmp1.y) ? 1.0 : 0.0; |
| result.z = (tmp0.z != tmp1.z) ? 1.0 : 0.0; |
| result.w = (tmp0.w != tmp1.w) ? 1.0 : 0.0; |
| |
| The following special-case rules apply to SNE: |
| |
| 1. (<x> != <y>) and (<y> != <x>) always produce the same result. |
| 2. (NaN != <x>) is TRUE for all <x>, including NaN. |
| 3. (+INF != +INF) and (-INF != -INF) are FALSE. |
| 4. (-0.0 != +0.0) and (+0.0 != -0.0) are TRUE. |
| |
| |
| Section 3.11.5.36, STR: Set on True |
| |
| The STR instruction is a degenerate case of the other "Set on" |
| instructions that sets all components of the result vector to 1.0. |
| |
| result.x = 1.0; |
| result.y = 1.0; |
| result.z = 1.0; |
| result.w = 1.0; |
| |
| |
| Section 3.11.5.37, SUB: Subtract |
| |
| The SUB instruction performs a component-wise subtraction of the second |
| operand from the first to yield a result vector. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| result.x = tmp0.x - tmp1.x; |
| result.y = tmp0.y - tmp1.y; |
| result.z = tmp0.z - tmp1.z; |
| result.w = tmp0.w - tmp1.w; |
| |
| The SUB instruction is completely equivalent to an identical ADD |
| instruction in which the negate operator on the second operand is |
| reversed: |
| |
| 1. "SUB R0, R1, R2" is equivalent to "ADD R0, R1, -R2". |
| 2. "SUB R0, R1, -R2" is equivalent to "ADD R0, R1, R2". |
| 3. "SUB R0, R1, |R2|" is equivalent to "ADD R0, R1, -|R2|". |
| 4. "SUB R0, R1, -|R2|" is equivalent to "ADD R0, R1, |R2|". |
| |
| |
| Section 3.11.5.38, TEX: Texture Lookup |
| |
| The TEX instruction performs a filtered texture lookup using the texture |
| target given by <texImageTarget> belonging to the texture image unit given |
| by <texImageUnit>. <texImageTarget> values of "1D", "2D", "3D", "CUBE", |
| and "RECT" correspond to the texture targets TEXTURE_1D, TEXTURE_2D, |
| TEXTURE_3D, TEXTURE_CUBE_MAP_ARB, and TEXTURE_RECTANGLE_NV, respectively. |
| |
| The (s,t,r) texture coordinates used for the lookup are the x, y, and z |
| components of the single operand. |
| |
| The texture lookup is performed as specified in Section 3.8. The LOD |
| calculations in Section 3.8.5 are performed using an implementation |
| dependent method to derive ds/dx, ds/dy, dt/dx, dt/dy, dr/dx, and dr/dy. |
| The mapping of filtered texture components to the components of the result |
| vector is dependent on the base internal format of the texture and is |
| specified in Table X.5. |
| |
| Result Vector Components |
| Base Internal Format X Y Z W |
| -------------------- ----- ----- ----- ----- |
| ALPHA 0.0 0.0 0.0 At |
| LUMINANCE Lt Lt Lt 1.0 |
| LUMINANCE_ALPHA Lt Lt Lt At |
| INTENSITY It It It It |
| RGB Rt Gt Bt 1.0 |
| RGBA Rt Gt Bt At |
| HILO_NV (signed) HIt LOt HEMI 1.0 |
| HILO_NV (unsigned) HIt LOt 1.0 1.0 |
| DSDT_NV DSt DTt 0.0 1.0 |
| DSDT_MAG_NV DSt DTt MAGt 1.0 |
| DSDT_MAG_INTENSITY_NV DSt DTt MAGt It |
| FLOAT_R_NV Rt 0.0 0.0 1.0 |
| FLOAT_RG_NV Rt Gt 0.0 1.0 |
| FLOAT_RGB_NV Rt Gt Bt 1.0 |
| FLOAT_RGBA_NV Rt Gt Bt At |
| |
| Table X.5: Mapping of filtered texel components to result vector |
| components for the TEX instruction. 0.0 and 1.0 indicate that the |
| corresponding constant value is written to the result vector. |
| DEPTH_COMPONENT textures are treated as ALPHA, LUMINANCE, or INTENSITY, |
| as specified in the texture's depth texture mode. |
| |
| For HILO_NV textures with signed components, "HEMI" is defined as |
| sqrt(MAX(0, 1-(HIt^2+LOt^2))). |
| |
| This instruction specifies a particular texture target, ignoring the |
| standard hierarchy of texture enables (TEXTURE_CUBE_MAP_ARB, TEXTURE_3D, |
| TEXTURE_2D, TEXTURE_1D) used to select a texture target in unextended |
| OpenGL. If the specified texture target has a consistent set of images, a |
| lookup is performed. Otherwise, the result of the instruction is the |
| vector (0,0,0,0). |
| |
| Although this instruction allows the selection of any texture target, a |
| fragment program can not use more than one texture target for any given |
| texture image unit. |
| |
| |
| Section 3.11.5.39, TXD: Texture Lookup with Derivatives |
| |
| The TXD instruction performs a filtered texture lookup using the texture |
| target given by <texImageTarget> belonging to the texture image unit given |
| by <texImageUnit>. <texImageTarget> values of "1D", "2D", "3D", "CUBE", |
| and "RECT" correspond to the texture targets TEXTURE_1D, TEXTURE_2D, |
| TEXTURE_3D, TEXTURE_CUBE_MAP_ARB, and TEXTURE_RECTANGLE_NV, respectively. |
| |
| The (s,t,r) texture coordinates used for the lookup are the x, y, and z |
| components of the first operand. The partial derivatives in the X |
| direction (ds/dx, dt/dx, dr/dx) are specified by the x, y, and z |
| components of the second operand. The partial derivatives in the Y |
| direction (ds/dy, dt/dy, dr/dy) are specified by the x, y, and z |
| components of the third operand. |
| |
| The texture lookup is performed as specified in Section 3.8. The LOD |
| calculations in Section 3.8.5 are performed using the specified partial |
| derivatives. The mapping of filtered texture components to the components |
| of the result vector is dependent on the base internal format of the |
| texture and is specified in Table X.5. |
| |
| This instruction specifies a particular texture target, ignoring the |
| standard hierarchy of texture enables (TEXTURE_CUBE_MAP_ARB, TEXTURE_3D, |
| TEXTURE_2D, TEXTURE_1D) used to select a texture target in unextended |
| OpenGL. If the specified texture target has a consistent set of images, a |
| lookup is performed. Otherwise, the result of the instruction is the |
| vector (0,0,0,0). |
| |
| Although this instruction allows the selection of any texture target, a |
| fragment program can not use more than one texture target for any given |
| texture image unit. |
| |
| |
| Section 3.11.5.40, TXP: Projective Texture Lookup |
| |
| The TXP instruction performs a filtered texture lookup using the texture |
| target given by <texImageTarget> belonging to the texture image unit given |
| by <texImageUnit>. <texImageTarget> values of "1D", "2D", "3D", "CUBE", |
| and "RECT" correspond to the texture targets TEXTURE_1D, TEXTURE_2D, |
| TEXTURE_3D, TEXTURE_CUBE_MAP_ARB, and TEXTURE_RECTANGLE_NV, respectively. |
| |
| For cube map textures, the (s,t,r) texture coordinates used for the lookup |
| are given by x, y, and z, respectively. For all other textures, the |
| (s,t,r) texture coordinates used for the lookup are given by x/w, y/w, and |
| z/w, respectively, where x, y, z, and w are the corresponding components |
| of the operand. |
| |
| The texture lookup is performed as specified in Section 3.8. The LOD |
| calculations in Section 3.8.5 are performed using an implementation |
| dependent method to derive ds/dx, ds/dy, dt/dx, dt/dy, dr/dx, and dr/dy. |
| The mapping of filtered texture components to the components of the result |
| vector is dependent on the base internal format of the texture and is |
| specified in Table X.5. |
| |
| This instruction specifies a particular texture target, ignoring the |
| standard hierarchy of texture enables (TEXTURE_CUBE_MAP_ARB, TEXTURE_3D, |
| TEXTURE_2D, TEXTURE_1D) used to select a texture target in unextended |
| OpenGL. If the specified texture target has a consistent set of images, a |
| lookup is performed. Otherwise, the result of the instruction is the |
| vector (0,0,0,0). |
| |
| Although this instruction allows the selection of any texture target, a |
| fragment program can not use more than one texture target for any given |
| texture image unit. |
| |
| |
| Section 3.11.5.41, UP2H: Unpack Two 16-Bit Floats |
| |
| The UP2H instruction unpacks two 16-bit floats stored together in a 32-bit |
| scalar operand. The first 16-bit float (stored in the 16 least |
| significant bits) is written into the "x" and "z" components of the result |
| vector; the second is written into the "y" and "w" components of the |
| result vector. |
| |
| This operation undoes the type conversion and packing performed by the |
| PK2H instruction. |
| |
| tmp = ScalarLoad(op0); |
| result.x = (fp16) (RawBits(tmp) & 0xFFFF); |
| result.y = (fp16) ((RawBits(tmp) >> 16) & 0xFFFF); |
| result.z = (fp16) (RawBits(tmp) & 0xFFFF); |
| result.w = (fp16) ((RawBits(tmp) >> 16) & 0xFFFF); |
| |
| Since the source operand must be a 32-bit scalar, a fragment program will |
| fail to load if the operand is not obtained from a register with 32-bit |
| components or from a program parameter. |
| |
| |
| Section 3.11.5.42, UP2US: Unpack Two Unsigned 16-Bit Scalars |
| |
| The UP2US instruction unpacks two 16-bit unsigned values packed together |
| in a 32-bit scalar operand. The unsigned quantities are encoded where a |
| bit pattern of all '0' bits corresponds to 0.0 and a pattern of all '1' |
| bits corresponds to 1.0. The "x" and "z" components of the result vector |
| are obtained from the 16 least significant bits of the operand; the "y" |
| and "w" components are obtained from the 16 most significant bits. |
| |
| This operation undoes the type conversion and packing performed by the |
| PK2US instruction. |
| |
| tmp = ScalarLoad(op0); |
| result.x = ((RawBits(tmp) >> 0) & 0xFFFF) / 65535.0; |
| result.y = ((RawBits(tmp) >> 16) & 0xFFFF) / 65535.0; |
| result.z = ((RawBits(tmp) >> 0) & 0xFFFF) / 65535.0; |
| result.w = ((RawBits(tmp) >> 16) & 0xFFFF) / 65535.0; |
| |
| Since the source operand must be a 32-bit scalar, a fragment program will |
| fail to load if the operand is not obtained from a register with 32-bit |
| components or from a program parameter. |
| |
| |
| Section 3.11.5.43, UP4B: Unpack Four Signed 8-Bit Values |
| |
| The UP4B instruction unpacks four 8-bit signed values packed together in a |
| 32-bit scalar operand. The signed quantities are encoded where a bit |
| pattern of all '0' bits corresponds to -128/127 and a pattern of all '1' |
| bits corresponds to +127/127. The "x" component of the result vector is |
| the converted value corresponding to the 8 least significant bits of the |
| operand; the "w" component corresponds to the 8 most significant bits. |
| |
| This operation undoes the type conversion and packing performed by the |
| PK4B instruction. |
| |
| tmp = ScalarLoad(op0); |
| result.x = (((RawBits(tmp) >> 0) & 0xFF) - 128) / 127.0; |
| result.y = (((RawBits(tmp) >> 8) & 0xFF) - 128) / 127.0; |
| result.z = (((RawBits(tmp) >> 16) & 0xFF) - 128) / 127.0; |
| result.w = (((RawBits(tmp) >> 24) & 0xFF) - 128) / 127.0; |
| |
| Since the source operand must be a 32-bit scalar, a fragment program will |
| fail to load if the operand is not obtained from a register with 32-bit |
| components or from a program parameter. |
| |
| |
| Section 3.11.5.44, UP4UB: Unpack Four Unsigned 8-Bit Scalars |
| |
| The UP4UB instruction unpacks four 8-bit unsigned values packed together |
| in a 32-bit scalar operand. The unsigned quantities are encoded where a |
| bit pattern of all '0' bits corresponds to 0.0 and a pattern of all '1' |
| bits corresponds to 1.0. The "x" component of the result vector is |
| obtained from the 8 least significant bits of the operand; the "w" |
| component is obtained from the 8 most significant bits. |
| |
| This operation undoes the type conversion and packing performed by the |
| PK4UB instruction. |
| |
| tmp = ScalarLoad(op0); |
| result.x = ((RawBits(tmp) >> 0) & 0xFF) / 255.0; |
| result.y = ((RawBits(tmp) >> 8) & 0xFF) / 255.0; |
| result.z = ((RawBits(tmp) >> 16) & 0xFF) / 255.0; |
| result.w = ((RawBits(tmp) >> 24) & 0xFF) / 255.0; |
| |
| Since the source operand must be a 32-bit scalar, a fragment program will |
| fail to load if the operand is not obtained from a register with 32-bit |
| components or from a program parameter. |
| |
| |
| Section 3.11.5.45, X2D: 2D Coordinate Transformation |
| |
| The X2D instruction multiplies the 2D offset vector specified by the "x" |
| and "y" components of the second vector operand by the 2x2 matrix |
| specified by the four components of the third vector operand, and adds the |
| transformed offset vector to the 2D vector specified by the "x" and "y" |
| components of the first vector operand. The first component of the sum is |
| written to the "x" and "z" components of the result; the second component |
| is written to the "y" and "w" components of the result. |
| |
| The X2D instruction can be used to displace texture coordinates in the |
| same manner as the OFFSET_TEXTURE_2D_NV mode in the GL_NV_texture_shader |
| extension. |
| |
| tmp0 = VectorLoad(op0); |
| tmp1 = VectorLoad(op1); |
| tmp2 = VectorLoad(op2); |
| result.x = tmp0.x + tmp1.x * tmp2.x + tmp1.y * tmp2.y; |
| result.y = tmp0.y + tmp1.x * tmp2.z + tmp1.y * tmp2.w; |
| result.z = tmp0.x + tmp1.x * tmp2.x + tmp1.y * tmp2.y; |
| result.w = tmp0.y + tmp1.x * tmp2.z + tmp1.y * tmp2.w; |
| |
| |
| Section 3.11.6, Fragment Program Outputs |
| |
| Upon completion of fragment program execution, the output registers are |
| used to replace the fragment's associated data. |
| |
| The RGBA color of the fragment is taken from the color output register |
| used by the program (COLR or COLH). The R, G, B, and A color components |
| are extracted from the "x", "y", "z", and "w" components, respectively, of |
| the output register and are clamped to the range [0,1]. |
| |
| If the DEPR output register is written by the fragment program, the depth |
| value of the fragment is taken from the z component of the DEPR output |
| register. If depth clamping is enabled, the depth value is clamped to the |
| range [min(n,f), max(n,f)], where n and f are the near and far depth range |
| values. If depth clamping is disabled, the fragment is discarded if its |
| depth value is outside the range [min(n,f), max(n,f)]. |
| |
| |
| Section 3.11.7, Required Fragment Program State |
| |
| The state required for managing fragment programs consists of: |
| |
| a bit indicating whether or not fragment program mode is enabled; |
| |
| an unsigned integer naming the currently bound fragment program |
| |
| and the state that must be maintained to indicate which integers are |
| currently in use as fragment program names. |
| |
| Fragment program mode is initially disabled. The initial state of all 128 |
| fragment program parameter registers is (0,0,0,0). The initial currently |
| bound fragment program is zero. |
| |
| Each fragment program object consists of: |
| |
| an enumerant given the program target (FRAGMENT_PROGRAM_NV); |
| |
| a boolean indicating whether the program is resident; |
| |
| an array of type ubyte containing the program string; |
| |
| an integer representing the length of the program string array; |
| |
| one four-component floating-point vector for each named local |
| parameter in the program; |
| |
| and a set of MAX_FRAGMENT_PROGRAM_LOCAL_PARAMETERS_NV four-component |
| floating-point vectors to hold numbered local parameters, each initially |
| set to (0,0,0,0). |
| |
| Initially, no program objects exist. |
| |
| Additionally, the state required during the execution of a fragment |
| program consists of: twelve 4-component floating-point fragment attribute |
| registers, thirty-two 128-bit physical temporary registers, and a single |
| 4-component condition code, whose components have one of four values (LT, |
| EQ, GT, or UN). |
| |
| Each time a fragment program is executed, the fragment attribute registers |
| are initialized with the fragment's location and associated data, all |
| temporary register components are initialized to zero, and all condition |
| code components are initialized to EQ. |
| |
| |
| Renumber Section 3.11 to Section 3.12, Antialiasing Application (p.140). |
| No changes to the text of the section. |
| |
| |
| Additions to Chapter 4 of the OpenGL 1.2.1 Specification (Per-Fragment |
| Operations and the Framebuffer) |
| |
| None |
| |
| Additions to Chapter 5 of the OpenGL 1.2.1 Specification (Special Functions) |
| |
| Add new section 5.7, Programs (after "Flush and Finish") |
| |
| Programs are specified as an array of ubytes used to control the operation |
| of portions of the GL. The array is a string of ASCII characters encoding |
| the program. |
| |
| The command |
| |
| LoadProgramNV(enum target, uint id, sizei len, const ubyte *program); |
| |
| loads a program. The target parameter specifies the type of program |
| loaded and can be VERTEX_PROGRAM_NV, VERTEX_STATE_PROGRAM_NV, or |
| FRAGMENT_PROGRAM_NV. VERTEX_PROGRAM_NV specifies a program to be executed |
| in vertex program mode as each vertex is specified. VERTEX_STATE_PROGRAM |
| specifies a program to be run manually to update vertex state. |
| FRAGMENT_PROGRAM specifies a program to be executed in fragment program |
| mode as each fragment is rasterized. |
| |
| Multiple programs can be loaded with different names. id names the |
| program to load. The name space for programs is the set of positive |
| integers (zero is reserved). The error INVALID_VALUE is generated by |
| LoadProgramNV if a program is loaded with an id of zero. The error |
| INVALID_OPERATION is generated by LoadProgramNV or if a program is loaded |
| for an id that is currently loaded with a program of a different program |
| target. program is a pointer to an array of ubytes that represents the |
| program being loaded. The length of the array in ubytes is indicated by |
| len. |
| |
| At program load time, the program is parsed into a set of tokens possibly |
| separated by white space. Spaces, tabs, newlines, carriage returns, and |
| comments are considered whitespace. Comments begin with the character "#" |
| and are terminated by a newline, a carriage return, or the end of the |
| program array. Tokens are processed in a case-sensitive manner: upper |
| and lower-case letters are not considered equivalent. |
| |
| Each program target has a corresponding Backus-Naur Form (BNF) grammar |
| specifying the syntactically valid sequences for programs of the specified |
| type. The set of valid tokens can be inferred from the grammar. The |
| token "" represents an empty string and is used to indicate optional |
| rules. A program is invalid if it contains any undefined tokens or |
| characters. |
| |
| The error INVALID_OPERATION is generated by LoadProgramNV if a program |
| fails to load because it is not syntactically correct or fails to satisfy |
| all of the semantic restrictions corresponding to the program target. |
| |
| A successfully loaded program is parsed into a sequence of instructions. |
| Each instruction is identified by its tokenized name. The operation of |
| these instructions is specific to the program target and is defined |
| elsewhere. |
| |
| A successfully loaded program replaces the program previously assigned to |
| the name specified by id. If the OUT_OF_MEMORY error is generated by |
| LoadProgramNV, no change is made to the previous contents of the named |
| program. |
| |
| Querying the value of PROGRAM_ERROR_POSITION_NV returns a ubyte offset |
| into the program string most recently passed to LoadProgramNV indicating |
| the position of the first error, if any, in the program. If the program |
| fails to load because of a semantic restriction that cannot be determined |
| until the program is fully scanned, the error position will be len, the |
| length of the program. If the program loads successfully, the value of |
| PROGRAM_ERROR_POSITION_NV is assigned the value negative one. |
| |
| For targets whose programs are executed automatically (e.g., vertex and |
| fragment programs), there must be a current program. The current vertex |
| program is executed automatically in vertex program mode as vertices are |
| specified. The current fragment program is executed automatically in |
| fragment program mode as fragments are generated by rasterization. |
| Current programs for a program target are updated by |
| |
| BindProgramNV(enum target, uint id); |
| |
| where target must be VERTEX_PROGRAM_NV or FRAGMENT_PROGRAM_NV. The error |
| INVALID_OPERATION is generated by BindProgramNV if id names a program that |
| has a type different than target (for example, if id names a vertex state |
| program as described in section 2.14.4). |
| |
| Binding to a nonexistent program id does not generate an error. In |
| particular, binding to program id zero does not generate an error. |
| However, because program zero cannot be loaded, program zero is always |
| nonexistent. If a program id is successfully loaded with a new vertex |
| program and id is also the currently bound vertex program, the new program |
| is considered the currently bound vertex program. |
| |
| The INVALID_OPERATION error is generated when both vertex program mode is |
| enabled and Begin is called (or when a command that performs an implicit |
| Begin is called) if the current vertex program is nonexistent or not |
| valid. A vertex program may not be valid for reasons explained in section |
| 2.14.5. |
| |
| The INVALID_OPERATION error is generated when both fragment program mode |
| is enabled and Begin, another GL command that performs an implicit Begin, |
| or any other GL command that generates fragments is called, if the current |
| fragment program is nonexistent or not valid. A fragment program may be |
| invalid for reasons explained in Section 3.11.3. |
| |
| Programs are deleted by calling |
| |
| void DeleteProgramsNV(sizei n, const uint *ids); |
| |
| ids contains n names of programs to be deleted. After a program is |
| deleted, it becomes nonexistent, and its name is again unused. If a |
| program that is currently bound is deleted, it is as though BindProgramNV |
| has been executed with the same target as the deleted program and program |
| zero. Unused names in ids are silently ignored, as is the value zero. |
| |
| The command |
| |
| void GenProgramsNV(sizei n, uint *ids); |
| |
| returns n currently unused program names in ids. These names are marked |
| as used, for the purposes of GenProgramsNV only, but they become existent |
| programs only when the are first loaded using LoadProgramNV. |
| |
| An implementation may choose to establish a working set of programs on |
| which binding and/or manual execution are performed with higher |
| performance. A program that is currently part of this working set is said |
| to be resident. |
| |
| The command |
| |
| boolean AreProgramsResidentNV(sizei n, const uint *ids, |
| boolean *residences); |
| |
| returns TRUE if all of the n programs named in ids are resident, or if the |
| implementation does not distinguish a working set. If at least one of the |
| programs named in ids is not resident, then FALSE is returned, and the |
| residence of each program is returned in residences. Otherwise the |
| contents of residences are not changed. If any of the names in ids are |
| nonexistent or zero, FALSE is returned, the error INVALID_VALUE is |
| generated, and the contents of residences are indeterminate. The |
| residence status of a single named program can also be queried by calling |
| GetProgramivNV (Section 6.1.13) with id set to the name of the program and |
| pname set to PROGRAM_RESIDENT_NV. |
| |
| AreProgramsResidentNV indicates only whether a program is currently |
| resident, not whether it could not be made resident. An implementation |
| may choose to make a program resident only on first use, for example. The |
| client may guide the GL implementation in determining which programs |
| should be resident by requesting a set of programs to make resident. |
| |
| The command |
| |
| void RequestResidentProgramsNV(sizei n, const uint *ids); |
| |
| requests that the n programs named in ids should be made resident. |
| While all the programs are not guaranteed to become resident, |
| the implementation should make a best effort to make as many of |
| the programs resident as possible. As a result of making the |
| requested programs resident, program names not among the requested |
| programs may become non-resident. Higher priority for residency |
| should be given to programs listed earlier in the ids array. |
| RequestResidentProgramsNV silently ignores attempts to make resident |
| nonexistent program names or zero. AreProgramsResidentNV can be |
| called after RequestResidentProgramsNV to determine which programs |
| actually became resident. |
| |
| The commands |
| |
| void ProgramNamedParameter4fNV(uint id, sizei len, const ubyte *name, |
| float x, float y, float z, float w); |
| void ProgramNamedParameter4dNV(uint id, sizei len, const ubyte *name, |
| double x, double y, double z, double w); |
| void ProgramNamedParameter4fvNV(uint id, sizei len, const ubyte *name, |
| const float v[]); |
| void ProgramNamedParameter4dvNV(uint id, sizei len, const ubyte *name, |
| const double v[]); |
| |
| specify a new value for the named program local parameter <name> belonging |
| to the fragment program specified by <id>. <name> is a pointer to an |
| array of ubytes holding the parameter name. <len> specifies the number of |
| ubytes in the array given by <name>. The new x, y, z, and w components of |
| the named local parameter are given by x, y, z, and w, respectively, for |
| ProgramNamedParameter4fNV and ProgramNamedParameter4dNV, and by v[0], |
| v[1], v[2], and v[3], respectively, for ProgramNamedParameter4fvNV and |
| ProgramNamedParameter4dvNV. The error INVALID_OPERATION is generated if |
| <id> specifies a nonexistent program or a program whose type does not |
| suport named local parameters. The error INVALID_VALUE error is generated |
| if <name> does not specify the name of a local parameter in the program |
| corresponding to <id>. The error INVALID_VALUE is also generated if <len> |
| is zero. |
| |
| The commands |
| |
| void ProgramLocalParameter4fARB(enum target, uint index, |
| float x, float y, float z, float w); |
| void ProgramLocalParameter4fvARB(enum target, uint index, |
| const float *params); |
| void ProgramLocalParameter4dARB(enum target, uint index, |
| double x, double y, double z, double w); |
| void ProgramLocalParameter4dvARB(enum target, uint index, |
| const double *params); |
| |
| update the values of the numbered program local parameter <index> |
| belonging to the program object currently bound to <target>. For |
| ProgramLocalParameter4fARB and ProgramLocalParameter4dARB, the four |
| components of the parameter are updated with the values of <x>, <y>, <z>, |
| and <w>, respectively. For ProgramLocalParameter4fvARB and |
| ProgramLocalParameter4dvARB, the four components of the parameter are |
| updated with the array of four values pointed to by <params>. The error |
| INVALID_VALUE is generated if <index> is greater than or equal to the |
| number of numbered program local parameters supported by <target>. |
| |
| |
| Additions to Chapter 6 of the OpenGL 1.2.1 Specification (State and |
| State Requests) |
| |
| Modify Section 6.1.11, Pointer and String Queries (p. 206) |
| |
| (modify last paragraph, p. 206) ... The possible values for <name> are |
| VENDOR, RENDERER, VERSION, EXTENSIONS, and PROGRAM_ERROR_STRING_NV. |
| |
| (add after last paragraph of section, p. 207) Queries of |
| PROGRAM_ERROR_STRING_NV return a pointer to an implementation-dependent |
| program load error string. If the last call to LoadProgramNV failed to |
| load a program, the returned string describes a reason that the program |
| failed to load. Otherwise, a pointer to an empty string (containing only |
| a terminator) is returned. |
| |
| Rename and modify Section 6.1.13, Vertex and Fragment Program Queries |
| (from GL_NV_fragment_program). Portions of this section pertaining to |
| fragment programs are copied verbatim. |
| |
| (insert after discussion of GetProgramParameter[fd]vNV) |
| |
| The commands |
| |
| void GetProgramNamedParameterfvNV(uint id, sizei len, |
| const ubyte *name, float *params); |
| void GetProgramNamedParameterdvNV(uint id, sizei len, |
| const ubyte *name, double *params); |
| |
| obtain the current program named local parameter value for the parameter |
| named <name> belonging to the program given by <id>. <name> is a pointer |
| to an array of ubytes holding the parameter name. <len> specifies the |
| number of ubytes in the array given by <name>. The error |
| INVALID_OPERATION is generated if <id> specifies a nonexistent program or |
| a program whose type does not suport named local parameters. The error |
| INVALID_VALUE is generated if <name> does not specify the name of a local |
| parameter in the program corresponding to <id>. The error INVALID_VALUE |
| is also generated if <len> is zero. Each named program local parameter is |
| an array of four values. |
| |
| The commands |
| |
| void GetProgramLocalParameterdvARB(enum target, uint index, |
| double *params); |
| void GetProgramLocalParameterfvARB(enum target, uint index, |
| float *params); |
| |
| obtain the current value for the numbered program local parameter <index> |
| belonging to the program object currently bound to <target>, and places |
| the information in the array <params>. The error INVALID_ENUM is |
| generated if <target> specifies a nonexistent program target or a program |
| target that does not support numbered program local parameters. The error |
| INVALID_VALUE is generated if <index> is greater than or equal to the |
| implementation-dependent number of supported numbered program local |
| parameters for the program target. |
| |
| When the program target type is FRAGMENT_PROGRAM_NV, each numbered program |
| local parameter returned is an array of four values. ... |
| |
| The command |
| |
| void GetProgramivNV(uint id, enum pname, int *params); |
| |
| obtains program state named by pname for the program named id in the array |
| params. pname must be one of PROGRAM_TARGET_NV, PROGRAM_LENGTH_NV, or |
| PROGRAM_RESIDENT_NV. The error INVALID_OPERATION is generated if the |
| program named id does not exist. |
| |
| The command |
| |
| void GetProgramStringNV(uint id, enum pname, |
| ubyte *program); |
| |
| obtains the program string for program id. pname must be |
| PROGRAM_STRING_NV. n ubytes are returned into the array program |
| where n is the length of the program in ubytes. GetProgramivNV with |
| PROGRAM_LENGTH_NV can be used to query the length of a program's |
| string. The INVALID_OPERATION error is generated if the program |
| named id does not exist. |
| |
| ... |
| |
| The command |
| |
| boolean IsProgramNV(uint id); |
| |
| returns TRUE if program is the name of a program object. If program |
| is zero or is a non-zero value that is not the name of a program |
| object, or if an error condition occurs, IsProgramNV returns FALSE. |
| A name returned by GenProgramsNV but not yet loaded with a program |
| is not the name of a program object." |
| |
| |
| Additions to Appendix F of the OpenGL 1.2.1 Specification (ARB Extensions) |
| |
| Modify Section F.2.3 (Changes to Section 2.6), p.240 |
| |
| (modify last paragraph on p.240) ... Multiple sets of texture coordinates |
| may be used to specify how multiple texture images are mapped onto a |
| primitive. The number of texture coordinate sets supported is |
| implementation dependent, but must be at least 1. The number of texture |
| coordinate sets supported may be queried with the state |
| MAX_TEXTURE_COORDS_NV. |
| |
| Modify Section F.2.4 (Changes to Section 2.7), p.241 |
| |
| (modify the last paragraph on p.241, carrying over to p.243) |
| Implementations may support more than one set of texture coordinates. The |
| commands |
| |
| void MultiTexCoord{1234}{sifd}ARB(enum texture, T coords) |
| void MultiTexCoord{1234}{sifd}vARB(enum texture, T coords) |
| |
| take the coordinate set to be modified as the <texture> parameter. |
| <texture> is a symbolic constant of the form TEXTUREi_ARB, indicating that |
| texture coordinate set i is to be modified. The constants obey |
| TEXTUREi_ARB = TEXTURE0_ARB + i (i is in the range 0 to k-1, where k is |
| the implementation dependent number of texture units defined by |
| MAX_TEXTURE_COORDS_NV). |
| |
| |
| Modify Section F.2.5 (Changes to Section 2.8), p.243 |
| |
| (modify first and second paragraphs of section) ... The client may specify |
| up to 5 plus the value of MAX_TEXTURE_COORDS_NV arrays; one each to store |
| vertex coordinates... |
| |
| In implementations which support more than one texture coordinate set, the |
| command |
| |
| void ClientActiveTextureARB(enum texture) |
| |
| is used to select the vertex array client state parameters to be modified |
| by the TexCoordPointer command and the array affected by EnableClientState |
| and DisableClientState with the parameter TEXTURE_COORD_ARRAY. This |
| command sets the state variable CLIENT_ACTIVE_TEXTURE_ARB. Each texture |
| coordinate set has a client state vector which is selected when this |
| command is invoked. This state vector also includes the vertex array |
| state. This command also selects the texture coordinate set state used |
| for queries of client state. |
| |
| (modify first paragraph on p.244) If the number of supported texture |
| coordinate sets (the value of MAX_TEXTURE_COORDS_NV) is k, ... |
| |
| |
| Modify Section F.2.6 (Changes to Section 2.10.2), p.244 |
| |
| (modify first paragraph) For each texture coordinate set, a 4x4 matrix is |
| applied to the corresponding texture coordinates... |
| |
| (replace second and third paragraphs) The command |
| |
| void ActiveTextureARB(enum texture); |
| |
| specifies the active texture unit selector, ACTIVE_TEXTURE_ARB. Each |
| texture unit contains up to two distinct sub-units: a texture coordinate |
| processing unit (consisting of a texture matrix stack and texture |
| coordinate generation state) and a texture image unit (consisting of all |
| the texture state defined in Section 3.8). In implementations with a |
| different number of supported texture coordinate sets and texture image |
| units, some texture units may consist of only one of the two sub-units. |
| |
| The active texture unit selector specifies the texture unit accessed by |
| commands involving texture coordinate processing. Such commands include |
| those accessing the current matrix stack (if MATRIX_MODE is TEXTURE), |
| TexGen (Section 2.10.4), Enable/Disable (if any texture coordinate |
| generation enum is selected), as well as queries of the current texture |
| coordinates and current raster texture coordinates. If the texture unit |
| number corresponding to the current value of ACTIVE_TEXTURE_ARB is greater |
| than or equal to the implementation dependent constant |
| MAX_TEXTURE_COORDS_NV, the error INVALID_OPERATION is generated by any |
| such command. |
| |
| The active texture unit selector also selects the texture unit accessed by |
| commands involving texture image processing (Section 3.8). Such commands |
| include all variants of TexEnv, TexParameter, and TexImage commands, |
| BindTexture, Enable/Disable for any texture target (e.g., TEXTURE_2D), and |
| queries of all such state. If the texture unit number corresponding to |
| the current value of ACTIVE_TEXTURE_ARB is greater than or equal to the |
| implementation dependent constant MAX_TEXTURE_IMAGE_UNITS_NV, the error |
| INVALID_OPERATION is generated by any such command. |
| |
| ActiveTextureARB generates the error INVALID_ENUM if an invalid <texture> |
| is specified. <texture> is a symbolic constant of the form TEXTUREi_ARB, |
| indicating that texture unit i is to be modified. The constants obey |
| TEXTUREi_ARB = TEXTURE0_ARB + i (i is in the range 0 to k-1, where k is |
| the larger of the MAX_TEXTURE_COORDS_NV and MAX_TEXTURE_IMAGE_UNITS_NV). |
| For compatibility with old OpenGL specifications, the implementation |
| dependent constant MAX_TEXTURE_UNITS_ARB specifies the number of |
| conventional texture units supported by the implementation. Its value |
| must be no larger than the minimum of MAX_TEXTURE_COORDS_NV and |
| MAX_TEXTURE_IMAGE_UNITS_NV. |
| |
| Modify Section F.2.12 (Changes to Section 3.8.10), p.249 |
| |
| (modify next-to-last paragraph) Texturing is enabled and disabled |
| individually for each texture unit. If texturing is disabled for one of |
| the units, then the fragment resulting from the previous unit is passed |
| unaltered to the following unit. Individual texture units beyond those |
| specified by MAX_TEXTURE_UNITS_ARB may be incomplete and are always |
| treated as disabled. |
| |
| Modify Section F.2.15 (Changes to Section 6.1.2), p.251 |
| |
| (add to end of paragraph) Queries of texture state variables corresponding |
| to texture coordinate processing unit (namely, TexGen state and enables, |
| and matrices) will produce an INVALID_OPERATION error if the value of |
| ACTIVE_TEXTURE_ARB is greater than or equal to MAX_TEXTURE_COORDS_NV. All |
| other texture state queries will result in an INVALID_OPERATION error if |
| the value of ACTIVE_TEXTURE_ARB is greater than or equal to |
| MAX_TEXTURE_IMAGE_UNITS_NV. |
| |
| Additions to the AGL/GLX/WGL Specifications |
| |
| Program objects are shared between AGL/GLX/WGL rendering contexts if |
| and only if the rendering contexts share display lists. No change |
| is made to the AGL/GLX/WGL API. |
| |
| Dependencies on GL_NV_vertex_program |
| |
| If NV_vertex_program is supported, the description of LoadProgramNV in |
| Section 2.14.1.7 (up to the BNF description of vertex programs) is |
| deleted, as it is replaced by the contents of Section 5.7 in this |
| specification. The general error descriptions in Section 2.14.1.7 common |
| to Section 5.7 (like INVALID_OPERATION if the program fails to compile) |
| should also be deleted. Section 2.14.1.8 should also be deleted. Section |
| 6.1.13 is modified by this specification as described above. |
| |
| Dependencies on NV_texture_shader |
| |
| If NV_texture_shader is not supported, the comment about texture shaders |
| being disabled in fragment program mode is not applicable. |
| |
| Dependencies on NV_texture_rectangle |
| |
| If NV_texture_rectangle is not supported, the references to "RECT" in the |
| <texImageTarget> grammar rule and TEXTURE_RECTANGLE_NV are not applicable. |
| |
| Dependencies on ARB_texture_cube_map |
| |
| If ARB_texture_cube_map is not supported, the references to "CUBE" in the |
| <texImageTarget> grammar rule and TEXTURE_CUBE_MAP_ARB are not applicable. |
| |
| Dependencies on EXT_fog_coord |
| |
| If EXT_fog_coord is not supported, references to "fog coordinate" in the |
| definition of the "FOGC" fragment attribute register should be removed. |
| |
| Dependencies on NV_depth_clamp |
| |
| If NV_depth_clamp is not supported, section 3.11.6 is modified to remove |
| discussion of the depth clamp enable and instead indicate that fragments |
| with depth values outside [min(n,f), max(n,f)] are always discarded. |
| |
| Dependencies on ARB_depth_texture and SGIX_depth_texture |
| |
| If ARB_depth_texture is not supported, but SGIX_depth_texture is |
| supported, the discussion of Table X.5 is modified to indicate that |
| DEPTH_COMPONENT textures are treated as LUMINANCE. |
| |
| If neither extension is supported, the discussion of DEPTH_COMPONENT |
| textures in Table X.5 should be removed. |
| |
| Dependencies on NV_float_buffer |
| |
| If NV_float_buffer is not supported, references to FLOAT_R_NV, |
| FLOAT_RG_NV, FLOAT_RGB_NV, and FLOAT_RGBA_NV internal texture formats in |
| Table X.5 should be removed. |
| |
| Dependencies on ARB_vertex_program |
| |
| This extension does not have any explicit dependencies, but the APIs for |
| setting and querying numbered local parameters (ProgramLocalParameter*ARB |
| and GetProgramLocalParameter*ARB) were taken directly from this extension, |
| |
| Dependencies on ARB_fragment_program |
| |
| If ARB_fragment_program is not supported, the maximum number of executable |
| instructions in any !!FP1.0 program is 1024. If ARB_fragment_program is |
| supported, the maximum number of executable instructions for an !!FP1.0 is |
| at least 1024, but can be larger. The limit can be queried by calling |
| GetProgramiv with <target> set to FRAGMENT_PROGRAM_ARB and <pname> set to |
| MAX_PROGRAM_INSTRUCTIONS_ARB. |
| |
| |
| GLX Protocol |
| |
| Most of the GLX protocol needed to implement this extension is described |
| in the GL_NV_vertex_program extension specification and will not be |
| repeated here. |
| |
| The following two rendering commands are potentially large, and hence can |
| be sent in a glXRender or glXRenderLarge request. |
| |
| ProgramNamedParameter4fvNV |
| 2 28+len+p rendering command length |
| 2 4218 rendering command opcode |
| 4 CARD32 id |
| 4 CARD32 len |
| 4 FLOAT32 params[0] |
| 4 FLOAT32 params[1] |
| 4 FLOAT32 params[2] |
| 4 FLOAT32 params[3] |
| len LISTofCARD8 name |
| p unused, p=pad(len) |
| |
| If the command is encoded in a glxRenderLarge request, the command |
| opcode and command length fields above are expanded to 4 bytes each: |
| |
| 4 32+len+p rendering command length |
| 4 4218 rendering command opcode |
| |
| |
| ProgramNamedParameter4dvNV |
| 2 44+len+p rendering command length |
| 2 4219 rendering command opcode |
| 4 CARD32 id |
| 4 CARD32 len |
| 8 FLOAT64 params[0] |
| 8 FLOAT64 params[1] |
| 8 FLOAT64 params[2] |
| 8 FLOAT64 params[3] |
| len LISTofCARD8 name |
| p unused, p=pad(len) |
| |
| If the command is encoded in
|