| Name |
| |
| NV_gpu_program5_mem_extended |
| |
| Name Strings |
| |
| GL_NV_gpu_program5_mem_extended |
| |
| Contact |
| |
| Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com) |
| |
| Status |
| |
| Shipping. |
| |
| Version |
| |
| Last Modified Date: October 30, 2012 |
| NVIDIA Revision: 1 |
| |
| Number |
| |
| OpenGL Extension #434 |
| |
| Dependencies |
| |
| NV_gpu_program5 is required. |
| |
| This extension is written against the NV_gpu_program5 extension |
| specification, which itself is written against the NV_gpu_program4 and |
| OpenGL 2.0 Specifications. |
| |
| This extension interacts trivially with EXT_shader_image_load_store, |
| NV_shader_storage_buffer_object, and NV_compute_program5. |
| |
| Overview |
| |
| This extension provides a new set of storage modifiers that can be used by |
| NV_gpu_program5 assembly program instructions loading from or storing to |
| various forms of GPU memory. In particular, we provide support for loads |
| and stores using the storage modifiers: |
| |
| .F16X2 .F16X4 .F16 (for 16-bit floating-point scalars/vectors) |
| .S8X2 .S8X4 (for 8-bit signed integer vectors) |
| .S16X2 .S16X4 (for 16-bit signed integer vectors) |
| .U8X2 .U8X4 (for 8-bit unsigned integer vectors) |
| .U16X2 .U16X4 (for 16-bit unsigned integer vectors) |
| |
| These modifiers are allowed for the following load/store instructions: |
| |
| LDC Load from constant buffer |
| |
| LOAD Global load |
| STORE Global store |
| |
| LOADIM Image load (via EXT_shader_image_load_store) |
| STOREIM Image store (via EXT_shader_image_load_store) |
| |
| LDB Load from storage buffer (via |
| NV_shader_storage_buffer_object) |
| STB Store to storage buffer (via |
| NV_shader_storage_buffer_object) |
| |
| LDS Load from shared memory (via NV_compute_program5) |
| STS Store to shared memory (via NV_compute_program5) |
| |
| For assembly programs prior to this extension, it was necessary to access |
| memory using packed types and then unpack with additional shader |
| instructions. |
| |
| Similar capabilities have already been provided in the OpenGL Shading |
| Language (GLSL) via the NV_gpu_shader5 extension, using the extended data |
| types provided there (e.g., "float16_t", "u8vec4", "s16vec2"). |
| |
| New Procedures and Functions |
| |
| None. |
| |
| New Tokens |
| |
| None. |
| |
| Additions to Chapter 2 of the OpenGL 2.0 Specification (OpenGL Operation) |
| |
| (All modifications are relative to Section 2.X, GPU Programs, from the |
| NV_gpu_program4 specification.) |
| |
| Modify Section 2.X.2, Program Grammar |
| |
| (add after the long list of grammar rules) If a program specifies the |
| NV_gpu_program5_mem_extended program option, the following rules are added |
| to the NV_gpu_program5 base program grammar: |
| |
| <opModifier> ::= "F16X2" |
| | "F16X4" |
| | "S8X2" |
| | "S8X4" |
| | "S16X2" |
| | "S16X4" |
| | "U8X2" |
| | "U8X4" |
| | "U16X2" |
| | "U16X4" |
| |
| (Note: This extension also provides new capabilities for the "F16" |
| modifier. Since it was already supported in NV_gpu_program5, it isn't |
| being added to the grammar here.) |
| |
| |
| Modify Section 2.X.4.1, Program Instruction Modifiers |
| |
| (add to Table X.14 of the NV_gpu_program4 specification.) |
| |
| Modifier Description |
| -------- --------------------------------------------------- |
| F16 Convert to or from one 16-bit floating-point value, |
| or access one 16-bit floating-point value |
| |
| F16X2 Access two 16-bit floating-point values |
| F16X4 Access four 16-bit floating-point values |
| S8X2 Access two 8-bit signed integer values |
| S8X4 Access four 8-bit signed integer values |
| S16X2 Access two 16-bit signed integer values |
| S16X4 Access four 16-bit signed integer values |
| U8X2 Access two 8-bit unsigned integer values |
| U8X4 Access four 8-bit unsigned integer values |
| U16X2 Access two 16-bit unsigned integer values |
| U16X4 Access four 16-bit unsigned integer values |
| |
| (modify discussion of storage modifiers for load and store operations, |
| adding the entries added to the table above) |
| |
| For load and store operations, the "F32", "F32X2", "F32X4", "F64", |
| "F64X2", "F64X4", "S8", "S8X2", "S8X4", "S16", "S16X2", "S16X4", "S32", |
| "S32X2", "S32X4", "S64", "S64X2", "S64X4", "U8", "U8X2", "U8X4", "U16", |
| "U16X2", "U16X4", "U32", "U32X2", "U32X4", "U64", "U64X2", "U64X4", "F16", |
| "F16X2", and "F16X4" storage modifiers control how data are loaded from or |
| stored to memory. ... |
| |
| |
| Modify Section 2.X.4.5, Program Memory Access, from NV_gpu_program5 |
| |
| (update pseudocode for BufferMemoryLoad) |
| |
| result_t_vec BufferMemoryLoad(char *address, OpModifier modifier) |
| { |
| result_t_vec result = { 0, 0, 0, 0 }; |
| switch (modifier) { |
| |
| /* Existing cases and code from NV_gpu_program5 unchanged. */ |
| |
| case F16: |
| result.x = ((float16_t *)address)[0]; |
| break; |
| case F16X2: |
| result.x = ((float16_t *)address)[0]; |
| result.y = ((float16_t *)address)[1]; |
| break; |
| case S8X2: |
| result.x = ((int8_t *)address)[0]; |
| result.y = ((int8_t *)address)[1]; |
| break; |
| case S8X4: |
| result.x = ((int8_t *)address)[0]; |
| result.y = ((int8_t *)address)[1]; |
| result.z = ((int8_t *)address)[2]; |
| result.w = ((int8_t *)address)[3]; |
| break; |
| case S16X2: |
| result.x = ((int16_t *)address)[0]; |
| result.y = ((int16_t *)address)[1]; |
| break; |
| case S16X4: |
| result.x = ((int16_t *)address)[0]; |
| result.y = ((int16_t *)address)[1]; |
| result.z = ((int16_t *)address)[2]; |
| result.w = ((int16_t *)address)[3]; |
| break; |
| case U8X2: |
| result.x = ((uint8_t *)address)[0]; |
| result.y = ((uint8_t *)address)[1]; |
| break; |
| case U8X4: |
| result.x = ((uint8_t *)address)[0]; |
| result.y = ((uint8_t *)address)[1]; |
| result.z = ((uint8_t *)address)[2]; |
| result.w = ((uint8_t *)address)[3]; |
| break; |
| case U16X2: |
| result.x = ((uint16_t *)address)[0]; |
| result.y = ((uint16_t *)address)[1]; |
| break; |
| case U16X4: |
| result.x = ((uint16_t *)address)[0]; |
| result.y = ((uint16_t *)address)[1]; |
| result.z = ((uint16_t *)address)[2]; |
| result.w = ((uint16_t *)address)[3]; |
| break; |
| } |
| return result; |
| } |
| |
| (update pseudocode for BufferMemoryStore) |
| |
| void BufferMemoryStore(char *address, operand_t_vec operand, |
| OpModifier modifier) |
| { |
| switch (modifier) { |
| |
| /* Existing cases and code from NV_gpu_program5 unchanged. */ |
| |
| case F16: |
| ((float16_t *)address)[0] = operand.x; |
| break; |
| case F16X2: |
| ((float16_t *)address)[0] = operand.x; |
| ((float16_t *)address)[1] = operand.y; |
| break; |
| case S8X2: |
| ((int8_t *)address)[0] = operand.x; |
| ((int8_t *)address)[1] = operand.y; |
| break; |
| case S8X4: |
| ((int8_t *)address)[0] = operand.x; |
| ((int8_t *)address)[1] = operand.y; |
| ((int8_t *)address)[2] = operand.z; |
| ((int8_t *)address)[3] = operand.w; |
| break; |
| case S16X2: |
| ((int16_t *)address)[0] = operand.x; |
| ((int16_t *)address)[1] = operand.y; |
| break; |
| case S16X4: |
| ((int16_t *)address)[0] = operand.x; |
| ((int16_t *)address)[1] = operand.y; |
| ((int16_t *)address)[2] = operand.z; |
| ((int16_t *)address)[3] = operand.w; |
| break; |
| case U8X2: |
| ((uint8_t *)address)[0] = operand.x; |
| ((uint8_t *)address)[1] = operand.y; |
| break; |
| case U8X4: |
| ((uint8_t *)address)[0] = operand.x; |
| ((uint8_t *)address)[1] = operand.y; |
| ((uint8_t *)address)[2] = operand.z; |
| ((uint8_t *)address)[3] = operand.w; |
| break; |
| case U16X2: |
| ((uint16_t *)address)[0] = operand.x; |
| ((uint16_t *)address)[1] = operand.y; |
| break; |
| case U16X4: |
| ((uint16_t *)address)[0] = operand.x; |
| ((uint16_t *)address)[1] = operand.y; |
| ((uint16_t *)address)[2] = operand.z; |
| ((uint16_t *)address)[3] = operand.w; |
| break; |
| } |
| } |
| |
| (modify paragraph to indicate the alignment requirement for new storage |
| modifiers) The address used for global memory loads or stores or offset |
| used for constant buffer loads must be aligned to the fetch size |
| corresponding to the storage opcode modifier. For S8 and U8, the offset |
| has no alignment requirements. For F16, S8X2, S16, U8X2, and U16, the |
| offset must be a multiple of two basic machine units. For F32, S32, and |
| U32, F16X2, S16X2, U16X2, S8X4, and U8X4, the offset must be a multiple of |
| four. For F32X2, F64, S32X2, S64, U32X2, U64, S16X4, and U16X4, the |
| offset must be a multiple of eight. ... If an offset is not correctly |
| aligned, the values returned by a buffer memory load will be undefined, |
| and the effects of a buffer memory store will also be undefined. |
| |
| |
| Modify Section 2.X.6, Program Options |
| |
| + Extended Memory Format Support (NV_gpu_program5_mem_extended) |
| |
| If a program specifies the "NV_gpu_program5_mem_extended" option, it may |
| use the "F16", "F16X2", "F16X4", "S8X2", "S8X4", "S16X2", "S16X4", "U8X2", |
| "U8X4", "U16X2", and "U16X4" storage modifiers on instructions loading |
| values from memory or storing values to memory (LDC, LOAD, STORE, LOADIM, |
| STOREIM, LDB, STB, LDS, STS). |
| |
| |
| Additions to Chapter 3 of the OpenGL 2.0 Specification (Rasterization) |
| |
| None. |
| |
| Additions to Chapter 4 of the OpenGL 2.0 Specification (Per-Fragment |
| Operations and the Frame Buffer) |
| |
| None. |
| |
| Additions to Chapter 5 of the OpenGL 2.0 Specification (Special Functions) |
| |
| None. |
| |
| Additions to Chapter 6 of the OpenGL 2.0 Specification (State and |
| State Requests) |
| |
| None. |
| |
| Additions to Appendix A of the OpenGL 2.0 Specification (Invariance) |
| |
| None. |
| |
| Additions to the AGL/GLX/WGL Specifications |
| |
| None. |
| |
| Dependencies on EXT_shader_image_load_store, NV_shader_storage_buffer_object, |
| and NV_compute_program5 |
| |
| If EXT_shader_image_load_store is not supported, references to the LOADIM |
| and STOREIM opcodes should be removed. |
| |
| If NV_shader_storage_buffer_object is not supported, references to the LDB |
| and STB opcodes should be removed. |
| |
| If NV_compute_program5 is not supported, references to the LDS and STS |
| opcodes should be removed. |
| |
| Errors |
| |
| None. |
| |
| New State |
| |
| None. |
| |
| New Implementation Dependent State |
| |
| None. |
| |
| Issues |
| |
| (1) Should this extension have its own extension string entry, or should |
| its existence be inferred from the NV_gpu_program5 extension or some |
| other extension? |
| |
| RESOLVED: Provide a separate extension string entry, since this |
| functionality was added after NV_gpu_program5 was published and may not |
| be available on older drivers supporting NV_gpu_program5. |
| |
| Revision History |
| |
| Revision 1, October 30, 2012 (pbrown): Initial revision. |