| Name |
| |
| QCOM_tiled_rendering |
| |
| Name Strings |
| |
| GL_QCOM_tiled_rendering |
| |
| Contributors |
| |
| Colin Sharp |
| Jeff Leger |
| |
| Contacts |
| |
| Chuck Smith, Qualcomm (chucks 'at' qualcomm.com) |
| Maurice Ribble, Qualcomm (mribble 'at' qualcomm.com) |
| |
| Notice |
| |
| Copyright Qualcomm 2009. |
| |
| IP Status |
| |
| Qualcomm Proprietary. |
| |
| Status |
| |
| Complete. |
| |
| Version |
| |
| Last Modified Date: August 20, 2009 |
| Revision: #1.6 |
| |
| Number |
| |
| OpenGL ES Extension #70 |
| |
| Dependencies |
| |
| OpenGL ES 1.0 or higher is required. |
| |
| This extension interacts with QCOM_write_only_rendering. |
| |
| This extension is written based on the wording of the OpenGL ES 2.0 |
| specification. |
| |
| Overview |
| |
| In the handheld graphics space, a typical challenge is achieving efficient |
| rendering performance given the different characteristics of the various |
| types of graphics memory. Some types of memory ("slow" memory) are less |
| expensive but have low bandwidth, higher latency, and/or higher power |
| consumption, while other types ("fast" memory) are more expensive but have |
| higher bandwidth, lower latency, and/or lower power consumption. In many |
| cases, it is more efficient for a graphics processing unit (GPU) to render |
| directly to fast memory, but at most common display resolutions it is not |
| practical for a device to contain enough fast memory to accommodate both the |
| full color and depth/stencil buffers (the frame buffer). In some devices, |
| this problem can be addressed by providing both types of memory; a large |
| amount of slow memory that is sufficient to store the entire frame buffer, |
| and a small, dedicated amount of fast memory that allows the GPU to render |
| with optimal performance. The challenge lies in finding a way for the GPU |
| to render to fast memory when it is not large enough to contain the actual |
| frame buffer. |
| |
| One approach to solving this problem is to design the GPU and/or driver |
| using a tiled rendering architecture. With this approach the render target |
| is subdivided into a number of individual tiles, which are sized to fit |
| within the available amount of fast memory. Under normal operation, the |
| entire scene will be rendered to each individual tile using a multi-pass |
| technique, in which primitives that lie entirely outside of the tile being |
| rendered are trivially discarded. After each tile has been rendered, its |
| contents are saved out to the actual frame buffer in slow memory (a process |
| referred to as the "resolve"). The resolve introduces significant overhead, |
| both for the CPU and the GPU. However, even with this additional overhead, |
| rendering using this method is usually more efficient than rendering |
| directly to slow memory. |
| |
| This extension allows the application to specify a rectangular tile |
| rendering area and have full control over the resolves for that area. The |
| information given to the driver through this API can be used to perform |
| various optimizations in the driver and hardware. One example optimization |
| is being able to reduce the size or number of the resolves. Another |
| optimization might be to reduce the number of passes needed in the tiling |
| approach mentioned above. Even traditional rendering GPUs that don't use |
| tiles may benefit from this extension depending on their implemention of |
| certain common GPU operations. |
| |
| One typical use case could involve an application only rendering to select |
| portions of the render target using this technique (which shall be referred |
| to as "application tiling"), leaving all other portions of the render target |
| untouched. Therefore, in order to preserve the contents of the untouched |
| portions of the render target, the application must request an EGL (or other |
| context management API) configuration with a non-destructive swap. A |
| destructive swap may only be used safely if the application renders to the |
| entire area of the render target during each frame (otherwise the contents |
| of the untouched portions of the frame buffer will be undefined). |
| |
| Additionally, care must be taken to avoid the cost of mixing rendering with |
| and without application tiling within a single frame. Rendering without |
| application tiling ("normal" rendering) is most efficient when all of the |
| rendering for the entire scene can be encompassed within a single resolve. |
| If any portions of the scene are rendered prior to that resolve (such as via |
| a prior resolve, or via application tiling), then that resolve becomes much |
| more heavyweight. When this occurs, prior to rendering each tile the fast |
| memory must be populated with the existing contents of the frame buffer |
| region corresponding to that tile. This operation can double the cost of |
| resolves, so it is recommended that applications avoid mixing application |
| tiling and normal rendering within a single frame. If both rendering |
| methods must be used in the same frame, then the most efficient approach is |
| to perform all normal rendering first, followed by rendering done with |
| application tiling. An implicit resolve will occur (if needed) at the start |
| of application tiling, so any pending normal rendering operations will be |
| flushed at the time application tiling is initiated. This extension |
| provides interfaces for the application to communicate to the driver whether |
| or not rendering done with application tiling depends on the existing |
| contents of the specified tile, and whether or not the rendered contents of |
| the specified tile need to be preserved upon completion. This mechanism can |
| be used to obtain optimal performance, e.g. when the application knows that |
| every pixel in a tile will be completely rendered or when the resulting |
| contents of the depth/stencil buffers do not need to be preserved. |
| |
| Issues |
| |
| (1) How do Viewport and Scissor interact with this extension? |
| |
| RESOLVED: They don't. When application tiling is used, the viewport and |
| scissor retain their existing values, relative to the render target, not the |
| specified tile. Therefore, all rendering commands issued between |
| StartTilingQCOM and EndTilingQCOM will be subject to the same scissor, and |
| will undergo the same viewport transformation, as normal rendering commands. |
| |
| (2) How do Flush and Finish interact with this extension? |
| |
| RESOLVED: When Flush or Finish is called while application tiling is |
| active, the behavior will be as if EndTilingQCOM was called, except that the |
| application tiling state will remain unchanged (meaning the active tile will |
| not be reset). This means that any pending rendering commands will be |
| performed to the active tile, and application tiling will continue to be |
| active for any following rendering commands. |
| |
| (3) How does SwapBuffers interact with this extension? |
| |
| RESOLVED: It doesn't. If SwapBuffers is called while application tiling is |
| active, the contents of the entire back buffer will be copied to the visible |
| window, ignoring the active tile. SwapBuffers will have no effect on the |
| application tiling state. |
| |
| (4) What happens if the render target is changed while application tiling |
| is active? |
| |
| RESOLVED: If the current render target is changed, either by binding a new |
| framebuffer object or changing the write surface of the active framebuffer |
| (either explicitly or by deleting the currently bound framebuffer or write |
| surface), an implicit EndTilingQCOM will occur. The active tile will be |
| reset and application tiling will be deactivated. This is necessary because |
| the active tile may not be valid for the new render target. |
| |
| (5) Should this extension provide a query mechanism for determining things |
| such as tile offset, alignment, and size requirements so a developer |
| can intelligently choose tile regions? |
| |
| RESOLVED: No. This information is very device-dependent and difficult to |
| present in an easily understood manner. Instead, this extension will let |
| developers specify an arbitrary rectangular tile region and all these |
| requirements, including subdividing the given tile into multiple tiles if |
| necessary, will be handled by the driver and hardware. |
| |
| (6) Should this extension allow multiple tiles? |
| |
| RESOLVED: No. While earlier versions of this extension allowed for this, |
| after support for arbitrary tile sizes was added the benefit of multiple |
| tiles became negligible. Allowing multiple tiles complicated the API and |
| made it much more difficult for traditional rendering and some tile-based |
| rendering GPUs to support this extension. |
| |
| (7) Should multiple render targets be supported? They are not supported |
| by either the OpenGL ES core specification or any existing OpenGL ES |
| extensions. Support could be added with some new bitmasks for the |
| <preserveMask> parameter. Should this be added now, or deferred for |
| inclusion in any possible future MRT extension? |
| |
| RESOLVED: Yes. It is not difficult to add now and doing it now makes |
| supporting MRTs in the future easier. |
| |
| New Procedures and Functions |
| |
| void StartTilingQCOM(uint x, uint y, uint width, uint height, |
| bitfield preserveMask); |
| |
| void EndTilingQCOM(bitfield preserveMask); |
| |
| New Tokens |
| |
| Accepted by the <preserveMask> parameter of StartTilingQCOM and |
| EndTilingQCOM |
| |
| GL_COLOR_BUFFER_BIT0_QCOM 0x00000001 |
| GL_COLOR_BUFFER_BIT1_QCOM 0x00000002 |
| GL_COLOR_BUFFER_BIT2_QCOM 0x00000004 |
| GL_COLOR_BUFFER_BIT3_QCOM 0x00000008 |
| GL_COLOR_BUFFER_BIT4_QCOM 0x00000010 |
| GL_COLOR_BUFFER_BIT5_QCOM 0x00000020 |
| GL_COLOR_BUFFER_BIT6_QCOM 0x00000040 |
| GL_COLOR_BUFFER_BIT7_QCOM 0x00000080 |
| GL_DEPTH_BUFFER_BIT0_QCOM 0x00000100 |
| GL_DEPTH_BUFFER_BIT1_QCOM 0x00000200 |
| GL_DEPTH_BUFFER_BIT2_QCOM 0x00000400 |
| GL_DEPTH_BUFFER_BIT3_QCOM 0x00000800 |
| GL_DEPTH_BUFFER_BIT4_QCOM 0x00001000 |
| GL_DEPTH_BUFFER_BIT5_QCOM 0x00002000 |
| GL_DEPTH_BUFFER_BIT6_QCOM 0x00004000 |
| GL_DEPTH_BUFFER_BIT7_QCOM 0x00008000 |
| GL_STENCIL_BUFFER_BIT0_QCOM 0x00010000 |
| GL_STENCIL_BUFFER_BIT1_QCOM 0x00020000 |
| GL_STENCIL_BUFFER_BIT2_QCOM 0x00040000 |
| GL_STENCIL_BUFFER_BIT3_QCOM 0x00080000 |
| GL_STENCIL_BUFFER_BIT4_QCOM 0x00100000 |
| GL_STENCIL_BUFFER_BIT5_QCOM 0x00200000 |
| GL_STENCIL_BUFFER_BIT6_QCOM 0x00400000 |
| GL_STENCIL_BUFFER_BIT7_QCOM 0x00800000 |
| GL_MULTISAMPLE_BUFFER_BIT0_QCOM 0x01000000 |
| GL_MULTISAMPLE_BUFFER_BIT1_QCOM 0x02000000 |
| GL_MULTISAMPLE_BUFFER_BIT2_QCOM 0x04000000 |
| GL_MULTISAMPLE_BUFFER_BIT3_QCOM 0x08000000 |
| GL_MULTISAMPLE_BUFFER_BIT4_QCOM 0x10000000 |
| GL_MULTISAMPLE_BUFFER_BIT5_QCOM 0x20000000 |
| GL_MULTISAMPLE_BUFFER_BIT6_QCOM 0x40000000 |
| GL_MULTISAMPLE_BUFFER_BIT7_QCOM 0x80000000 |
| |
| Additions to Chapter 2 of the OpenGL ES 2.0 Specification (OpenGL Operation) |
| |
| Add a new section "Rendering with Application Tiling" after section 2.13: |
| |
| "2.14 Rendering with Application Tiling |
| |
| The application may specify an arbitrary rectangular region (a 'tile') to |
| which rendering commands should be restricted. |
| |
| The command |
| |
| void StartTilingQCOM(uint x, uint y, uint width, uint height, |
| bitfield preserveMask); |
| |
| specifies the tile described by <x>, <y>, <width>, <height>. Until the next |
| call to EndTilingQCOM, all rendering commands (including clears) will only |
| update the contents of the render target defined by the extents of this |
| tile. The parameters <x> and <y> specify the screen-space origin of the |
| tile, and <width> and <height> specify the screen-space width and height of |
| the tile. The tile origin is located at the lower left corner of the tile. |
| If the size of the tile is too large for the fast memory on the device then |
| it will be internally subdivided into multiple tiles. The parameter |
| <preserveMask> is the bitwise OR of a number of values indicating which |
| buffers need to be initialized with the existing contents of the frame |
| buffer region corresponding to the specified tile prior to rendering, or the |
| single value NONE. The values allowed are COLOR_BUFFER_BIT*_QCOM, |
| DEPTH_BUFFER_BIT*_QCOM, STENCIL_BUFFER_BIT*_QCOM, and |
| MULTISAMPLE_BUFFER_BIT*_QCOM. These indicate the color buffer, the depth |
| buffer, the stencil buffer, and a multisample buffer modifier, respectively. |
| The multisample bits are different since they modify the meaning of the |
| color, depth, and stencil bits if the active surface is a multisample |
| surface. If a multisample bit is set then the corresponding color, depth, |
| and/or stencil bit will cause all the samples to be copied across the memory |
| bus in devices that are using fast tiled memory, but if the multisample bit |
| is not set then only a single resolved sample is copied across the bus. In |
| practice, not setting the multisample bit when rendering to a multisample |
| buffer can greatly improve performance, but could cause small rendering |
| artifacts in some multiple-pass rendering algorithms. The 0-7 number is to |
| specify which render target is being used. If multiple render targets are |
| not being used then 0 should be specified. Any buffers specifed in |
| <preserveMask> that do not exist in the current rendering state will be |
| silently ignored (simlilar to the behavior of Clear). If NONE is specified, |
| then no buffers will be initialized. For any buffers not initialized in |
| this manner, the initial contents will be undefined. |
| |
| The values of <x>, <y>, <width> and <height> are silently clamped to the |
| extents of the render target. |
| |
| The command |
| |
| void EndTilingQCOM(bitfield preserveMask); |
| |
| notifies the driver that the application has completed all desired rendering |
| to the tile specified by StartTilingQCOM. This allows the driver to flush |
| the contents of the specified tile to the corresponding region of the render |
| target, and disables application tiling (resuming normal rendering). The |
| parameter <preserveMask> is specified using the same values as the |
| equivalent argument of StartTilingQCOM, but indicates which buffers need to |
| be preserved upon completion of all rendering commands issued with |
| application tiling. For any buffers not preserved in this manner, the |
| resulting contents of the buffer regions corresponding to the active tile |
| will be undefined. |
| |
| GLX Protocol |
| |
| None. |
| |
| Errors |
| |
| INVALID_OPERATION error is generated if StartTilingQCOM is called while |
| WRITEONLY_RENDERING_QCOM is enabled or the current framebuffer is not |
| framebuffer complete |
| |
| INVALID_OPERATION error is generated if EndTilingQCOM is called without a |
| corresponding call to StartTilingQCOM |
| |
| INVALID_OPERATION error is generated if StartTilingQCOM is called after |
| calling StartTilingQCOM without a corresponding call to EndTilingQCOM |
| |
| INVALID_OPERATION error is generated if Enable(WRITEONLY_RENDERING_QCOM) |
| is called between StartTilingQCOM and EndTilingQCOM |
| |
| New State |
| |
| None. |
| |
| Sample Usage |
| |
| GLboolean renderTiledTriangle(GLuint x, GLuint y, GLuint width, GLuint height) |
| { |
| // set the active tile and initialize the color and depth buffers with |
| // the existing contents |
| glStartTilingQCOM(x, y, width, height, |
| GL_COLOR_BUFFER_BIT0_QCOM | GL_DEPTH_BUFFER_BIT0_QCOM); |
| |
| // draw the triangle |
| glDrawArrays(GL_TRIANGLES, 0, 3); |
| |
| // finished with this tile -- preserve the color buffer |
| glEndTilingQCOM(GL_COLOR_BUFFER_BIT0_QCOM); |
| |
| // return success |
| return GL_TRUE; |
| } |
| |
| Revision History |
| |
| #09 08/20/2009 Chuck Smith Cosmetic changes |
| #08 08/19/2009 Maurice Ribble Add support for multiple render targets |
| #07 07/28/2009 Maurice Ribble Clean up spec |
| Remove multiple tile support |
| #06 07/23/2009 Maurice Ribble Updated overview to match latest spec |
| #05 07/15/2009 Maurice Ribble Changed from spec to subdivide tiles |
| instead of returning out of memory |
| #04 07/06/2009 Maurice Ribble Update due to the AMD->Qualcomm move; |
| general extension cleanup. |
| #03 11/17/2008 Chuck Smith Clarified the results of EndTilingQCOM |
| for unpreserved buffers. |
| #02 11/10/2008 Chuck Smith Updates to clarify behavior; additions |
| to the Issues section. |
| #01 11/04/2008 Chuck Smith First draft. |