| Name |
| |
| QCOM_shading_rate |
| |
| Name Strings |
| |
| GL_QCOM_shading_rate |
| |
| Contributors |
| |
| Jeff Leger |
| Robert VanReenen |
| |
| Contact |
| |
| Jeff Leger - jleger 'at' qti.qualcomm.com |
| |
| Status |
| |
| Complete |
| |
| Version |
| |
| Last Modified Date: April 22, 2020 |
| Revision: #2 |
| |
| Number |
| |
| OpenGL ES Extension #279 |
| |
| Dependencies |
| |
| OpenGL ES 2.0 is required. This extension is written against OpenGL ES 3.2. |
| |
| This extension interacts with OVR_Multiview. |
| This extension interacts with QCOM_framebuffer_foveated and QCOM_texture_foveated |
| |
| When this extension is advertised, the implementation must also advertise GLSL |
| extension "GL_EXT_fragment_invocation_density" (documented separately), which |
| provides new built-in variables that allow fragment shaders to determine the |
| effective shading rate used for fragment invocations. |
| |
| Overview |
| |
| By default, OpenGL runs a fragment shader once for each pixel covered by a |
| primitive being rasterized. When using multisampling, the outputs of that |
| fragment shader are broadcast to each covered sample of the fragment's |
| pixel. When using multisampling, applications can optionally request that |
| the fragment shader be run once per color sample (e.g., by using the "sample" |
| qualifier on one or more active fragment shader inputs), or run a minimum |
| number of times per pixel using SAMPLE_SHADING enable and the |
| MinSampleShading frequency value. |
| |
| This extension allows applications to specify fragment shading rates of less |
| than 1 invocation per pixel. Instead of invoking the fragment shader |
| once for each covered pixel, the fragment shader can be run once for a |
| group of adjacent pixels in the framebuffer. The outputs of that fragment |
| shader invocation are broadcast to each covered samples for all of the pixels |
| in the group. The initial version of this extension allows for groups of |
| 1, 2, 4, 8, and 16 pixels. |
| |
| This can be useful for effects like motion volumetric rendering |
| where a portion of scene is processed at full shading rate and a portion can |
| be processed at a reduced shading rate, saving power and processing resources. |
| The requested rate can vary from (finest and default) 1 fragment shader |
| invocation per pixel to (coarsest) one fragment shader invocation for each |
| 4x4 block of pixels. Implementations are given wide latitude to rasterize |
| at the requested rate or any other rate that is less coarse. |
| |
| New Tokens |
| |
| Accepted by the <pname> parameter of GetIntegerv, GetInterger64v |
| and GetFloatv: |
| |
| SHADING_RATE_QCOM 0x96A4 |
| |
| Accepted by the <cap> parameter of Enable, Disable, IsEnabled: |
| |
| SHADING_RATE_PRESERVE_ASPECT_RATIO_QCOM 0x96A5 |
| |
| Allowed in the <rate> parameter in ShadingRateQCOM: |
| SHADING_RATE_1X1_PIXELS_QCOM 0x96A6 |
| SHADING_RATE_1X2_PIXELS_QCOM 0x96A7 |
| SHADING_RATE_2X1_PIXELS_QCOM 0x96A8 |
| SHADING_RATE_2X2_PIXELS_QCOM 0x96A9 |
| SHADING_RATE_4X2_PIXELS_QCOM 0x96AC |
| SHADING_RATE_4X4_PIXELS_QCOM 0x96AE |
| |
| New Procedures and Functions |
| |
| void ShadingRateQCOM(enum rate); |
| |
| Modifications to the OpenGL ES 3.2 Specification |
| |
| Modify Section 8.14.1, Scale Factor and Level of Detail, p. 196 |
| |
| (Modify the function approximating Scale Factor (P), to allow implementations |
| to scale implicit derivatives based on the shading rate. The scale occurs before |
| the LOD bias and before LOD clamping). |
| |
| Modify the definitions of (mu, mv, mw): |
| |
| | du du | |
| mu = max | ----- , ----- | |
| | dx dy | |
| |
| | dv dv | |
| mv = max | ----- , ----- | |
| | dx dy | |
| |
| | dw dw | |
| mw = max | ----- , ----- | |
| | dx dy | |
| to: |
| | du du | |
| mu = max | ---- * sx , ---- * sy | |
| | dx dy | |
| |
| | dv dv | |
| mv = max | ---- * sx , ---- * sy | |
| | dx dy | |
| |
| | dw dw | |
| mw = max | ---- * sx , ---- * sy | |
| | dx dy | |
| |
| where (sx, sy) refer to _effective shading rate_ (w', h') specified in |
| section 13.X.2. |
| |
| Modify Section 13.4, Multisampling, p. 353 |
| |
| (add to the end of the section) |
| |
| When SHADING_RATE_QCOM is set to a value other than SHADING_RATE_1x1_PIXELS_QCOM, |
| the rasterization will occur at the _effective shading rate_ (Section 13.X) and |
| will result in fragments covering a <W>x<H> group of pixels. |
| |
| When multisample rasterization is enabled, the samples of the fragment will consist |
| of the samples for each of the pixels in the group. The fragment center will be |
| the center of this group of pixels. Each fragment will include a coverage value |
| with (W x H x SAMPLES) bits. For example, if GL_SHADING_RATE_QCOM is is 2X2 and the |
| currently bound framebuffer object has SAMPLES equal to 4 (4xMSAA), then the fragment |
| will consist of 4 pixels and 16 samples. Similarly, each fragment will have |
| (W * H * SAMPLES) depth values and associated data. |
| |
| The contents of Section 13.4.1, Sample Shading, p. 355 is moved to the new Section 13.X.3, "Sample Shading". |
| |
| Add new section 13.X before Section 13.5, Points, p. 355 |
| |
| Section 13.X, Shading Rate |
| |
| By default, each fragment processed by programmable fragment processing |
| corresponds to a single pixel with a single (x,y) coordinate. When using |
| multisampling, implementations are permitted to run separate fragment shader |
| invocations for each sample, but often only run a single invocation for all |
| samples of the fragment. We will refer to the density of fragment shader |
| invocations as the _shading rate_. |
| Applications can use the shading rate to increase the size of fragments to |
| cover multiple pixels and reduce the amount of fragment shader work. |
| Applications can also use the shading rate to explicitly control the minimum |
| number of fragment shader invocations when multisampling. |
| |
| Section 13.X.1, Shading Rate Control |
| |
| The shading rate can be controlled with the command |
| |
| void ShadingRateQCOM(enum rate); |
| |
| <rate> specifies the value of SHADING_RATE_QCOM, and defines the |
| _shading rate_. Valid values for <rate> are described in |
| table X.1 |
| |
| Shading Rate Size |
| ---------------------------- ----- |
| SHADING_RATE_1X1_PIXELS_QCOM 1x1 |
| SHADING_RATE_1X2_PIXELS_QCOM 1x2 |
| SHADING_RATE_2X1_PIXELS_QCOM 2x1 |
| SHADING_RATE_2X2_PIXELS_QCOM 2x2 |
| SHADING_RATE_4X2_PIXELS_QCOM 4x2 |
| SHADING_RATE_4X4_PIXELS_QCOM 4x4 |
| |
| Table X.1: Shading rates accepted by ShadingRateQCOM. An |
| entry of "<W>x<H>" in the "Size" column indicates that the shading |
| rate request for fragments with a width and height (in pixels) of <W> |
| and <H>, respectively. |
| |
| If the shading rate is specified with ShadingRateCOM, it will apply to all |
| draw buffers. If the shading rate has not been set , the shading rate |
| will be SHADING_RATE_1x1_PIXELS_QCOM. In either case, the shading rate will |
| be further adjusted as described in the following sections. |
| |
| Section 13.X.2, Effective Shading Rate |
| |
| The value of SHADING_RATE_QCOM, in combination with other GL state, |
| is used to derive an adjusted rate or _effective shading rate_, as |
| as described in this section. |
| |
| Where possible, implementations should provide an _effective shading rate_ |
| equal to the SHADING_RATE_QCOM. When this is not possible, an adjusted |
| _effective shading rate_ may be used as described in this section. While |
| there is no API for querying the _effective shading rate_, the value of this |
| parameter exists, can be queried from the fragment shader built-in gl_FragSizeEXT, |
| and is referred to in a number of places in the specification. Implementations |
| may also adjust the shading rate for other reasons not listed here. |
| |
| Implementations derive the _effective shading rate_ in an implementation-dependent |
| manner. When rendering to the default framebuffer, the rate may be adjusted |
| to 1x1. When sample shading (section 13.X.3 Sample Shading) is enabled, the |
| rate may be adjusted to 1x1. When the fragment shader uses GLSL built-in |
| input variables gl_SampleMaskIn[], gl_SampleMask[], or uses variables |
| declared with "centroid in", the rate may be adjusted to 1x1. When sample coverage |
| or sample mask operations are enabled (Section 13.8.3 Multisample Fragment |
| Operations), the rate may be adjusted to 1x1. |
| |
| The shading rate may be adjusted to limit the number of samples covered by a |
| fragment. For example, if the implementation supports a maximum of 16 samples |
| per fragment and if GL_SHADING_RATE_QCOM is 4X4 and the currently bound |
| framebuffer object has SAMPLES equal to 4 (4xMSAA), then the number of samples |
| per coarse fragment would be 64. In such an example, an implementation may |
| adjust the shading rate to a rate with 16 or fewer samples (e.g., 2x2). |
| |
| If the active fragment shader uses any inputs that are qualified with |
| "sample" (unique values per sample), including the built-ins "gl_SampleID" |
| and "gl_SamplePosition", or the built-in function "interpolateAtSample", |
| the shader code is written to expect a separate shader invocation for each |
| shaded sample. For such fragment shaders, the shading rate is adjusted to |
| 1x1. |
| |
| If the <W>x<H> value of SHADING_RATE_QCOM is expressed as <w, h> then the |
| adjusted rate may be any <w', h'> as long as (w' * h') <= (w * h). If |
| PRESERVE_SHADING_RATE_ASPECT_RATIO is TRUE, then the implementation further |
| guarantees that (w'/h') equals (w/h) or that w'=1 and h'=1. |
| |
| Section 13.X.3 Sample Shading |
| |
| [[The contents from Section 13.4.1, Sample Shading, p. 355 is copied here]] |
| |
| Modifications to Section 13.8.2, Scissor Test (p. 367) |
| (add to the end of the section) |
| |
| When the _effective shading rate_ results in fragments covering more than one pixel, |
| the scissor tests are performed separately for each pixel in the fragment. |
| If a pixel covered by a fragment fails the scissor test, that pixel is |
| treated as though it was not covered by the primitive. If all pixels covered |
| by a fragment are either not covered by the primitive being rasterized or fail |
| the scissor test, the fragment is discarded. |
| |
| Modifications to Section 13.8.3, Multisample Fragment Operations (p. 368) |
| |
| (modify the last sentence of the the first paragraph to indicate that sample mask |
| operations are performed when shading rate is used, even if multisampling is not |
| enabled which can produce fragments covering more than one pixel where each pixel |
| is considered a "sample") |
| |
| Change the following sentence from: |
| "If the value of SAMPLE_BUFFERS is not one, this step is skipped." |
| to: |
| "This step is skipped if SAMPLE_BUFFERS is not one, unless SHADING_RATE_QCOM |
| is set to a value other than SHADING_RATE_1x1_PIXELS_QCOM." |
| |
| (add to the end of the section) |
| |
| When the _effective shading rate_ results in fragments covering more than one pixel, |
| each fragment will generate a composite coverage mask that includes separate |
| coverage bits for each sample in each pixel covered by the fragment. This |
| composite coverage mask will be used by the GLSL built-in input variable |
| gl_SampleMaskIn[] and updated according to the built-in output variable |
| gl_SampleMask[]. The number of composite coverage mask bits in the built-in |
| variables and their mapping to a specific pixel and sample number |
| within that pixel is implementation-defined. |
| |
| Modify Section 14.1, Fragment Shader Variables (p. 370) |
| |
| (modify sixth paragraph, p. 371, specifying that the "centroid" location |
| for multi-pixel fragments is implementation-dependent, and is allowed to |
| be outside the primitive) |
| |
| After the following sentence: |
| "When interpolating variables declared using "centroid in", |
| the variable is sampled at a location within the pixel covered |
| by the primitive generating the fragment." |
| Add the following sentence: |
| "When the _effective shading rate_ results in fragments covering more than one |
| pixel, variables declared using "centroid in" are sampled from an |
| implementation-dependent location within any one of the covered pixels." |
| |
| Modify Section 15.1, Per-Fragment Operations (p. 378) |
| |
| (insert a new paragraph after the first paragraph of the section) |
| |
| When the _effective shading rate_ results in fragments covering multiple pixels, |
| the operations described in the section are performed independently for |
| each pixel covered by the fragment. The set of samples covered by each pixel |
| is determined by extracting the portion of the fragment's composite coverage |
| that applies to that pixel, as described in section 13.8.3. |
| |
| Errors |
| |
| INVALID_ENUM is generated by ShadingRateQCOM if <rate> is not |
| a valid shading rate from table X.1 |
| |
| New State |
| |
| Add to table 21.7, Rasterization |
| |
| Get Value Type Get Command Initial Value Description Sec |
| ------------------------------------- ---- ----------- -------------------------------- -------------- ------ |
| SHADING_RATE_QCOM E GetIntegerV SHADING_RATE_1x1_PIXELS_BIT_QCOM shading rate 13.X.1 |
| PRESERVE_SHADING_RATE_ASPECT_RATIO_QCOM B IsEnabled FALSE maintain aspect 13.X.2 |
| |
| Interactions with OVR_Multiview |
| |
| If OVR_Multiview is supported, SHADING_RATE_QCOM applies to all views. |
| |
| Interactions with QCOM_framebuffer_foveated and QCOM_texture_foveated |
| |
| QCOM_framebuffer_foveated and QCOM_texture_foveated specify a pixel |
| density which is exposed as a fragment size via the fragment |
| shader built-in gl_FragSizeEXT. This extension defines an effective |
| shading rate which is also exposed as a fragment size using the via the |
| same built-in. If either foveation extension is enabled in conjunction with |
| this extension, then the value of gl_FragSizeEXT is the component-wise product |
| of both fragment sizes. |
| |
| Issues |
| |
| (1) Should the application-specified rate in ShadingRateCOM() be a "hint" |
| that can be ignored by the driver, or is the driver reqired to honor |
| the requested rate? |
| |
| RESOLVED: The driver should honor the application-specified rate where |
| possible, but is allowed to use an adjusted rate due to implementation- |
| depdendent reasons. The specific rates supported in the hardware and the |
| specific conditions when the rates needs to be adjusted can differ across |
| different Adreno GPU families. This extension gives drivers the flexibility to |
| expose this extension on early hardware that may have restrictions and oddities |
| while providing applications some (admittedly limited) control over the adjusted |
| rate that will be selected. The actual rate is always exposed via the fragment |
| shader built-in. |
| |
| (2) If the application-specified rate is only a hint, can developers expect that all the |
| shading rates exposed by this extension are supported natively by the HW? |
| |
| RESOLVED: The initial version of this extension exposes token values for |
| shading rates of 1x1, 1x2, 2x1, 2x2, 4x2, and 4x4. Most Adreno GPUs supporting |
| this extension are expected to support all those rates, although some early HW |
| may support fewer rates. Note that this extension does not include shading |
| rates of 1x4, 4x1, nor 2x4 because Adreno GPUs may never support those rates. |
| Because a future version of this extension could support those rates, |
| we have reserved the token values (0x96AA, 0x96AB, and 0x96AD) for those rates. |
| |
| (3) How does this feature work with per-sample shading? |
| |
| RESOLVED: When using per-sample shading, an application is expecting a |
| fragment shader to run with a separate invocation per sample. The |
| shading rate might allow for a "coarsening" that would break such |
| shaders. Furthermore, some Adreno families may not support this |
| combination. We've chosen not to explicitly disallow this combination, |
| while giving implementions the flexibility to use an adjusted 1x1 sample |
| rate. |
| |
| (4) How do centroid-sampled variables work with fragments larger than one |
| pixel? |
| |
| RESOLVED: For single-pixel fragments, attributes declared with |
| "centroid" are sampled at an implementation-dependent location in the |
| intersection of the area of the primitive being rasterized and the area |
| of the pixel that corresponds to the fragment. With multi-pixel |
| fragments, attributes declared with "centroid" are sampled from an |
| implementation-dependent location within any of the covered pixels. |
| This wide allowance for implementation-dependent behavior |
| enables the extension to be exposed on early Adreno hardware. |
| |
| (5) How do built-in variables gl_SampleMask[] and gl_SampleMaskIn[] work with |
| fragments larger than one pixel? |
| |
| RESOLVED: For single-pixel fragments, gl_SampleMaskIn[] and gl_SampleMask[] |
| specify the input and output coverage bits for a single pixel, where bit 'B' |
| corresonds to SampleID 'B'. With this extension enabled, these built-ins would |
| specify the coverage bits for all the samples in all the pixels covered by the |
| fragment. In this extension, the exact behavior of gl_SampleMaskIn[] and |
| gl_SampleMask[] is implementation-dependent. For some Adreno GPUs, use of these |
| built-in variables will cause the driver to use a 1x1 adjusted sample rate. |
| In other cases, the exact mapping of bits to samples/pixels is implementation- |
| defined. This wide allowance for implementation-dependent behavior enables the |
| extension to be exposed on early Adreno hardware. |
| |
| (6) Are there any restrictions on framebuffer formats used with this feature? |
| For example, are EglImages that may contain multi-plane YUV formats supported? |
| |
| RESOLVED: It is implementation-dependent whether shading rate is supported for |
| all formats, or only certain formats. Implementations are allowed to adjust |
| the _effective sample rate_ based on the format. |
| |
| (7) Does the value of SHADING_RATE_QCOM affect the built in variable gl_Fragcoord? |
| |
| RESOLVED: Yes, when the shading rate results in fragments covering multiple pixels, |
| gl_Fragcoord will be the window relative coordinates (x,y,z,1/w) of the center of |
| the fragment. For non multisample cases this may not be at a pixel center. This may |
| break shaders that assume pixel center (0.5, 0.5) values for fragcoord. |
| |
| (8) Does the shading rate affect the value of gl_SamplePosition or gl_NumSamples? |
| |
| RESOLVED: No, neither built-in is affected. If the shader usess gl_SamplePosition, the |
| shader runs at sample-rate causing the shading rate to be ignored. gl_NumSamples is |
| is the number of samples in the framebuffer object which is unaffected by the value of |
| shading rate. |
| |
| (9) Should shading rate affect screen-space derivatives? |
| |
| RESOLVED: This extension scales the gradients between ajacent fragments by |
| the effecive shading rate (w', h'). The resulting increase in computed LOD |
| aligns well with the reduced fragment shader invocations in most use cases; |
| in other cases the shader author may want to bias the LOD to compensate. |
| Shader built-in instructions that return gradient values (dFdx, dFdy, and fwidth) |
| are similarly scaled for the same reason. |
| |
| |
| Revision History |
| |
| Rev. Date Author Changes |
| ---- -------- -------- ---------------------------------------------- |
| 1 03/17/20 jleger Initial draft. |
| 2 04/22/20 jleger Relaxed the <w', h'> guarantee from "w'<=w and |
| h'<=h" to "w’*h’ <= w*h". |