skia /
skia /
a132c3869fcffb350d7a5ca7256496ab977bdd0c Faster and more accurate blit_row_s32a_opaque for ARM
Change ARM implementation of alpha blending to work on 8 pixels at a
time (using NEON). Also improve the accuracy of alpha blending by using
a formula based on SkMulDiv255Round rather than SkPMSrcOver.
Note that a number of variations of this code were considered. Here are
some notes:
- A 16 pixels at a time version was considered. This performs well for
the case of extreme alpha (all-opaque or all-transparent pixels), but
performs worst than the 8 pixels version when there are frequent
transitions of alpha. Also gcc 6.2.1 seems to have troubles with
register pressure when using this version.
- If the branch to detect the fully-opaque or fully-transparent cases
is removed, then the performance increases significantly for images
which are all partially transparent (especially on ARM Cortex A72),
but can significantly decrease for images that are almost fully
opaque or fully transparent.
This implementation is a compromise to the effects described above.
This patch produces a ~10% improvement on the nanobench's sub-scores
repeatTile_BGRA_8888_A, constXTile_MM_filter_trans, constXTile_CC_trans,
constXTile_RR_filter_trans when running on ARM Cortex A72. Improvements
of greater magnitude (20% to 30%) are observed when running on ARM
Cortex A53.
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
Change-Id: I1f0c9f549057613bbffd26e6651f3beeb0019af9
Bug: skia:
Reviewed-on: https://skia-review.googlesource.com/16520
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
1 file changed