overhaul blit_row_s32a_opaque()

  - Remove branching on source alpha, which
    makes Skia susceptible to timing attacks.

  - Remove SSE4.1 variant, which is nearly identical
    to the SSE2 code once branching's removed.

  - Reroll SIMD loops back to their native vector
    size, leaving unrolling to the compiler.

  - Allow wider SIMD sets to cascade down into the
    narrower ones for the last few pixels instead of
    always hitting the scalar fallback.

  - Move code around, rewrite, refactor, etc. so it
    all reads more consistently.

  blit_row_color32() has not changed at all here,
  just moved to the bottom of the file to prevent
  it from interrupting blit_row_s32a_opaque().

  This prevents me from seeing the timing reconstructions
  on the Timing sample everywhere I've tested.

Charts to watch (ARMv7, ARMv8 AVX2, AVX512 geometric mean of SKP rendering):


Bug: chromium:1088224
Change-Id: I4baf2ccd58ac3129ef01c2cf79c3826c9c0d389e
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/313716
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Mike Reed <reed@google.com>
Reviewed-by: Herb Derby <herb@google.com>
1 file changed