Do loads and math in parallel in SkColorXform_opts

Note that baselines have changed a little since I
recently started using clang.

201295.jpg on HP z620 (300x280)

Skia Xform sRGB Dst Before    0.378 ms
Skia Xform sRGB Dst After     0.322 ms
                              1.17x

Skia Xform 2.2  Dst Before    0.428 ms
Skia Xform 2.2  Dst After     0.395 ms
                              1.08x

QCMS Xform                    0.418 ms

sRGB Dst vs QCMS              1.30x
2.2  Dst vs QCMS              1.06x

--------------------------------------------

Nexus 6P:
Skia Xform sRGB Dst Before    1.58 ms
Skia Xform sRGB Dst After     1.43 ms
Skia Xform 2.2  Dst Before    2.69 ms
Skia Xform 2.2  Dst After     2.62 ms

Dell Venue 8:
Skia Xform sRGB Dst Before    2.78 ms
Skia Xform sRGB Dst After     2.74 ms
Skia Xform 2.2  Dst Before    3.73 ms
Skia Xform 2.2  Dst After     3.64 ms

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2081933005
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review-Url: https://codereview.chromium.org/2081933005
1 file changed