ARM Skia NEON patches - 41 - arm64: SkXfermode::xfer32

Currently the NEON code for Xfermodes performs well on arm64
targets except for dstout and dstin which are significantly
slower than the C code. This patch fixes this and gives
further improvements on other modes.

Here are some perf results:

+------------+------------+------------+
| mode       | Cortex-A53 | Cortex-A57 |
+------------+------------+------------+
| multiply   |    +24.58% |    +23.71% |
+------------+------------+------------+
| exclusion  |    +22.72% |    +22.05% |
+------------+------------+------------+
| difference |    +34.67% |    +36.82% |
+------------+------------+------------+
| hardlight  |    +17.07% |    +14.74% |
+------------+------------+------------+
| lighten    |    +38.21% |    +32.87% |
+------------+------------+------------+
| darken     |    +37.59% |    +32.99% |
+------------+------------+------------+
| overlay    |    +17.36% |    +16.88% |
+------------+------------+------------+
| screen     |    +52.56% |    +54.43% |
+------------+------------+------------+
| modulate   |    +62.85% |    +61.32% |
+------------+------------+------------+
| plus       |    +91.52% |   +117.41% |
+------------+------------+------------+
| xor        |    +42.86% |    +43.38% |
+------------+------------+------------+
| dstatop    |    +48.46% |    +48.99% |
+------------+------------+------------+
| srcatop    |    +50.50% |    +48.51% |
+------------+------------+------------+
| dstout     |    +67.83% |    +78.09% |
+------------+------------+------------+
| srcout     |    +69.02% |    +78.26% |
+------------+------------+------------+
| dstin      |    +70.92% |    +79.24% |
+------------+------------+------------+
| srcin      |    +68.90% |    +78.23% |
+------------+------------+------------+
| dstover    |    +73.80% |    +68.10% |
+------------+------------+------------+

Signed-off-by: Kévin PETIT <kevin.petit@arm.com>

BUG=skia
R=mtklein@google.com, djsollen@google.com

Author: kevin.petit@arm.com

Review URL: https://codereview.chromium.org/350343002
1 file changed