pass SkVx::Vec arguments as const&

Yet another surprising finding when looking at ARM code generation is
that passing these values to functions by const& does make a difference,
even when fully inlined.  I can only guess that the compiler's somehow
more sure that way that the values won't change?  Anyway, convert all
skvx functions that take Vec arguments to take const Vec& instead.

This tweak is enough to let the natural implementation of mull()
actually produce good code generation, so I've promoted that to SkVx.h
and added a unit test.  Notice in the NEON case we've got a base case at
N=8 and two recursive cases, one down to 8 as usual when N > 8, but also
one up to 8 when N < 8.

This also is another big speedup for ARMv7 NEON, bringing it to nearly
the same speed as ARMv8 NEON on the same device.

Bug: chromium:952502
Change-Id: I0f19bab45cf02222ccc8090053ea2a4a380f1dfe
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/208582
Commit-Queue: Michael Ludwig <michaelludwig@google.com>
Auto-Submit: Mike Klein <mtklein@google.com>
Reviewed-by: Michael Ludwig <michaelludwig@google.com>
3 files changed