explicitly vectorize sk_memset{16,32,64}
This ought to help clients who don't enable autovectorization.
With autovectorization enabled, this new version is like,
hyper-vectorized compared to the old autovectorization.
Instead of handling 128 bytes max per loop, it now
handles up to 512 bytes per loop. Pretty exciting.
Locally perf effects are a mix, but we'd expect this to help
Chrome unambiguously if they've turned off autovectorization.
$ out/ok bench:samples=100 sw filter:match=memset32_\\d\* serial
Before:
[memset32_100000] 16ms @0 20.1ms @99 20.2ms @100
[memset32_10000] 1.07ms @0 1.26ms @99 1.31ms @100
[memset32_1000] 73.9µs @0 89.4µs @99 90.1µs @100
[memset32_100] 8.59µs @0 9.74µs @99 9.96µs @100
[memset32_10] 7.45µs @0 8.96µs @99 8.99µs @100
[memset32_1] 2.29µs @0 2.81µs @99 2.92µs @100
After:
[memset32_100000] 16.2ms @0 17.3ms @99 17.3ms @100
[memset32_10000] 1.06ms @0 1.18ms @99 1.23ms @100
[memset32_1000] 72µs @0 75.6µs @99 84.7µs @100
[memset32_100] 9.14µs @0 10.6µs @99 10.7µs @100
[memset32_10] 5.43µs @0 5.88µs @99 5.99µs @100
[memset32_1] 3.43µs @0 3.65µs @99 3.83µs @100
BUG=chromium:755391
Change-Id: If9059a30ca7a345f1f7c37bd51473c29e8bb8922
Reviewed-on: https://skia-review.googlesource.com/34746
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-on: https://skia-review.googlesource.com/37000
Reviewed-by: Mike Klein <mtklein@chromium.org>
1 file changed