restore SkOpts::blit_row_color32

We think there are some seemingly minor codegen improvements (pmovzxbw
instead of punpcklbw) when compiling for SSE 4.1 that might actually be
a bigger deal speed-wise than they'd seem.

Also rewrite using SkVx in a way that should scale well up to AVX2.

Change-Id: Ie7c0194dc4fe9fe81c1c932187c0bb00da69190b
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/207260
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Lee Salzman <lsalzman@mozilla.com>
6 files changed