align skvx::Vec<N,T> to N*sizeof(T)

This increases the alignment of these vector types.  I would have liked
to keep the alignment minimal, but it's probably no big deal either way.

In terms of code generation, it doesn't make much difference for x86 or
ARMv8, but it seems hugely important for good ARMv7 NEON code.  It's a
~10x difference for the bench I've been playing around with that spends
most of its time in that SkOpts::blit_row_color32 routine.

Bug: chromium:952502
Change-Id: Ib12caad6b9b3f3f6e821ed70bfb57099db37b15f
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/208581
Commit-Queue: Michael Ludwig <michaelludwig@google.com>
Reviewed-by: Michael Ludwig <michaelludwig@google.com>
Auto-Submit: Mike Klein <mtklein@google.com>
diff --git a/include/private/SkVx.h b/include/private/SkVx.h
index 66d63d9..310ac4e 100644
--- a/include/private/SkVx.h
+++ b/include/private/SkVx.h
@@ -16,7 +16,9 @@
 //
 // We've also fixed a few of the caveats that used to make SkNx awkward to work
 // with across translation units.  skvx::Vec<N,T> always has N*sizeof(T) size
-// and alignof(T) alignment and is safe to use across translation units freely.
+// and alignment and is safe to use across translation units freely.
+//
+// (Ideally we'd only align to T, but that tanks ARMv7 NEON codegen.)
 
 #include "SkTypes.h"         // SK_CPU_SSE_LEVEL*, etc.
 #include <algorithm>         // std::min, std::max
@@ -38,7 +40,7 @@
 // This gives Vec a consistent ABI, letting them pass between files compiled with
 // different instruction sets (e.g. SSE2 and AVX2) without fear of ODR violation.
 template <int N, typename T>
-struct Vec {
+struct alignas(N * sizeof(T)) Vec {
     static_assert((N & (N-1)) == 0, "N must be a power of 2.");
 
     Vec<N/2,T> lo, hi;