impl gather8/gather16 with gather32

This is our quick path to JIT small gathers.

The idea is roughly,

   const uint32_t* ptr32 = ptr8;
   uint32_t abcd = ptr32[ix/4];
   switch (ix & 3) {
     case 3: return (abcd >> 24)       ;
     case 2: return (abcd >> 16) & 0xff;
     case 1: return (abcd >>  8) & 0xff;
     case 0: return (abcd      ) & 0xff;
   }

With the idea that if we may load a given byte,
we should also be allowed to load the four byte
aligned word that byte falls within.

Change-Id: I7fb1085306050c918ccf505f1d2e1e87db3b8c9a
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/268381
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
3 files changed