handle x86 tail in JIT code too

Basically the same deal as aarch64:

    - a bunch of instructions to rewrite control
      flow to be two loops, body and tail

    - a bunch of instructions to support scalar
      loads and stores in the tail

We can now remove the JIT::mask field.

I've removed the SkUNREACHABLE I'd put in for the ARM code...  as
written the interpreter is still reachable by the loser if two threads
race to JIT the program.  Medium term I plan to move JIT compilation to
a more proactive time, eliminating the need for the lock and letting the
interpreter become truly unreachable.

I had a little bit of a false start with what instructions to use for
scalar load8 and store8, first starting with instructions that loaded
via GP registers, then remembering vpinsrb and vpextrb can take a memory
argument, loading into xmm directly.  I've left the first instructions I
used in the file, still implemented but only used from the unit tests.
They're pretty common and will probably be useful some day.

Change-Id: I471b13026af4b1c6e861a53159f9df5f0285447c
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/227178
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
3 files changed