fold together clut<8> and clut<16>

As you might guess by now, somehow both smaller and faster.

Since there's only one clut() and one call to it now, it's much more
likely to be inlined, which means writing into *r, *g, *b is just as
cheap as temporaries now.

Change-Id: Ie0d4f828333785c58f87e970446b92da6b9b207c
Reviewed-on: https://skia-review.googlesource.com/c/162420
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Brian Osman <brianosman@google.com>
Auto-Submit: Mike Klein <mtklein@google.com>
2 files changed