refactor _N and _1 stages

Instead of having _1 stages to handle the jagged tail, have
skcms_Transform() do what they do, leaving only the _N stages (now
stripped of their _N).

This makes things a bit easier to work with, cuts 1-3K of code size,
and may even make the 0<n<N case faster too, because we handle it in
one pass now instead of the previous n passes.  In exchange, we call
out to memcpy() twice.

Change-Id: I4d3629e76d8e0ba9c307e391770777b8b2d24eb8
Reviewed-on: https://skia-review.googlesource.com/96867
Reviewed-by: Brian Osman <brianosman@google.com>
1 file changed