Use tailcall recursion to process skcms ops.

This is dramatically faster on M1 processors (and likely other ARMs)
compared to a giant switch. It is also 25% faster on AMD EPYC (hsw).
The MSVS + clang-cl configuration fails tests when SKCMS_MUSTTAIL is
used, so SKCMS_MUSTTAIL is disabled in this configuration; I don't
have access to a Windows machine to check the difference in actual
code generation, but hopefully this is a negligible difference.

Change-Id: Ie2f9569d19f3804fcf3ba77b6148fe8f5c8e13f9
Bug: b/305974160
Commit-Queue: John Stiles <>
Reviewed-by: Brian Osman <>
4 files changed