skcms_OptimizeForSpeed()

Add a skcms_OptimizeForSpeed(), implement tf13_{r,g,b}, and rewrite the
bench tool to allow running with or without optimization.

Big speedup already, and that's only the forward direction.

Change-Id: Ic7c3f66ebea82b5f7116523681f47eabc389e2e4
Reviewed-on: https://skia-review.googlesource.com/123045
Commit-Queue: Brian Osman <brianosman@google.com>
Reviewed-by: Brian Osman <brianosman@google.com>
Auto-Submit: Mike Klein <mtklein@chromium.org>
6 files changed