Replace automatic CPU detection with explicit flags.
Context: We cannot reliably use attributes to change the compile
architecture without getting suboptimal codegen
(http://go/llvm-issue/64706) in Clang. Previously, these issues
were minor because the skcms main loop was all inside one big
function. Now we are restructuring things to split up work into
multiple small functions, so we can no longer ignore it.
Because we cannot use attribute(target) to control the architecture,
we will need to split out the HSW and SKX implementations into
separate translation units. (This will happen in followup CLs.)
These translation units will not be able to infer the project's
"default" architecture via #if defined(...) since they will be
using -march=haswell or -march=skylake in their build config.
Following that thread of logic, this means that we cannot
rely on probing the "default" architecture when dealing with
Haswell+ or Skylake+.
Instead of probing the architecture, we now have explicit flags:
These flags will prevent the HSW and SKX code from being compiled.
The baseline implementation cannot be disabled.
And to control testing, we have three:
Setting one of these flags will force that implementation to be
used, regardless of the current machine's capabilities.
Finally, SKCMS_NO_RUNTIME_CPU_DETECTION was removed. I wasn't able to
find any projects which relied on this flag, fortunately.
The "baseline" setting on Intel is considered to be 4-lane SSE
as before, and will continue to be SSE2 or SSE4.1 based on the
architecture detected at compile time. (There has never been a
separate Transform function for SSE2 and SSE4.1; perhaps the
benefit from SSE4.1 is too small to justify the increase in
The previously existing SKCMS_PORTABLE flag continues to exist
and will override all of these flags to generate fully scalar
Reviewed-by: Brian Osman <email@example.com>
Commit-Queue: John Stiles <firstname.lastname@example.org>
7 files changed