e9cc5993398f5bcad9bf62201538c73ae86424ca - skcms

commit	e9cc5993398f5bcad9bf62201538c73ae86424ca	[log] [tgz]
author	John Stiles <johnstiles@google.com>	Mon Nov 06 15:04:27 2023 -0500
committer	SkCQ <skcq-be@skia-corp.google.com.iam.gserviceaccount.com>	Mon Nov 06 20:44:59 2023 +0000
tree	00f90556864e185710036363caf36a4f71727faa
parent	cd60a4f973b5d17b17ffdbb2d86078b4d1aff45a [diff]

Replace automatic CPU detection with explicit flags. Context: We cannot reliably use attributes to change the compile architecture without getting suboptimal codegen (http://go/llvm-issue/64706) in Clang. Previously, these issues were minor because the skcms main loop was all inside one big function. Now we are restructuring things to split up work into multiple small functions, so we can no longer ignore it. Because we cannot use attribute(target) to control the architecture, we will need to split out the HSW and SKX implementations into separate translation units. (This will happen in followup CLs.) These translation units will not be able to infer the project's "default" architecture via #if defined(...) since they will be using -march=haswell or -march=skylake in their build config. Following that thread of logic, this means that we cannot rely on probing the "default" architecture when dealing with Haswell+ or Skylake+. Instead of probing the architecture, we now have explicit flags: - SKCMS_DISABLE_HSW - SKCMS_DISABLE_SKX These flags will prevent the HSW and SKX code from being compiled. The baseline implementation cannot be disabled. And to control testing, we have three: - SKCMS_FORCE_BASELINE - SKCMS_FORCE_HSW - SKCMS_FORCE_SKX Setting one of these flags will force that implementation to be used, regardless of the current machine's capabilities. Finally, SKCMS_NO_RUNTIME_CPU_DETECTION was removed. I wasn't able to find any projects which relied on this flag, fortunately. The "baseline" setting on Intel is considered to be 4-lane SSE as before, and will continue to be SSE2 or SSE4.1 based on the architecture detected at compile time. (There has never been a separate Transform function for SSE2 and SSE4.1; perhaps the benefit from SSE4.1 is too small to justify the increase in code size?) The previously existing SKCMS_PORTABLE flag continues to exist and will override all of these flags to generate fully scalar code. Change-Id: I0a5ae70c65f8277918020d3b1dc0ab8c875627be Reviewed-on: https://skia-review.googlesource.com/c/skcms/+/774641 Reviewed-by: Brian Osman <brianosman@google.com> Commit-Queue: John Stiles <johnstiles@google.com>