refine __chkstk_darwin() workaround

Apple's Clang has a bug rooted in these three conflicting constraints:

   A) before we save a large amount of state to the stack, call
      __chkstk_darwin() to um, do something to make that safe.  It's
      some sort of debug feature that involves marking various pages
      as writable and unwritable to catch stack overflows.

   B) before calling any function that doesn't use AVX, which includes
      __chkstk_darwin(), call vzeroupper as a performance enhancement

   C) we must save a large amount of state to the stack before it's
      sound to call vzeroupper, so it can be restored after the
      function call.  Otherwise the upper 128 bits of all ymm registers
      will be lost, zeroed.

There's no way to order A,B, and C to make them all happy.

Saving registers before zeroing them (C) is a correctness issue, so it's
got to take precedence.  Zeroing the upper bits of ymm registers before
calling into code that's not ymm-aware (B) is a performance issue only,
and not actually even needed for __chkstk_darwin().  The whole
__chkstk_darwin() thing (A) is nice a safety feature we'll have to live
without.

The best fix would be to make it so that Clang doesn't issue vzeroupper
before __chkstk_darwin(), but we can't do that here.  The next best
thing, since mucking with correctness isn't really viable, is to turn of
the stack checking.

Until now we've been using -O1 to reduce the stack usage down low enough
that the stack check calls are not generated.  Maybe there's a known
safe single page, and the call's only needed when going above that?

Anyway, let's try focusing our workaround on -fstack-check itself.

Bug: skia:9709
Change-Id: Ie236101d2d464526b33e327db1f94574a7a86948
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/294326
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
1 file changed