add vzeroupper to GCC builds

Turning on the HSW code slice improved performance on Clang
bots but hurt it on GCC bots.  I think it's due to lack of
vzeroupper, which helps minimize the penalty when switching
from AVX code back to SSE code.  I saw Clang had them in
exec_ops_hsw()/run_program() in the places you'd expect,
but GCC did not.  This adds one manually in GCC builds.

Change-Id: Ic0e93d991208674f6163929f04c9fd2acfcd0ae1
Reviewed-on: https://skia-review.googlesource.com/118260
Reviewed-by: Brian Osman <brianosman@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
1 file changed