7f061fb53b9e347d8f5eb951b8d73aa4ce68c004 - skia

commit	7f061fb53b9e347d8f5eb951b8d73aa4ce68c004	[log] [tgz]
author	Mike Klein <mtklein@google.com>	Thu Jun 13 13:12:38 2019 -0500
committer	Skia Commit-Bot <skia-commit-bot@chromium.org>	Thu Jun 13 21:32:45 2019 +0000
tree	545cb33e902723f78324cceb7a003ee982be5d7c
parent	f7a57f986a60dca9e153af2fcff6868db4c00fd1 [diff]

streamline srcover math in I32_SWAR

This is the final bunny I've got in my hat, I think...

Remembering that none of the s += d*invA adds can overflow,
we can use a single 32-bit add to add them all at once.

This means we don't have to unpack the src pixel into rb/ga
halves.  We need only extract the alpha for invA.

This brings I32_SWAR even with the Opts code!

curr/maxrss	loops	min	median	mean	max	stddev	samples   	config	bench
  36/36  MB	133	0.206ns	0.211ns	0.208ns	0.211ns	1%	▁▇▁█▁▇▁▇▁▇	nonrendering	SkVM_4096_I32_SWAR
  37/37  MB	152	0.432ns	0.432ns	0.434ns	0.444ns	1%	▃▁▁▁▁▃▁▁█▁	nonrendering	SkVM_4096_I32
  37/37  MB	50	0.781ns	0.794ns	0.815ns	0.895ns	5%	▆▂█▃▅▂▂▁▂▁	nonrendering	SkVM_4096_F32
  37/37  MB	76	0.773ns	0.78ns	0.804ns	0.907ns	6%	▄█▅▁▁▁▁▂▁▁	nonrendering	SkVM_4096_RP
  37/37  MB	268	0.201ns	0.203ns	0.203ns	0.204ns	0%	█▇▆▆▆▆▁▆▆▆	nonrendering	SkVM_4096_Opts

Change-Id: Ibf0a9c5d90b35f1e9cf7265868bd18b7e0a76c43
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/220805
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>

2 files changed

tree: 545cb33e902723f78324cceb7a003ee982be5d7c