cbf4fba43933302a846872e4c5ce8f1adb8b325e - skia.git

commit	cbf4fba43933302a846872e4c5ce8f1adb8b325e	[log] [tgz]
author	mtklein <mtklein@chromium.org>	Tue Nov 17 14:19:52 2015 -0800
committer	Commit bot <commit-bot@chromium.org>	Tue Nov 17 14:19:52 2015 -0800
tree	96dad6cc0a2241544a0cf52cccdc7a0fbe89f9b1
parent	56847a65648af4d06da9c26c55242949a1bf31ab [diff]

div255(x) as ((x+128)*257)>>16 with SSE

_mm_mulhi_epu16 makes the (...*257)>>16 part simple.
This seems to speed up every transfermode that uses div255(),
in the 7-25% range.

It even appears to obviate the need for approxMulDiv255() on SSE.
I'm not sure about NEON yet, so I'll keep approxMulDiv255() for now.

Should be no pixels change:
https://gold.skia.org/search2?issue=1452903004&unt=true&query=source_type%3Dgm&master=false

BUG=skia:
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1452903004

src/core/Sk4px.h[diff]
src/opts/Sk4px_NEON.h[diff]
src/opts/Sk4px_SSE2.h[diff]
src/opts/Sk4px_none.h[diff]

4 files changed