add sli.4s, use it in pack sometimes

We have pack(x,y,imm) = x | (y<<imm) assuming (x & (y<<imm)) == 0.

If we can destroy x, sli (shift-left-insert) lets us implement that
as x |= y << imm.  This happens quite often, so you'll see sequences
of pack that used to look like this

	shl	v4.4s, v2.4s, #8
	orr	v1.16b, v4.16b, v1.16b
	shl	v2.4s, v0.4s, #8
	orr	v0.16b, v2.16b, v3.16b
	shl	v2.4s, v0.4s, #16
	orr	v0.16b, v2.16b, v1.16b

now look like this

	sli	v1.4s, v2.4s, #8
	sli	v3.4s, v0.4s, #8
	sli	v1.4s, v3.4s, #16

We can do this thanks to the new simultaneous register assignment
and instruction selection I added.  We used to never hit this case.

Change-Id: I75fa3defc1afd38779b3993887ca302a0885c5b1
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/228611
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
3 files changed