revise extract instruction

Convert extract(x,bits,z) to be (x >> bits) & z,
now a more explicit parallel to pack().

This lets us eliminate the funky bit counting required from the old
instruction, but more saliently it makes it more likely that the masks
we AND with will be the same value.

Ultimately down at the x86 or ARM ISA level, the AND instructions don't
really benefit from having an immediate argument (while the shifts do).
We might as well treat the mask as a normal value, letting it get
commoned with identical values, loop hoisted, etc.

