Add SSE2 and SSE4.1 limited builds.

As the machines we test on get fancier and fancier, we're losing
coverage of older instruction sets.  This adds a flag to skip
CPU detection, and two new builds to test SSE2 and SSE4.1.

