more C++ cleanup

Mainly, convert macro CAST(D,v) to cast<D>(S v).

Now that we've got functions passing around U64 by value,
GCC warns bogusly about using registers that aren't enabled.
That doesn't really happen... I think its front-end is confused.

Disabling -Wpsabi stifles that for return values, and passing U64
by const& for arguments.  We can make our 16x4 byte swap function
look a little more normal now too.

And I tidied up the warning supressions a little.  No real reason
to ever re-enable warnings we disable... Transform_inl.h is only
ever included from

