Allow parallel feature checks

Checking for non-zero means that calls like `has(tAVX512F | tAVX512VL)`
may _not_ work in the case that _only one_ of the checked flags are
true, since it only masks the bit-flag and checks for non-zero, when it
should mask the bits and then check if it equals the mask again to
ensure that all bits are active.

Currently, the user has to call `has` multiple times:
```cpp
if( cpu.has(tAVX512F) && cpu.has(tAVX512VL) && cpu.has(tAVX512BW) )
{
    ...
}
```

this PR allows them all to be checked in parallel with one call
```cpp
if( cpu.has(tAVX512F | tAVX512VL | tAVX512BW) )
{
    ...
}
```
diff --git a/xbyak/xbyak_util.h b/xbyak/xbyak_util.h
index 8a40e70..300ee7c 100644
--- a/xbyak/xbyak_util.h
+++ b/xbyak/xbyak_util.h
@@ -513,7 +513,7 @@
 	}
 	bool has(Type type) const
 	{
-		return (type & type_) != 0;
+		return (type & type_) == type;
 	}
 };