Replace SIMD CRC-32 k6' magic constant with Px'

This is just a change of name (and a change of comment), not a change of
value.

From my colleague, Eric Biggers:

The number is not k6' = (x^32 mod P(x) << 32)' << 1 = 0x1DB710640, but
rather P(x)' = 0x1DB710641. This value represents the generator
polynomial of CRC-32 as a 33-bit value, using the CRC-32 bit ordering
convention (lowest order bit = coefficient of highest order polynomial
term). The lowest order bit is the coefficient of x^32, which by
definition is 1 for CRC-32. P(x) is used as a multiplicand in Barrett
reduction, which is the step which reduces a 64-bit value to the final
32-bit remainder. See "Algorithm 1" in the Gopal paper.

The reason that last bit doesn't actually matter is because the output
of the Barrett reduction is supposed to be fully reduced mod P(x), so
only 32 bits (the bits that represent the coefficients of x^0..x^31) are
supposed to be able to be nonzero. Therefore, in practice
implementations will just truncate the result to 32 bits. Incorrectly
using 0x1DB710640 rather than P(x) = 0x1DB710641 only causes the result
to be wrong by a factor of x^32, so it doesn't affect those 32 bits.
diff --git a/release/c/wuffs-unsupported-snapshot.c b/release/c/wuffs-unsupported-snapshot.c
index fcc5522..8e1b49a 100644
--- a/release/c/wuffs-unsupported-snapshot.c
+++ b/release/c/wuffs-unsupported-snapshot.c
@@ -22560,7 +22560,7 @@
 };
 
 static const uint8_t
-WUFFS_CRC32__IEEE_X86_SSE42_K6MU[16] WUFFS_BASE__POTENTIALLY_UNUSED = {
+WUFFS_CRC32__IEEE_X86_SSE42_PXMU[16] WUFFS_BASE__POTENTIALLY_UNUSED = {
   65, 6, 113, 219, 1, 0, 0, 0,
   65, 22, 1, 247, 1, 0, 0, 0,
 };
@@ -22981,7 +22981,7 @@
   v_x0 = _mm_and_si128(v_x0, v_x2);
   v_x0 = _mm_clmulepi64_si128(v_x0, v_k, (int32_t)(0));
   v_x0 = _mm_xor_si128(v_x0, v_x1);
-  v_k = _mm_lddqu_si128((const __m128i*)(const void*)(WUFFS_CRC32__IEEE_X86_SSE42_K6MU));
+  v_k = _mm_lddqu_si128((const __m128i*)(const void*)(WUFFS_CRC32__IEEE_X86_SSE42_PXMU));
   v_x1 = _mm_and_si128(v_x0, v_x2);
   v_x1 = _mm_clmulepi64_si128(v_x1, v_k, (int32_t)(16));
   v_x1 = _mm_and_si128(v_x1, v_x2);
diff --git a/std/crc32/common_up_x86_sse42.wuffs b/std/crc32/common_up_x86_sse42.wuffs
index 529d977..da9f298 100644
--- a/std/crc32/common_up_x86_sse42.wuffs
+++ b/std/crc32/common_up_x86_sse42.wuffs
@@ -111,8 +111,11 @@
 	x0 = x0._mm_clmulepi64_si128(b: k, imm8: 0x00)
 	x0 = x0._mm_xor_si128(b: x1)
 
-	// Reduce 64 bits to 32 bits and extract.
-	k = util.make_m128i_slice128(a: IEEE_X86_SSE42_K6MU[.. 16])
+	// Reduce 64 bits to 32 bits (Barrett Reduction) and extract.
+	//
+	// Barrett Reduction is Algorithm 1 (page 14) of Gopal et al., after
+	// adjusting for bit-reflection as per Figure 12 (page 21).
+	k = util.make_m128i_slice128(a: IEEE_X86_SSE42_PXMU[.. 16])
 	x1 = x0._mm_and_si128(b: x2)
 	x1 = x1._mm_clmulepi64_si128(b: k, imm8: 0x10)
 	x1 = x1._mm_and_si128(b: x2)
@@ -135,45 +138,7 @@
 // by script/print-crc32-x86-sse42-magic-numbers.go which is runnable online at
 // https://play.golang.org/p/wH1q6GfhKOE
 //
-// (†) There is a discrepancy in the k6' constant. The Gopal paper gives the
-// least significant byte as 0x40 and this is confirmed by the reproduction
-// script. Indeed, the formula on page 22 ends with a left-shift by 1, so the
-// final hex digit must be even.
-//
-// Nonetheless, we use 0x41 here, the same as the Linux kernel
-// (https://github.com/torvalds/linux/blob/v5.11/arch/x86/crypto/crc32-pclmul_asm.S#L72).
-// It's there in 2021 unchanged since that code was introduced in 2013
-// (https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=78c37d191dd6899d8c219fee597a17d6e3c5d288).
-//
-// The Linux kernel's 0x41 (instead of 0x40) also turns up in e.g. Chromium
-// (https://source.chromium.org/chromium/chromium/src/+/master:third_party/zlib/crc32_simd.c;l=36;drc=9d4ec9349a1bf609eedb917c44c69eb0df9ff6bb)
-// and Go
-// (https://github.com/golang/go/blob/414fa8c35e7c2f65e2c767d6db2f25791e53b5c1/src/hash/crc32/crc32_amd64.s#L143).
-// The 0x41 has surely been copy/pasted to other software too.
-//
-// Code (0x41) and documentation (0x40) disagree. We went with code here, for
-// interoperability. Even though it's mathematical documentation, it's very
-// widely used code.
-//
-// For academic curiosity, there are at least two possible sources of the k6'
-// constant discrepancy.
-//
-// One is a transcription error. The Gopal paper has two consecutive lines:
-//  k6' = ... = 0x1DB710640
-//  μ'  = ... = 0x1F7011641
-// The final hex digits are 640 and 641.
-//
-// Two is that there's also this line a little earlier on the same page:
-//  P(x)'     = 0x1DB710641
-// This P(x)' number differs from k6' only in the final bit: 0x41 vs 0x40. This
-// isn't coincidental. Calculating k6' (in the Galois Field GF(2) space) means
-// dividing (i.e. bitwise XOR-ing) a 33-bit polynomial (x³², with only one 'on'
-// bit) by the 33-bit polynomial P(x)'.
-//
-// In practice, 0x40 vs 0x41 is a harmless difference. The least significant
-// bit doesn't affect the result. The tests pass either way (and fail with 0x3F
-// or 0x42). A more thorough analysis of why the low bit doesn't matter is at
-// https://danlark.org/2021/03/08/how-a-bug-in-the-linux-crc-32-checksum-turned-out-not-to-be-a-bug/
+// The k6' constant from the Gopal paper is unused.
 
 pri const IEEE_X86_SSE42_K1K2 : array[16] base.u8 = [
 	0xD4, 0x2B, 0x44, 0x54, 0x01, 0x00, 0x00, 0x00,  // k1' = 0x1_5444_2BD4
@@ -190,7 +155,7 @@
 	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,  // Unused
 ]
 
-pri const IEEE_X86_SSE42_K6MU : array[16] base.u8 = [
-	0x41, 0x06, 0x71, 0xDB, 0x01, 0x00, 0x00, 0x00,  // k6' = 0x1_DB71_0640 (†)
+pri const IEEE_X86_SSE42_PXMU : array[16] base.u8 = [
+	0x41, 0x06, 0x71, 0xDB, 0x01, 0x00, 0x00, 0x00,  // Px' = 0x1_DB71_0641
 	0x41, 0x16, 0x01, 0xF7, 0x01, 0x00, 0x00, 0x00,  // μ'  = 0x1_F701_1641
 ]