)]}'
{
  "commit": "60b607be5b2d8934386cae2d1455625a788f1be3",
  "tree": "bc2bae56e98f417b9551482d1b3fc8195e972f1b",
  "parents": [
    "7b40ebf9468a9003c51dc11852ef300d9b9075d3"
  ],
  "author": {
    "name": "Vitaly Goldshteyn",
    "email": "goldvitaly@google.com",
    "time": "Wed Dec 31 00:51:16 2025 -0800"
  },
  "committer": {
    "name": "Copybara-Service",
    "email": "copybara-worker@google.com",
    "time": "Wed Dec 31 00:51:54 2025 -0800"
  },
  "message": "`CRC32` version of `CombineContiguous` for length \u003c\u003d 32.\n\nFor length in [17, 32] we compute two chain of dependent CRC32  operations to have good entropy in the resulting two 32 bit numbers.\n1. x :\u003d CRC32(CRC32(state, A), D)\n2. y :\u003d CRC32(CRC32(bswap(state), C), B)\n\nOn ARM:\n  CRC32 has 2 cycles latency and throughput equal to 1.\n  Computations will be pipelined without any wait.\nOn x86:\n  CRC32 has 3 cycles latency and throughput equal to 1.\n  There will be 1 extra cycle wait, but we can do `cmp` in parallel.\n\nAt the end we multiply (mul - x) * (y - mul). mul is added to fill upper 32 bits of CRC result with good entropy bits. `mul \u003d rotr(kMul, len)`\n\nWe also mixing length differently:\n1. `state + 8 * len` (`lea` instruction), later one or two CRC shuffle these bits well into low 32 bit.\n2. `rotr(kMul, len)` is used for filling high 32 bits before multiplication in `Mix`. This avoid reading from `kStaticRandomData`.\n\nFor smaller strings we try to extremely minimize binary size and register pressure.\nCRC instruction fused with memory read is used. llvm-mca reporting 1 cycle smaller latency compared to separate `mov` + `crc`.\n\nASM analysis https://godbolt.org/z/e1xrKzhdc:\n1. 100+ bytes binary size saving (per inline instance)\n2. 25+ instruction saving\n3. 2 registers are not used (r8 and r9).\n\nLatency in isolation without accounting comparison are controversial.\n1. latency for 8 bytes in isolation is 1 cycle better: https://godbolt.org/z/zc39eM3K9\n2. latency for 1-3 bytes in isolation is 2 cycles better: https://godbolt.org/z/qMKfbv438\n3. latency for 16 bytes in isolation is 3 cycles worse: https://godbolt.org/z/vcqr8oGv3\n4. latency for 32 bytes in isolation is 5 cycles worse:\nhttps://godbolt.org/z/nEPP5jP58\n\nPiperOrigin-RevId: 850659551\nChange-Id: I02a2434f2d98473b099c171ef1c56adffa821c60\n",
  "tree_diff": [
    {
      "type": "modify",
      "old_id": "02df7faacdba2bb84a78ebbf17e572b2e0bd7ce2",
      "old_mode": 33188,
      "old_path": "absl/hash/internal/hash.h",
      "new_id": "37bd39d60a06d67c934c355d60c3abb5d4608a22",
      "new_mode": 33188,
      "new_path": "absl/hash/internal/hash.h"
    }
  ]
}