Skip to content

perf: unroll VarInt parsing loops#826

Open
anthony-swirldslabs wants to merge 9 commits into
mainfrom
798-unrollVarIntParsingLoops
Open

perf: unroll VarInt parsing loops#826
anthony-swirldslabs wants to merge 9 commits into
mainfrom
798-unrollVarIntParsingLoops

Conversation

@anthony-swirldslabs
Copy link
Copy Markdown
Contributor

@anthony-swirldslabs anthony-swirldslabs commented May 13, 2026

Description:
Updating the varint parsing implementation by:

  • unrolling loops
  • first using an int, and then switching to a long accumulator to reduce the cost of bitwise operations
  • adopting the Google XOR trick where OR is replaced with XOR w/o clearing the sign bits when adding bytes, and only clearing them all at once with one final XOR on the accumulator.
  • adding fast/slow paths, with the fast path bypassing limit checks (which is possible when we parse an array or a buffer where the limit is known upfront)
  • fixing a bug still present in the Google implementation in the current benchmark (as well as in Google's actual code) that happily accepts 10 -1 bytes and interprets them as a varint with value -1, even though this is a malformed representation that must throw an exception.
  • eliminating the Google's "tempPos" local variable, and ensuring all local variables are allocated on the stack just once to avoid extending the local stack frame in the middle of the method execution dynamically.

A unit test is added to verify the correctness of the new algorithm. All the PBJ implementations are updated: ReadableSequentialData , DirectBufferedData, RandomAccessData, ByteArrayBufferedData, and Bytes. The benchmark is updated as well.

Performance testing:

The benchmark was run on a real Linux AMD hardware: https://github.com/hashgraph/pbj/actions/runs/25944398410/job/76269270691

The current PBJ implementation appears to be slower than Google. Please note that previously, this wasn't the case on Linux AMD, as can be seen at #797 (comment) . I'm unsure what caused the discrepancy, but Google implementation is currently consistently faster than the current PBJ implementation. Our new implementation introduced in this fix is even superior to Google:

  • 1 byte - same performance
  • 2 byte - 10% faster
  • longer varints - ~1% faster

The performance improvements are reproducible consistently after multiple re-runs (can be seen at https://github.com/hashgraph/pbj/actions/workflows/performance-PBJ-JMH.yml ).

Benchmark                                       (range)   Mode  Cnt     Score    Error   Units
VarIntByteArrayReadBench.google_zigZagAndLimit        1  thrpt   15  1707.264 ±  0.714  ops/us
VarIntByteArrayReadBench.google_zigZagAndLimit        2  thrpt   15   924.932 ±  1.002  ops/us
VarIntByteArrayReadBench.google_zigZagAndLimit        3  thrpt   15   755.884 ±  9.184  ops/us
VarIntByteArrayReadBench.google_zigZagAndLimit        4  thrpt   15   639.781 ±  0.721  ops/us
VarIntByteArrayReadBench.google_zigZagAndLimit        5  thrpt   15   360.415 ±  0.140  ops/us
VarIntByteArrayReadBench.pbj                          1  thrpt   15  1242.653 ±  2.002  ops/us
VarIntByteArrayReadBench.pbj                          2  thrpt   15   258.054 ±  0.237  ops/us
VarIntByteArrayReadBench.pbj                          3  thrpt   15   242.877 ±  0.095  ops/us
VarIntByteArrayReadBench.pbj                          4  thrpt   15   187.489 ±  0.156  ops/us
VarIntByteArrayReadBench.pbj                          5  thrpt   15   187.103 ± 15.050  ops/us
VarIntByteArrayReadBench.vector_fastXOR               1  thrpt   15  1708.714 ±  1.432  ops/us
VarIntByteArrayReadBench.vector_fastXOR               2  thrpt   15  1025.616 ±  1.081  ops/us
VarIntByteArrayReadBench.vector_fastXOR               3  thrpt   15   782.612 ±  0.535  ops/us
VarIntByteArrayReadBench.vector_fastXOR               4  thrpt   15   687.230 ±  0.584  ops/us
VarIntByteArrayReadBench.vector_fastXOR               5  thrpt   15   365.028 ± 55.450  ops/us

Related issue(s):

Fixes #798

Notes for reviewer:
All tests should pass.

Checklist

  • Documented (Code comments, README, etc.)
  • Tested (unit, integration, etc.)

Signed-off-by: Anthony Petrov <anthony@swirldslabs.com>
@anthony-swirldslabs anthony-swirldslabs self-assigned this May 13, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 13, 2026

JUnit Test Report

   521 files  ±0     521 suites  ±0   27s ⏱️ -8s
 1 519 tests ±0   1 515 ✅ ±0   4 💤 ±0  0 ❌ ±0 
10 407 runs  ±0  10 379 ✅ ±0  28 💤 ±0  0 ❌ ±0 

Results for commit d6156cf. ± Comparison against base commit 4201101.

This pull request removes 6 and adds 6 tests. Note that renamed tests count towards both.
com.hedera.pbj.runtime.ProtoWriterToolsTest ‑ [1] FLOAT, com.hedera.pbj.runtime.ProtoWriterToolsTest$$Lambda/0x00000000a349f730@4687fee7, [0.1, 0.5, 100.0], 12, com.hedera.pbj.runtime.ProtoWriterToolsTest$$Lambda/0x00000000a349f958@4a0fc665
com.hedera.pbj.runtime.ProtoWriterToolsTest ‑ [1] STRING, com.hedera.pbj.runtime.ProtoWriterToolsTest$$Lambda/0x00000000a34a7cf0@9e092b5, [string 1, testing here, testing there], com.hedera.pbj.runtime.ProtoWriterToolsTest$$Lambda/0x00000000a34ac000@52737c1
com.hedera.pbj.runtime.ProtoWriterToolsTest ‑ [2] BYTES, com.hedera.pbj.runtime.ProtoWriterToolsTest$$Lambda/0x00000000a34ac228@62eb918, [010203, ff7f0f, 42da07370bff], com.hedera.pbj.runtime.ProtoWriterToolsTest$$Lambda/0x00000000a34ac450@37e28b20
com.hedera.pbj.runtime.ProtoWriterToolsTest ‑ [2] DOUBLE, com.hedera.pbj.runtime.ProtoWriterToolsTest$$Lambda/0x00000000a349fb80@25d87313, [0.1, 0.5, 100.0, 1.7653472635472653E240], 32, com.hedera.pbj.runtime.ProtoWriterToolsTest$$Lambda/0x00000000a349fda8@3c130cb2
com.hedera.pbj.runtime.ProtoWriterToolsTest ‑ [3] BOOL, com.hedera.pbj.runtime.ProtoWriterToolsTest$$Lambda/0x00000000a34a4000@471d6571, [true, false, false, true, true, true], 6, com.hedera.pbj.runtime.ProtoWriterToolsTest$$Lambda/0x00000000a34a4228@213bd66a
com.hedera.pbj.runtime.ProtoWriterToolsTest ‑ [4] ENUM, com.hedera.pbj.runtime.ProtoWriterToolsTest$$Lambda/0x00000000a34a4450@77a1df4d, [0, 2, 1], 3, com.hedera.pbj.runtime.ProtoWriterToolsTest$$Lambda/0x00000000a34a4678@5bbb0a25
com.hedera.pbj.runtime.ProtoWriterToolsTest ‑ [1] FLOAT, com.hedera.pbj.runtime.ProtoWriterToolsTest$$Lambda/0x00000000ab49f730@5de6c7d7, [0.1, 0.5, 100.0], 12, com.hedera.pbj.runtime.ProtoWriterToolsTest$$Lambda/0x00000000ab49f958@69f55ea
com.hedera.pbj.runtime.ProtoWriterToolsTest ‑ [1] STRING, com.hedera.pbj.runtime.ProtoWriterToolsTest$$Lambda/0x00000000ab4a7cf0@55c78556, [string 1, testing here, testing there], com.hedera.pbj.runtime.ProtoWriterToolsTest$$Lambda/0x00000000ab4ac000@25134e01
com.hedera.pbj.runtime.ProtoWriterToolsTest ‑ [2] BYTES, com.hedera.pbj.runtime.ProtoWriterToolsTest$$Lambda/0x00000000ab4ac228@40e7aea9, [010203, ff7f0f, 42da07370bff], com.hedera.pbj.runtime.ProtoWriterToolsTest$$Lambda/0x00000000ab4ac450@46e38c28
com.hedera.pbj.runtime.ProtoWriterToolsTest ‑ [2] DOUBLE, com.hedera.pbj.runtime.ProtoWriterToolsTest$$Lambda/0x00000000ab49fb80@5692863, [0.1, 0.5, 100.0, 1.7653472635472653E240], 32, com.hedera.pbj.runtime.ProtoWriterToolsTest$$Lambda/0x00000000ab49fda8@77a1df4d
com.hedera.pbj.runtime.ProtoWriterToolsTest ‑ [3] BOOL, com.hedera.pbj.runtime.ProtoWriterToolsTest$$Lambda/0x00000000ab4a4000@244c0fbe, [true, false, false, true, true, true], 6, com.hedera.pbj.runtime.ProtoWriterToolsTest$$Lambda/0x00000000ab4a4228@68aa1164
com.hedera.pbj.runtime.ProtoWriterToolsTest ‑ [4] ENUM, com.hedera.pbj.runtime.ProtoWriterToolsTest$$Lambda/0x00000000ab4a4450@12e13abd, [0, 2, 1], 3, com.hedera.pbj.runtime.ProtoWriterToolsTest$$Lambda/0x00000000ab4a4678@5694f6a0

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 13, 2026

Integration Test Report

    426 files  + 3      426 suites  +3   24m 58s ⏱️ + 9m 11s
115 028 tests +14  115 028 ✅ +14  0 💤 ±0  0 ❌ ±0 
115 272 runs  +16  115 272 ✅ +16  0 💤 ±0  0 ❌ ±0 

Results for commit d6156cf. ± Comparison against base commit 4201101.

This pull request removes 2 and adds 16 tests. Note that renamed tests count towards both.
com.hedera.pbj.integration.test.ParserNeverWrapsTest ‑ [1] com.hedera.pbj.integration.test.ParserNeverWrapsTest$$Lambda/0x0000000071c7ad58@767da802
com.hedera.pbj.integration.test.ParserNeverWrapsTest ‑ [2] com.hedera.pbj.integration.test.ParserNeverWrapsTest$$Lambda/0x0000000071c7afa0@21c48634
com.hedera.pbj.integration.test.ParserNeverWrapsTest ‑ [1] com.hedera.pbj.integration.test.ParserNeverWrapsTest$$Lambda/0x0000000094bb2230@29ef43d7
com.hedera.pbj.integration.test.ParserNeverWrapsTest ‑ [2] com.hedera.pbj.integration.test.ParserNeverWrapsTest$$Lambda/0x0000000094bb2478@77602e0d
com.hedera.pbj.integration.test.grpc.TCPIPInfoProviderTest ‑ testTCPIPInfoProvider()
pbj.integration.tests.pbj.integration.tests.tests.TCPIPReplyTest ‑ [1] NoToStringWrapper{pbj.integration.tests.pbj.integration.tests.TCPIPReply}
pbj.integration.tests.pbj.integration.tests.tests.TCPIPReplyTest ‑ [2] NoToStringWrapper{pbj.integration.tests.pbj.integration.tests.TCPIPReply}
pbj.integration.tests.pbj.integration.tests.tests.TCPIPReplyTest ‑ [3] NoToStringWrapper{pbj.integration.tests.pbj.integration.tests.TCPIPReply}
pbj.integration.tests.pbj.integration.tests.tests.TCPIPReplyTest ‑ [4] NoToStringWrapper{pbj.integration.tests.pbj.integration.tests.TCPIPReply}
pbj.integration.tests.pbj.integration.tests.tests.TCPIPReplyTest ‑ [5] NoToStringWrapper{pbj.integration.tests.pbj.integration.tests.TCPIPReply}
pbj.integration.tests.pbj.integration.tests.tests.TCPIPReplyTest ‑ [6] NoToStringWrapper{pbj.integration.tests.pbj.integration.tests.TCPIPReply}
pbj.integration.tests.pbj.integration.tests.tests.TCPIPReplyTest ‑ [7] NoToStringWrapper{pbj.integration.tests.pbj.integration.tests.TCPIPReply}
…

♻️ This comment has been updated with latest results.

Signed-off-by: Anthony Petrov <anthony@swirldslabs.com>
Signed-off-by: Anthony Petrov <anthony@swirldslabs.com>
Signed-off-by: Anthony Petrov <anthony@swirldslabs.com>
Signed-off-by: Anthony Petrov <anthony@swirldslabs.com>
Signed-off-by: Anthony Petrov <anthony@swirldslabs.com>
Signed-off-by: Anthony Petrov <anthony@swirldslabs.com>
Signed-off-by: Anthony Petrov <anthony@swirldslabs.com>
Signed-off-by: Anthony Petrov <anthony@swirldslabs.com>
@anthony-swirldslabs anthony-swirldslabs marked this pull request as ready for review May 15, 2026 23:56
@anthony-swirldslabs anthony-swirldslabs requested review from a team as code owners May 15, 2026 23:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unroll varint reading loops

2 participants