perf: Stage 6 sumcheck micro-optimizations by MatteoMer · Pull Request #85 · MatteoMer/zolt

MatteoMer · 2026-04-17T14:34:32Z

Summary

Lift RaPolynomial tag dispatch out of inner loops: Both RamRaVirtualProver and LookupsRaVirtualProver mapFn bodies now dispatch on the first ra_poly's active tag once via inline else, generating specialized loop bodies per variant. Inner loops use @field(..., @tagName(tag)).getBoundCoeff() instead of the tagged union's switch-based dispatch, eliminating per-access tag checks in release mode.
Add has_nulls branchless fast path: RaPolynomial compressed rounds (Round1/2/3) now track a has_nulls: bool flag, computed during initRound1 and propagated through bind transitions. When false (common for instruction lookups), getBoundCoeff skips optional index checks entirely.

Context: flamegraph analysis of sha256_2048 showed Stage 6 at 7.2% of total prover time. These are diminishing-returns micro-optimizations — the implementation already closely matches Jolt's approach.

Benchmark (sha256_2048, 10 runs, ReleaseFast, Apple Silicon)

Metric	Before (ms)	After (ms)	Delta	Change
Mean	2814	2672	-142	-5.1%
Median	2759	2651	-108	-3.9%
P25	2690	2607	-83	-3.1%
Min	2582	2540	-43	-1.6%
Stdev	188	105

Improvement is larger than the conservative 0.5–1% estimate — the inline else dispatch likely enables LLVM to further optimize the inner loops (inlining, register allocation) once the variant is statically known. Reduced stdev (188→105) suggests fewer branch mispredictions.

Test plan

zig build test — all 539 tests pass
zig build -Doptimize=ReleaseFast — clean build
Before/after benchmark on sha256_2048 (10 runs each)

🤖 Generated with Claude Code

Eliminate per-access tagged union dispatch in the innermost loops of RamRaVirtualProver and LookupsRaVirtualProver by switching once on the first ra_poly's active tag and generating specialized loop bodies via inline else. Also add a has_nulls flag to RaPolynomial compressed rounds so getBoundCoeff can skip optional index checks when all indices are non-null (common for instruction lookups). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

MatteoMer merged commit 11ed864 into main Apr 17, 2026
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Stage 6 sumcheck micro-optimizations#85

perf: Stage 6 sumcheck micro-optimizations#85
MatteoMer merged 1 commit into
mainfrom
perf/stage6-sumcheck-microopts

MatteoMer commented Apr 17, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MatteoMer commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Benchmark (sha256_2048, 10 runs, ReleaseFast, Apple Silicon)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

MatteoMer commented Apr 17, 2026 •

edited

Loading