perf: delegate G2 MSM to generic XYZZ Pippenger by MatteoMer · Pull Request #86 · MatteoMer/zolt

MatteoMer · 2026-04-18T09:02:19Z

Summary

Replace hand-written G2 Pippenger (Jacobian buckets, per-window parallelism, ~339 lines) with a thin wrapper that delegates to the generic MSM(F, Fp2).computeWithPool() (~80 lines of new code)
G2 MSM now gets the same optimizations as G1: XYZZ bucket coordinates (7M+2S vs 7M+4S per mixed add), batch window normalization (Montgomery's trick), and chunk-based parallelism (better cache locality)
Zero-cost type bridging via @ptrCast with comptime layout assertions — G2Point and AffinePoint(Fp2) have identical memory layout
Public API (msmG2, msmG2Bench) unchanged — no caller modifications needed

Test plan

zig build test — all unit tests pass, including the arkworks-validated G2 MSM fixture vectors
Benchmark G2 MSM at various sizes to measure speedup
End-to-end: cargo run --release -p zolt -- prove examples/sha256_2048.elf timing comparison

🤖 Generated with Claude Code

Replace the hand-written G2 Pippenger (Jacobian buckets, per-window parallelism) with a thin wrapper that delegates to the generic MSM(F, Fp2).computeWithPool(). This gives G2 the same optimizations as G1: XYZZ bucket coordinates (7M+2S vs 7M+4S per mixed add), batch window normalization via Montgomery's trick, and chunk-based parallelism for better cache locality. G2Point and AffinePoint(Fp2) share identical memory layout, so the bridging is a zero-cost @ptrCast with comptime layout assertions. 339 → 118 lines, public API (msmG2, msmG2Bench) unchanged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The generic MSM's parallel path previously fell back to sequential Pippenger when n < num_threads*256 (chunks too small). This caused a regression for G2 MSM at sizes 256-2048, where the old hand-written code used per-window parallelism. Add pippengerMSMWindowParallel: each thread processes a subset of windows over all points, using XYZZ buckets + batch normalization. Used when n < num_threads*256; chunk-based parallelism still used for larger inputs where cache locality matters more. Benefits both G1 and G2. G2 parallel MSM at N=1024: 6.9ms → 4.4ms. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

MatteoMer and others added 2 commits April 18, 2026 10:01

MatteoMer merged commit 8d5bf91 into main Apr 18, 2026
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: delegate G2 MSM to generic XYZZ Pippenger#86

perf: delegate G2 MSM to generic XYZZ Pippenger#86
MatteoMer merged 2 commits into
mainfrom
worktree-perf+g2-msm-xyzz-buckets

MatteoMer commented Apr 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MatteoMer commented Apr 18, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant