feat: Add SP1 and Zisk kernel proving backends by samuelburnham · Pull Request #411 · argumentcomputer/ix

samuelburnham · 2026-05-15T22:49:59Z

This pull request adds RISC-V zkVM proving backends for the Rust kernel — Zisk as the primary target (single process, single GPU), SP1 as an experimental alternative — together with the host pipeline that makes whole-environment Ix proving practical: RAM-aware sharding, tree aggregation with in-circuit assumption discharge, and cross-run proof reuse. Guest runs are validated by the shard's committed failures publics word; cycle counts are deterministic ziskemu step counts.

Key Results

Whole-environment proving works end to end on a single GPU: profile → shard plan (.ixes manifest) → leaf proofs → tree-aligned aggregation to a single proof with no dangling assumptions
Shard count is auto-sized from machine RAM and profiled heartbeats — no manual budget tuning
Cross-run reuse: store-aware planning re-proves only novel shards; a shard whose certified targets are already covered by stored proofs is skipped entirely
Guest checks an explicit check-list against a closure-injected sub-env (never the whole env); 112-byte shard publics bind the env hash, range/count, failures word, and assumption roots
ByteArray.utf8DecodeChar?_utf8EncodeChar_append — the most expensive proof in Init — provable via the NatSuccMode::Stuck cache
Init is provable in principle in 288 shards on a 256GB RAM machine, ~0.35 constants per second out of ~52k total constants = ~37-45 hours on a single GPU.

Move Rust sources from src/ into per-crate trees under crates/ (aiur, common, compile, ffi, ixon, kernel) and convert the root Cargo.toml into a workspace manifest with shared dependencies, lints, and profiles.

Fix CI Fix licenses Updates Fixups

The `#[quickcheck]` attribute macro lives in `quickcheck_macros`, not `quickcheck`. Without this import the ixon test target fails to compile.

Re-home jcb's monolithic `crate::ix::{profile,shard}` into `crates/kernel` (ProfileSink must be reachable from KEnv) and rewrite imports to the workspace convention (ix_common::address, ixon::merkle, crate::{profile,shard}, ix_kernel::{profile,shard} from the FFI). No logic changes to the ported code. ix-kernel builds; all 12 shard partitioner tests + the profile-sink delta-edge test pass. (Pre-existing, unrelated: ix-ffi fails to build on kernel-riscv too — private IOBuffer fields in aiur/protocol.rs from the workspace split.)

…olding Replace the static range/byte partition heuristics with the offline profiler/partitioner plan as the primary sharding path: * --shard-plan <manifest.ixes>: prove a `.ixes` partition (from `ix profile` + `ix shard`, i.e. real per-block heartbeats balanced + cross-shard delta-ingress minimized). Each manifest shard → one closure-injected leaf proof; leaves tree-fold into one aggregate whose committed subject is bound to the env. Maps manifest block addresses to work items by ingress-block (a Muts work item's `primary` is a member projection, not the block addr), validates full coverage, and composes with --execute and --only-shard. * New `ix-kernel` example `shard_plan`: profile a .ixe + partition → .ixes, out of circuit, without the Lean/FFI layer (handy until the pre-existing ix-ffi build break is fixed). Validated end-to-end in --execute on nataddcomm: 34 work items → 4 balanced shards (heartbeats 78–120), full coverage, 0 failures. Removed as superseded by the planner: * --topo + topo_order_items (the graph cut subsumes within-env scheduling) * --closure-stats / closure_report (manifest reports exact cross_ingress) * --overlap-stats / overlap_report (Design-C measurement scaffolding) The host `shard` partitioner uses rayon + std::time, so the module is gated to non-riscv targets (it runs in the host planner, never the guest); `profile` stays available everywhere since KEnv holds a ProfileSink (always None on the guest). --shard-consts/--shard-bytes remain as no-profile fallbacks.

Add `partition_for_cycle_cap(profile, max_cycles, cycles_per_heartbeat, eps)`: pick the smallest N whose heaviest shard fits a Zisk guest-cycle budget, then partition. Converts the cap to a heartbeat cap (cycles ÷ cycles/hb), estimates N from total load, and grows N proportionally to any overshoot — re-partitioning each round (cheap) to converge on the minimal feasible N. Detects the atomic-block floor: when the largest mutual block alone exceeds the cap, flags `infeasible_atomic_floor` (no N can fit it; split upstream or raise the cap). The cap is the leaf prover's trace size, which sets peak prover RAM; doc gives the conversion from a host-RAM budget (peak_GB ≈ 45 + 32·cycles_B, measured on this prover) so users think in RAM, pass cycles, and get N. Wire it into the `shard_plan` example: `--max-cycles C` (alternative to `--shards N`), `--cycles-per-heartbeat K` (default 208k, mergesort-calibrated; recalibrate per env via one --execute). Verified on mergesort: 4.5e9 → N=2 (heaviest 3.9B cyc), 2e9 → N=5 (1.76B cyc); largest atomic block 2553 hb.

…→215k) Measured whole-env Zisk cycles ÷ profiler heartbeats across 12 envs: - large shardable envs (>20k hb) cluster 194-208k whole-env - per-shard ratio runs ~5-8% higher (a shard memoizes less than the whole env); mergesort's heaviest shard is ~216k - tiny arithmetic envs ~130-160k; heavy-def-eq envs (array/string-assoc) ~258k The budget needs the per-shard heaviest ratio (so it never under-predicts a leaf's cycles → never OOMs). Set the default to 215k (matches the measured mergesort heaviest shard; predicts 4.03B vs 4.042B actual). Genuinely workload-dependent, so the doc + --cycles-per-heartbeat recalibration guidance stand; the cap's RAM headroom absorbs the residual. mergesort still → N=2.

Add cycle_cap_for_ram(ram_gb): invert the measured single-leaf prover model (peak_GB ≈ 45 + 32·cycles_B) at ~78% of RAM to a per-shard cycle cap; 0 if the box can't hold the ~45 GB prover base. Compose with the existing partition_for_cycle_cap so RAM → cap → N from total heartbeats. shard_plan example: default mode (no flags) reads /proc/meminfo and sizes N automatically — the user just passes the .ixe. --shards / --max-cycles remain as explicit overrides; --ram-gb for what-if. Verified: 250 GB → N=2 (mergesort), 64 GB → N=47 + atomic-floor warning (its biggest block won't fit), 40 GB refused.

The recursive min-cut partitioner now returns its bisection tree (AggNode: leaves are shard ids, internal nodes are min-cut splits) via partition_with_tree; rec_bisect builds it as it recurses, so the tree mirrors the partition exactly. agg_plan lowers the binary tree to an arity-bounded post-order fold plan (FoldOp::Leaf/Agg), collapsing whole subtrees so each agg call folds up to `arity` children — agg-call count stays ~N/(arity-1), same as a flat fold, while children remain subtree-aligned and in shard order (keeping the subject merkle-fold left-associative). The tree is serialized as an optional trailing .ixes section: pre-tree manifests decode to tree=None and provers fall back to a flat fold. partition_for_cycle_cap and shard_esp populate it; the shard_plan example writes it (omitted for --naive baselines). Folding along this tree lets aggregation discharge each cross-shard assumption at the lowest common ancestor of the shards that share it, where the assumption sets are smallest.

The agg guest gains a mode word. Mode 0 is the legacy O(1) merkle_join fold of opaque subject/assumption roots (used by the flat fallback, range, and reuse paths — unchanged semantics). Mode 1 implements set discharge: the host supplies each child's subject/assumption preimage address lists as untrusted witness; the guest re-derives every list's canonical merkle root and asserts it equals the child's committed root, then commits canonical roots over union(subjects) and union(assumptions) \ union(subjects). Assumptions proven by any sibling are discharged in-circuit at the lowest level of the tree where producer and consumer meet; when the whole env folds, the final assumptions root collapses to the zero sentinel and the aggregate claim is unconditional. Mode-1 outputs are canonical set commitments, so agg-of-agg witness verification is uniform at every level, and the mode is committed in field_b. The shard-plan host path folds along the manifest's bisection tree (pruned to the shards actually proven), executes the arity-collapsed plan, supplies the discharge witness per agg call, mirrors the union/discharge host-side, binds the root subject to the canonical root over the covered union, and binds the root assumptions to the mirror — reporting closure ("unconditional") or the outstanding count. Each leaf additionally checks the host's assumption mirror against the guest's committed root at prove time, so a rule divergence fails fast rather than as a witness panic deep in the agg. Measured on mergesort (8 shards, identical leaves, warm GPU): discharge adds 2.5s to the single arity-16 agg call (19.0s -> 21.5s, +0.9% total run) and resolves all 2439 cross-shard assumptions to zero. Also: the Assembly executor is now the default (--emulator opts out). It is ~4-6x faster at trace generation and a prerequisite for the hints stream. The historical multi-program breakage (second setup() clobbers the first program's ASM ROM histogram -> rom.rs:178 panic) is fixed by ordering: the agg program is set up only after the leaf loop, never before; the Emulator keeps the upfront setup. Validated end to end on single-leaf, balanced (4-leaf), unbalanced mixed-vk (3-leaf), and 8-shard runs.

Simplifications, no behavior change (validated: the 3-shard discharge prove reproduces the same leaf steps, discharge counts, and final subject root as before): - Drop the flat mode-0 fallback fold from run_shard_plan. It was reachable only via manifests written before the tree section existed, which regenerate in seconds; aggregation now requires the bisection tree and says so. One fold path instead of two. - Move tree pruning into the kernel as AggNode::prune (where the type lives), with the unit test it was missing. - Drop the leaf_subjects vector (redundant with leaf_subject_sets) and the duplicate cover->Address conversion in the leaf loop. - Extract discharge() — the host mirror of the agg guest's union/resolve — out of the fold loop. - Name build_inputs' return triple (LeafInputs) and tighten build_plan's frontier expansion. - Zero compiler and clippy warnings across ix-kernel and zisk-host (was 12): unused mut, documented dead-code allow on the publics layout field, redundant to_string in format args, indexed loop, then -> then_some. Audited and kept: every CLI flag is read; subject_of_cover, fold_subjects, and tree_partition remain in use by the only-const, range, and reuse paths; the agg guest's two-mode structure is already minimal.

…euse path --store-dir now composes with --shard-plan (and requires it). A manifest shard whose certified targets are all covered by stored proofs is not re-proven: the covering stored proofs are folded into the aggregate as extra children, where the agg guest verifies them (vk-pinned, witness-root-checked) and discharges assumptions against their subjects — reuse by verification, never by trust. Partially-covered shards are re-proven whole (the shard is the proof unit). Freshly proven shards are banked per-shard (proofs/<n>.{proof,cover,asm}) as they complete, so an interrupted run resumes where it left off; .asm carries the assumption preimages mode-1 discharge needs, and the proofs directory is the store's single source of truth (proven.bin is gone). The fold is one op-list executor: novel leaves fold along the pruned bisection tree as before; stored proofs join as extra slots; a synthesized tail fold reduces the forest to one verified root, with an agg-of-1 when a lone stored proof would otherwise go unverified. Mode-1 commitments are canonical sets, so fold shape never changes the final claim — validated: fresh-store, full-reuse (0 leaves proven), and mixed (novel tree + stored tail) runs all commit the identical subject root and discharge to an unconditional claim. The legacy standalone reuse path (mode-0 join fold, naive count/byte chunking, proven.bin) is deleted, along with its private flags (--closure-inject, --limit-items) and now-orphaned helpers (fold_subjects, plan_chunks_by_closure, addr_bytes, load_store, save_store). --require-closed derives the covered set from the proof index. Net -392 lines; zero compiler/clippy warnings.

…reuse The planner (shard_plan --store-dir) now diffs the env against the proof store before profiling: work items whose targets are all store-covered are skipped entirely — not typechecked (planning gets faster), not partitioned, and their blocks excluded from the hypergraph (a novel→covered delta edge is an assumption discharged at aggregation, not a cut to minimize). The manifest covers only novel work. The host accepts such manifests: grouping requires every work item to be either manifest-assigned or store-covered, and the covering-proof selection generalizes from "reused manifest shards" to "every needed target the novel shards won't certify" — subsuming reused shards and unmanifested covered work. A fixpoint additionally folds stored proofs that cover the assumptions of already-included stored proofs, so a closed store yields an unconditional claim even cross-env. This fixes the cross-env conservatism where the min-cut objective pulled already-proven library constants into novel shards (clustering dependents with dependencies) and one novel constant forced a whole mixed shard to re-prove. Validated: with nataddcomm proven into a store, planning natmulcomm skips 34 of 47 work items, proves 2 novel shards (13 items), folds the 3 stored proofs (verified in-circuit, assumption-closure fixpoint), and discharges to an unconditional claim — 72% of the work reused across envs. Same-env full reuse regression unchanged (identical subject root).

The open-assumption filter scanned stored×subjects per assumption per round — O(assumptions × proofs × subjects), seconds-to-minutes of host CPU at init-store scale (~206 proofs × 51k subjects). One HashSet of included-proof subjects per round makes it linear and reads better. Re-validated: full-reuse fold of the cross-env store commits the identical subject root, outstanding 0, unconditional.

…_utf8EncodeChar_append`

samuelburnham force-pushed the kernel-riscv branch 2 times, most recently from d2ea3d9 to 554d76d Compare June 12, 2026 15:39

samuelburnham added 16 commits June 12, 2026 15:46

refactor: Convert to Cargo workspace with subcrates

d8af326

Move Rust sources from src/ into per-crate trees under crates/ (aiur, common, compile, ffi, ixon, kernel) and convert the root Cargo.toml into a workspace manifest with shared dependencies, lints, and profiles.

feat: Add SP1 and Zisk kernel proving backends

54b19bc

Fix CI Fix licenses Updates Fixups

fix: Import quickcheck_macros::quickcheck in ixon merkle tests

880d5de

The `#[quickcheck]` attribute macro lives in `quickcheck_macros`, not `quickcheck`. Without this import the ixon test target fails to compile.

Updates

2d45433

fix: Add NatSuccMode::Stuck cache to prove `ByteArray.utf8DecodeChar?…

735d07c

…_utf8EncodeChar_append`

samuelburnham force-pushed the kernel-riscv branch from 554d76d to 735d07c Compare June 12, 2026 15:50

samuelburnham mentioned this pull request Jun 12, 2026

perf(rust-kernel): Intern-assigned uids, symbolic Nat offsets, environment-machine WHNF reducer #442

Open

samuelburnham added 2 commits June 12, 2026 16:29

chore: apply cargo fmt

bf449db

chore: fix clippy warnings in shard examples

017b77c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add SP1 and Zisk kernel proving backends#411

feat: Add SP1 and Zisk kernel proving backends#411
samuelburnham wants to merge 18 commits into
mainfrom
kernel-riscv

samuelburnham commented May 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

samuelburnham commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Key Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

samuelburnham commented May 15, 2026 •

edited

Loading