Skip to content

Add_run_and_prove_recursive_tree#348

Open
YairVaknin-starkware wants to merge 1 commit into
mainfrom
yairv/add_run_and_prove_recursive_tree
Open

Add_run_and_prove_recursive_tree#348
YairVaknin-starkware wants to merge 1 commit into
mainfrom
yairv/add_run_and_prove_recursive_tree

Conversation

@YairVaknin-starkware
Copy link
Copy Markdown
Contributor

@YairVaknin-starkware YairVaknin-starkware commented May 26, 2026

Adds a new crate (crates/stwo_run_and_prove_recursive_tree) that folds an entire applicative recursive proof tree above N leaf STWO proofs into a single root proof in one binary invocation. Used by SHARP's GPS prover when the parent CairoJob's spec is StwoInBinaryRecursiveTree (gated by enable_in_binary_recursive_tree), to skip O(n) per-node bootloader round-trips through the GPS pipeline.

Crate (876 LOC across lib.rs/main.rs/tests.rs):

  • stwo_run_and_prove_recursive_tree() performs layered 2-to-1 reductions: at each layer pairs of children are verified by a no-builtin-simulation simple-bootloader run (Cairo1Executable verifier tasks); odd entries are carried unchanged to the next layer. Outputs the single root proof, its flat program output, its root fact_topologies, and a nested CompositePackedOutput JSON for the on-chain unpacker bootloader.
  • LeafInput carries each leaf's proof_path, fact_topology, outputs, Python-computed per-leaf stats (n_non_recursive_jobs etc.), and a pre-computed PackedOutput so the binary doesn't have to know about whether each leaf is recursive — Python's packed_output_from_data handles that and the binary propagates it verbatim.
  • LayerEntry.fact_topologies mirrors what stwo_run_and_prove would have written (per-task entries from the bootloader hint), kept on the layer entry for the root disk write only — children's per-task fact_topologies in CompositePackedOutput come from the per-layer fact_topologies file written by the bootloader, not from a fabricated trivial value.
  • single_page=true is set on every reduce_pair's bootloader input.

cairo-program-runner-lib (hints/types.rs):

  • Adds Serialize to PackedOutput/CompositePackedOutput with #[serde(tag = "type")] discriminator matching Python's marshmallow PackedOutputSchema (PlainPackedOutput/CompositePackedOutput rename pairs).
  • Adds felt_decimal_vec::serialize so CompositePackedOutput.outputs emits decimal strings (Python's marshmallow.fields.Integer chokes on hex and on numbers exceeding the JSON safe-integer range).

Wiring:

  • Cargo workspace registers the new crate.
  • CI workflow uploads target/release/stwo_run_and_prove_recursive_tree to the GCS bucket stwo_run_and_prove_recursive_tree.

Type

  • feature
  • bugfix
  • dev (no functional changes, no API changes)
  • fmt (formatting, renaming)
  • build
  • docs
  • testing

Description

Breaking changes?

  • yes
  • no

This change is Reviewable

Adds a new crate (crates/stwo_run_and_prove_recursive_tree) that folds
an entire applicative recursive proof tree above N leaf STWO proofs into
a single root proof in one binary invocation. Used by SHARP's GPS
prover when the parent CairoJob's spec is `StwoInBinaryRecursiveTree`
(gated by `enable_in_binary_recursive_tree`), to skip O(n) per-node
bootloader round-trips through the GPS pipeline.

Crate (876 LOC across lib.rs/main.rs/tests.rs):
- `stwo_run_and_prove_recursive_tree()` performs layered 2-to-1
  reductions: at each layer pairs of children are verified by a
  `no-builtin-simulation` simple-bootloader run (Cairo1Executable
  verifier tasks); odd entries are carried unchanged to the next layer.
  Outputs the single root proof, its flat program output, its root
  fact_topologies, and a nested `CompositePackedOutput` JSON for the
  on-chain unpacker bootloader.
- `LeafInput` carries each leaf's proof_path, fact_topology, outputs,
  Python-computed per-leaf stats (`n_non_recursive_jobs` etc.), and a
  pre-computed `PackedOutput` so the binary doesn't have to know about
  whether each leaf is recursive — Python's `packed_output_from_data`
  handles that and the binary propagates it verbatim.
- `LayerEntry.fact_topologies` mirrors what `stwo_run_and_prove` would
  have written (per-task entries from the bootloader hint), kept on the
  layer entry for the root disk write only — children's per-task
  fact_topologies in `CompositePackedOutput` come from the per-layer
  fact_topologies file written by the bootloader, not from a fabricated
  trivial value.
- `single_page=true` is set on every reduce_pair's bootloader input.

cairo-program-runner-lib (`hints/types.rs`):
- Adds `Serialize` to `PackedOutput`/`CompositePackedOutput` with
  `#[serde(tag = "type")]` discriminator matching Python's marshmallow
  `PackedOutputSchema` (`PlainPackedOutput`/`CompositePackedOutput`
  rename pairs).
- Adds `felt_decimal_vec::serialize` so `CompositePackedOutput.outputs`
  emits decimal strings (Python's `marshmallow.fields.Integer` chokes
  on hex and on numbers exceeding the JSON safe-integer range).

Wiring:
- Cargo workspace registers the new crate.
- CI workflow uploads `target/release/stwo_run_and_prove_recursive_tree`
  to the GCS bucket `stwo_run_and_prove_recursive_tree`.
@codecov
Copy link
Copy Markdown

codecov Bot commented May 26, 2026

Codecov Report

❌ Patch coverage is 15.84158% with 255 lines in your changes missing coverage. Please review.
✅ Project coverage is 51.65%. Comparing base (9824175) to head (1650478).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
...rates/stwo_run_and_prove_recursive_tree/src/lib.rs 15.38% 220 Missing ⚠️
...ates/stwo_run_and_prove_recursive_tree/src/main.rs 0.00% 35 Missing ⚠️

❌ Your patch check has failed because the patch coverage (15.84%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #348      +/-   ##
==========================================
- Coverage   53.16%   51.65%   -1.51%     
==========================================
  Files          35       37       +2     
  Lines        5295     5606     +311     
==========================================
+ Hits         2815     2896      +81     
- Misses       2480     2710     +230     
Files with missing lines Coverage Δ
crates/cairo-program-runner-lib/src/hints/types.rs 49.76% <100.00%> (+1.97%) ⬆️
...ates/stwo_run_and_prove_recursive_tree/src/main.rs 0.00% <0.00%> (ø)
...rates/stwo_run_and_prove_recursive_tree/src/lib.rs 15.38% <15.38%> (ø)

... and 3 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@noam-starkware noam-starkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@noam-starkware made 25 comments.
Reviewable status: 0 of 9 files reviewed, 24 unresolved discussions (waiting on noaov1 and YairVaknin-starkware).


a discussion (no related file):
Please add doc strings to functions that don't have one.


crates/stwo_run_and_prove_recursive_tree/src/tests.rs line 1 at r1 (raw file):

use std::path::PathBuf;

You mentioned you're writing more tests, can you elaborate? Both what unit tests you're adding, and are you planning on adding a bigger flow or e2e test?


crates/stwo_run_and_prove_recursive_tree/src/lib.rs line 1 at r1 (raw file):

//! In-binary recursive proof tree builder.

I read online about comment styles and I saw that //! is sometimes used for crate level doc, but seeing as we don't use it anywhere else in the repo, don't you think we should use /// here for conformity?


crates/stwo_run_and_prove_recursive_tree/src/lib.rs line 11 at r1 (raw file):

//! per-leaf pieces.
//!
//! The layered reduction is purely sequential: each layer's two-to-one bootloader call consumes

Please specify that the topology of the proof tree is that of a complete tree.


crates/stwo_run_and_prove_recursive_tree/src/lib.rs line 88 at r1 (raw file):

    /// Hash function the verifier program should use when hashing the verified program.
    /// One of "pedersen", "poseidon", "blake". (Currently `verifier_task_for_child` hardcodes
    /// `"blake"` matching Python's `get_recursive_program_hash_function()`; this field is

Is there a way to add some unit test that ensures that sharp and this lib use the same hash function? Could also be a on the sharp side if easier.

Code quote:

    /// One of "pedersen", "poseidon", "blake". (Currently `verifier_task_for_child` hardcodes
    /// `"blake"` matching Python's `get_recursive_program_hash_function()`; this field is

crates/stwo_run_and_prove_recursive_tree/src/lib.rs line 97 at r1 (raw file):

    /// the full nested Composite tree describing the leaf's own aggregation. The Rust binary
    /// propagates this verbatim as the LayerEntry's `packed_output` so that the resulting
    /// root `PackedOutput` matches what the standard layered flow would have produced.

You refer to this elsewhere as legacy flow, please pick a consistent term.

Code quote:

standard layered flow

crates/stwo_run_and_prove_recursive_tree/src/lib.rs line 105 at r1 (raw file):

    /// For non-recursive leaves these equal the raw fact-topology values; for recursive leaves
    /// Python reads the cumulative totals from `CairoJobReceived` (which already aggregated the
    /// entire sub-tree), so passing them here is the only way to get the correct aggregation.

Delete.

Code quote:

so passing them here is the only way to get the correct aggregation

crates/stwo_run_and_prove_recursive_tree/src/lib.rs line 109 at r1 (raw file):

    pub total_non_recursive_output_size: u64,
    pub total_n_pages: u64,
    pub total_fact_tree_structures_len: u64,

These fields reappear in the RecursiveJobData (along with outputs). Is there a (simple, nice) way to avoid this duplication?

Code quote:

    pub n_non_recursive_jobs: u64,
    pub total_non_recursive_output_size: u64,
    pub total_n_pages: u64,
    pub total_fact_tree_structures_len: u64,

crates/stwo_run_and_prove_recursive_tree/src/lib.rs line 116 at r1 (raw file):

/// that the root entry's `RecursiveJobData` is the correct sum across all leaves underneath it.
#[derive(Debug, Clone, Default, Serialize)]
pub struct RecursiveJobData {

Why doesn't this struct contain the fact topologies dict?

IIUC - you have the data to construct it in the LayerEntry objects (and you get it in LeafEntry for the leaves), and you construct it recursively during the tree building phase. Can't you put it here and keep the same struct as in the main repo?

More IIUC - I know that the dict is mapped according to cairo job ids which you obviously don't have here, but maybe you can create some form of identification based on the tree node?

What is the benefit of having the fact topology in the LayerEntry instead of here?


crates/stwo_run_and_prove_recursive_tree/src/lib.rs line 158 at r1 (raw file):

/// In-memory representation of a single tree node during reduction. At layer 0 these wrap the
/// leaves; at higher layers each entry is the result of folding two children.
struct LayerEntry {

I don't understand this name. If it represents a single tree node, why not call it TreeNode or something similar? When I first skimmed this PR I assumed this struct represented something for a whole layer in the tree...

Code quote:

LayerEntry

crates/stwo_run_and_prove_recursive_tree/src/lib.rs line 169 at r1 (raw file):

    fact_topologies: Vec<FactTopology>,
    outputs: Vec<Felt252>,
    aggregated: RecursiveJobData,

Is this a good name for this field? I feel like it's confusing (especially given our current use of the word aggregator). Why not simplyrecursive_job_data?

Code quote:

aggregated

crates/stwo_run_and_prove_recursive_tree/src/lib.rs line 229 at r1 (raw file):

    info!(
        n_leaves = leaves.len(),
        leaf_train_ids = ?leaves.iter().map(|l| l.train_id).collect::<Vec<_>>(),

leaf.

Code quote:

l

crates/stwo_run_and_prove_recursive_tree/src/lib.rs line 229 at r1 (raw file):

    info!(
        n_leaves = leaves.len(),
        leaf_train_ids = ?leaves.iter().map(|l| l.train_id).collect::<Vec<_>>(),

leaves_train_ids.

Code quote:

leaf_train_ids

crates/stwo_run_and_prove_recursive_tree/src/lib.rs line 236 at r1 (raw file):

    // Dropped (and removed from disk) when this function returns; the only files that outlive
    // the invocation are the four root-layer outputs the caller asked for.
    let scratch = TempDir::new()?;

How about adding the option to save temp files in case of failure for debugging? Obviously a TODO and can be postponed to post-MVP.


crates/stwo_run_and_prove_recursive_tree/src/lib.rs line 269 at r1 (raw file):

        .collect::<Result<Vec<_>, RecursiveTreeError>>()?;

    let mut layer_idx: usize = 0;

Any reason not start indexing at 1 and avoid the addition in the call to reduce_pair?

Code quote:

let mut layer_idx: usize = 0

crates/stwo_run_and_prove_recursive_tree/src/lib.rs line 315 at r1 (raw file):

    let root = current_layer
        .pop()
        .expect("layer.len() == 1 after the loop");

Consider replacing with something more informative: vector should contain a single root entry after tree building process or something like that.

Code quote:

"layer.len() == 1 after the loop"

crates/stwo_run_and_prove_recursive_tree/src/lib.rs line 345 at r1 (raw file):

    let _span = span!(Level::INFO, "reduce_pair", layer_idx, pair_idx).entered();

    let pair_dir = scratch_dir.join(format!("layer_{layer_idx:03}_pair_{pair_idx:03}"));

I know this seems excessive, but I think you should add a check that the number of leaves is below 1k to ensure no collisions.


crates/stwo_run_and_prove_recursive_tree/src/lib.rs line 470 at r1 (raw file):

}

fn read_outputs_file(path: &PathBuf) -> Result<Vec<Felt252>, RecursiveTreeError> {

Add unit tests for this function.


crates/stwo_run_and_prove_recursive_tree/src/lib.rs line 524 at r1 (raw file):

/// JSON's safe-integer range. Used by `RecursiveJobData.outputs` for the (Serialize-only) debug
/// dump from `main.rs`.
mod felt_decimal_vec {

Is there any differene between this function and the one defined in the program runner lib? If not, can you use one shared function (maybe in some util file)?


crates/stwo_run_and_prove_recursive_tree/Cargo.toml line 9 at r1 (raw file):

description = "Fold an entire applicative recursive proof tree above its leaves into a single STWO root proof, in one binary invocation."

# Explicit [[bin]] so the produced binary file name uses underscores; this is what Python's

Nice!


.github/workflows/upload_artifacts_workflow.yml line 6 at r1 (raw file):

  push:
    branches:
      - yairv/add_run_and_prove_recursive_tree

Blocking till reverted to main


crates/cairo-program-runner-lib/src/hints/types.rs line 599 at r1 (raw file):

/// Serializes `Vec<Felt252>` as decimal strings so Python's `marshmallow.fields.Integer` can
/// parse values larger than JSON's safe integer range without precision loss.

Why does the JSON have a valid range? Can you give a more thorough doc here?


crates/cairo-program-runner-lib/src/hints/types.rs line 600 at r1 (raw file):

/// Serializes `Vec<Felt252>` as decimal strings so Python's `marshmallow.fields.Integer` can
/// parse values larger than JSON's safe integer range without precision loss.
mod felt_decimal_vec {

Please add unit test for this (especially for the large values mentioned in the doc).


crates/cairo-program-runner-lib/src/hints/types.rs line 601 at r1 (raw file):

/// parse values larger than JSON's safe integer range without precision loss.
mod felt_decimal_vec {
    use cairo_vm::Felt252;

I think the convention throughout the crate is to add imports at the top level. Any reason to do this here inside the module?


crates/stwo_run_and_prove_recursive_tree/src/main.rs line 118 at r1 (raw file):

#[derive(serde::Deserialize)]
struct LeavesFile {

Struct seems redundant just for its below. Is this necessary for the serde trait or any other reason to keep it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants