feat(payment): fire trust events on provably-bad quote bindings (node-side audit) by grumbach · Pull Request #97 · WithAutonomi/ant-node

grumbach · 2026-05-15T06:42:08Z

Why

Prod fleet measurement (24 ant-node v0.11.3 hosts, 3h47m window 2026-05-14 23:15 → 2026-05-15 03:02 UTC): 4 distinct nodes (do-ams3-node-2, do-fra1-node-1, hz-hel1-2, hz-nbg1-3) reported BLAKE3 quote-binding mismatches.

The 2026-05-06 attack on world_trade.JPG: 5 distinct peer-IDs all signed quotes with one of 3 shared private keys (worst case: peer-IDs 5b45e9d7, 5b14976c, 5be5095b share BLAKE3 key 5bd35c6c...). Those 5 captured 5 of 15 close-K slots, forced quorum failure. Today the same operator IP (75.48.86.24) is being dialed 35,592 times across the prod fleet in the same 3h47m window.

Today validate_peer_bindings (verifier.rs:590) and the signature-verify loop (verifier.rs:466) detect the bad behaviour and return Err but neither calls report_trust_event. Result: detected, rejected, then forgotten — the same offender reappears in the next chunk's close-K and the cycle continues.

Per Mick's #05_client-side direction (2026-05-13): node-side quote audit, not client-side trust reporting. Anselme's PRs #114/#90/#77 (client side) were closed accordingly. This PR adds the node-side wiring without introducing a new audit protocol — the audit happens implicitly during payment verification.

What

Three trust-event sites, each with its own weight constant calibrated to the soundness of the attribution:

Site	Weight	Constant	Rationale
BLAKE3 binding mismatch in `validate_peer_bindings`	5.0	`QUOTE_BINDING_MISMATCH_WEIGHT`	Deterministic, non-spoofable. Payer cannot fabricate this against an innocent peer. Crosses `BLOCK_THRESHOLD` in one event so the offender becomes immediately ineligible for new admissions.
ML-DSA signature failure in `verify_evm_payment`	1.0	`QUOTE_SIGNATURE_FAILURE_WEIGHT`	After binding has passed, signature failure proves only that the quote bytes do not verify under `pub_key` — does NOT prove the peer misbehaved. A malicious payer could flip a bit in `quote.price` after taking a valid quote, and the verifier would otherwise penalise the innocent peer at zero attacker cost. Lower weight degrades reputation under sustained patterns without single-event blocking.
Merkle candidate signature failure in `verify_merkle_payment`	5.0	`MERKLE_CANDIDATE_SIGNATURE_FAILURE_WEIGHT`	Self-signed by the candidate node — no payer in between, attribution sound. Same rationale as binding mismatch.

Per-proof dedup: a peer appearing in multiple slots of the same proof produces at most one trust event per proof, capping trust-engine write-lock cost at O(distinct_offenders) regardless of attacker-controlled proof shape.

The "Invalid ML-DSA public key" branch (undecodable bytes) deliberately fires NO trust event. Corrupt bytes cannot be attributed to any real peer; penalising a random peer-ID derived from BLAKE3 of garbage would attack a non-existent identity.

Hooks into existing infrastructure

No new code paths, no new audit protocol, no wire-format changes:

PaymentVerifier already holds p2p_node: RwLock<Option<Arc<P2PNode>>> (verifier.rs:141)
attach_p2p_node() is already called at startup (verifier.rs:270)
P2PNode::report_trust_event(&PeerId, TrustEvent) is async and exists in saorsa-core (src/network.rs:1001)
TrustEvent::ApplicationFailure(f64) exists, weight clamped to MAX_CONSUMER_WEIGHT = 5.0
Trust scores feed Mick's lazy swap-out (saorsa-core PR feat!: require --enable-logging flag to install tracing subscriber #65 feat!: replace binary peer blocking with lazy trust-based swap-out): no immediate eviction, but the next time a better candidate competes for the same routing-table slot the bad peer is replaced.

API change

validate_peer_bindings is now an async fn(&self, ...) -> Result<()> (was static fn). Single caller updated.

The spawn_blocking signature-verify path now collects per-quote (EncodedPeerId, bool) results instead of bailing on first failure. Order preserved; first error returned matches pre-patch behaviour. The async caller iterates and fires one report per failure before returning.

Adversarial review

Spawned a hostile reviewer subagent before push. 3 high-severity findings, all addressed:

Test did not assert multi-offender iteration. Rewrote validate_peer_bindings_does_not_short_circuit_on_first_valid_quote to put a CORRECTLY-bound quote at position 0; if the loop short-circuits on the first OK quote the test fails. Asserts the err names the first mismatched peer-id (proving iteration past position 0).
DoS amplification on multi-offender proofs. Originally a 16-mismatch proof would fire 16 sequential report_trust_event calls (each a brief write-lock on the trust engine). Added per-proof dedup so 16 occurrences of the same offender produce 1 trust event, not 16.
Signature-failure attribution could punish innocents. Originally used the max weight (5.0) for signature failures; a malicious payer could mutate a valid quote in transit and burn an innocent peer's trust at zero cost. Split the weight constant: signature failures use 1.0, binding mismatches keep 5.0.

Other lower-severity findings noted but not in this PR (e.g. failed-proof negative cache to dedup across proofs, splitting log levels for the report). They're additive — can be follow-ups without re-architecting this PR.

Tests

3 new unit tests + existing test_wrong_peer_binding_rejected continues to pass
cargo test --lib — 463/463 pass
cargo fmt --all -- --check — clean
cargo clippy --all-features --all-targets -- -D warnings — clean
cargo clippy --all-features --lib -- -D clippy::panic -D clippy::unwrap_used -D clippy::expect_used — clean (matches CLAUDE.md's strict lint set)

What this does NOT do

No new wire protocol. Nothing changes for payers or for older nodes.
No immediate eviction. We feed the existing lazy-swap-out machinery; the offender stays in the routing table until a better candidate competes for the same slot.
No client-side reporting. Per Mick's direction, this is purely the node side.
Does not address the 35k/h dialing of 75.48.86.24 directly — that requires the ADD_ADDRESS reachability gate (separate PR in saorsa-transport, also in flight).

…-side audit) Wires the existing detection paths in PaymentVerifier to saorsa-core's TrustEvent system so provably-bad quote behaviour feeds Mick's lazy swap-out (saorsa-core PR WithAutonomi#65) instead of being detected, rejected, then forgotten. Why === Prod fleet (24 ant-node v0.11.3 hosts, 3h47m window 2026-05-14 23:15 → 2026-05-15 03:02 UTC): 4 distinct nodes (do-ams3-node-2, do-fra1-node-1, hz-hel1-2, hz-nbg1-3) reported BLAKE3 quote-binding mismatches. The 2026-05-06 attack: 5 distinct peer-IDs all signed quotes with one of 3 shared private keys (worst case: 3 peer-IDs share BLAKE3 key 5bd35c6c...), captured 5 of 15 close-K slots, forced quorum failure. Today validate_peer_bindings (verifier.rs:590) and the signature-verify loop (verifier.rs:466) both detect the bad behaviour and return Err but neither calls report_trust_event, so the same offender reappears in the next chunk's close-K and the cycle continues. Per Mick's #05_client-side direction (2026-05-13): node-side quote audit, not client-side trust reporting. PRs #114/WithAutonomi#90/WithAutonomi#77 (client side) were closed accordingly. This PR adds the node-side wiring without introducing a new audit protocol — the audit happens implicitly during payment verification. What ==== Three trust-event sites, each with its own weight constant: - QUOTE_BINDING_MISMATCH_WEIGHT = 5.0 (max). BLAKE3 binding mismatch is deterministic and non-spoofable — payer cannot fabricate this against an innocent peer. Weight crosses BLOCK_THRESHOLD in one event so the offender becomes immediately ineligible for new admissions. - QUOTE_SIGNATURE_FAILURE_WEIGHT = 1.0 (moderate). After binding has passed, signature failure proves only that the quote bytes do not verify under pub_key — does NOT prove the peer misbehaved. A malicious payer could flip a bit in quote.price after taking a valid quote, and the verifier would otherwise penalise the innocent peer at zero attacker cost. Lower weight degrades reputation under sustained patterns without single-event blocking. - MERKLE_CANDIDATE_SIGNATURE_FAILURE_WEIGHT = 5.0 (max). Self-signed by the candidate node — no payer in between, attribution sound. Per-proof dedup: a peer appearing in multiple slots produces at most one trust event per proof, capping trust-engine write-lock cost at O(distinct_offenders) regardless of attacker-controlled proof shape. The 'Invalid ML-DSA public key' branch (undecodable bytes) deliberately fires NO trust event — corrupt bytes cannot be attributed to any real peer, and penalising a random peer-ID derived from BLAKE3 of garbage would attack a non-existent identity. Adversarial review (3 high-severity findings, all addressed) ============================================================ 1. Test did not assert multi-offender iteration. Rewrote validate_peer_bindings_does_not_short_circuit_on_first_valid_quote to put a CORRECTLY-bound quote at position 0; if the loop short- circuits the test fails. Asserts the err names the first mismatched peer-id (proving iteration past position 0). 2. DoS amplification on multi-offender proofs. Added per-proof dedup so 16 occurrences of the same offender produce 1 trust event, not 16. 3. Signature-failure attribution can punish innocents (payer mutates bit-flipped quote in transit). Split the weight constant: signature failures use 1.0, binding mismatches use 5.0. Tests ===== 3 new tests, 463 lib tests pass, fmt clean, cargo clippy --all-features --all-targets -- -D warnings clean, cargo clippy --all-features --lib -- -D clippy::panic -D clippy::unwrap_used -D clippy::expect_used -D warnings clean. Existing test_wrong_peer_binding_rejected continues to pass — same Err semantics, trust events fire on the way to the same error. Tooling notes ============= - validate_peer_bindings is now async on &self (was static fn). Single caller updated. - Signature-verify spawn_blocking now collects (peer_id, valid) results instead of bailing on first failure. Order preserved; first error returned matches pre-patch behaviour. - Same shape applied to verify_merkle_payment's candidate-signature loop.

Copilot

Pull request overview

Wires three node-side trust-event reports into PaymentVerifier so peers that ship provably-bad quotes (BLAKE3 binding mismatch, ML-DSA signature failure on a quote, or merkle-candidate signature failure) feed saorsa-core's lazy swap-out machinery instead of being silently rejected. Adds per-proof dedup, makes validate_peer_bindings async on &self, refactors the spawn_blocking signature path to collect per-quote results so the loop no longer short-circuits, and adds three unit tests around the binding path.

Changes:

New weight constants (QUOTE_BINDING_MISMATCH_WEIGHT = 5.0, QUOTE_SIGNATURE_FAILURE_WEIGHT = 1.0, MERKLE_CANDIDATE_SIGNATURE_FAILURE_WEIGHT = 5.0) and a report_peer_failure helper that calls P2PNode::report_trust_event when attached.
validate_peer_bindings becomes async fn(&self, …), walks the full proof, dedups offenders, and reports binding mismatches; the signature-verify path collects (EncodedPeerId, bool) and reports failures with dedup; verify_merkle_payment does the same for candidate-signature failures.
Three new #[tokio::test]s covering the no-short-circuit shape, no-panic when P2PNode is unattached, and the all-valid pass-through case.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

            if expected_peer_id.as_bytes() != encoded_peer_id.as_bytes() {
                let expected_hex = expected_peer_id.to_hex();
                let actual_hex = hex::encode(encoded_peer_id.as_bytes());
-                return Err(Error::Payment(format!(
-                    "Quote pub_key does not belong to claimed peer {encoded_peer_id:?}: \
-                     BLAKE3(pub_key) = {expected_hex}, peer_id = {actual_hex}"
-                )));
+                // Provably bad behaviour: penalise the peer who claimed
+                // this binding. Use the EncodedPeerId from the proof —
+                // that is the identity routing-table lookups will hit.
+                let offender_bytes = *encoded_peer_id.as_bytes();
+                if reported.insert(offender_bytes) {
+                    let offender = PeerId::from_bytes(offender_bytes);
+                    self.report_peer_failure(
+                        &offender,
+                        QUOTE_BINDING_MISMATCH_WEIGHT,
+                        "BLAKE3 quote-binding mismatch",
+                    )
+                    .await;
+                }
+                if first_error.is_none() {
+                    first_error = Some(Error::Payment(format!(
+                        "Quote pub_key does not belong to claimed peer {encoded_peer_id:?}: \
+                         BLAKE3(pub_key) = {expected_hex}, peer_id = {actual_hex}"
+                    )));
+                }
            }


+        // produces at most one trust event. See validate_peer_bindings
+        // for the same rationale (cap write-lock cost on the trust
+        // engine regardless of attacker-controlled proof shape).
+        //
+        // Signature failures use a lower weight than binding mismatches
+        // (see [`QUOTE_SIGNATURE_FAILURE_WEIGHT`] doc): a malicious
+        // payer can flip a bit in `quote.price` after taking a valid
+        // quote from an honest peer, and the verifier would otherwise
+        // penalise the innocent peer at zero attacker cost.
+        let mut sig_error: Option<Error> = None;
+        let mut sig_reported: std::collections::HashSet<[u8; 32]> =
+            std::collections::HashSet::new();
+        for (encoded_peer_id, valid) in sig_results {
+            if !valid {
+                let offender_bytes = *encoded_peer_id.as_bytes();
+                if sig_reported.insert(offender_bytes) {
+                    let offender = PeerId::from_bytes(offender_bytes);
+                    self.report_peer_failure(
+                        &offender,
+                        QUOTE_SIGNATURE_FAILURE_WEIGHT,
+                        "ML-DSA-65 signature verification failed",
+                    )
+                    .await;
+                }
+                if sig_error.is_none() {
+                    sig_error = Some(Error::Payment(format!(
+                        "Quote ML-DSA-65 signature verification failed for peer {encoded_peer_id:?}"
+                    )));
                }


            if !crate::payment::verify_merkle_candidate_signature(candidate) {
-                return Err(Error::Payment(format!(
-                    "Invalid ML-DSA-65 signature on merkle candidate node (reward: {})",
-                    candidate.reward_address
-                )));
+                if let Ok(offender) = peer_id_from_public_key_bytes(&candidate.pub_key) {
+                    if merkle_reported.insert(*offender.as_bytes()) {
+                        self.report_peer_failure(
+                            &offender,
+                            MERKLE_CANDIDATE_SIGNATURE_FAILURE_WEIGHT,
+                            "merkle candidate ML-DSA-65 signature verification failed",
+                        )
+                        .await;
+                    }
+                }


grumbach · 2026-05-15T09:37:56Z

Closing: adversarial review found a design flaw. validate_peer_bindings attributes blame to encoded_peer_id, which is attacker-controlled proof bytes — a junk proof naming any honest peer + a random pub_key fires ApplicationFailure(5.0) against that peer at zero cost (no payment, no victim signature), before payment verification. This is a remote trust-poisoning primitive (downscore any peer ID below the swap threshold). Restarting from a new design: attribute only to bindings the quoted peer itself signed, and fire trust events only after on-chain payment verification.

Copilot AI review requested due to automatic review settings May 15, 2026 06:42

Copilot started reviewing on behalf of grumbach May 15, 2026 06:42 View session

Copilot AI reviewed May 15, 2026

View reviewed changes

grumbach marked this pull request as draft May 15, 2026 09:35

grumbach closed this May 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(payment): fire trust events on provably-bad quote bindings (node-side audit)#97

feat(payment): fire trust events on provably-bad quote bindings (node-side audit)#97
grumbach wants to merge 1 commit into
WithAutonomi:mainfrom
grumbach:grumbach/node-side-quote-audit

grumbach commented May 15, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

grumbach commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

grumbach commented May 15, 2026

Why

What

Hooks into existing infrastructure

API change

Adversarial review

Tests

What this does NOT do

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

grumbach commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants