Skip to content

diag(governance): dev-node battery + isCoherent mismatch dump + hash-mismatch analysis#876

Open
Shitikyan wants to merge 8 commits into
stabilisationfrom
diag/governance-hash-mismatch-v2
Open

diag(governance): dev-node battery + isCoherent mismatch dump + hash-mismatch analysis#876
Shitikyan wants to merge 8 commits into
stabilisationfrom
diag/governance-hash-mismatch-v2

Conversation

@Shitikyan
Copy link
Copy Markdown
Contributor

@Shitikyan Shitikyan commented May 29, 2026

Summary

Three diagnostic pieces for the governance hash mismatch surfaced on dev.node2 while exercising DemosTransactions.proposeNetworkUpgrade end-to-end:

  • scripts/dev-node-battery.ts — single-command smoke test against any deployed RPC: native pay → stake → validators list → governance propose → vote → unstake (arm) → final state. Per-stage tx polling via getTransactionStatus nodeCall; self-contained markdown report writer.
  • test-reports/dev-node-battery-FINAL.md — clean run captured 2026-05-28T08:55Z against dev.node2.demos.sh:53552 (v0.9.8 / a0957941, dirty, osDenomination active). 7/10 stages green; native flow proven.
  • test-reports/governance-hash-mismatch-analysis.md — root-cause write-up. Two serializer asymmetries documented (node doesn't walk gcr_edits[], node rebuilds transaction_fee without spread); pure local round-trip from the SDK's post-sign canonical shape matches bytes for both PAY and PROPOSE, so the dev.node2-specific divergence is not yet pinned. The deployed binary is dirty=true and we have no log access — hence the debug log below.
  • src/libs/blockchain/transaction.ts — sibling of PR debug(validation): emit raw gcr_edits dump on mismatch + version canary (DEBUG-ONLY) #870's GCREdit-mismatch dump. When isCoherent diverges, emit the tx type, both hashes, and the serialized bytes so the next dev.node2 repro pinpoints the diverging field from logs alone. Pure logging, strip after root cause is found.

Why this is worth landing as diagnostics, not a fix

Without log access to dev.node2 + the binary being dirty=true, a speculative serializer fix risks chasing the wrong asymmetry. The combination of (a) battery script, (b) isCoherent log dump, and (c) analysis report gives the next on-call enough to find the diverging byte in one re-deploy cycle.

The SDK-side coverage gap that let this slip through is closed in a separate test PR: kynesyslabs/sdks#90 adds networkUpgrade + networkUpgradeVote fixtures to roundTripHash.test.ts. That PR is byte-equality-only and would have failed CI on any future SDK serializer drift for governance content.

Repro

# stress-test-mnemonic must be present + funded on the target RPC
RPC=http://dev.node2.demos.sh:53552 bunx tsx scripts/dev-node-battery.ts

Stages 4 + 5 fail with [Tx Validation] [SIGNATURE ERROR] Transaction hash mismatch. Native pay / stake / unstake all green.

Test plan

  • Battery runs cleanly against dev.node2 — 7/10 stages green, full report in test-reports/dev-node-battery-FINAL.md
  • src/libs/blockchain/transaction.ts change is debug-log only — no behaviour change, mirrors the pattern shipped in PR debug(validation): emit raw gcr_edits dump on mismatch + version canary (DEBUG-ONLY) #870
  • After this lands on the deployed test environment: re-run battery and capture the new [TX] isCoherent mismatch dump.serialized log line from the failing propose tx, attach to a follow-up

Companion PR

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added comprehensive battery test script for validating node transaction processing across multiple stages.
  • Bug Fixes

    • Enhanced transaction coherence validation with detailed diagnostic logging for hash mismatches.
  • Documentation

    • Added governance transaction serialization mismatch analysis with root causes and resolution guidance.
    • Added battery test report demonstrating transaction validation results.

Review Change Stack

Shitikyan and others added 4 commits May 23, 2026 14:36
Two minimal changes that together let testing/scripts/run-scenario.sh
and the testenv:* suites run end-to-end on a fresh local devnet:

- .dockerignore re-includes testing/loadgen/ so the demos-devnet-node
  image ships testing/loadgen/src/main.ts. The parent testing/ exclude
  was hiding it, breaking docker-compose.perf.yml's loadgen service
  with "Module not found".

- data/genesis.json funds the 4 docker-compose devnet identities (the
  addresses derived from testing/devnet/identities/node{1-4}.identity)
  so loadgen scenarios can sign + broadcast native txs without hitting
  "Insufficient balance: required N, available 0".

Verified locally: testenv:doctor 4/4 healthy, sanity scenarios ok=true,
consensus_tx_inclusion confirms txs past block 18.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…mismatch analysis

Adds three pieces of diagnostic infrastructure for the governance hash
mismatch surfaced on dev.node2 while exercising
`DemosTransactions.proposeNetworkUpgrade` end-to-end:

1. `scripts/dev-node-battery.ts` — single-command smoke test against
   any deployed RPC: native pay → stake → validators list → governance
   propose → vote → unstake (arm) → final state, with per-stage tx
   polling via `getTransactionStatus` nodeCall and a self-contained
   markdown report writer. Stages 1/3/6 confirm cleanly on dev.node2;
   stages 4/5 fail with `[Tx Validation] [SIGNATURE ERROR] Transaction
   hash mismatch` — the failure that drove this PR.

2. `test-reports/dev-node-battery-FINAL.md` — clean run captured
   2026-05-28T08:55Z against dev.node2 (node v0.9.8 / a095794, dirty,
   osDenomination active). 7/10 stages pass; native flow proven.

3. `test-reports/governance-hash-mismatch-analysis.md` — root-cause
   write-up: the node and SDK serializers (`src/forks/serializerGate.ts`
   vs `@kynesyslabs/demosdk/denomination/serializerGate.js`) differ
   in two places — node doesn't walk `gcr_edits[]`, and node rebuilds
   `transaction_fee` in fixed key order instead of spreading. Local
   round-trip from the SDK's post-sign canonical shape produces
   matching bytes for both PAY and PROPOSE, so the dev.node2-specific
   divergence is not yet pinned — the deployed binary is `dirty=true`
   and we have no log access. The same report documents why every
   existing governance test (10 files) misses this: they all cut the
   wire at the SDK-builder→node-validator boundary. SDK PR
   kynesyslabs/sdks#90 closes the round-trip coverage gap from the
   SDK side.

Companion debug log in `Transaction.isCoherent` — sibling to PR #870's
GCREdit-mismatch dump. When the full content hash diverges, emit the
tx type, both hashes, and the bytes the node hashed so the next
dev.node2 repro can pinpoint the diverging byte from logs alone.
Pure logging — no consensus/validation behaviour change. Strip after
root cause is found.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@qodo-code-review
Copy link
Copy Markdown
Contributor

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 29, 2026

Warning

Review limit reached

@Shitikyan, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 48 minutes and 7 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1f331aa0-27c9-4a88-beb4-bc775cd0d93f

📥 Commits

Reviewing files that changed from the base of the PR and between f961fcd and 49dd4e7.

📒 Files selected for processing (1)
  • scripts/dev-node-battery.ts

Walkthrough

This PR adds comprehensive transaction hash mismatch diagnosis and end-to-end testing infrastructure. It enhances Transaction.isCoherent with detailed mismatch logging, creates a dev-node battery test script that exercises 8 stages of node functionality, documents root-cause analysis of governance transaction serialization divergences, and captures test execution results.

Changes

Transaction Hash Mismatch Diagnosis and Testing

Layer / File(s) Summary
Enhanced transaction coherence logging
src/libs/blockchain/transaction.ts
Transaction.isCoherent computes and stores serialized content before hashing, and logs detailed mismatch warnings with transaction type, SDK hash, derived hash, and serialized bytes when hashes diverge, wrapped in try/catch to preserve validation behavior.
Dev-node battery test infrastructure and helpers
scripts/dev-node-battery.ts
Configuration constants for RPC, mnemonic paths, and polling intervals; StageResult data structure for recording outcomes; runStage, pollTx, and getBlock helpers that provide timing, error capture, transaction polling with timeout handling, and RPC-resilient block fetching.
Dev-node battery test execution across 8 stages
scripts/dev-node-battery.ts
Main entry point loads configuration, connects wallet via Demos, and orchestrates 8 sequential stages: node health, native self-pay, L2PS encrypted-tx (optional), validator stake, validators-list verification, governance upgrade proposal (conditional), governance vote (conditional), unstake/arm (conditional), and final state snapshot. Each stage captures timing, transaction hash/status, block references, and stage-specific notes.
Battery test report generation and error handling
scripts/dev-node-battery.ts
Generates markdown reports with summary header, per-stage results table, detailed per-stage sections including error messages and tx details, conditional "Known issues" guidance when governance propose fails with specific error patterns, and top-level fatal error handling that writes partial reports on failure.
Governance hash mismatch root-cause analysis
test-reports/governance-hash-mismatch-analysis.md
Comprehensive analysis documenting transaction hash mismatches between SDK-built governance upgrade transactions and node's Transaction.isCoherent validation. Identifies two post-fork serializer divergences in gcr_edits[] transformation and transaction_fee key ordering. Explains why the issue affects networkUpgrade/networkUpgradeVote but not pay/stake/unstake, why existing governance tests missed it, proposes test-gap and bug-side fixes, provides production workaround, and includes verification checklist.
Test execution output and local configuration ignore
test-reports/dev-node-battery-FINAL.md, .gitignore
Captures generated battery test execution report documenting a run against a dev node with 7 of 10 stages passed, including stage-by-stage results table, detailed per-stage sections with timing/transaction details/error messages, and final state snapshot. Adds stress-test-mnemonic to gitignore local devnet identities section.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • kynesyslabs/node#861: Both PRs address transaction hash/coherence failures caused by post-fork SDK↔node serialization differences—this PR adds detailed diagnostic logging and documents root-cause analysis around gcr_edits/fee serialization, while PR #861 implements validation fixes by normalizing regenerated gcr_edits via the chain's denomination.serializeTransactionContent before hashing.

Suggested labels

Review effort 5/5

Suggested reviewers

  • cwilvx

Poem

🐰 The battery runs to test the node,
Each stage a checkpoint, each flow a note,
When hashes mismatch, the logger cries,
With SDK and chain truth side by side.
Now governance glitches are caught and known, 📊
Fixing serialization, one byte at a time shown.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and accurately summarizes the main changes: adding diagnostic tools (dev-node battery script) and analysis (hash-mismatch investigation) for governance issues, with the isCoherent mismatch dump as a supporting diagnostic.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch diag/governance-hash-mismatch-v2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented May 29, 2026

Greptile Summary

This PR lands three diagnostic pieces for the governance hash mismatch observed on dev.node2: a battery test script (scripts/dev-node-battery.ts), a captured run report (test-reports/dev-node-battery-FINAL.md), and a serializer root-cause write-up (test-reports/governance-hash-mismatch-analysis.md), plus a log.warn dump of serialized bytes in transaction.ts for every isCoherent failure.

  • transaction.ts: captures serialized before hashing and emits it at warn level (not error) when coherence fails — pure diagnostic, no behaviour change.
  • dev-node-battery.ts: multi-stage smoke test; proposalId is now correctly promoted only after poll.status === \"included\", and the summary line correctly excludes skipped stages from the denominator.
  • dev-node-battery-FINAL.md: historical run captured before the proposalId guard fix — Stage 5 shows ❌ rather than the ⏭️ the current script would produce.

Confidence Score: 5/5

Safe to merge; all changes are diagnostic-only with no behavioural impact on production paths.

The transaction.ts change is a pure log addition on an already-rejected transaction path and cannot affect consensus or correctness. The battery script is a standalone diagnostic tool that never runs in CI or production. The proposalId guard and skipped-stage count fixes are correct.

test-reports/dev-node-battery-FINAL.md — committed report was captured from an earlier version of the script and no longer accurately represents what the current script produces.

Important Files Changed

Filename Overview
scripts/dev-node-battery.ts New diagnostic battery script; proposalId guard fix and skipped-count calculation are present and correct; Stage 4 uses a different hash source for polling than all other stages
src/libs/blockchain/transaction.ts Adds log.warn dump of serialized bytes on isCoherent failure; uses correct warn level; try/catch is overly defensive but harmless
test-reports/dev-node-battery-FINAL.md Historical run report generated before the proposalId guard fix; Stage 5 shows failed rather than skipped, inconsistent with current script logic
test-reports/governance-hash-mismatch-analysis.md Documentation-only root-cause analysis; thorough breakdown of both serializer divergences
.gitignore Adds stress-test-mnemonic to gitignore; correct and necessary

Sequence Diagram

sequenceDiagram
    participant Script as dev-node-battery.ts
    participant SDK as Demos SDK
    participant Node as dev.node2 RPC
    participant Log as node log

    Script->>SDK: pay / stake / unstake
    SDK->>Node: broadcast signed tx
    Node-->>Script: 200 OK included

    Script->>SDK: proposeNetworkUpgrade()
    SDK->>SDK: sign() compute tx.hash SDK serializer
    SDK->>Node: confirm(tx)
    Node->>Node: isCoherent() serializeTransactionContent()
    Node->>Log: log.warn type sdkHash derivedHash serialized
    Node-->>SDK: SIGNATURE ERROR hash mismatch
    SDK-->>Script: throws proposalId stays empty
    Note over Script: Stage 5 vote skipped
Loading

Reviews (5): Last reviewed commit: "review: guard proposalId behind successf..." | Re-trigger Greptile

Comment thread src/libs/blockchain/transaction.ts
Comment thread scripts/dev-node-battery.ts Outdated
Comment thread scripts/dev-node-battery.ts
Shitikyan and others added 3 commits May 29, 2026 14:02
- `Transaction.isCoherent` mismatch dump now uses `log.warn` instead of
  `log.error` (Greptile P1). Hash mismatch is an expected, recoverable
  condition during investigation — error-level would light up on-call
  alerts in any monitoring stack watching this node. Collapsed the four
  separate log lines into one to avoid log spam.

- Drop unused `existsSync` import from `dev-node-battery.ts`
  (Greptile P2).

- Track skipped stages separately so the summary headline isn't
  misleading (Greptile P2). A skipped L2PS stage no longer counts as
  a failure: "7/8 stages passed (+ 1 skipped)" instead of "7/10
  stages passed". Added `skipped?: boolean` to `StageResult`, marked
  every L2PS/governance/unstake skip path, and switched the table icon
  selector to `s.skipped` instead of a brittle notes string match.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… gitignore mnemonic

SDK signatures take a single argument — the `Transaction` (for confirm)
or `RPCResponseWithValidityData` (for broadcast). The `demos` instance
is bound as `this` on the method; passing it as a second positional
arg was tolerated at runtime (JavaScript ignores extras) but TypeScript
correctly flagged it under `tsc --noEmit`. Eight call sites cleaned up.

Also adds `stress-test-mnemonic` (the funded test wallet used by
`dev-node-battery.ts`) to .gitignore so it can sit alongside
`.manual-test-mnemonic` without risk of getting staged on a future
`git add .`. The file itself is operator-local and was never tracked.

Verified: battery still runs end-to-end against dev.node2, same
7/9 + 1 skipped result.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… default

`scripts/dev-node-battery.ts:17` sets the default RPC to a plain-HTTP URL
because the deployed dev node listens on plain HTTP — there is no TLS
terminator in front of it. SonarCloud's typescript:S5332 rule
flagged this as a Quality Gate failure on PR #876.

The default is just convenience — an operator pointing the battery
at a TLS-terminated endpoint overrides via `RPC=https://...`. Inline
NOSONAR with explanatory comment so the next reviewer sees why the
HTTP default is the right choice for this specific URL.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
test-reports/governance-hash-mismatch-analysis.md (1)

13-15: 💤 Low value

Consider adding language identifiers to fenced code blocks.

The fenced code blocks at lines 13-15 and 176-183 are missing language specifiers. Adding identifiers (e.g., ```text or ```plaintext) improves syntax highlighting and tooling support.

📝 Proposed fix

Line 13:

-```
+```text
 [Tx Validation] [SIGNATURE ERROR] Transaction hash mismatch

Line 176:
```diff
-```
+```text
 SDK builder ─── sign ─── confirm ─── isCoherent ─── handleGovernanceTx ─── GCR apply
        │                    │                              │                      │

Also applies to: 176-183

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test-reports/governance-hash-mismatch-analysis.md` around lines 13 - 15, Add
a language identifier (e.g., "text" or "plaintext") to the fenced code blocks
that are currently unspecified; specifically update the block containing "[Tx
Validation] [SIGNATURE ERROR] Transaction hash mismatch" and the ASCII diagram
starting with "SDK builder ─── sign ─── confirm ─── isCoherent ───
handleGovernanceTx ─── GCR apply" so they become "```text" fenced blocks to
improve syntax highlighting and tooling support.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@test-reports/governance-hash-mismatch-analysis.md`:
- Around line 13-15: Add a language identifier (e.g., "text" or "plaintext") to
the fenced code blocks that are currently unspecified; specifically update the
block containing "[Tx Validation] [SIGNATURE ERROR] Transaction hash mismatch"
and the ASCII diagram starting with "SDK builder ─── sign ─── confirm ───
isCoherent ─── handleGovernanceTx ─── GCR apply" so they become "```text" fenced
blocks to improve syntax highlighting and tooling support.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1cf50ecd-0cf4-49ed-adae-889d2e37e83c

📥 Commits

Reviewing files that changed from the base of the PR and between f6336cc and f961fcd.

📒 Files selected for processing (5)
  • .gitignore
  • scripts/dev-node-battery.ts
  • src/libs/blockchain/transaction.ts
  • test-reports/dev-node-battery-FINAL.md
  • test-reports/governance-hash-mismatch-analysis.md

Comment thread scripts/dev-node-battery.ts
`proposalId = randomUUID()` ran before `await demos.confirm(tx)`. When
`confirm()` throws (current dev.node2 behaviour — hash mismatch), the
exception is caught by `runStage`, but the outer `proposalId` is already
a non-empty UUID. The `if (proposalId)` guard at Stage 5 was therefore
truthy, firing a vote against a proposal that never existed on chain
and producing a misleading ❌ "Proposal not found" instead of ⏭️.

Mint the UUID locally as `id`, only promote to the outer `proposalId`
after `poll.status === "included"` — mirrors the `stakeOk` pattern in
Stage 3.

Verified: re-running battery against dev.node2 now shows Stage 5
correctly skipped as ⏭️. Summary line went from
`7/9 stages passed (+ 1 skipped)` to `6/7 stages passed (+ 2 skipped)`
— the false positive in the headline is gone.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@Shitikyan Shitikyan changed the base branch from feat-stabilisation-test to stabilisation May 29, 2026 15:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant