Skip to content

merger: add optional validation of one-block files before merging#150

Closed
YaroShkvorets wants to merge 1 commit into
streamingfast:developfrom
pinax-network:feature/merger-validate-blocks
Closed

merger: add optional validation of one-block files before merging#150
YaroShkvorets wants to merge 1 commit into
streamingfast:developfrom
pinax-network:feature/merger-validate-blocks

Conversation

@YaroShkvorets

Copy link
Copy Markdown
Contributor

Problem

The merger's streaming bundle reader byte-concatenates one-block files into the merged bundle without any decoding. A one-block file containing a corrupted block payload (e.g. produced by a faulty reader node) is silently propagated into the merged-blocks store, where it later poisons every consumer of the bundle — index-builder, firehose, substreams — typically surfacing much later as a bare proto: cannot parse invalid wire-format data (see #149).

We hit this in production on a MegaETH deployment: a single block with an invalid payload inside an otherwise-healthy 100-block bundle (valid zstd, valid DBIN framing, valid block envelope — only the inner payload was garbage).

Change

New --merger-validate-one-block-files flag, default false — the existing pure-streaming behavior is preserved unless explicitly opted in.

When enabled, each one-block file is buffered (one file at a time, so memory stays bounded to a single block) and validated before being written into the bundle:

  • DBIN framing and pbbstream.Block envelope must decode
  • payload must be non-empty and valid protobuf wire-format — checked generically by unmarshalling into emptypb.Empty with DiscardUnknown, so no knowledge of the chain-specific block type is needed
  • the file must contain exactly one block (no trailing data)

A corrupted file fails the merge with an error naming the file and block:

refusing to merge corrupted one_block_file 0006258830-...: block #6258830 (d82cd8…) payload (type.googleapis.com/sf.ethereum.type.v2.Block) is corrupted: proto: cannot parse invalid wire-format data

so the operator can re-extract the block instead of discovering the corruption much later in a downstream consumer.

Testing

  • New unit tests for validateOneBlockFile (corrupt payload, empty payload, truncated file, trailing data)
  • Streaming output is byte-identical with validation on and off (TestStreamingBundleReader_ReadSimpleFiles runs both modes)
  • Default pass-through behavior documented by TestStreamingBundleReader_CorruptPayloadPassesWithoutValidation
  • End-to-end MergeAndStore failure path covered by TestMergerIO_MergeUploadCorruptBlockWithValidation

🤖 Generated with Claude Code

The streaming bundle reader byte-concatenates one-block files into the
merged bundle without any decoding, so a one-block file with a
corrupted block payload is silently propagated into the merged-blocks
store, poisoning every downstream consumer of the bundle
(index-builder, firehose, substreams, ...).

This adds a --merger-validate-one-block-files flag (default false,
preserving the existing pure-streaming behavior). When enabled, each
one-block file is buffered (one at a time, memory stays bounded to a
single block) and validated before being written out: the DBIN framing
and pbbstream.Block envelope must decode, the payload must be non-empty
and valid protobuf wire-format (checked generically, without knowledge
of the chain-specific block type), and the file must contain exactly
one block. A corrupted file fails the merge with an error naming the
file, so the operator can re-extract it instead of discovering the
corruption much later in a consumer.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@YaroShkvorets YaroShkvorets force-pushed the feature/merger-validate-blocks branch from 843aa9a to 3376b73 Compare June 6, 2026 01:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant