Skip to content

Reorg-orphaned rows leak in events/transactions → Streams cursor collisions #46

@ryanwaits

Description

@ryanwaits

Summary

When a block height is reorged, the orphaned block's transactions/events rows are left behind and the new canonical block's rows are inserted alongside them — producing duplicate rows at the same (block_height, tx_index) (different tx_ids). The Streams events query filters canonicality by blocks.canonical at that height (which both orphaned and canonical rows satisfy), so orphaned rows leak into candidate_events. That inflates the stream_event_index COUNT and makes two distinct events resolve to the same cursor.

Downstream this wedged the L2 decoders: a decode batch with two same-cursor rows fails the decoded_events upsert (ON CONFLICT DO UPDATE command cannot affect row a second time) and the decoder loops forever. (stx_transfer/stx_mint were ~15h behind before the hotfix.)

Root cause

  • events has no canonical column (only block_height); transactions has neither canonical nor block_hash (only block_height).
  • reorg.ts only marks blocks.canonical = false — it never touches events/transactions.
  • Ingest (packages/indexer/src/index.ts:367–452) inserts txs/events with onConflict … doNothing and never deletes, so reorged heights accumulate orphaned rows.
  • The Streams query (packages/indexer/src/streams-events.ts, stream_event_index COUNT joining events → transactions) double-counts the orphaned rows → colliding cursors. This affects the Streams events surface itself, not just the decoder.

Evidence

SELECT block_height, tx_index, count(*)
FROM transactions WHERE block_height BETWEEN 8088743 AND 8088760
GROUP BY block_height, tx_index HAVING count(*) > 1;
-- n=2 for blocks 8088744+

Decoder logs: repeated l2_decoder.error … ON CONFLICT DO UPDATE command cannot affect row a second time for l2.stx_transfer.v1 / l2.stx_mint.v1.

Hotfix (shipped)

writeDecodedEvents now de-dupes by cursor before the upsert (commit f195618a) — stops the decoder wedge. Defense-in-depth; does not fix the underlying leak.

Fix — Option A: replace-per-height ingest

blocks already replaces-by-height (onConflict(height).doUpdateSet); make transactions/events do the same.

  • T1 (root fix): in the new_block txn, delete transactions+events at block_height before re-inserting (both *_block_height_idx exist → cheap; atomic in the existing txn). The node only emits canonical blocks, so the height ends up holding exactly the canonical set. Self-healing for future reorgs. During the reorg window there is no canonical block at the height, so the Streams b.canonical join already returns nothing — no leak.
  • T2 (one-time cleanup, required): dedupe existing transactions/events — keep MAX(created_at) per (block_height, tx_index) (later insert = new chain = canonical), delete the rest + their events by tx_id. (Heuristic; precise alternative is re-ingesting affected blocks from the node.)
  • T3 (hardening, optional): reorg.ts also deletes transactions/events at block_height >= fork_point.
  • T4: Streams-events test over a reorged-height fixture asserting unique cursors.

Risks / open questions

  • Perf of delete-per-block at high catch-up/backfill rates (indexed → cheap, but measure).
  • T2's "latest created_at = canonical" heuristic.
  • Confirm the Stacks event emitter never POSTs an orphaned block to /new_block (only canonical). If it can, T1's "incoming = canonical" assumption needs a guard (the handler already does parent-hash checks ~index.ts:340).

Related

  • The 90-day L2 backfill (packages/indexer/src/l2/BACKFILL.md) should run after T1 + T2.
  • Surfaced while diagnosing the stx_lock/decoder rollout.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions