Skip to content

CDC replication support (DML for Iceberg with INSERT/UPDATE/DELETE) #5

Description

@laskoviymishka

Currently the iceberg plugin only supports snapshot mode — full table copies. For replication (streaming CDC from Postgres WAL), we need to handle row-level mutations: INSERTs, UPDATEs, and DELETEs.

What works today

  • Snapshot mode: bulk copy all rows into Iceberg tables via SinkSnapshot
  • Streaming append-only mode: SinkStreaming for insert-only sources like Kafka

What's missing

Replication from Postgres produces a mix of INSERT, UPDATE, and DELETE operations. Efficiently applying these to Iceberg requires:

  1. Equality deletes — write small delete files keyed on PK columns instead of rewriting entire data files
  2. RowDelta API — commit new data files + delete files in a single atomic snapshot (so an UPDATE = delete old row + insert new row happens atomically)
  3. Equality delete reading — so downstream consumers see correct query results

These are upstream gaps in apache/iceberg-go:

  • #602 — RowDelta API
  • #784 — Multi-table commit (done)

Rough plan

Once iceberg-go has RowDelta + equality deletes, the replication sink would look something like:

for _, change := range cdcBatch {
    switch change.Kind {
    case INSERT:
        rd.AddRows(writeDataFile(change.After))
    case DELETE:
        rd.AddDeletes(writeEqDeleteFile(change.Before, pkFields))
    case UPDATE:
        rd.AddDeletes(writeEqDeleteFile(change.Before, pkFields))
        rd.AddRows(writeDataFile(change.After))
    }
}

Combined with multi-table commit, all tables in a WAL batch can be committed atomically — consistent cross-table point-in-time view.

Blocked on

  • apache/iceberg-go RowDelta API (#602)
  • apache/iceberg-go equality delete write + read support

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions