Skip to content

Prune old shard revisions (zeitghost prune-revisions --keep-last N) to bound objects/ growth #12

Description

@aaronmarkham

Prune old shard revisions (bound objects/ growth from reanalyze)

Follow-up from the PR #11 review. zeitghost reanalyze (#8) writes a new shard per re-scored article and keeps the prior revision in the store (that's the point — lineage). load_articles_from_shards collapses each chain to the latest for rendering, but the superseded shards persist forever. Under cron-driven re-analysis this grows objects/ unboundedly — same shape as the trace-retention concern in #10.

Idea

A zeitghost prune-revisions --keep-last N (default e.g. 3) that, per entity, deletes all but the newest N shards in the parent_shard_id chain — for both zeitghost:article and sw:article scopes.

Considerations

  • Don't break lineage you keep: if you keep the last N, the oldest kept shard's parent_shard_id will dangle (its parent was pruned). Either accept dangling parents as "history truncated here," or rewrite the oldest kept shard's parent to None. Document the choice.
  • Trace refs: pruned shards may be referenced by trace events (Wire TraceEmitter into ingest: shard_created/superseded events, trace_ref, per-run run_id, flip-panel provenance #7). Pruning shards but keeping traces is fine (trace is a log of what happened), but a future verify-trace shouldn't assume every referenced shard still exists.
  • Backups: shards are bind-mounted on us-ny1; pruning should be safe w.r.t. rsync (deletions propagate — intended).
  • Dry-run + bounded by default, mirroring reanalyze's safety shape.

Acceptance criteria

  • prune-revisions keeps the newest N revisions per entity per scope, deletes the rest.
  • Dangling-parent handling is explicit and documented.
  • --dry-run reports what would be deleted without touching the store.
  • Test: a 4-deep chain pruned to --keep-last 2 leaves exactly the 2 newest.

Relates to #8 (reanalyze, which creates the revisions) and #10 (trace retention — same unbounded-growth shape).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions