Skip to content

Add zeitghost reanalyze to exercise lineage (re-score a window, form parent_shard_id revisions) #8

Description

@aaronmarkham

Add zeitghost reanalyze to exercise lineage

Follow-up to PR #5 (audit recommendation #3). PR #5 built the lineage machinery — parent_shard_id chaining via build_lineage_index, and a latest-per-entity load collapse so revisions render as one card. But that machinery is currently dormant: nothing ever writes a second revision of an article.

Why it's dormant

zeitghost ingest dedups by URL entity (known_url_entities / is_known) and skips anything already in the store. So every article is written exactly once, parent_shard_id is always None, and the revision-chain + collapse logic never actually triggers in production. The one workflow that would exercise it — re-analysing an existing article — doesn't exist yet.

Scope

A zeitghost reanalyze command that bypasses dedup over a bounded selection and re-writes shards as revisions:

  • Select by window/filter (e.g. --since, --source, --model, --limit) — never the whole corpus by default.
  • For each selected article: re-run analyze_article, then article_to_*_shard(..., lineage_index=...) so the new shard links its predecessor via parent_shard_id.
  • Set article.model to the model that actually did the re-analysis (the existing writer preserves the prior model's name otherwise — see article_to_internal_shard docstring).
  • Reuse the same signing_seed / signing_required path as ingest.
  • Loud summary: N re-analyzed, N revisions chained, N skipped.

Acceptance criteria

  • reanalyze over a small window produces shards with non-null parent_shard_id pointing at the prior revision.
  • load_articles_from_shards still returns one card per entity (the newest), confirming the PR shards: integrity & lineage-correctness (sign, collapse revisions, no default-fill bias) #5 collapse handles real chains.
  • Re-analysis with a different --model is reflected in the new shard's model atom.
  • Signed when a key is configured; honors --require-signing / ZEITGHOST_REQUIRE_SIGNING.
  • Bounded by default (no accidental full-corpus re-analysis / cost blowout); --limit + a dry-run mode.

Pairs with #7 (trace emitter)

Re-analysis is what produces shard_superseded events, so this is the natural test workload for #7. Recommend landing #7's emitter first (or together) so re-analysis is traced end-to-end from day one.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions