Add zeitghost reanalyze to exercise lineage
Follow-up to PR #5 (audit recommendation #3). PR #5 built the lineage machinery — parent_shard_id chaining via build_lineage_index, and a latest-per-entity load collapse so revisions render as one card. But that machinery is currently dormant: nothing ever writes a second revision of an article.
Why it's dormant
zeitghost ingest dedups by URL entity (known_url_entities / is_known) and skips anything already in the store. So every article is written exactly once, parent_shard_id is always None, and the revision-chain + collapse logic never actually triggers in production. The one workflow that would exercise it — re-analysing an existing article — doesn't exist yet.
Scope
A zeitghost reanalyze command that bypasses dedup over a bounded selection and re-writes shards as revisions:
- Select by window/filter (e.g.
--since, --source, --model, --limit) — never the whole corpus by default.
- For each selected article: re-run
analyze_article, then article_to_*_shard(..., lineage_index=...) so the new shard links its predecessor via parent_shard_id.
- Set
article.model to the model that actually did the re-analysis (the existing writer preserves the prior model's name otherwise — see article_to_internal_shard docstring).
- Reuse the same
signing_seed / signing_required path as ingest.
- Loud summary: N re-analyzed, N revisions chained, N skipped.
Acceptance criteria
Pairs with #7 (trace emitter)
Re-analysis is what produces shard_superseded events, so this is the natural test workload for #7. Recommend landing #7's emitter first (or together) so re-analysis is traced end-to-end from day one.
Add
zeitghost reanalyzeto exercise lineageFollow-up to PR #5 (audit recommendation #3). PR #5 built the lineage machinery —
parent_shard_idchaining viabuild_lineage_index, and a latest-per-entity load collapse so revisions render as one card. But that machinery is currently dormant: nothing ever writes a second revision of an article.Why it's dormant
zeitghost ingestdedups by URL entity (known_url_entities/is_known) and skips anything already in the store. So every article is written exactly once,parent_shard_idis alwaysNone, and the revision-chain + collapse logic never actually triggers in production. The one workflow that would exercise it — re-analysing an existing article — doesn't exist yet.Scope
A
zeitghost reanalyzecommand that bypasses dedup over a bounded selection and re-writes shards as revisions:--since,--source,--model,--limit) — never the whole corpus by default.analyze_article, thenarticle_to_*_shard(..., lineage_index=...)so the new shard links its predecessor viaparent_shard_id.article.modelto the model that actually did the re-analysis (the existing writer preserves the prior model's name otherwise — seearticle_to_internal_sharddocstring).signing_seed/signing_requiredpath asingest.Acceptance criteria
reanalyzeover a small window produces shards with non-nullparent_shard_idpointing at the prior revision.load_articles_from_shardsstill returns one card per entity (the newest), confirming the PR shards: integrity & lineage-correctness (sign, collapse revisions, no default-fill bias) #5 collapse handles real chains.--modelis reflected in the new shard'smodelatom.--require-signing/ZEITGHOST_REQUIRE_SIGNING.--limit+ a dry-run mode.Pairs with #7 (trace emitter)
Re-analysis is what produces
shard_supersededevents, so this is the natural test workload for #7. Recommend landing #7's emitter first (or together) so re-analysis is traced end-to-end from day one.