reanalyze: add `zeitghost reanalyze` to exercise lineage (re-score → chained revisions) by aaronmarkham · Pull Request #11 · aaronmarkham/zeitghost

aaronmarkham · 2026-05-31T05:48:04Z

Summary

Closes #8. Adds zeitghost reanalyze — the command that actually exercises lineage. PR #5 built parent_shard_id chaining and PR #9 wired shard_superseded events, but neither fires in production because ingest dedups and skips known articles. reanalyze deliberately re-processes existing articles and writes each as a new revision chained onto its predecessor.

What it does

Loads current articles, selects a bounded subset, re-runs bias analysis, and writes new shards via build_lineage_index so each chains onto the prior one (parent_shard_id) and emits shard_superseded.
Reuses the ingest signing + fail-fast + trace-emitter path, so revisions are signed and traced like fresh writes.

Key pieces

select_for_reanalysis(articles, source=, since=, limit=) — pure, unit-testable selection. Filters by source-name slug ("Fox News" == "fox-news") and published date; sorts newest-first so --limit re-scores the most recent articles.
analyze_article / analyze_batch gain a model param — re-analysis can use a newer model; the command sets the result's .model so the new shard records which model produced the revision (the writer would otherwise fall back to DEFAULT_MODEL).
Bounded by design — a bare reanalyze (whole corpus = one Claude call each) is refused; requires --limit and/or --source/--since. --dry-run previews the selection with zero Claude calls or writes.

CLI

zeitghost reanalyze --limit 50                 # 50 most-recent articles
zeitghost reanalyze --source "Fox News" -n 20  # newest 20 from one source
zeitghost reanalyze --since 2026-05-01 --model claude-opus-4-8 --dry-run

Test plan

python -m pytest tests/ — 63 passed
Selection: source (name/slug), since, limit + newest-first ordering
End-to-end (fake provider): re-scored article writes a shard with parent_shard_id == the original, .model == the chosen model, and the new score — confirming the PR shards: integrity & lineage-correctness (sign, collapse revisions, no default-fill bias) #5 collapse + PR trace: wire TraceEmitter into ingest (events, trace_ref, flip-panel provenance) #9 supersede path light up
Whole-corpus invocation refused, makes zero Claude calls
--dry-run makes no calls and writes nothing
reanalyze --help renders; clean-cwd import holds

Notes

Re-analysis runs on the stored summary (article bodies aren't persisted — used at ingest only). A --refetch-bodies option could be a future add if re-scoring on full text matters.
This is the workload that produces shard_superseded events in prod (trace: wire TraceEmitter into ingest (events, trace_ref, flip-panel provenance) #9 wired emission; this triggers it).

🤖 Generated with Claude Code

…chained revisions) Closes #8. Adds a command that deliberately re-processes existing articles — the workflow that exercises the lineage chaining (PR #5) and shard_superseded events (PR #9), which `ingest` never triggers because it dedups and skips known articles. - `select_for_reanalysis(articles, source=, since=, limit=)`: pure, testable selection — filters by source-name slug and published date, sorts newest first so --limit re-scores the most recent articles. - `analyze_article` / `analyze_batch` take a `model` param so re-analysis can use a different (e.g. newer) Claude model; the command sets the result's `.model` so the new shard records which model produced the revision. - `reanalyze` CLI: loads current articles, selects, re-analyzes, and writes new shards with build_lineage_index so each chains onto its predecessor via parent_shard_id. Reuses the ingest signing + fail-fast + trace-emitter path. - Bounded by design: a bare `reanalyze` (whole corpus, one Claude call each) is refused — requires --limit and/or --source/--since. --dry-run previews the selection with zero Claude calls / writes. Tests: selection filters/ordering/limit; end-to-end (fake provider) that a re-scored article writes a revision with parent_shard_id set + the new model recorded; whole-corpus refusal makes no calls; dry-run makes no calls/writes. 63 passed. CLAUDE.md commands updated (reanalyze + gen-signing-key). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…_article, validate --since Addresses PR #11 review: - CRITICAL: restore @main.command() on `build`. Inserting `reanalyze` consumed the decorator that belonged to `build`, leaving it unregistered — which would have broken the prod `ingest && build` loop. Add a command-registration smoke test (build/ingest/reanalyze/analytics/gen-signing-key/import-legacy) so a future decorator shuffle of this shape fails a test instead of prod. - analyze_article now stamps `.model` on its result (model=model), so the param does what the docstring claims and every shard records its producer — fixes the latent ingest empty-model bug too. Dropped reanalyze's caller-side `r.model = model` workaround. - Validate --since as YYYY-MM-DD (strptime) and fail loud; the comparison is a lexicographic ISO-prefix match, so a malformed date silently matched everything/nothing before. Tests: command registration; --source + --since AND'd together; malformed --since rejected (no Claude calls); model stamped via analyze_article. 67 passed. Kept select_for_reanalysis's lazy `source_slug` import — it's consistent with _article_tags/tags_from_shard in the same module, which import it lazily to avoid the analytics→bias→shards cycle. Storage-growth from kept revisions tracked in #12 (prune-revisions). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

aaronmarkham mentioned this pull request May 31, 2026

Prune old shard revisions (zeitghost prune-revisions --keep-last N) to bound objects/ growth #12

Open

4 tasks

aaronmarkham merged commit c10c316 into main May 31, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reanalyze: add `zeitghost reanalyze` to exercise lineage (re-score → chained revisions)#11

reanalyze: add `zeitghost reanalyze` to exercise lineage (re-score → chained revisions)#11
aaronmarkham merged 2 commits into
mainfrom
claude/reanalyze

aaronmarkham commented May 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aaronmarkham commented May 31, 2026

Summary

What it does

Key pieces

CLI

Test plan

Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant