Skip to content

reanalyze: add zeitghost reanalyze to exercise lineage (re-score → chained revisions)#11

Merged
aaronmarkham merged 2 commits into
mainfrom
claude/reanalyze
May 31, 2026
Merged

reanalyze: add zeitghost reanalyze to exercise lineage (re-score → chained revisions)#11
aaronmarkham merged 2 commits into
mainfrom
claude/reanalyze

Conversation

@aaronmarkham

Copy link
Copy Markdown
Owner

Summary

Closes #8. Adds zeitghost reanalyze — the command that actually exercises lineage. PR #5 built parent_shard_id chaining and PR #9 wired shard_superseded events, but neither fires in production because ingest dedups and skips known articles. reanalyze deliberately re-processes existing articles and writes each as a new revision chained onto its predecessor.

What it does

  • Loads current articles, selects a bounded subset, re-runs bias analysis, and writes new shards via build_lineage_index so each chains onto the prior one (parent_shard_id) and emits shard_superseded.
  • Reuses the ingest signing + fail-fast + trace-emitter path, so revisions are signed and traced like fresh writes.

Key pieces

  • select_for_reanalysis(articles, source=, since=, limit=) — pure, unit-testable selection. Filters by source-name slug ("Fox News" == "fox-news") and published date; sorts newest-first so --limit re-scores the most recent articles.
  • analyze_article / analyze_batch gain a model param — re-analysis can use a newer model; the command sets the result's .model so the new shard records which model produced the revision (the writer would otherwise fall back to DEFAULT_MODEL).
  • Bounded by design — a bare reanalyze (whole corpus = one Claude call each) is refused; requires --limit and/or --source/--since. --dry-run previews the selection with zero Claude calls or writes.

CLI

zeitghost reanalyze --limit 50                 # 50 most-recent articles
zeitghost reanalyze --source "Fox News" -n 20  # newest 20 from one source
zeitghost reanalyze --since 2026-05-01 --model claude-opus-4-8 --dry-run

Test plan

Notes

🤖 Generated with Claude Code

…chained revisions)

Closes #8. Adds a command that deliberately re-processes existing articles —
the workflow that exercises the lineage chaining (PR #5) and shard_superseded
events (PR #9), which `ingest` never triggers because it dedups and skips known
articles.

- `select_for_reanalysis(articles, source=, since=, limit=)`: pure, testable
  selection — filters by source-name slug and published date, sorts newest
  first so --limit re-scores the most recent articles.
- `analyze_article` / `analyze_batch` take a `model` param so re-analysis can
  use a different (e.g. newer) Claude model; the command sets the result's
  `.model` so the new shard records which model produced the revision.
- `reanalyze` CLI: loads current articles, selects, re-analyzes, and writes new
  shards with build_lineage_index so each chains onto its predecessor via
  parent_shard_id. Reuses the ingest signing + fail-fast + trace-emitter path.
- Bounded by design: a bare `reanalyze` (whole corpus, one Claude call each) is
  refused — requires --limit and/or --source/--since. --dry-run previews the
  selection with zero Claude calls / writes.

Tests: selection filters/ordering/limit; end-to-end (fake provider) that a
re-scored article writes a revision with parent_shard_id set + the new model
recorded; whole-corpus refusal makes no calls; dry-run makes no calls/writes.
63 passed. CLAUDE.md commands updated (reanalyze + gen-signing-key).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…_article, validate --since

Addresses PR #11 review:

- CRITICAL: restore @main.command() on `build`. Inserting `reanalyze` consumed
  the decorator that belonged to `build`, leaving it unregistered — which would
  have broken the prod `ingest && build` loop. Add a command-registration smoke
  test (build/ingest/reanalyze/analytics/gen-signing-key/import-legacy) so a
  future decorator shuffle of this shape fails a test instead of prod.
- analyze_article now stamps `.model` on its result (model=model), so the param
  does what the docstring claims and every shard records its producer — fixes
  the latent ingest empty-model bug too. Dropped reanalyze's caller-side
  `r.model = model` workaround.
- Validate --since as YYYY-MM-DD (strptime) and fail loud; the comparison is a
  lexicographic ISO-prefix match, so a malformed date silently matched
  everything/nothing before.

Tests: command registration; --source + --since AND'd together; malformed
--since rejected (no Claude calls); model stamped via analyze_article. 67 passed.

Kept select_for_reanalysis's lazy `source_slug` import — it's consistent with
_article_tags/tags_from_shard in the same module, which import it lazily to
avoid the analytics→bias→shards cycle. Storage-growth from kept revisions
tracked in #12 (prune-revisions).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@aaronmarkham aaronmarkham merged commit c10c316 into main May 31, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add zeitghost reanalyze to exercise lineage (re-score a window, form parent_shard_id revisions)

1 participant