reanalyze: add zeitghost reanalyze to exercise lineage (re-score → chained revisions)#11
Merged
Conversation
…chained revisions) Closes #8. Adds a command that deliberately re-processes existing articles — the workflow that exercises the lineage chaining (PR #5) and shard_superseded events (PR #9), which `ingest` never triggers because it dedups and skips known articles. - `select_for_reanalysis(articles, source=, since=, limit=)`: pure, testable selection — filters by source-name slug and published date, sorts newest first so --limit re-scores the most recent articles. - `analyze_article` / `analyze_batch` take a `model` param so re-analysis can use a different (e.g. newer) Claude model; the command sets the result's `.model` so the new shard records which model produced the revision. - `reanalyze` CLI: loads current articles, selects, re-analyzes, and writes new shards with build_lineage_index so each chains onto its predecessor via parent_shard_id. Reuses the ingest signing + fail-fast + trace-emitter path. - Bounded by design: a bare `reanalyze` (whole corpus, one Claude call each) is refused — requires --limit and/or --source/--since. --dry-run previews the selection with zero Claude calls / writes. Tests: selection filters/ordering/limit; end-to-end (fake provider) that a re-scored article writes a revision with parent_shard_id set + the new model recorded; whole-corpus refusal makes no calls; dry-run makes no calls/writes. 63 passed. CLAUDE.md commands updated (reanalyze + gen-signing-key). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Open
4 tasks
…_article, validate --since Addresses PR #11 review: - CRITICAL: restore @main.command() on `build`. Inserting `reanalyze` consumed the decorator that belonged to `build`, leaving it unregistered — which would have broken the prod `ingest && build` loop. Add a command-registration smoke test (build/ingest/reanalyze/analytics/gen-signing-key/import-legacy) so a future decorator shuffle of this shape fails a test instead of prod. - analyze_article now stamps `.model` on its result (model=model), so the param does what the docstring claims and every shard records its producer — fixes the latent ingest empty-model bug too. Dropped reanalyze's caller-side `r.model = model` workaround. - Validate --since as YYYY-MM-DD (strptime) and fail loud; the comparison is a lexicographic ISO-prefix match, so a malformed date silently matched everything/nothing before. Tests: command registration; --source + --since AND'd together; malformed --since rejected (no Claude calls); model stamped via analyze_article. 67 passed. Kept select_for_reanalysis's lazy `source_slug` import — it's consistent with _article_tags/tags_from_shard in the same module, which import it lazily to avoid the analytics→bias→shards cycle. Storage-growth from kept revisions tracked in #12 (prune-revisions). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #8. Adds
zeitghost reanalyze— the command that actually exercises lineage. PR #5 builtparent_shard_idchaining and PR #9 wiredshard_supersededevents, but neither fires in production becauseingestdedups and skips known articles.reanalyzedeliberately re-processes existing articles and writes each as a new revision chained onto its predecessor.What it does
build_lineage_indexso each chains onto the prior one (parent_shard_id) and emitsshard_superseded.ingestsigning + fail-fast + trace-emitter path, so revisions are signed and traced like fresh writes.Key pieces
select_for_reanalysis(articles, source=, since=, limit=)— pure, unit-testable selection. Filters by source-name slug ("Fox News"=="fox-news") andpublisheddate; sorts newest-first so--limitre-scores the most recent articles.analyze_article/analyze_batchgain amodelparam — re-analysis can use a newer model; the command sets the result's.modelso the new shard records which model produced the revision (the writer would otherwise fall back toDEFAULT_MODEL).reanalyze(whole corpus = one Claude call each) is refused; requires--limitand/or--source/--since.--dry-runpreviews the selection with zero Claude calls or writes.CLI
Test plan
python -m pytest tests/— 63 passedparent_shard_id== the original,.model== the chosen model, and the new score — confirming the PR shards: integrity & lineage-correctness (sign, collapse revisions, no default-fill bias) #5 collapse + PR trace: wire TraceEmitter into ingest (events, trace_ref, flip-panel provenance) #9 supersede path light up--dry-runmakes no calls and writes nothingreanalyze --helprenders; clean-cwd import holdsNotes
--refetch-bodiesoption could be a future add if re-scoring on full text matters.shard_supersededevents in prod (trace: wire TraceEmitter into ingest (events, trace_ref, flip-panel provenance) #9 wired emission; this triggers it).🤖 Generated with Claude Code