shards: integrity & lineage-correctness (sign, collapse revisions, no default-fill bias)#5
Merged
Merged
Conversation
…ions, never default-fill bias
Three related fixes to how zeitghost uses spiritwriter's shard substrate:
- Opt-in Ed25519 signing. article_to_*_shard now accept a signing_seed and
sign before store.put when one is configured; resolve_signing_seed() reads
ZEITGHOST_SIGNING_KEY via spiritwriter.secrets (keychain → env fallback, so
the headless us-ny1 builder can use an env var). Unconfigured environments
write unsigned shards rather than failing. New `zeitghost gen-signing-key`
mints a seed, stores it in the keychain, and prints the signer thumbprint.
Signing covers {atoms, scope, origin, …} but not the content-address, so
shard_id is unchanged.
- load_articles_from_shards collapses each parent_shard_id revision chain to
its newest shard per entity. Previously every shard in scope was rendered,
so a re-analyzed article would surface as duplicate cards the moment a chain
formed — a latent bug that became reachable once re-analysis is enabled.
- analyze_article skips an article whose LLM response lacks a usable
bias_score instead of defaulting to 0.5, which would mislabel an unscored
article as "center" (robustness invariant #1). Logic extracted to a testable
_parse_bias_score helper; a literal 0.0 (full left) still passes through.
Factored the shared entity-key lookup into _entity_of (used by
known_url_entities, build_lineage_index, and the new load path) so dedup,
lineage chaining, and render-time collapse all key off the same identity.
Tests: new tests/test_shard_integrity.py covers signing round-trip + verify,
unsigned-by-default, latest-per-entity collapse, distinct-entity separation,
the bias-score guard, and resolve_signing_seed (env/absent/malformed).
Updated test_shard_metadata_round_trip for the latest-only load behavior.
Not in scope (rides with the trace-emitter PR): shard_superseded trace events,
trace_ref population, and surfacing the signer in the card flip-panel.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rden tests Addresses PR #5 review: - gen-signing-key no longer echoes the secret seed after a successful keychain store. New --print-seed flag (or --no-store, which must print since it's the only copy) reveals it for mirroring to the prod env var; otherwise it stays off-screen so it doesn't linger in shell scrollback/history. - analyze_batch now logs a WARNING with the count of articles dropped for a missing/invalid bias_score (threaded via a stats dict from analyze_article). The new skip-not-default path silently shrinks the feed by design, so the count surfaces an LLM regression; logging reaches the operator's console via the CLI RichHandler. - load_articles_from_shards: comment that equal-timestamp ties go to first-seen, consistent with build_lineage_index's comparison. - ingest only prints the signing-mode line when there's something to write. - Tests: tampered-signature → verify() raises InvalidSignature; end-to-end analyze_article returns None when the LLM response omits bias_score (mocked provider). 48 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…sible/compose Follow-up #1 from the PR #5 review: let prod fail-closed when signing is required but no key is configured, so an accidentally-cleared key surfaces loudly instead of silently degrading to unsigned shards. - signing_required(flag) (testable, no Click): True if --require-signing is passed or ZEITGHOST_REQUIRE_SIGNING is truthy. Local/CI leave both unset, so signing stays opt-in there. - ingest resolves the seed up front and fails fast (click.ClickException, exit 1) before spending any NewsAPI quota / Claude calls when required-but-missing. The single resolved seed is reused for the write loop (no double lookup). - env.j2: template ZEITGHOST_SIGNING_KEY from vault_zeitghost_signing_key (defaulted '' so the deploy renders before the key is vaulted) and ZEITGHOST_REQUIRE_SIGNING from zeitghost_require_signing (default 0). - docker-compose builder: pass both through (references only; secret stays in the 0600 .env Ansible renders). Safe rollout: require defaults OFF. Provision the vault key, deploy, confirm shards sign, then flip `zeitghost_require_signing: 1` in the inventory. Tests: signing_required off-by-default / flag-wins / env truthy+falsy variants. 51 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Wires the shard-signing seed through the existing GitHub-secret → ansible extra-var → env.j2 chain (this repo has no ansible-vault file; vault_* vars are injected from repo secrets in deploy.yml). env.j2 already defaults the var to '' so the deploy renders before the ZEITGHOST_SIGNING_KEY repo secret is set. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This was referenced May 31, 2026
Open
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Tightens how zeitghost uses the spiritwriter shard substrate, in three related ways that make the provenance / integrity story the landing page tells actually true for zeitghost's own output. Came out of an audit of the ingest → analyze → shard write/read path against what
spiritwriter.fabricexposes.1. Opt-in Ed25519 shard signing (#4)
article_to_internal_shard/article_to_sw_shardtake a newsigning_seedand callshard.sign()beforestore.putwhen one is supplied.resolve_signing_seed()readsZEITGHOST_SIGNING_KEYviaspiritwriter.secrets(OS keychain first, then env-var fallback — so the headless us-ny1 builder gets its seed from an env var without a keychain).zeitghost gen-signing-keymints a 32-byte seed, stores it in the keychain (--no-storeto just print), and reports the signer thumbprint (the identityMemoryShard.verify()checks) plus the seed for mirroring to prod.{atoms, scope, origin, …}but not the content-address, soshard_idis unchanged — sign-before-put is safe.2. Load collapses revision chains to latest-per-entity (#2)
load_articles_from_shardspreviously rendered every shard in scope. The lineage machinery (parent_shard_idviabuild_lineage_index) means a re-analyzed article forms a revision chain — so the old loader would surface it as duplicate cards the moment a chain formed. Now it keeps the newest shard per entity (latest-wins bycreated_at, matching howbuild_lineage_indexpicks parents). Latent bug fixed before re-analysis is ever enabled.3.
bias_scoreis skipped, never default-filled (#5)analyze_articledefaulted a missing score to0.5(float(data.get("bias_score", 0.5))) — exactly the silent-mislabel-as-"center" that robustness invariant #1 forbids (the legacy importer already skips NULL scores). Now it returnsNoneand skips. Logic extracted to a testable_parse_bias_scorehelper; a legitimate0.0(full left) still passes through.Cleanup
Factored the repeated "which article is this shard?" lookup into
_entity_of, now shared byknown_url_entities,build_lineage_index, and the new load path — so dedup, lineage chaining, and render-time collapse all key off the same identity.Test plan
python -m pytest tests/— 46 passed (existing) + newtests/test_shard_integrity.pyverify(), unsigned-by-default, latest-per-entity collapse, distinct-entity separation,_parse_bias_score(incl. the0.0edge),resolve_signing_seed(env / absent / malformed-hex / wrong-length)test_shard_metadata_round_tripto assert latest-only loadpython -c "import zeitghost"clean from an empty cwd (invariant Replace coming-soon placeholder with real spiritwriter.ai landing page #2 — no import-time I/O; secrets/crypto imports are all lazy/in-function)zeitghost gen-signing-key --no-storesmoke-testedNot in scope (rides with the trace-emitter follow-up)
TraceEmitterwiring (shard_createdevents,trace_refpopulation, per-runrun_id) and surfacing the signer thumbprint in the card flip-panel. This PR makes the bytes signed and the lineage load-correct; the trace chain + UI come next.🤖 Generated with Claude Code