Skip to content

shards: integrity & lineage-correctness (sign, collapse revisions, no default-fill bias)#5

Merged
aaronmarkham merged 4 commits into
mainfrom
claude/shard-integrity-lineage
May 31, 2026
Merged

shards: integrity & lineage-correctness (sign, collapse revisions, no default-fill bias)#5
aaronmarkham merged 4 commits into
mainfrom
claude/shard-integrity-lineage

Conversation

@aaronmarkham

Copy link
Copy Markdown
Owner

Summary

Tightens how zeitghost uses the spiritwriter shard substrate, in three related ways that make the provenance / integrity story the landing page tells actually true for zeitghost's own output. Came out of an audit of the ingest → analyze → shard write/read path against what spiritwriter.fabric exposes.

1. Opt-in Ed25519 shard signing (#4)

  • article_to_internal_shard / article_to_sw_shard take a new signing_seed and call shard.sign() before store.put when one is supplied.
  • resolve_signing_seed() reads ZEITGHOST_SIGNING_KEY via spiritwriter.secrets (OS keychain first, then env-var fallback — so the headless us-ny1 builder gets its seed from an env var without a keychain).
  • Graceful opt-in: no key configured → unsigned shards + a one-line notice, never a failure. CI/dev/local stay green with zero setup.
  • New zeitghost gen-signing-key mints a 32-byte seed, stores it in the keychain (--no-store to just print), and reports the signer thumbprint (the identity MemoryShard.verify() checks) plus the seed for mirroring to prod.
  • Signing covers {atoms, scope, origin, …} but not the content-address, so shard_id is unchanged — sign-before-put is safe.

2. Load collapses revision chains to latest-per-entity (#2)

load_articles_from_shards previously rendered every shard in scope. The lineage machinery (parent_shard_id via build_lineage_index) means a re-analyzed article forms a revision chain — so the old loader would surface it as duplicate cards the moment a chain formed. Now it keeps the newest shard per entity (latest-wins by created_at, matching how build_lineage_index picks parents). Latent bug fixed before re-analysis is ever enabled.

3. bias_score is skipped, never default-filled (#5)

analyze_article defaulted a missing score to 0.5 (float(data.get("bias_score", 0.5))) — exactly the silent-mislabel-as-"center" that robustness invariant #1 forbids (the legacy importer already skips NULL scores). Now it returns None and skips. Logic extracted to a testable _parse_bias_score helper; a legitimate 0.0 (full left) still passes through.

Cleanup

Factored the repeated "which article is this shard?" lookup into _entity_of, now shared by known_url_entities, build_lineage_index, and the new load path — so dedup, lineage chaining, and render-time collapse all key off the same identity.

Test plan

  • python -m pytest tests/ — 46 passed (existing) + new tests/test_shard_integrity.py
  • New tests: signing round-trip + verify(), unsigned-by-default, latest-per-entity collapse, distinct-entity separation, _parse_bias_score (incl. the 0.0 edge), resolve_signing_seed (env / absent / malformed-hex / wrong-length)
  • Updated test_shard_metadata_round_trip to assert latest-only load
  • python -c "import zeitghost" clean from an empty cwd (invariant Replace coming-soon placeholder with real spiritwriter.ai landing page #2 — no import-time I/O; secrets/crypto imports are all lazy/in-function)
  • zeitghost gen-signing-key --no-store smoke-tested

Not in scope (rides with the trace-emitter follow-up)

TraceEmitter wiring (shard_created events, trace_ref population, per-run run_id) and surfacing the signer thumbprint in the card flip-panel. This PR makes the bytes signed and the lineage load-correct; the trace chain + UI come next.

🤖 Generated with Claude Code

aaronmarkham and others added 4 commits May 30, 2026 14:03
…ions, never default-fill bias

Three related fixes to how zeitghost uses spiritwriter's shard substrate:

- Opt-in Ed25519 signing. article_to_*_shard now accept a signing_seed and
  sign before store.put when one is configured; resolve_signing_seed() reads
  ZEITGHOST_SIGNING_KEY via spiritwriter.secrets (keychain → env fallback, so
  the headless us-ny1 builder can use an env var). Unconfigured environments
  write unsigned shards rather than failing. New `zeitghost gen-signing-key`
  mints a seed, stores it in the keychain, and prints the signer thumbprint.
  Signing covers {atoms, scope, origin, …} but not the content-address, so
  shard_id is unchanged.

- load_articles_from_shards collapses each parent_shard_id revision chain to
  its newest shard per entity. Previously every shard in scope was rendered,
  so a re-analyzed article would surface as duplicate cards the moment a chain
  formed — a latent bug that became reachable once re-analysis is enabled.

- analyze_article skips an article whose LLM response lacks a usable
  bias_score instead of defaulting to 0.5, which would mislabel an unscored
  article as "center" (robustness invariant #1). Logic extracted to a testable
  _parse_bias_score helper; a literal 0.0 (full left) still passes through.

Factored the shared entity-key lookup into _entity_of (used by
known_url_entities, build_lineage_index, and the new load path) so dedup,
lineage chaining, and render-time collapse all key off the same identity.

Tests: new tests/test_shard_integrity.py covers signing round-trip + verify,
unsigned-by-default, latest-per-entity collapse, distinct-entity separation,
the bias-score guard, and resolve_signing_seed (env/absent/malformed).
Updated test_shard_metadata_round_trip for the latest-only load behavior.

Not in scope (rides with the trace-emitter PR): shard_superseded trace events,
trace_ref population, and surfacing the signer in the card flip-panel.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rden tests

Addresses PR #5 review:

- gen-signing-key no longer echoes the secret seed after a successful keychain
  store. New --print-seed flag (or --no-store, which must print since it's the
  only copy) reveals it for mirroring to the prod env var; otherwise it stays
  off-screen so it doesn't linger in shell scrollback/history.

- analyze_batch now logs a WARNING with the count of articles dropped for a
  missing/invalid bias_score (threaded via a stats dict from analyze_article).
  The new skip-not-default path silently shrinks the feed by design, so the
  count surfaces an LLM regression; logging reaches the operator's console via
  the CLI RichHandler.

- load_articles_from_shards: comment that equal-timestamp ties go to first-seen,
  consistent with build_lineage_index's comparison.

- ingest only prints the signing-mode line when there's something to write.

- Tests: tampered-signature → verify() raises InvalidSignature; end-to-end
  analyze_article returns None when the LLM response omits bias_score (mocked
  provider). 48 passed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…sible/compose

Follow-up #1 from the PR #5 review: let prod fail-closed when signing is
required but no key is configured, so an accidentally-cleared key surfaces
loudly instead of silently degrading to unsigned shards.

- signing_required(flag) (testable, no Click): True if --require-signing is
  passed or ZEITGHOST_REQUIRE_SIGNING is truthy. Local/CI leave both unset, so
  signing stays opt-in there.
- ingest resolves the seed up front and fails fast (click.ClickException, exit
  1) before spending any NewsAPI quota / Claude calls when required-but-missing.
  The single resolved seed is reused for the write loop (no double lookup).
- env.j2: template ZEITGHOST_SIGNING_KEY from vault_zeitghost_signing_key
  (defaulted '' so the deploy renders before the key is vaulted) and
  ZEITGHOST_REQUIRE_SIGNING from zeitghost_require_signing (default 0).
- docker-compose builder: pass both through (references only; secret stays in
  the 0600 .env Ansible renders).

Safe rollout: require defaults OFF. Provision the vault key, deploy, confirm
shards sign, then flip `zeitghost_require_signing: 1` in the inventory.

Tests: signing_required off-by-default / flag-wins / env truthy+falsy variants.
51 passed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Wires the shard-signing seed through the existing GitHub-secret → ansible
extra-var → env.j2 chain (this repo has no ansible-vault file; vault_* vars are
injected from repo secrets in deploy.yml). env.j2 already defaults the var to
'' so the deploy renders before the ZEITGHOST_SIGNING_KEY repo secret is set.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant