Skip to content

Fix #294: checkpoint exports bundle the Forgejo repo of record#318

Merged
ealt merged 3 commits into
mainfrom
impl/issue-294-checkpoint-bundle
Jun 16, 2026
Merged

Fix #294: checkpoint exports bundle the Forgejo repo of record#318
ealt merged 3 commits into
mainfrom
impl/issue-294-checkpoint-bundle

Conversation

@ealt

@ealt ealt commented Jun 16, 2026

Copy link
Copy Markdown
Owner

Summary

  • Why: under Compose, every checkpoint archive — manual and auto alike — carried an empty repo.bundle: the task-store-server ran without --repo-path, so the export route emitted the chapter 10 §6 zero-byte placeholder. Wire state round-tripped; git history silently did not (structurally valid, but non-resumable). Confirmed by live repro before the fix (0-byte checkpoint/repo.bundle while Forgejo held main + work/* + variant/*).
  • The fix: the task-store-server keeps its own bare clone, synced from Forgejo per export (new --forgejo-url / --credential-helper flags → shared eden_git.ensure_local_clone; clone-on-first-export, fetch --prune thereafter). The sync is lazy (startup never touches the remote, so the import-receiver posture keeps working); a failed sync fails the export loudly with 503 eden://reference-error/checkpoint-repo-unavailable rather than emitting a stale/empty bundle.
  • Ordering (chapter 10 §6): export_checkpoint now takes a repo_bundle_provider invoked after the store snapshot (outside the transaction), so the bundle is a §12-permitted superset of what the snapshot references; non-empty bundles are self-validated against the frozen snapshot with the importer's own §12 check (export-time failure beats import-time rejection), and exported_at is stamped at the snapshot instant.

What this does NOT cover

Fresh-operator walkthrough

  • Walkthrough performed against the changed operator surface (the checkpoint-export flow under Compose) via bash reference/compose/healthcheck/smoke-checkpoint.sh, which runs setup-experiment → full stack → quiescence → POST /v0/experiments/<id>/checkpoint → teardown+wipe → import on a fresh receiver.
  • Notes: passed cleanly. The new assertion confirms the operator-visible payload is now correct — repo.bundle OK: 3247 bytes, 3 variant refs (git bundle verify passes; ref set carries refs/heads/main + one variant/* per integrated variant), then the import round-trips wire state. New flags documented in reference/services/task-store-server/README.md; the observability §2.9 operator gap ("empty git bundle under Compose") is replaced with the new posture.

Test plan

  • python3 scripts/check-complexity.py — clean (0 blocking)
  • uv run ruff check . — clean
  • uv run pyright — 0 errors
  • markdownlint (CI-pinned) — 0 errors
  • full pre-push gate — all checks clean
  • uv run pytest -q2355 passed, 254 skipped
  • bash reference/compose/healthcheck/smoke-checkpoint.shPASS, with the new non-empty-bundle + git bundle verify + ref-set assertions (extended to close the smoke gap that let Checkpoint archives carry an empty git bundle under Compose (no --repo-path on task-store-server) #294 ship)
  • bash reference/compose/healthcheck/smoke-auto-checkpoint.sh — PASS (asserts non-empty repo.bundle in every periodic + terminal archive)

Codex-review provenance

Impl-stage records under docs/plans/review/issue-294-checkpoint-bundle/impl/. Round 0: 3 blocking findings, all addressed (snapshot↔bundle race + late exported_at → §12 export-side self-validation + snapshot-instant stamping; silent bundle-failure → 503; format_version query unenforced → 400). Round 1: round-0 fixes verified; one stale-doc finding (README --shared-token) fixed.

Related issues

🤖 Generated with Claude Code

@ealt ealt enabled auto-merge (squash) June 16, 2026 20:10
ealt and others added 3 commits June 16, 2026 13:11
Under Compose every checkpoint archive carried an empty repo.bundle:
the task-store-server ran without --repo-path, so the export route
emitted the chapter 10 §6 zero-byte placeholder (structurally valid,
non-resumable). Forgejo is the git remote of record with no canonical
local bare repo, so the fix gives the task-store-server its own bare
clone, synced lazily per export:

- eden-git: extract ensure_local_clone (clone --bare if absent, fetch
  --prune if present) from the orchestrator's _ensure_repo; both the
  integrator startup and the new export path share it.
- eden-storage: export_checkpoint gains repo_bundle_provider, invoked
  exactly once AFTER the store snapshot, outside the transaction —
  roles publish refs before committing rows, so snapshot-then-bundle
  yields a §12-permitted superset; the old bundle-then-snapshot order
  could produce an import-rejecting archive.
- eden-wire: the export route refreshes + bundles via the provider; a
  failed remote sync maps to 503
  eden://reference-error/checkpoint-repo-unavailable instead of a
  silently stale/empty bundle.
- task-store-server: new --forgejo-url / --credential-helper flags
  (orchestrator contract) build the refresh callable; startup never
  touches the remote, so the checkpoint-import receiver posture
  (postgres + task-store-server, no forgejo) keeps working.
- compose.yaml / setup-experiment.sh: task-store-repo bind-mount +
  credential-helper mount + flags; substrate-surface audit per #178
  (smoke.sh existence assertion, compose README, durability doc).
- smoke-checkpoint.sh now extracts the bundle, git-bundle-verifies it,
  and asserts main + >= variant.integrated-count variant/* refs;
  smoke-auto-checkpoint.sh asserts non-empty bundles in every archive
  (both previously asserted structural validity only — the gap that
  let #294 ship).

Deferral: Helm-chart parity tracked as #306.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- Export-side §12 self-validation: non-empty provider bundles are
  checked against the frozen snapshot with the importer's own
  cross-reference validator, closing the residual snapshot↔fetch ref
  race; exported_at is stamped at the snapshot instant (§10 anchor).
- Bundle-creation failure with a remote of record configured now 503s
  (checkpoint-repo-unavailable) instead of silently emitting the
  zero-byte placeholder; the placeholder swallow survives only in the
  no-remote test-fixture posture.
- §14.1 format_version query param enforced: 400 bad-request on
  unrecognized values.
- Provider tests use real git bundles (self-validation rejects
  unparseable bytes by design); new coverage/failure-mode tests.
- Round-0 review record under
  docs/plans/review/issue-294-checkpoint-bundle/impl/20260611T221618/;
  non-blocking findings filed as #312 / #313.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…e, fix README auth

- _checkpoint.py: condense the export_checkpoint docstring so the
  function clears the 100-line length gate (the round-0 §12
  self-validation prose pushed it to 105); content preserved.
- task-store-server/README.md: the Auth section documented the
  retired --shared-token / §12 scheme; replace with the normative
  §13 --admin-token bearer scheme (codex round-1 stale-doc finding).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@ealt ealt force-pushed the impl/issue-294-checkpoint-bundle branch from 3497d71 to c533dfc Compare June 16, 2026 20:12
@ealt ealt merged commit 03614ef into main Jun 16, 2026
23 of 25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Checkpoint archives carry an empty git bundle under Compose (no --repo-path on task-store-server)

1 participant