Fix #294: checkpoint exports bundle the Forgejo repo of record#318
Merged
Conversation
Under Compose every checkpoint archive carried an empty repo.bundle: the task-store-server ran without --repo-path, so the export route emitted the chapter 10 §6 zero-byte placeholder (structurally valid, non-resumable). Forgejo is the git remote of record with no canonical local bare repo, so the fix gives the task-store-server its own bare clone, synced lazily per export: - eden-git: extract ensure_local_clone (clone --bare if absent, fetch --prune if present) from the orchestrator's _ensure_repo; both the integrator startup and the new export path share it. - eden-storage: export_checkpoint gains repo_bundle_provider, invoked exactly once AFTER the store snapshot, outside the transaction — roles publish refs before committing rows, so snapshot-then-bundle yields a §12-permitted superset; the old bundle-then-snapshot order could produce an import-rejecting archive. - eden-wire: the export route refreshes + bundles via the provider; a failed remote sync maps to 503 eden://reference-error/checkpoint-repo-unavailable instead of a silently stale/empty bundle. - task-store-server: new --forgejo-url / --credential-helper flags (orchestrator contract) build the refresh callable; startup never touches the remote, so the checkpoint-import receiver posture (postgres + task-store-server, no forgejo) keeps working. - compose.yaml / setup-experiment.sh: task-store-repo bind-mount + credential-helper mount + flags; substrate-surface audit per #178 (smoke.sh existence assertion, compose README, durability doc). - smoke-checkpoint.sh now extracts the bundle, git-bundle-verifies it, and asserts main + >= variant.integrated-count variant/* refs; smoke-auto-checkpoint.sh asserts non-empty bundles in every archive (both previously asserted structural validity only — the gap that let #294 ship). Deferral: Helm-chart parity tracked as #306. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- Export-side §12 self-validation: non-empty provider bundles are checked against the frozen snapshot with the importer's own cross-reference validator, closing the residual snapshot↔fetch ref race; exported_at is stamped at the snapshot instant (§10 anchor). - Bundle-creation failure with a remote of record configured now 503s (checkpoint-repo-unavailable) instead of silently emitting the zero-byte placeholder; the placeholder swallow survives only in the no-remote test-fixture posture. - §14.1 format_version query param enforced: 400 bad-request on unrecognized values. - Provider tests use real git bundles (self-validation rejects unparseable bytes by design); new coverage/failure-mode tests. - Round-0 review record under docs/plans/review/issue-294-checkpoint-bundle/impl/20260611T221618/; non-blocking findings filed as #312 / #313. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…e, fix README auth - _checkpoint.py: condense the export_checkpoint docstring so the function clears the 100-line length gate (the round-0 §12 self-validation prose pushed it to 105); content preserved. - task-store-server/README.md: the Auth section documented the retired --shared-token / §12 scheme; replace with the normative §13 --admin-token bearer scheme (codex round-1 stale-doc finding). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
3497d71 to
c533dfc
Compare
8 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
repo.bundle: thetask-store-serverran without--repo-path, so the export route emitted the chapter 10 §6 zero-byte placeholder. Wire state round-tripped; git history silently did not (structurally valid, but non-resumable). Confirmed by live repro before the fix (0-bytecheckpoint/repo.bundlewhile Forgejo heldmain+work/*+variant/*).--forgejo-url/--credential-helperflags → sharededen_git.ensure_local_clone; clone-on-first-export,fetch --prunethereafter). The sync is lazy (startup never touches the remote, so the import-receiver posture keeps working); a failed sync fails the export loudly with 503eden://reference-error/checkpoint-repo-unavailablerather than emitting a stale/empty bundle.export_checkpointnow takes arepo_bundle_providerinvoked after the store snapshot (outside the transaction), so the bundle is a §12-permitted superset of what the snapshot references; non-empty bundles are self-validated against the frozen snapshot with the importer's own §12 check (export-time failure beats import-time rejection), andexported_atis stamped at the snapshot instant.What this does NOT cover
--repo-path/--forgejo-url, so Kubernetes-substrate exports still carry the empty placeholder. The server-side machinery from this PR is substrate-agnostic; chart wiring only → Helm: wire the checkpoint-export repo (--repo-path/--forgejo-url) on the task-store-server #306.smoke-checkpoint.shPhase 6 compares counts + sorted id-sets, not full per-object field equality → smoke-checkpoint: deepen post-import equality from counts/id-sets to full-object comparison #312.test_loop_respawns_on_subprocess_crash(ideator subprocess, unrelated path) flakes under parallel load → Flaky: test_loop_respawns_on_subprocess_crash fails with BrokenPipeError under parallel pytest load #307. Did not recur in this PR's full-suite run.Fresh-operator walkthrough
bash reference/compose/healthcheck/smoke-checkpoint.sh, which runs setup-experiment → full stack → quiescence →POST /v0/experiments/<id>/checkpoint→ teardown+wipe → import on a fresh receiver.repo.bundle OK: 3247 bytes, 3 variant refs(git bundle verifypasses; ref set carriesrefs/heads/main+ onevariant/*per integrated variant), then the import round-trips wire state. New flags documented inreference/services/task-store-server/README.md; the observability §2.9 operator gap ("empty git bundle under Compose") is replaced with the new posture.Test plan
python3 scripts/check-complexity.py— clean (0 blocking)uv run ruff check .— cleanuv run pyright— 0 errorsuv run pytest -q— 2355 passed, 254 skippedbash reference/compose/healthcheck/smoke-checkpoint.sh— PASS, with the new non-empty-bundle +git bundle verify+ ref-set assertions (extended to close the smoke gap that let Checkpoint archives carry an empty git bundle under Compose (no --repo-path on task-store-server) #294 ship)bash reference/compose/healthcheck/smoke-auto-checkpoint.sh— PASS (asserts non-emptyrepo.bundlein every periodic + terminal archive)Codex-review provenance
Impl-stage records under
docs/plans/review/issue-294-checkpoint-bundle/impl/. Round 0: 3 blocking findings, all addressed (snapshot↔bundle race + lateexported_at→ §12 export-side self-validation + snapshot-instant stamping; silent bundle-failure → 503;format_versionquery unenforced → 400). Round 1: round-0 fixes verified; one stale-doc finding (README--shared-token) fixed.Related issues
🤖 Generated with Claude Code