Skip to content

plan(tracker): e2e tests for CLI + composed-state lifecycle#10

Draft
dordor12 wants to merge 2 commits into
mainfrom
claude/plan-tracker-e2e-tests-Ne9yi
Draft

plan(tracker): e2e tests for CLI + composed-state lifecycle#10
dordor12 wants to merge 2 commits into
mainfrom
claude/plan-tracker-e2e-tests-Ne9yi

Conversation

@dordor12

@dordor12 dordor12 commented May 3, 2026

Copy link
Copy Markdown
Owner

Summary

Implementation plan for tracker/test/e2e/. Two test surfaces, scoped to what the tracker actually exposes today:

  • Binary CLIgo build the tracker, exec version and config validate via os/exec, assert the spec §3.3 exit-code matrix (0 / 1 / 2 / 3) through real os.Exit.
  • Composed-state lifecycle — load YAML → open SQLite at cfg.Ledger.StoragePath → open the ledger orchestrator → seed the registry. Drives a starter-grant → SignedBalance round-trip, registry candidate matching against a cfg.Broker-derived filter, and chain-integrity across orchestrator restart. Same wiring path internal/server.Run will follow when that subsystem lands.

Seven tasks, one commit each, no production-code changes. Wall-time budget < 3s for the full e2e package.

Out of scope (deferred to plans that land alongside those subsystems)

  • Real listener / RPC e2e (internal/server, internal/api, internal/session)
  • Broker selection scoring (internal/broker)
  • Admin HTTP API (internal/admin)
  • Federation peer protocol (internal/federation)
  • STUN/TURN, reputation freeze list

When the listener subsystem lands, helpers_test.go gains a dialPlugin helper and the lifecycle tests grow from "compose modules in-process" to "drive modules through the wire" — same scenarios, larger surface.

Test plan

This PR ships the plan only; no executable tests yet. Reviewing for plan correctness:

  • File map matches tracker/test/e2e/ directory layout (currently empty)
  • Helpers reference real exports: config.Load, ledger.Open/WithClock/Close, storage.Open, registry.New/DefaultShardCount/Filter, Ledger.IssueStarterGrant/SignedBalance/AssertChainIntegrity/Tip
  • Exit-code matrix matches docs/superpowers/specs/tracker/2026-04-25-tracker-internal-config-design.md §3.3
  • Fixtures self-contained under tracker/test/e2e/testdata/ (no cross-package testdata coupling)
  • make -C tracker test and make -C tracker lint green checks included in Task 7

Generated by Claude Code

claude added 2 commits May 3, 2026 14:56
Lands the implementation plan for tracker/test/e2e/. Two surfaces:
binary CLI (go build + os/exec, version + config validate exit-code
matrix) and composed-state lifecycle (config -> ledger -> registry
along the wiring path internal/server.Run will eventually follow:
starter grant -> SignedBalance round-trip, registry candidate matching
against config-derived filter, chain integrity across orchestrator
restart). Seven tasks, one commit each, no production-code changes.
Listener / broker / admin / federation e2e are explicitly deferred to
plans that land alongside those subsystems.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a "Network surface — deferred test shapes" section sketching the
three e2e shapes the eventual internal/stunturn plan should land
against: loopback STUN binding, loopback TURN relay round-trip + per-
seeder rate limit, and a build-tagged netns NAT-simulation matrix for
the spec §11 hole-punching acceptance. Updates the out-of-scope bullet
and table of contents to point at the new section.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dordor12 added a commit that referenced this pull request May 6, 2026
)

* docs(plans): tracker admission persistence — initial draft (tasks 1-5 full TDD)

Plan 3 of the admission subsystem trilogy. Covers admission.tlog with
CRC32C framing + batched/sync fsync + 1 GiB rotation; snapshot file
format with magic 0xADMSNAP1 + atomic write; StartupReplay with
snapshot fallback; 11 admin handlers; ~20 Prometheus metrics; and
acceptance hardening (§10 #9-20).

Tasks 1-5 are full TDD with verbatim test/impl/commit blocks.
Tasks 6-15 are scoped outlines — full TDD code follows in subsequent
commits before execution begins, matching the plan 2 expand-first
pattern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): tracker admission persistence — expand tasks 6-8 to full TDD

Tasks 6-8 now have verbatim test/impl/run/commit blocks matching the
plan-2 expansion pattern:
  - Task 6: snapshot emitter goroutine + retention pruning
  - Task 7: StartupReplay (snapshot fallback + tlog replay + ledger
    cross-check + degraded mode)
  - Task 8: OnLedgerEvent → tlog persist-then-apply ordering, replay
    suppression flag

Tasks 9-15 still outline-only — expansion continues in subsequent
commits before execution begins.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): tracker admission persistence — expand tasks 9-11 to full TDD

Tasks 9-11 now have verbatim test/impl/run/commit blocks:
  - Task 9: §7.1-7.4 failure-mode integration + degraded-mode behavior
  - Task 10: 11 admin handlers, RegisterMux, BasicAuthGuard
  - Task 11: writeOperatorOverride helper + operator-context key

Tasks 12-15 still outline-only — expansion continues in subsequent
commits before execution begins.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): tracker admission persistence — expand tasks 12-15 to full TDD

Tasks 12-15 now have verbatim test/impl/run/commit blocks:
  - Task 12: Prometheus Collector + ~20 metrics catalogue
  - Task 13: §10 #9-12 persistence/recovery acceptance
  - Task 14: §10 #13-16 performance acceptance + benchmarks
  - Task 15: §10 #17-20 security acceptance + final integration

Plan 3 is now complete: 15 tasks, every step in red→green→commit
form, ready for execution starting with Task 1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(tracker/admission): TLogRecord framing + CRC32C

Lands the on-disk frame format from admission-design §4.3:
  length(4) | seq(8) | ts(8) | kind(1) | payload | crc32c(4)

CRC32C uses stdlib hash/crc32 with the Castagnoli polynomial
(0x82f63b78). Tests pin the table against crc32.MakeTable so a future
import-path change can't silently switch us off Castagnoli.

unmarshal returns sentinel ErrTLogTruncated vs ErrTLogCorrupt so replay
can distinguish "trailing partial frame from a crash" from "real CRC
mismatch on a complete frame" — the former heals by truncating to the
last good record; the latter surfaces to the operator.

Includes the full kind enum (settlement, dispute_filed, dispute_resolved,
heartbeat_bucket_roll, snapshot_mark, operator_override, transfer,
starter_grant). TLogKindDispute aliases TLogKindDisputeFiled so Task 8's
persistEvent wiring can route both filed and upheld disputes through one
write path.

No I/O yet — pure bytes-in/bytes-out. The writer + rotation lands in
the next task; OnLedgerEvent integration in a follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(tracker/admission): tlog payload types per kind

Each TLogKind carries a typed payload — SettlementPayload, DisputePayload,
SnapshotMarkPayload, OperatorOverridePayload, TransferPayload,
StarterGrantPayload. Marshal/unmarshal pairs use big-endian fixed-width
encoding for primitives + length-prefixed bytes for variable-length
fields (operator_id, action, params, snapshot path).

Each type implements MarshalBinary / UnmarshalBinary so persistEvent
(Task 8) and applyTLogRecord (Task 7) can route generically.

OperatorOverridePayload also carries a Ts unix-second field so audit
records survive a clock skew in admin tooling.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(tracker/admission): tlog writer — batched fsync + dispute sync + rotation

tlogWriter wraps the active admission.tlog file with three concerns:
  - per-Append routing by kind (disputes synchronous, others batched)
  - flushInterval ticker driving Sync() on the batched soft-state
  - size-triggered rotation at rotationBytes (production default 1 GiB)

Disputes have no ledger backing if lost (admission-design §4.3 "stricter
durability"), so they pay the per-write fsync cost. Settlements / transfers
/ starter_grants / heartbeat-rolls are recoverable from ledger replay,
so they ride the periodic batch.

LastSeq() returns the highest seq observed; the snapshot emitter (Task 6)
uses it to stamp snapshot files.

Open-then-Append-to-existing test pins behavior on restart; concurrent-
append test exercises the mutex under -race.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(tracker/admission): tlog reader + file enumeration

readTLogFile parses one tlog file end-to-end with two distinct error
shapes:
  - ErrTLogTruncated at tail: silently healed (post-crash state).
    Caller takes lastGoodOffset as the true file end.
  - ErrTLogCorrupt mid-file: propagated to the operator (admission-
    design §7.2). Pre-corruption records are still returned so replay
    can apply them before halting.

enumerateTLogFiles returns rotated files in seq order followed by the
active file. Tests cover out-of-order naming on disk so the sort
matters.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(tracker/admission): snapshot file format + atomic write/read

Per admission-design §4.3:
  magic(4) | format_version(4) | seq(8) | ts(8) |
  consumers_count(4) | repeated ConsumerState |
  seeders_count(4) | repeated SeederState |
  trailer_crc32(4)

Magic 0xADMSNAP1 encoded as ASCII 'A','D','M','S' (0x41444D53);
format_version handles the numeric suffix.

Atomic write via <path>.tmp + fsync + rename. Read validates magic,
format_version, and trailer CRC; any failure surfaces an error so
StartupReplay (Task 7) can fall back to the next-older snapshot.

ConsumerState encodes FirstSeenAt + LastBalanceSeen + 30 day-buckets
each for {settlement, dispute, flow}. SeederState encodes 10
MinuteBuckets + heartbeat metadata.

The 600s emit goroutine + retention pruning lands in Task 6;
load-into-Subsystem during replay lands in Task 7.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(tracker/admission): periodic snapshot emitter + retention pruning

Per admission-design §4.3 + §6.5:
  - runSnapshotEmitOnce: writeSnapshot at current tlog.LastSeq(), prune
    to SnapshotsRetained, append TLogKindSnapshotMark to the tlog.
  - snapshotState: per-shard deep-copy under RLock so live mutation
    cannot tear a write.

WithSnapshotPrefix / WithTLogPath / WithSnapshotsRetained Options let
tests drive the cycle without real timers; runSnapshotEmitOnce is the
test seam.

Also fixes a latent race in events.go: applySettlement / applyTransfer /
applyStarterGrant / applyDispute now hold the per-shard mutex during
mutation, so concurrent snapshotState reads under RLock observe a
consistent state. The race was masked by plan-2 tests that didn't
exercise concurrent observers.

Open() opens the tlog writer when WithTLogPath is set; Close() shuts it
down after the aggregator goroutine exits.

The ticker-driven background goroutine + auto-replay land in Task 7's
StartupReplay wiring; this task's runSnapshotEmitOnce is the test seam
both rely on.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(tracker/admission): StartupReplay + OnLedgerEvent persistence

Task 7 — StartupReplay (admission-design §5.7):
  - Walk newest→oldest snapshots; first that loads is applied.
  - All-corrupt → degraded mode (decisions still flow).
  - Empty/no-prefix → clean first-boot (NOT degraded).
  - Replay tlog records with seq > snapshot.seq; mid-file CRC halts
    and surfaces ErrTLogCorrupt; trailing-frame truncation heals.
  - LedgerSource cross-check fills any gap between local tlog and
    authoritative ledger. v1 default is null.

Task 8 — OnLedgerEvent → tlog write-through:
  Each branch persists its kind-specific payload before mutating
  in-memory state. Disputes get synchronous fsync via the writer's
  kind-routing; settlements / transfers / starter_grants ride the
  batched fsync. s.replaying suppresses tlog writes during
  StartupReplay so applyTLogRecord doesn't double-write.

WithLedgerSource / WithSkipAutoReplay / DegradedMode /
SnapshotLoadFailures / TLogCorruptions Options + accessors used by
Task 9 metrics + Task 12 Collector.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(tracker/admission): §7.1-7.4 failure-mode + degraded-mode integration

Pins admission-design §7.x failure scenarios:
  §7.1 Crash mid-OnLedgerEvent → ledger cross-check fills tlog gap.
  §7.2 Mid-tlog corruption → replay halts, decisions still flow,
       TLogCorruptions counter bumps.
  §7.3 Snapshot corruption → fall back to next-older, counter bumps.
  §7.4 All snapshots corrupt → DegradedMode active, decisions still flow.

The accessors + counters that these tests verify (DegradedMode,
SnapshotLoadFailures, TLogCorruptions) land in Task 7's replay.go;
this commit only adds the integration tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(tracker/admission): admin handlers + operator override audit trail

Per admission-design §9.1 (admin) and §4.3 (audit):

11 admin routes under /admission/, each gated by a MuxGuard. BasicAuthGuard
ships as the canonical bearer-token middleware; tests inject a fake
validator. RegisterMux mounts everything on a caller-supplied http.ServeMux
so the tracker control-plane plan can wire admission into a real listener.

  GET    /status                              queue + supply snapshot
  GET    /queue                               ranked queue contents
  GET    /consumer/{id}                       signals + composite score
  GET    /seeder/{id}                         heartbeat + headroom
  POST   /queue/drain                         body {n}, OPERATOR_OVERRIDE
  POST   /queue/eject/{request_id}            OPERATOR_OVERRIDE
  POST   /snapshot                            force runSnapshotEmitOnce
  POST   /recompute/{consumer_id}             re-derive (queued)
  GET    /peers/blocklist                     hex-encoded peer IDs
  POST   /peers/blocklist/{peer_id}           OPERATOR_OVERRIDE
  DELETE /peers/blocklist/{peer_id}           OPERATOR_OVERRIDE

writeOperatorOverride centralizes the audit-trail write: every mutating
admin handler appends a TLogKindOperatorOverride record carrying
{operator_id, action, params, ts}. operator_id flows from the request
context (WithOperatorContext) or "anonymous" when missing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(tracker/admission): Prometheus Collector — ~20 metrics

Per admission-design §9.2:
  Hot-path:    decisions_total{result}, queue_depth, pressure,
               supply_total_headroom, demand_rate_ewma,
               decision_duration_seconds (histogram)
  Attestations: attestations_issued_total, validation_failures{stage},
                attestation_age_seconds, trial_tier_decisions_total
  Persistence:  tlog_replay_gap_entries, tlog_corruption_records_total,
                snapshot_load_failures_total, snapshot_emit_failures_total,
                degraded_mode_active (dynamic gauge)
  Operational:  clock_jump_detected_total{direction},
                fetchheadroom_timeouts_total,
                rejections_total{reason},
                pressure_threshold_crossing_total{direction},
                seeders_contributing

Decide bumps decisions_total, rejections_total{reason},
queue_depth, decision_duration_seconds. publishSupply mirrors
pressure / supply_total_headroom / seeders_contributing.

Collector() returns a composite Collector ready to register with the
tracker control-plane's metrics registry. Adds prometheus/client_golang
v1.23.2 to tracker module deps.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(tracker/admission): §10 #9-20 acceptance — persistence + perf + security

Tasks 13, 14, 15 land the spec's §10 acceptance harness in three files:

acceptance_persistence_test.go (§10 #9-12):
  #9  Crash mid-OnLedgerEvent → ledger cross-check fills tlog gap.
  #10 tlog mid-record corruption → replay halts, decisions still flow,
      TLogCorruptions counter bumps.
  #11 Latest snapshot deleted → next-older loads + tlog catches up,
      400-entry recovery < 30s.
  #12 All snapshots corrupted → DegradedMode + decisions still flow.

perf_bench_test.go (§10 #13-16):
  #13 BenchmarkDecide_NoAttestation + TestPerformance_S10_13 pin
      Decide latency: avg < 1ms.
  #14 Sustained 500 decides at low pressure keeps queue drained.
  #15 SupplySnapshot updates within aggregator-tick window.
  #16 tlog write rate is 1:1 with ledger event rate.

acceptance_security_test.go (§10 #17-20):
  #17 Forged attestation (body tampered post-sign) → score falls back
      to TrialTierScore.
  #18 Ejected peer's attestation discarded; consumer falls through.
  #19 Inflated peer score clamped at MaxAttestationScoreImported.
  #20 /admission/queue returns 401 without operator token, 200 with.

helpers_test.go grows allowAllPeerSet / rejectAllPeerSet fixtures and
signedTestAttestation{,WithScore} for §10 #17-19. openTempSubsystem
now takes testing.TB so benchmarks can call it.

Each test name + leading comment mirrors the §10 spec language so
the binding from spec-line to test is searchable.

Coverage: 82.8% of statements; race-clean across 10 repeat runs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants