Skip to content

Thread ctx through v3 envelope, add CtxCopy helper, and detach pebble finalize#893

Open
arreyder wants to merge 3 commits into
mainfrom
ctx-aware-envelope
Open

Thread ctx through v3 envelope, add CtxCopy helper, and detach pebble finalize#893
arreyder wants to merge 3 commits into
mainfrom
ctx-aware-envelope

Conversation

@arreyder
Copy link
Copy Markdown
Contributor

Summary

Stacked on #891. Merge that one first.

WriteEnvelope previously held no context: a caller cancellation during the payload tar/zstd walk had to wait for filepath.Walk to drain the entire Pebble checkpoint before the function returned. Big-tenant checkpoints are GBs; that wait is the gap during which the io.Pipe upload pattern can end up with clean-EOF semantics instead of CloseWithError.

WriteEnvelope now takes ctx as its first parameter. The walk callbacks check ctx.Err() per entry, and each regular-file copy is routed through a new iox.CtxCopy: chunked io.CopyBuffer (4 MiB poll granularity, 32 KiB working buffer held across iterations — one allocation per call, not one per chunk).

registeredStore.save (the Pebble equivalent of C1File.finalize) now runs the Checkpoint + WriteEnvelope + rename sequence on a context detached from the caller via dotc1z.FinalizeTimeout(), inside a uotel.StartWithLink new-root span — same pattern as PR #891's C1File.finalize. This protects the Pebble write path even when the caller is a direct C1ZStore consumer that didn't already detach.

Changes

  • pkg/dotc1z/iox/copy.go (new, new package) — CtxCopy helper. Lives in a leaf subpackage of dotc1z so sibling subpackages (format/v3, engine/pebble, future manager/s3 / synccompactor callers) can import it without circular risk and without pulling in the rest of dotc1z.
  • pkg/dotc1z/format/v3/envelope.goWriteEnvelope(ctx, w, m, payloadDir) signature. Walk callbacks check ctx.Err() per entry and wrap with the in-flight tar entry name (c1z v3: tar_zstd walk canceled at "entry-X": ...) so operators triaging "context canceled" in logs can see exactly which Pebble checkpoint file was being visited. Godoc tightened to say magic+length+manifest bytes are written before the first ctx check, so io.Pipe callers must propagate ctx.Err() via CloseWithError on cancellation.
  • pkg/dotc1z/engine/pebble/register.goregisteredStore.save adopts the detach + new-root pattern. Honors the propagated parent ctx-err label from PR Detach ctx for c1z finalize so caller cancel cannot truncate the artifact #891 so the inner finalize span reports the real caller cancel state.
  • pkg/dotc1z/engine/pebble/adapter_clone_sync.go — passes ctx through to WriteEnvelope.
  • pkg/dotc1z/iox/copy_test.go (new) — 7 unit tests: happy, empty source, cancel-before-start, cancel-mid-stream (with read counter to prove the source is not over-consumed), source-error, dst-error, short-write.
  • pkg/dotc1z/format/v3/envelope_test.go — adds TestWriteEnvelopeCanceledMidWalk (tee-writer fires cancel after the first header bytes hit the writer; asserts wrapped error names the in-flight entry) and TestWriteEnvelopeEmptyPayloadDir.
  • pkg/dotc1z/engine/pebble/registered_store_save_cancel_test.go (new) — pebble symmetry with PR Detach ctx for c1z finalize so caller cancel cannot truncate the artifact #891's TestC1FileCloseSurvivesCanceledCtx. A regression that replaces finalizeCtx with the caller ctx in save() would fail this test.

Deferred to follow-up

writeZstdTar still allocates a fresh zstd.Encoder per save instead of using pkg/dotc1z/pool.go's existing encoder pool. Using the pool from format/v3 requires either moving the pool to a shared subpackage or threading an encoder via a new option — out of scope for this PR.

Test plan

🤖 Generated with Claude Code

@arreyder arreyder force-pushed the ctx-detach-finalize branch from 81d1b03 to 73a98cf Compare May 28, 2026 16:52
@arreyder arreyder force-pushed the ctx-aware-envelope branch from ad6a638 to 1567f72 Compare May 28, 2026 16:58
@linear-code
Copy link
Copy Markdown

linear-code Bot commented May 28, 2026

FDE-244

Base automatically changed from ctx-detach-finalize to main May 29, 2026 00:35
@arreyder arreyder requested review from a team and btipling May 29, 2026 00:35
arreyder and others added 3 commits May 29, 2026 09:02
… dangle it

WriteEnvelope previously held no context: a caller cancellation during
the payload tar/zstd walk had to wait for filepath.Walk to drain the
entire Pebble checkpoint before the function returned. Big-tenant
checkpoints are GBs; that wait is the gap during which the io.Pipe
upload pattern can end up with clean-EOF semantics instead of
CloseWithError.

WriteEnvelope now takes ctx as its first parameter. The walk callbacks
check ctx.Err() per entry, and each regular-file copy is routed through
new CtxCopy: chunked io.CopyN (4 MiB) with a ctx.Done() check between
chunks. Bounded worst-case cancel latency stays well under a second
even on slow disks while keeping the per-chunk select overhead
invisible against typical c1z throughput.

registeredStore.save (the Pebble equivalent of C1File.finalize) now
runs the Checkpoint + WriteEnvelope + rename sequence on a context
detached from the caller via dotc1z.FinalizeTimeout(), inside a
uotel.StartWithLink new-root span. This protects the Pebble write path
even when the caller is a direct C1ZStore consumer that didn't already
detach (the syncer.Close path from PR 1 still detaches once at the
outer layer; double-detach is harmless and gives defense in depth).

Tests cover CtxCopy's four behaviors (happy path, cancel-before-start,
cancel-mid-stream after one chunk, source-error propagation) plus
WriteEnvelope refusing to walk on a cancelled context. The existing
envelope round-trip tests are updated to pass t.Context() through.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…k cancel test, single-alloc copy, error context

Multi-angle review surfaced 10 fix-now items. Nine applied here; one
deferred (encoder-pool reuse for writeZstdTar) because pkg/dotc1z's
pool.go can't be imported from pkg/dotc1z/format/v3 without a
restructure — filed as follow-up.

1. CtxCopy relocated. pkg/dotc1z/format/v3 was the wrong home for a
   generic ctx-aware copy helper; sibling subpackages
   (synccompactor, manager/s3, c1's c1age) would have ended up
   importing format/v3 for an unrelated dep. New home:
   pkg/dotc1z/iox — a leaf subpackage of dotc1z importable by
   anything in the tree without circular risk.

2. Single allocation per CtxCopy call. Old impl called io.CopyN
   per chunk, which means io.Copy allocated a fresh 32 KiB buffer
   every iteration. New impl holds one buffer and uses
   io.CopyBuffer + io.LimitReader. At Lilly's 72 GiB checkpoint
   that's the difference between ~18k 32 KiB allocations vs one,
   per file walked.

3. Error context on cancellation. CtxCopy now wraps ctx.Err()
   with "iox.CtxCopy: copied %d bytes: %w". writeZstdTar/writeTar
   wrap their per-entry ctx.Err with "c1z v3: tar(_zstd) walk
   canceled at %q: %w" so operators triaging "context canceled"
   in logs can see exactly which Pebble checkpoint file the walk
   was visiting.

4. ctxCopyChunkSize unexported. Nothing external needed it and
   exporting locked it into the public API surface.

5. CtxCopy doc fixed: "checks ctx.Done()" → "checks ctx.Err()"
   (matches the implementation).

6. WriteEnvelope godoc tightened. Now explicit that magic, length,
   and manifest bytes are written BEFORE the first ctx check
   inside the walk, so callers using WriteEnvelope with an io.Pipe
   must propagate ctx.Err() via CloseWithError to make downstream
   consumers (e.g. s3manager) abort instead of finalizing a
   truncated upload.

7. TestRegisteredStoreSaveSurvivesCanceledCtx — Pebble symmetry
   with PR 1's TestC1FileCloseSurvivesCanceledCtx. Verifies a
   pre-cancelled or pre-deadlined ctx through Close still produces
   a durable, readable v3 envelope on disk.

8. TestWriteEnvelopeCanceledMidWalk — covers the mid-walk cancel
   case the prior test couldn't (it only handled pre-call cancel).
   Uses a teeWriter + cancelOnWriteOnce to fire the cancel after
   the very first header bytes hit the writer, then asserts the
   wrapped error names the in-flight tar entry.

9. CtxCopy edge cases — TestCtxCopyEmptySource (degenerate
   zero-byte file in payloadDir), TestCtxCopyPropagatesDstWriteError,
   TestCtxCopyPropagatesDstShortWrite. Plus
   TestWriteEnvelopeEmptyPayloadDir for the "Pebble checkpoint
   with no SSTs" round-trip.

10. Two pre-existing unused //nolint:gosec directives in the tar
    walkers were now flagged by nolintlint because the lines moved;
    removed them.

Deferred follow-up:
- writeZstdTar still allocates a fresh zstd.Encoder per save.
  pkg/dotc1z/pool.go has getEncoder/putEncoder for exactly this
  case; using it from format/v3 needs either a shared subpackage
  for the pool or an injection-based API. Out of scope for this
  review pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…filenames

PR 893 CI failures, fixed:
- gosec G122 fires under CI's pinned golangci-lint v2.12.2 on the
  os.Open(path) inside both writeTar and writeZstdTar walk callbacks.
  Local v2.8.0 doesn't have G122 yet, so nolintlint there reports
  the directive as unused. CI is authoritative.
- TestWriteEnvelopeCanceledMidWalk built filenames with
  string(rune('a'+i)) which generates Windows-invalid characters
  ('|', '{', '}', '~') at i >= 27. Switch to fmt.Sprintf("entry-%02d",
  i) so go-test on windows-latest can create the inputs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@arreyder arreyder force-pushed the ctx-aware-envelope branch from 1567f72 to 3bcd4cd Compare May 29, 2026 14:05
Comment thread pkg/dotc1z/iox/copy.go
Comment on lines +48 to +50
if errors.Is(err, io.EOF) {
return total, nil
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: io.CopyBuffer never returns io.EOF from a source-side EOF — it swallows it and returns (n, nil). The only way this check triggers is if dst.Write() returns io.EOF as a write error, in which case this would incorrectly treat it as successful completion. The check is effectively dead code today (the n < ctxCopyChunkSize branch at line 59 handles normal source exhaustion), but if you want to keep a defensive guard, consider checking err == io.EOF (identity, not wrapped) to avoid swallowing a wrapped writer error.

@github-actions
Copy link
Copy Markdown
Contributor

General PR Review: Thread ctx through v3 envelope, add CtxCopy helper, and detach pebble finalize

Blocking Issues: 0 | Suggestions: 1 | Threads Resolved: 0
Review mode: full
View review run

Review Summary

This PR threads context.Context through WriteEnvelope and its tar-walk internals so that caller cancellation surfaces as an error mid-walk rather than silently producing a truncated envelope. The new iox.CtxCopy helper provides chunked context-aware copying, and registeredStore.save now detaches from the caller's context with FinalizeTimeout() — mirroring the C1File pattern from PR #891. The implementation is clean, all callers are updated, and test coverage is thorough (cancel-before, cancel-mid, empty payload, and the Pebble save-survives-cancel regression test).

Security Issues

None found.

Correctness Issues

None found.

Suggestions

  • pkg/dotc1z/iox/copy.go:48-50: The errors.Is(err, io.EOF) check is dead code under normal io.CopyBuffer behavior (it swallows source EOF). Could mask a hypothetical dst.Write() returning io.EOF.
Prompt for AI agents
Verify each finding against the current code and only fix it if needed.

## Suggestions

In `pkg/dotc1z/iox/copy.go`:
- Around line 48-50: The `errors.Is(err, io.EOF)` guard after `io.CopyBuffer` is dead code for
  normal source EOF (io.CopyBuffer swallows it and returns nil). The only trigger is if
  `dst.Write()` returns `io.EOF` as a write error, which this code would incorrectly treat as
  success. Either remove the check entirely (the `n < ctxCopyChunkSize` branch at line 59
  already handles normal source exhaustion) or narrow it to identity comparison `err == io.EOF`
  to avoid masking a wrapped writer error.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blocking issues found.

arreyder added a commit that referenced this pull request May 30, 2026
…ally bounds the loop (#899)

* synccompactor: pass runCtx into the doOneCompaction loop

Compact() built runCtx = context.WithTimeout(ctx, c.runDuration) but the
loop body called doOneCompaction(ctx, ...). The deadline only gated the
pre-flight select; individual compactions ran under the unbounded parent
ctx and could blow past c.runDuration.

This is the sibling of the bug fixed in #893 (WriteEnvelope walking under
an unbounded context): a deadline-bearing context exists but the inner
work uses a different one, so the deadline dangles instead of bounding
the work.

Pass runCtx into doOneCompaction so c.runDuration actually limits the
loop, and when runCtx fires due to its deadline (rather than parent
cancellation), surface the same clean "compaction run duration has
expired" message the pre-flight check already uses — instead of letting
a bare context.DeadlineExceeded bubble out of inner SQLite operations.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(synccompactor): cover runDuration-expired and parent-cancel disambiguation

TestCompactorRunDurationExpired locks in the user-visible contract that
when c.runDuration expires (with parent ctx still alive), Compact returns
the clean "compaction run duration has expired" error wrapping
context.DeadlineExceeded.

TestCompactorParentCtxCancelDoesNotMaskAsRunDuration locks in the
disambiguation between runDuration expiry and parent ctx cancellation —
a parent-cancelled runCtx must surface as the raw cause (context.Canceled)
and must NOT be misreported as runDuration expiry. Without the disambiguation
either branch could silently regress and mislabel cancellation as a deadline.

Both tests reuse the existing makeEmptySync helper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant