fix(sandbox/k8s): accumulate canonical stdout from polled deltas to avoid mid-stream window after log rotation by larryro · Pull Request #1917 · tale-project/tale

larryro · 2026-06-22T02:27:35Z

Summary

When kubelet rotates a container log (at containerLogMaxSize, default 10Mi), the Kubernetes log API serves only the current/new file. The previous implementation did a final readPodLog call to get the canonical stdout — after rotation, this returned content from the middle of the stream (the new file's head), not the deterministic first-stdoutMaxBytes bytes.

Root cause: pollRunnerStdout accumulated deltas into the scanner for live progress callbacks, but the canonical stdout at the end was taken from a fresh full readPodLog call that bypassed the spawner-side accumulation.

Fix: Accumulate the canonical stdout spawner-side into logBuf from the same polled deltas, capped at stdoutMaxBytes bytes (using capText). After rotation (logs.length < lastLogLen), logBuf already holds the pre-rotation head; we reset lastLogLen = 0 to continue accumulating from the new file within the remaining cap. The final readPodLog call for canonical stdout is removed — logBuf is used directly.

This is the "accumulate canonical stdout spawner-side from the polled deltas" direction described in the issue, analogous to docker's drainAndCap approach.

Changes

services/sandbox/src/backend/kubernetes/k8s-backend.ts: Add logBuf accumulation in pollRunnerStdout; reset lastLogLen = 0 on rotation; replace the final readPodLog call with const stdout = logBuf.
services/sandbox/src/backend/kubernetes/k8s-backend.test.ts: Add stdout log-rotation pinning test that simulates log shrinkage and verifies the canonical stdout is the pre-rotation head (not the rotated window).

Test plan

bun run --filter @tale/sandbox test — 301 pass, 3 pre-existing failures (all due to @kubernetes/client-node not installed in the test environment, unrelated to this change)
New unit test stdout log-rotation pinning added to k8s-backend.test.ts covering the rotation scenario
Logic verified: with the bug, final stdout would be the rotated file content; with the fix it is the pre-rotation head

Summary by CodeRabbit

New Features
- Added comprehensive test coverage for stdout handling during Kubernetes pod log rotation events.
Bug Fixes
- Fixed incomplete stdout logs during Kubernetes log rotation. Stdout is now reliably preserved across log rotation boundaries using a persistent buffer mechanism, ensuring complete execution logs.

coderabbitai · 2026-06-22T02:33:41Z

📝 Walkthrough

Walkthrough

The K8s backend's stdout collection is refactored from a final full pod-log re-read to a spawner-side delta accumulation strategy. A logBuf string is introduced inside execute() to retain the deterministic stream head. pollRunnerStdout() now slices only new content via lastLogLen, appends each delta into the capped logBuf, and resets lastLogLen to zero on log shrinkage (kubelet rotation) while keeping the pre-rotation content in logBuf. The final full readPodLog call is removed and replaced by direct use of logBuf. Test stubs are updated to forward a { container } parameter, and a new test suite pins the pre-rotation head behavior.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: adding spawner-side canonical stdout accumulation from polled deltas to preserve content across log rotation events.
Description check	✅ Passed	The description follows the template structure with a clear summary linking issue `#1850`, explains root cause and fix, lists specific file changes, and provides a test plan showing 301 passing tests with new rotation test coverage.
Linked Issues check	✅ Passed	The PR fully implements the first proposed solution from `#1850`: accumulating canonical stdout spawner-side from polled deltas (capped at stdoutMaxBytes via capText), detecting rotation via logShrunk, and using logBuf directly instead of re-reading.
Out of Scope Changes check	✅ Passed	All changes align with fixing `#1850`: logBuf accumulation in pollRunnerStdout, lastLogLen reset on rotation, removal of final readPodLog call, and the new rotation-pinning test are scoped to the stdout reconstruction bug.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch tale/wx748par7e2nkqexhkjk9ygtch895z5b

Warning

Billing warning: we have not been able to collect payment for this subscription for more than 72 hours. Please update the payment method or pay any pending invoices in Billing to avoid service interruption.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@services/sandbox/src/backend/kubernetes/k8s-backend.test.ts`:
- Around line 372-403: Remove the inlined client object fixture that contains
type assertions (the `as V1Pod` and `as unknown as CoreV1Api` assertions) and
refactor it to use the existing `stubClient()` helper function with properly
typed fixtures instead. Look for other examples of `stubClient()` usage
elsewhere in the test file to follow the same pattern. This approach will
eliminate the unsafe type assertions while maintaining the same test behavior
and keeping all test stubs assertion-free according to the project's guidelines.

In `@services/sandbox/src/backend/kubernetes/k8s-backend.ts`:
- Around line 555-562: In the final stdout handling block around
pollRunnerStdout and scanner.finalize, add a check after the final poll to
detect if the logs have shrunk (logs.length < lastLogLen indicating a potential
rotation). When a shrink is detected and logBuf is still below capacity, perform
one additional poll or read operation to capture any newly available
post-rotation bytes before calling scanner.finalize(). This ensures that
immediately available bytes after log rotation are not missed when feeding
residual data to the scanner.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: bc5ae332-3494-456c-aafd-0f9968a7bb89

📥 Commits

Reviewing files that changed from the base of the PR and between 5946d86 and 5a7d9e0.

📒 Files selected for processing (2)

services/sandbox/src/backend/kubernetes/k8s-backend.test.ts
services/sandbox/src/backend/kubernetes/k8s-backend.ts

coderabbitai · 2026-06-22T02:33:44Z

+    // oxlint-disable-next-line typescript-eslint/no-unsafe-type-assertion -- test stub
+    const client = {
+      namespace: 'tale-sandbox',
+      core: {
+        createNamespacedSecret: () => Promise.resolve({}),
+        createNamespacedPod: () => Promise.resolve({}),
+        readNamespacedPod: () => {
+          podCallIdx += 1;
+          if (podCallIdx === 1) {
+            // waitForRunnerStart: runner container is Running
+            return Promise.resolve({
+              metadata: {},
+              status: {
+                containerStatuses: [
+                  {
+                    name: 'runner',
+                    state: { running: { startedAt: new Date() } },
+                  },
+                ],
+              },
+            } as V1Pod);
+          }
+          // Main loop: pod Succeeded with harvest terminated
+          return Promise.resolve({
+            metadata: {},
+            status: {
+              phase: 'Succeeded',
+              containerStatuses: [
+                { name: 'harvest', state: { terminated: { exitCode: 0 } } },
+              ],
+            },
+          } as V1Pod);


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Check if the file exists and examine the flagged lines if [ -f "services/sandbox/src/backend/kubernetes/k8s-backend.test.ts" ]; then echo "=== File exists, checking line count ===" wc -l services/sandbox/src/backend/kubernetes/k8s-backend.test.ts echo "" echo "=== Lines 360-410 (context around 372-403) ===" sed -n '360,410p' services/sandbox/src/backend/kubernetes/k8s-backend.test.ts | cat -n else echo "File not found at specified path" fi

Repository: tale-project/tale

Length of output: 2253

🏁 Script executed:

# Search for stubClient helper in the codebase echo "=== Searching for stubClient function ===" rg "stubClient" --type ts --type tsx

Repository: tale-project/tale

Length of output: 149

🏁 Script executed:

# Search for other test fixtures in the same test file echo "=== Searching for fixture patterns in k8s-backend.test.ts ===" rg "const.*=.*\{" services/sandbox/src/backend/kubernetes/k8s-backend.test.ts | head -20

Repository: tale-project/tale

Length of output: 884

🏁 Script executed:

# Look for stubClient implementation and usage in the test file echo "=== Searching for stubClient usage context ===" rg -A 10 "const stub = stubClient" services/sandbox/src/backend/kubernetes/k8s-backend.test.ts | head -50

Repository: tale-project/tale

Length of output: 1529

🏁 Script executed:

# Check line 418 for secondary issue echo "=== Lines 410-425 to check line 418 ===" sed -n '410,425p' services/sandbox/src/backend/kubernetes/k8s-backend.test.ts | cat -n

Repository: tale-project/tale

Length of output: 890

🏁 Script executed:

# Find the stubClient definition/import echo "=== Looking for stubClient import or definition ===" head -50 services/sandbox/src/backend/kubernetes/k8s-backend.test.ts | cat -n

Repository: tale-project/tale

Length of output: 2159

🏁 Script executed:

# Search for stubClient definition echo "=== Looking for stubClient definition ===" rg "function stubClient|const stubClient|export.*stubClient" services/sandbox/src/backend/kubernetes/ -A 15

Repository: tale-project/tale

Length of output: 1837

🏁 Script executed:

# Also check if it's imported or defined in the test file echo "=== Full search for stubClient in k8s-backend.test.ts ===" rg -n "stubClient" services/sandbox/src/backend/kubernetes/k8s-backend.test.ts

Repository: tale-project/tale

Length of output: 280

🏁 Script executed:

# Look for pod() and secret() helper definitions echo "=== Looking for pod() and secret() helper definitions ===" rg -B 5 "function pod|const pod.*=" services/sandbox/src/backend/kubernetes/k8s-backend.test.ts | head -40

Repository: tale-project/tale

Length of output: 234

🏁 Script executed:

# Get the full stubClient definition echo "=== Full stubClient definition ===" sed -n '227,266p' services/sandbox/src/backend/kubernetes/k8s-backend.test.ts | cat -n

Repository: tale-project/tale

Length of output: 1962

🏁 Script executed:

# Get the pod() helper definition echo "=== pod() helper definition ===" rg -A 20 "^function pod\(" services/sandbox/src/backend/kubernetes/k8s-backend.test.ts | head -40

Repository: tale-project/tale

Length of output: 722

Remove type assertions from the test client fixture.

The inlined client object at lines 372-403 uses as V1Pod (twice) and as unknown as CoreV1Api assertions, which violate the coding guideline: **/*.{ts,tsx} — "Never as, never any, never unknown".

Instead of inlining the client, use the existing stubClient() helper with typed fixtures. This pattern is already used throughout the file and keeps test stubs assertion-free:

Refactor to stubClient with typed fixtures

- // oxlint-disable-next-line typescript-eslint/no-unsafe-type-assertion -- test stub - const client = { - namespace: 'tale-sandbox', - core: { - createNamespacedSecret: () => Promise.resolve({}), - createNamespacedPod: () => Promise.resolve({}), - readNamespacedPod: () => { - podCallIdx += 1; - if (podCallIdx === 1) { - return Promise.resolve({ - metadata: {}, - status: { - containerStatuses: [ - { - name: 'runner', - state: { running: { startedAt: new Date() } }, - }, - ], - }, - } as V1Pod); - } - return Promise.resolve({ - metadata: {}, - status: { - phase: 'Succeeded', - containerStatuses: [ - { name: 'harvest', state: { terminated: { exitCode: 0 } } }, - ], - }, - } as V1Pod); - }, - readNamespacedPodLog: ({ container }: { container: string }) => { - if (container === 'harvest') return Promise.resolve(harvestLog); - const log = - runnerLogs[runnerLogIdx] ?? runnerLogs[runnerLogs.length - 1]; - runnerLogIdx += 1; - return Promise.resolve(log ?? ''); - }, - replaceNamespacedSecret: () => Promise.resolve({}), - deleteNamespacedPod: () => Promise.resolve({}), - deleteNamespacedSecret: () => Promise.resolve({}), - listNamespacedPod: () => Promise.resolve({ items: [] }), - listNamespacedSecret: () => Promise.resolve({ items: [] }), - } as unknown as CoreV1Api, - }; + const startedPod: V1Pod = { + metadata: {}, + status: { + containerStatuses: [ + { name: 'runner', state: { running: { startedAt: new Date() } } }, + ], + }, + }; + const succeededPod: V1Pod = { + metadata: {}, + status: { + phase: 'Succeeded', + containerStatuses: [ + { name: 'harvest', state: { terminated: { exitCode: 0 } } }, + ], + }, + }; + + const { core, namespace } = stubClient({ + readPod: () => { + podCallIdx += 1; + return Promise.resolve(podCallIdx === 1 ? startedPod : succeededPod); + }, + readLog: ({ container }) => { + if (container === 'harvest') return Promise.resolve(harvestLog); + const log = + runnerLogs[runnerLogIdx] ?? runnerLogs[runnerLogs.length - 1]; + runnerLogIdx += 1; + return Promise.resolve(log ?? ''); + }, + }); + const client = { namespace, core };

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/sandbox/src/backend/kubernetes/k8s-backend.test.ts` around lines 372 - 403, Remove the inlined client object fixture that contains type assertions (the `as V1Pod` and `as unknown as CoreV1Api` assertions) and refactor it to use the existing `stubClient()` helper function with properly typed fixtures instead. Look for other examples of `stubClient()` usage elsewhere in the test file to follow the same pattern. This approach will eliminate the unsafe type assertions while maintaining the same test behavior and keeping all test stubs assertion-free according to the project's guidelines.

Source: Coding guidelines

coderabbitai · 2026-06-22T02:33:44Z

+      // Final stdout poll (the runner may have emitted more between the last
+      // loop iteration and exit) → feed the residual to the scanner, then
+      // drain it. Use the spawner-side accumulation (logBuf) as the canonical
+      // stdout — it always holds the deterministic head of the stream even
+      // across kubelet log rotations.
      await pollRunnerStdout();
      scanner.finalize();
-      let stdout = '';
-      try {
-        stdout = await readPodLog(this.client, podName, 'runner', {
-          limitBytes: cfg.stdoutMaxBytes,
-        });
-      } catch (err) {
-        console.warn('[sandbox.k8s] final runner log read failed:', err);
-      }
+      const stdout = logBuf;


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Handle final-iteration log shrink with one extra read before finalize.

If the final poll only observes logs.length < lastLogLen, the cursor is reset but no follow-up read happens before scanner.finalize(). That can miss immediately available post-rotation bytes when logBuf is still below cap.

💡 Suggested patch

await pollRunnerStdout(); + if ( + logShrunk && + Buffer.byteLength(logBuf, 'utf8') < cfg.stdoutMaxBytes + ) { + // If shrink is first detected in the final poll, do one extra read so + // we can capture the new file head before finalizing. + await pollRunnerStdout(); + } scanner.finalize(); const stdout = logBuf;

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/sandbox/src/backend/kubernetes/k8s-backend.ts` around lines 555 - 562, In the final stdout handling block around pollRunnerStdout and scanner.finalize, add a check after the final poll to detect if the logs have shrunk (logs.length < lastLogLen indicating a potential rotation). When a shrink is detected and logBuf is still below capacity, perform one additional poll or read operation to capture any newly available post-rotation bytes before calling scanner.finalize(). This ensures that immediately available bytes after log rotation are not missed when feeding residual data to the scanner.

…let log rotation (#1850)

larryro · 2026-06-22T03:02:33Z

Desk Review — PR #1917

VERDICT: READY TO MERGE

CI: all required checks green (Analyze, Browser, Build ×8, Opengrep, Smoke test, Unit, Lint commits, Validate images, Migrations, UI, Scan sandbox — all pass; 5 jobs correctly skip on fork/non-applicable). No red or pending checks.

Tests: bun run --filter @tale/sandbox test — 301 pass, 3 fail. The 3 failures are pre-existing and unrelated to this change (the three @kubernetes/client-node tests that fail because the module is absent from the sandbox CI test environment; all 3 identical failures exist on main). The CI Unit check is green in the proper environment.

What the fix does

pollRunnerStdout now accumulates an incremental logBuf from each polled delta, capped at stdoutMaxBytes via capText. On a kubelet log rotation (logs.length < lastLogLen), logBuf already holds the deterministic pre-rotation head; the code resets lastLogLen = 0 so the new file's content continues accumulating into the remaining cap. The final readPodLog('runner') call is replaced with const stdout = logBuf.

The PR also fixes a secondary bug in the old code that went undocumented in the issue: the old else if branch set logShrunk = true but did not reset lastLogLen. On every subsequent poll after rotation the new file starts from zero and grows slowly — always shorter than the pre-rotation lastLogLen — so logs.length < lastLogLen stays true and logs.length > lastLogLen can never fire. scanner.onStdoutChunk would go permanently silent and no new post-rotation content would reach live-progress callbacks. The lastLogLen = 0 reset at line 466 closes this gap.

Findings by dimension

Correctness — no new bugs. Walked all branches:

Happy path: delta accumulates correctly into logBuf.
Rotation: logShrunk = true + lastLogLen = 0 → correct. Pre-rotation head preserved; new file accumulates within remaining cap.
Multi-rotation: each shrink resets cursor; invariant holds.
Error path: poll failure leaves logBuf/lastLogLen unchanged; next poll catches up.
stdoutStreamTruncated = Buffer.byteLength(stdout) >= stdoutMaxBytes || logShrunk propagates correctly through all four return paths (aborted, runnerDead, harvestDone, harvestMissing).
assemble() re-applies capText(stripPhaseMarkers(stripControlChars(logBuf)), stdoutMaxBytes); since logBuf is already capped and stripping can only shorten it, stdoutCapTrunc is always false — no double-truncation issue.

Two pre-existing issues (not introduced by this PR):

lastLogLen tracks .length (character count) while limitBytes is a byte budget — incommensurable for multi-byte UTF-8. After rotation, lastLogLen = 0 makes the mismatch slightly more load-bearing, but the practical impact is narrow (only matters when the post-rotation file immediately exceeds the byte cap between polls with multi-byte content) and a fix is out of scope here.
Rotation is silently undetected when the new file happens to be ≥ old length at the moment of polling. Pre-existing limitation of length-based detection; not worsened by this PR.

Tests — one gap, not blocking. The new stdout log-rotation pinning test correctly pins the core scenario; the res.stdoutBase64 / res.truncated.stdout assertions are sound. The stubClient signature change (readLog now receives { container }) is backward-compatible with all existing callers.

Minor coverage gap: the test is structured so rotation is always detected in the final pollRunnerStdout() after the loop, never during an in-loop poll. The post-rotation accumulation branch (lastLogLen = 0 → delta from new file flows into logBuf) is never exercised by the test. The code is correct by inspection, and the existing 301-test suite covers no-rotation accumulation, so this is not blocking — worth a follow-up test if the rotation scenario ever sees regressions.

Elegance — cosmetic only. logBuf is declared two lines below its three sibling poll-state variables, separated by its comment block. Convention in the file is to group related let declarations first. The Buffer.byteLength(logBuf, 'utf8') < cfg.stdoutMaxBytes guard allocates a Buffer per poll; capturing capText's returned truncated boolean as a logBufFull sentinel would avoid the redundant allocation. Neither item is blocking at 500 ms poll intervals.

Issue resolution — complete. The fix matches the "accumulate canonical stdout spawner-side" direction in #1850. All readPodLog('runner') call sites verified: the only remaining one is the poll inside pollRunnerStdout itself (correct, it must stay). Consistent with the Docker backend's drainAndCap approach.

The implementation is correct, the secondary live-tail-dark regression is a genuine bonus fix, and CI is fully green. Ready to merge.

larryro · 2026-06-24T08:57:24Z

Superseded by #1915, which fixes the same issue (#1850) with the same approach but a more comprehensive test suite (covers both pre-rotation preservation and post-rotation delta append) and cleaner accumulation logic (canonicalChunks/canonicalByteCount with an explicit byte limit).

Closing this to avoid merging the same fix twice. If you prefer this implementation, reopen and we'll close #1915 instead.

coderabbitai Bot suggested changes Jun 22, 2026

View reviewed changes

fix(sandbox): stdout accumulation avoids mid-stream window after kube…

b68a6ed

…let log rotation (#1850)

larryro force-pushed the tale/wx748par7e2nkqexhkjk9ygtch895z5b branch from 5a7d9e0 to b68a6ed Compare June 22, 2026 02:38

larryro closed this Jun 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(sandbox/k8s): accumulate canonical stdout from polled deltas to avoid mid-stream window after log rotation#1917

fix(sandbox/k8s): accumulate canonical stdout from polled deltas to avoid mid-stream window after log rotation#1917
larryro wants to merge 1 commit into
mainfrom
tale/wx748par7e2nkqexhkjk9ygtch895z5b

larryro commented Jun 22, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 22, 2026

Walkthrough

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 22, 2026

Uh oh!

coderabbitai Bot Jun 22, 2026

Uh oh!

larryro commented Jun 22, 2026

Uh oh!

larryro commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

larryro commented Jun 22, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 22, 2026

Walkthrough

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

larryro commented Jun 22, 2026

Desk Review — PR #1917

What the fix does

Findings by dimension

Uh oh!

larryro commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

larryro commented Jun 22, 2026 •

edited by coderabbitai Bot

Loading