Skip to content

fix(sandbox/k8s): accumulate canonical stdout from polled deltas to avoid mid-stream window after log rotation#1917

Closed
larryro wants to merge 1 commit into
mainfrom
tale/wx748par7e2nkqexhkjk9ygtch895z5b
Closed

fix(sandbox/k8s): accumulate canonical stdout from polled deltas to avoid mid-stream window after log rotation#1917
larryro wants to merge 1 commit into
mainfrom
tale/wx748par7e2nkqexhkjk9ygtch895z5b

Conversation

@larryro

@larryro larryro commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator

Summary

Fixes #1850.

When kubelet rotates a container log (at containerLogMaxSize, default 10Mi), the Kubernetes log API serves only the current/new file. The previous implementation did a final readPodLog call to get the canonical stdout — after rotation, this returned content from the middle of the stream (the new file's head), not the deterministic first-stdoutMaxBytes bytes.

Root cause: pollRunnerStdout accumulated deltas into the scanner for live progress callbacks, but the canonical stdout at the end was taken from a fresh full readPodLog call that bypassed the spawner-side accumulation.

Fix: Accumulate the canonical stdout spawner-side into logBuf from the same polled deltas, capped at stdoutMaxBytes bytes (using capText). After rotation (logs.length < lastLogLen), logBuf already holds the pre-rotation head; we reset lastLogLen = 0 to continue accumulating from the new file within the remaining cap. The final readPodLog call for canonical stdout is removed — logBuf is used directly.

This is the "accumulate canonical stdout spawner-side from the polled deltas" direction described in the issue, analogous to docker's drainAndCap approach.

Changes

  • services/sandbox/src/backend/kubernetes/k8s-backend.ts: Add logBuf accumulation in pollRunnerStdout; reset lastLogLen = 0 on rotation; replace the final readPodLog call with const stdout = logBuf.
  • services/sandbox/src/backend/kubernetes/k8s-backend.test.ts: Add stdout log-rotation pinning test that simulates log shrinkage and verifies the canonical stdout is the pre-rotation head (not the rotated window).

Test plan

  • bun run --filter @tale/sandbox test — 301 pass, 3 pre-existing failures (all due to @kubernetes/client-node not installed in the test environment, unrelated to this change)
  • New unit test stdout log-rotation pinning added to k8s-backend.test.ts covering the rotation scenario
  • Logic verified: with the bug, final stdout would be the rotated file content; with the fix it is the pre-rotation head

Summary by CodeRabbit

  • New Features

    • Added comprehensive test coverage for stdout handling during Kubernetes pod log rotation events.
  • Bug Fixes

    • Fixed incomplete stdout logs during Kubernetes log rotation. Stdout is now reliably preserved across log rotation boundaries using a persistent buffer mechanism, ensuring complete execution logs.

@coderabbitai

coderabbitai Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

The K8s backend's stdout collection is refactored from a final full pod-log re-read to a spawner-side delta accumulation strategy. A logBuf string is introduced inside execute() to retain the deterministic stream head. pollRunnerStdout() now slices only new content via lastLogLen, appends each delta into the capped logBuf, and resets lastLogLen to zero on log shrinkage (kubelet rotation) while keeping the pre-rotation content in logBuf. The final full readPodLog call is removed and replaced by direct use of logBuf. Test stubs are updated to forward a { container } parameter, and a new test suite pins the pre-rotation head behavior.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: adding spawner-side canonical stdout accumulation from polled deltas to preserve content across log rotation events.
Description check ✅ Passed The description follows the template structure with a clear summary linking issue #1850, explains root cause and fix, lists specific file changes, and provides a test plan showing 301 passing tests with new rotation test coverage.
Linked Issues check ✅ Passed The PR fully implements the first proposed solution from #1850: accumulating canonical stdout spawner-side from polled deltas (capped at stdoutMaxBytes via capText), detecting rotation via logShrunk, and using logBuf directly instead of re-reading.
Out of Scope Changes check ✅ Passed All changes align with fixing #1850: logBuf accumulation in pollRunnerStdout, lastLogLen reset on rotation, removal of final readPodLog call, and the new rotation-pinning test are scoped to the stdout reconstruction bug.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch tale/wx748par7e2nkqexhkjk9ygtch895z5b

Warning

Billing warning: we have not been able to collect payment for this subscription for more than 72 hours. Please update the payment method or pay any pending invoices in Billing to avoid service interruption.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@services/sandbox/src/backend/kubernetes/k8s-backend.test.ts`:
- Around line 372-403: Remove the inlined client object fixture that contains
type assertions (the `as V1Pod` and `as unknown as CoreV1Api` assertions) and
refactor it to use the existing `stubClient()` helper function with properly
typed fixtures instead. Look for other examples of `stubClient()` usage
elsewhere in the test file to follow the same pattern. This approach will
eliminate the unsafe type assertions while maintaining the same test behavior
and keeping all test stubs assertion-free according to the project's guidelines.

In `@services/sandbox/src/backend/kubernetes/k8s-backend.ts`:
- Around line 555-562: In the final stdout handling block around
pollRunnerStdout and scanner.finalize, add a check after the final poll to
detect if the logs have shrunk (logs.length < lastLogLen indicating a potential
rotation). When a shrink is detected and logBuf is still below capacity, perform
one additional poll or read operation to capture any newly available
post-rotation bytes before calling scanner.finalize(). This ensures that
immediately available bytes after log rotation are not missed when feeding
residual data to the scanner.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: bc5ae332-3494-456c-aafd-0f9968a7bb89

📥 Commits

Reviewing files that changed from the base of the PR and between 5946d86 and 5a7d9e0.

📒 Files selected for processing (2)
  • services/sandbox/src/backend/kubernetes/k8s-backend.test.ts
  • services/sandbox/src/backend/kubernetes/k8s-backend.ts

Comment on lines +372 to +403
// oxlint-disable-next-line typescript-eslint/no-unsafe-type-assertion -- test stub
const client = {
namespace: 'tale-sandbox',
core: {
createNamespacedSecret: () => Promise.resolve({}),
createNamespacedPod: () => Promise.resolve({}),
readNamespacedPod: () => {
podCallIdx += 1;
if (podCallIdx === 1) {
// waitForRunnerStart: runner container is Running
return Promise.resolve({
metadata: {},
status: {
containerStatuses: [
{
name: 'runner',
state: { running: { startedAt: new Date() } },
},
],
},
} as V1Pod);
}
// Main loop: pod Succeeded with harvest terminated
return Promise.resolve({
metadata: {},
status: {
phase: 'Succeeded',
containerStatuses: [
{ name: 'harvest', state: { terminated: { exitCode: 0 } } },
],
},
} as V1Pod);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Check if the file exists and examine the flagged lines
if [ -f "services/sandbox/src/backend/kubernetes/k8s-backend.test.ts" ]; then
  echo "=== File exists, checking line count ==="
  wc -l services/sandbox/src/backend/kubernetes/k8s-backend.test.ts
  echo ""
  echo "=== Lines 360-410 (context around 372-403) ==="
  sed -n '360,410p' services/sandbox/src/backend/kubernetes/k8s-backend.test.ts | cat -n
else
  echo "File not found at specified path"
fi

Repository: tale-project/tale

Length of output: 2253


🏁 Script executed:

# Search for stubClient helper in the codebase
echo "=== Searching for stubClient function ==="
rg "stubClient" --type ts --type tsx

Repository: tale-project/tale

Length of output: 149


🏁 Script executed:

# Search for other test fixtures in the same test file
echo "=== Searching for fixture patterns in k8s-backend.test.ts ==="
rg "const.*=.*\{" services/sandbox/src/backend/kubernetes/k8s-backend.test.ts | head -20

Repository: tale-project/tale

Length of output: 884


🏁 Script executed:

# Look for stubClient implementation and usage in the test file
echo "=== Searching for stubClient usage context ==="
rg -A 10 "const stub = stubClient" services/sandbox/src/backend/kubernetes/k8s-backend.test.ts | head -50

Repository: tale-project/tale

Length of output: 1529


🏁 Script executed:

# Check line 418 for secondary issue
echo "=== Lines 410-425 to check line 418 ==="
sed -n '410,425p' services/sandbox/src/backend/kubernetes/k8s-backend.test.ts | cat -n

Repository: tale-project/tale

Length of output: 890


🏁 Script executed:

# Find the stubClient definition/import
echo "=== Looking for stubClient import or definition ==="
head -50 services/sandbox/src/backend/kubernetes/k8s-backend.test.ts | cat -n

Repository: tale-project/tale

Length of output: 2159


🏁 Script executed:

# Search for stubClient definition
echo "=== Looking for stubClient definition ==="
rg "function stubClient|const stubClient|export.*stubClient" services/sandbox/src/backend/kubernetes/ -A 15

Repository: tale-project/tale

Length of output: 1837


🏁 Script executed:

# Also check if it's imported or defined in the test file
echo "=== Full search for stubClient in k8s-backend.test.ts ==="
rg -n "stubClient" services/sandbox/src/backend/kubernetes/k8s-backend.test.ts

Repository: tale-project/tale

Length of output: 280


🏁 Script executed:

# Look for pod() and secret() helper definitions
echo "=== Looking for pod() and secret() helper definitions ==="
rg -B 5 "function pod|const pod.*=" services/sandbox/src/backend/kubernetes/k8s-backend.test.ts | head -40

Repository: tale-project/tale

Length of output: 234


🏁 Script executed:

# Get the full stubClient definition
echo "=== Full stubClient definition ==="
sed -n '227,266p' services/sandbox/src/backend/kubernetes/k8s-backend.test.ts | cat -n

Repository: tale-project/tale

Length of output: 1962


🏁 Script executed:

# Get the pod() helper definition
echo "=== pod() helper definition ==="
rg -A 20 "^function pod\(" services/sandbox/src/backend/kubernetes/k8s-backend.test.ts | head -40

Repository: tale-project/tale

Length of output: 722


Remove type assertions from the test client fixture.

The inlined client object at lines 372-403 uses as V1Pod (twice) and as unknown as CoreV1Api assertions, which violate the coding guideline: **/*.{ts,tsx} — "Never as, never any, never unknown".

Instead of inlining the client, use the existing stubClient() helper with typed fixtures. This pattern is already used throughout the file and keeps test stubs assertion-free:

Refactor to stubClient with typed fixtures
-    // oxlint-disable-next-line typescript-eslint/no-unsafe-type-assertion -- test stub
-    const client = {
-      namespace: 'tale-sandbox',
-      core: {
-        createNamespacedSecret: () => Promise.resolve({}),
-        createNamespacedPod: () => Promise.resolve({}),
-        readNamespacedPod: () => {
-          podCallIdx += 1;
-          if (podCallIdx === 1) {
-            return Promise.resolve({
-              metadata: {},
-              status: {
-                containerStatuses: [
-                  {
-                    name: 'runner',
-                    state: { running: { startedAt: new Date() } },
-                  },
-                ],
-              },
-            } as V1Pod);
-          }
-          return Promise.resolve({
-            metadata: {},
-            status: {
-              phase: 'Succeeded',
-              containerStatuses: [
-                { name: 'harvest', state: { terminated: { exitCode: 0 } } },
-              ],
-            },
-          } as V1Pod);
-        },
-        readNamespacedPodLog: ({ container }: { container: string }) => {
-          if (container === 'harvest') return Promise.resolve(harvestLog);
-          const log =
-            runnerLogs[runnerLogIdx] ?? runnerLogs[runnerLogs.length - 1];
-          runnerLogIdx += 1;
-          return Promise.resolve(log ?? '');
-        },
-        replaceNamespacedSecret: () => Promise.resolve({}),
-        deleteNamespacedPod: () => Promise.resolve({}),
-        deleteNamespacedSecret: () => Promise.resolve({}),
-        listNamespacedPod: () => Promise.resolve({ items: [] }),
-        listNamespacedSecret: () => Promise.resolve({ items: [] }),
-      } as unknown as CoreV1Api,
-    };
+    const startedPod: V1Pod = {
+      metadata: {},
+      status: {
+        containerStatuses: [
+          { name: 'runner', state: { running: { startedAt: new Date() } } },
+        ],
+      },
+    };
+    const succeededPod: V1Pod = {
+      metadata: {},
+      status: {
+        phase: 'Succeeded',
+        containerStatuses: [
+          { name: 'harvest', state: { terminated: { exitCode: 0 } } },
+        ],
+      },
+    };
+
+    const { core, namespace } = stubClient({
+      readPod: () => {
+        podCallIdx += 1;
+        return Promise.resolve(podCallIdx === 1 ? startedPod : succeededPod);
+      },
+      readLog: ({ container }) => {
+        if (container === 'harvest') return Promise.resolve(harvestLog);
+        const log =
+          runnerLogs[runnerLogIdx] ?? runnerLogs[runnerLogs.length - 1];
+        runnerLogIdx += 1;
+        return Promise.resolve(log ?? '');
+      },
+    });
+    const client = { namespace, core };
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/sandbox/src/backend/kubernetes/k8s-backend.test.ts` around lines 372
- 403, Remove the inlined client object fixture that contains type assertions
(the `as V1Pod` and `as unknown as CoreV1Api` assertions) and refactor it to use
the existing `stubClient()` helper function with properly typed fixtures
instead. Look for other examples of `stubClient()` usage elsewhere in the test
file to follow the same pattern. This approach will eliminate the unsafe type
assertions while maintaining the same test behavior and keeping all test stubs
assertion-free according to the project's guidelines.

Source: Coding guidelines

Comment on lines +555 to +562
// Final stdout poll (the runner may have emitted more between the last
// loop iteration and exit) → feed the residual to the scanner, then
// drain it. Use the spawner-side accumulation (logBuf) as the canonical
// stdout — it always holds the deterministic head of the stream even
// across kubelet log rotations.
await pollRunnerStdout();
scanner.finalize();
let stdout = '';
try {
stdout = await readPodLog(this.client, podName, 'runner', {
limitBytes: cfg.stdoutMaxBytes,
});
} catch (err) {
console.warn('[sandbox.k8s] final runner log read failed:', err);
}
const stdout = logBuf;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Handle final-iteration log shrink with one extra read before finalize.

If the final poll only observes logs.length < lastLogLen, the cursor is reset but no follow-up read happens before scanner.finalize(). That can miss immediately available post-rotation bytes when logBuf is still below cap.

💡 Suggested patch
       await pollRunnerStdout();
+      if (
+        logShrunk &&
+        Buffer.byteLength(logBuf, 'utf8') < cfg.stdoutMaxBytes
+      ) {
+        // If shrink is first detected in the final poll, do one extra read so
+        // we can capture the new file head before finalizing.
+        await pollRunnerStdout();
+      }
       scanner.finalize();
       const stdout = logBuf;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/sandbox/src/backend/kubernetes/k8s-backend.ts` around lines 555 -
562, In the final stdout handling block around pollRunnerStdout and
scanner.finalize, add a check after the final poll to detect if the logs have
shrunk (logs.length < lastLogLen indicating a potential rotation). When a shrink
is detected and logBuf is still below capacity, perform one additional poll or
read operation to capture any newly available post-rotation bytes before calling
scanner.finalize(). This ensures that immediately available bytes after log
rotation are not missed when feeding residual data to the scanner.

@larryro larryro force-pushed the tale/wx748par7e2nkqexhkjk9ygtch895z5b branch from 5a7d9e0 to b68a6ed Compare June 22, 2026 02:38
@larryro

larryro commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator Author

Desk Review — PR #1917

VERDICT: READY TO MERGE

CI: all required checks green (Analyze, Browser, Build ×8, Opengrep, Smoke test, Unit, Lint commits, Validate images, Migrations, UI, Scan sandbox — all pass; 5 jobs correctly skip on fork/non-applicable). No red or pending checks.

Tests: bun run --filter @tale/sandbox test — 301 pass, 3 fail. The 3 failures are pre-existing and unrelated to this change (the three @kubernetes/client-node tests that fail because the module is absent from the sandbox CI test environment; all 3 identical failures exist on main). The CI Unit check is green in the proper environment.


What the fix does

pollRunnerStdout now accumulates an incremental logBuf from each polled delta, capped at stdoutMaxBytes via capText. On a kubelet log rotation (logs.length < lastLogLen), logBuf already holds the deterministic pre-rotation head; the code resets lastLogLen = 0 so the new file's content continues accumulating into the remaining cap. The final readPodLog('runner') call is replaced with const stdout = logBuf.

The PR also fixes a secondary bug in the old code that went undocumented in the issue: the old else if branch set logShrunk = true but did not reset lastLogLen. On every subsequent poll after rotation the new file starts from zero and grows slowly — always shorter than the pre-rotation lastLogLen — so logs.length < lastLogLen stays true and logs.length > lastLogLen can never fire. scanner.onStdoutChunk would go permanently silent and no new post-rotation content would reach live-progress callbacks. The lastLogLen = 0 reset at line 466 closes this gap.


Findings by dimension

Correctness — no new bugs. Walked all branches:

  • Happy path: delta accumulates correctly into logBuf.
  • Rotation: logShrunk = true + lastLogLen = 0 → correct. Pre-rotation head preserved; new file accumulates within remaining cap.
  • Multi-rotation: each shrink resets cursor; invariant holds.
  • Error path: poll failure leaves logBuf/lastLogLen unchanged; next poll catches up.
  • stdoutStreamTruncated = Buffer.byteLength(stdout) >= stdoutMaxBytes || logShrunk propagates correctly through all four return paths (aborted, runnerDead, harvestDone, harvestMissing).
  • assemble() re-applies capText(stripPhaseMarkers(stripControlChars(logBuf)), stdoutMaxBytes); since logBuf is already capped and stripping can only shorten it, stdoutCapTrunc is always false — no double-truncation issue.

Two pre-existing issues (not introduced by this PR):

  • lastLogLen tracks .length (character count) while limitBytes is a byte budget — incommensurable for multi-byte UTF-8. After rotation, lastLogLen = 0 makes the mismatch slightly more load-bearing, but the practical impact is narrow (only matters when the post-rotation file immediately exceeds the byte cap between polls with multi-byte content) and a fix is out of scope here.
  • Rotation is silently undetected when the new file happens to be ≥ old length at the moment of polling. Pre-existing limitation of length-based detection; not worsened by this PR.

Tests — one gap, not blocking. The new stdout log-rotation pinning test correctly pins the core scenario; the res.stdoutBase64 / res.truncated.stdout assertions are sound. The stubClient signature change (readLog now receives { container }) is backward-compatible with all existing callers.

Minor coverage gap: the test is structured so rotation is always detected in the final pollRunnerStdout() after the loop, never during an in-loop poll. The post-rotation accumulation branch (lastLogLen = 0 → delta from new file flows into logBuf) is never exercised by the test. The code is correct by inspection, and the existing 301-test suite covers no-rotation accumulation, so this is not blocking — worth a follow-up test if the rotation scenario ever sees regressions.

Elegance — cosmetic only. logBuf is declared two lines below its three sibling poll-state variables, separated by its comment block. Convention in the file is to group related let declarations first. The Buffer.byteLength(logBuf, 'utf8') < cfg.stdoutMaxBytes guard allocates a Buffer per poll; capturing capText's returned truncated boolean as a logBufFull sentinel would avoid the redundant allocation. Neither item is blocking at 500 ms poll intervals.

Issue resolution — complete. The fix matches the "accumulate canonical stdout spawner-side" direction in #1850. All readPodLog('runner') call sites verified: the only remaining one is the poll inside pollRunnerStdout itself (correct, it must stay). Consistent with the Docker backend's drainAndCap approach.


The implementation is correct, the secondary live-tail-dark regression is a genuine bonus fix, and CI is fully green. Ready to merge.

@larryro

larryro commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator Author

Superseded by #1915, which fixes the same issue (#1850) with the same approach but a more comprehensive test suite (covers both pre-rotation preservation and post-rotation delta append) and cleaner accumulation logic (canonicalChunks/canonicalByteCount with an explicit byte limit).

Closing this to avoid merging the same fix twice. If you prefer this implementation, reopen and we'll close #1915 instead.

@larryro larryro closed this Jun 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sandbox/k8s: stdout beyond kubelet log rotation returns a mid-stream window

1 participant