fix(sandbox/k8s): accumulate canonical stdout from polled deltas to avoid mid-stream window after log rotation#1917
Conversation
📝 WalkthroughWalkthroughThe K8s backend's stdout collection is refactored from a final full pod-log re-read to a spawner-side delta accumulation strategy. A Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning Billing warning: we have not been able to collect payment for this subscription for more than 72 hours. Please update the payment method or pay any pending invoices in Billing to avoid service interruption. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@services/sandbox/src/backend/kubernetes/k8s-backend.test.ts`:
- Around line 372-403: Remove the inlined client object fixture that contains
type assertions (the `as V1Pod` and `as unknown as CoreV1Api` assertions) and
refactor it to use the existing `stubClient()` helper function with properly
typed fixtures instead. Look for other examples of `stubClient()` usage
elsewhere in the test file to follow the same pattern. This approach will
eliminate the unsafe type assertions while maintaining the same test behavior
and keeping all test stubs assertion-free according to the project's guidelines.
In `@services/sandbox/src/backend/kubernetes/k8s-backend.ts`:
- Around line 555-562: In the final stdout handling block around
pollRunnerStdout and scanner.finalize, add a check after the final poll to
detect if the logs have shrunk (logs.length < lastLogLen indicating a potential
rotation). When a shrink is detected and logBuf is still below capacity, perform
one additional poll or read operation to capture any newly available
post-rotation bytes before calling scanner.finalize(). This ensures that
immediately available bytes after log rotation are not missed when feeding
residual data to the scanner.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: bc5ae332-3494-456c-aafd-0f9968a7bb89
📒 Files selected for processing (2)
services/sandbox/src/backend/kubernetes/k8s-backend.test.tsservices/sandbox/src/backend/kubernetes/k8s-backend.ts
| // oxlint-disable-next-line typescript-eslint/no-unsafe-type-assertion -- test stub | ||
| const client = { | ||
| namespace: 'tale-sandbox', | ||
| core: { | ||
| createNamespacedSecret: () => Promise.resolve({}), | ||
| createNamespacedPod: () => Promise.resolve({}), | ||
| readNamespacedPod: () => { | ||
| podCallIdx += 1; | ||
| if (podCallIdx === 1) { | ||
| // waitForRunnerStart: runner container is Running | ||
| return Promise.resolve({ | ||
| metadata: {}, | ||
| status: { | ||
| containerStatuses: [ | ||
| { | ||
| name: 'runner', | ||
| state: { running: { startedAt: new Date() } }, | ||
| }, | ||
| ], | ||
| }, | ||
| } as V1Pod); | ||
| } | ||
| // Main loop: pod Succeeded with harvest terminated | ||
| return Promise.resolve({ | ||
| metadata: {}, | ||
| status: { | ||
| phase: 'Succeeded', | ||
| containerStatuses: [ | ||
| { name: 'harvest', state: { terminated: { exitCode: 0 } } }, | ||
| ], | ||
| }, | ||
| } as V1Pod); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Check if the file exists and examine the flagged lines
if [ -f "services/sandbox/src/backend/kubernetes/k8s-backend.test.ts" ]; then
echo "=== File exists, checking line count ==="
wc -l services/sandbox/src/backend/kubernetes/k8s-backend.test.ts
echo ""
echo "=== Lines 360-410 (context around 372-403) ==="
sed -n '360,410p' services/sandbox/src/backend/kubernetes/k8s-backend.test.ts | cat -n
else
echo "File not found at specified path"
fiRepository: tale-project/tale
Length of output: 2253
🏁 Script executed:
# Search for stubClient helper in the codebase
echo "=== Searching for stubClient function ==="
rg "stubClient" --type ts --type tsxRepository: tale-project/tale
Length of output: 149
🏁 Script executed:
# Search for other test fixtures in the same test file
echo "=== Searching for fixture patterns in k8s-backend.test.ts ==="
rg "const.*=.*\{" services/sandbox/src/backend/kubernetes/k8s-backend.test.ts | head -20Repository: tale-project/tale
Length of output: 884
🏁 Script executed:
# Look for stubClient implementation and usage in the test file
echo "=== Searching for stubClient usage context ==="
rg -A 10 "const stub = stubClient" services/sandbox/src/backend/kubernetes/k8s-backend.test.ts | head -50Repository: tale-project/tale
Length of output: 1529
🏁 Script executed:
# Check line 418 for secondary issue
echo "=== Lines 410-425 to check line 418 ==="
sed -n '410,425p' services/sandbox/src/backend/kubernetes/k8s-backend.test.ts | cat -nRepository: tale-project/tale
Length of output: 890
🏁 Script executed:
# Find the stubClient definition/import
echo "=== Looking for stubClient import or definition ==="
head -50 services/sandbox/src/backend/kubernetes/k8s-backend.test.ts | cat -nRepository: tale-project/tale
Length of output: 2159
🏁 Script executed:
# Search for stubClient definition
echo "=== Looking for stubClient definition ==="
rg "function stubClient|const stubClient|export.*stubClient" services/sandbox/src/backend/kubernetes/ -A 15Repository: tale-project/tale
Length of output: 1837
🏁 Script executed:
# Also check if it's imported or defined in the test file
echo "=== Full search for stubClient in k8s-backend.test.ts ==="
rg -n "stubClient" services/sandbox/src/backend/kubernetes/k8s-backend.test.tsRepository: tale-project/tale
Length of output: 280
🏁 Script executed:
# Look for pod() and secret() helper definitions
echo "=== Looking for pod() and secret() helper definitions ==="
rg -B 5 "function pod|const pod.*=" services/sandbox/src/backend/kubernetes/k8s-backend.test.ts | head -40Repository: tale-project/tale
Length of output: 234
🏁 Script executed:
# Get the full stubClient definition
echo "=== Full stubClient definition ==="
sed -n '227,266p' services/sandbox/src/backend/kubernetes/k8s-backend.test.ts | cat -nRepository: tale-project/tale
Length of output: 1962
🏁 Script executed:
# Get the pod() helper definition
echo "=== pod() helper definition ==="
rg -A 20 "^function pod\(" services/sandbox/src/backend/kubernetes/k8s-backend.test.ts | head -40Repository: tale-project/tale
Length of output: 722
Remove type assertions from the test client fixture.
The inlined client object at lines 372-403 uses as V1Pod (twice) and as unknown as CoreV1Api assertions, which violate the coding guideline: **/*.{ts,tsx} — "Never as, never any, never unknown".
Instead of inlining the client, use the existing stubClient() helper with typed fixtures. This pattern is already used throughout the file and keeps test stubs assertion-free:
Refactor to stubClient with typed fixtures
- // oxlint-disable-next-line typescript-eslint/no-unsafe-type-assertion -- test stub
- const client = {
- namespace: 'tale-sandbox',
- core: {
- createNamespacedSecret: () => Promise.resolve({}),
- createNamespacedPod: () => Promise.resolve({}),
- readNamespacedPod: () => {
- podCallIdx += 1;
- if (podCallIdx === 1) {
- return Promise.resolve({
- metadata: {},
- status: {
- containerStatuses: [
- {
- name: 'runner',
- state: { running: { startedAt: new Date() } },
- },
- ],
- },
- } as V1Pod);
- }
- return Promise.resolve({
- metadata: {},
- status: {
- phase: 'Succeeded',
- containerStatuses: [
- { name: 'harvest', state: { terminated: { exitCode: 0 } } },
- ],
- },
- } as V1Pod);
- },
- readNamespacedPodLog: ({ container }: { container: string }) => {
- if (container === 'harvest') return Promise.resolve(harvestLog);
- const log =
- runnerLogs[runnerLogIdx] ?? runnerLogs[runnerLogs.length - 1];
- runnerLogIdx += 1;
- return Promise.resolve(log ?? '');
- },
- replaceNamespacedSecret: () => Promise.resolve({}),
- deleteNamespacedPod: () => Promise.resolve({}),
- deleteNamespacedSecret: () => Promise.resolve({}),
- listNamespacedPod: () => Promise.resolve({ items: [] }),
- listNamespacedSecret: () => Promise.resolve({ items: [] }),
- } as unknown as CoreV1Api,
- };
+ const startedPod: V1Pod = {
+ metadata: {},
+ status: {
+ containerStatuses: [
+ { name: 'runner', state: { running: { startedAt: new Date() } } },
+ ],
+ },
+ };
+ const succeededPod: V1Pod = {
+ metadata: {},
+ status: {
+ phase: 'Succeeded',
+ containerStatuses: [
+ { name: 'harvest', state: { terminated: { exitCode: 0 } } },
+ ],
+ },
+ };
+
+ const { core, namespace } = stubClient({
+ readPod: () => {
+ podCallIdx += 1;
+ return Promise.resolve(podCallIdx === 1 ? startedPod : succeededPod);
+ },
+ readLog: ({ container }) => {
+ if (container === 'harvest') return Promise.resolve(harvestLog);
+ const log =
+ runnerLogs[runnerLogIdx] ?? runnerLogs[runnerLogs.length - 1];
+ runnerLogIdx += 1;
+ return Promise.resolve(log ?? '');
+ },
+ });
+ const client = { namespace, core };🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@services/sandbox/src/backend/kubernetes/k8s-backend.test.ts` around lines 372
- 403, Remove the inlined client object fixture that contains type assertions
(the `as V1Pod` and `as unknown as CoreV1Api` assertions) and refactor it to use
the existing `stubClient()` helper function with properly typed fixtures
instead. Look for other examples of `stubClient()` usage elsewhere in the test
file to follow the same pattern. This approach will eliminate the unsafe type
assertions while maintaining the same test behavior and keeping all test stubs
assertion-free according to the project's guidelines.
Source: Coding guidelines
| // Final stdout poll (the runner may have emitted more between the last | ||
| // loop iteration and exit) → feed the residual to the scanner, then | ||
| // drain it. Use the spawner-side accumulation (logBuf) as the canonical | ||
| // stdout — it always holds the deterministic head of the stream even | ||
| // across kubelet log rotations. | ||
| await pollRunnerStdout(); | ||
| scanner.finalize(); | ||
| let stdout = ''; | ||
| try { | ||
| stdout = await readPodLog(this.client, podName, 'runner', { | ||
| limitBytes: cfg.stdoutMaxBytes, | ||
| }); | ||
| } catch (err) { | ||
| console.warn('[sandbox.k8s] final runner log read failed:', err); | ||
| } | ||
| const stdout = logBuf; |
There was a problem hiding this comment.
Handle final-iteration log shrink with one extra read before finalize.
If the final poll only observes logs.length < lastLogLen, the cursor is reset but no follow-up read happens before scanner.finalize(). That can miss immediately available post-rotation bytes when logBuf is still below cap.
💡 Suggested patch
await pollRunnerStdout();
+ if (
+ logShrunk &&
+ Buffer.byteLength(logBuf, 'utf8') < cfg.stdoutMaxBytes
+ ) {
+ // If shrink is first detected in the final poll, do one extra read so
+ // we can capture the new file head before finalizing.
+ await pollRunnerStdout();
+ }
scanner.finalize();
const stdout = logBuf;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@services/sandbox/src/backend/kubernetes/k8s-backend.ts` around lines 555 -
562, In the final stdout handling block around pollRunnerStdout and
scanner.finalize, add a check after the final poll to detect if the logs have
shrunk (logs.length < lastLogLen indicating a potential rotation). When a shrink
is detected and logBuf is still below capacity, perform one additional poll or
read operation to capture any newly available post-rotation bytes before calling
scanner.finalize(). This ensures that immediately available bytes after log
rotation are not missed when feeding residual data to the scanner.
…let log rotation (#1850)
5a7d9e0 to
b68a6ed
Compare
Desk Review — PR #1917VERDICT: READY TO MERGE CI: all required checks green (Analyze, Browser, Build ×8, Opengrep, Smoke test, Unit, Lint commits, Validate images, Migrations, UI, Scan sandbox — all pass; 5 jobs correctly skip on fork/non-applicable). No red or pending checks. Tests: What the fix does
The PR also fixes a secondary bug in the old code that went undocumented in the issue: the old Findings by dimensionCorrectness — no new bugs. Walked all branches:
Two pre-existing issues (not introduced by this PR):
Tests — one gap, not blocking. The new Minor coverage gap: the test is structured so rotation is always detected in the final Elegance — cosmetic only. Issue resolution — complete. The fix matches the "accumulate canonical stdout spawner-side" direction in #1850. All The implementation is correct, the secondary live-tail-dark regression is a genuine bonus fix, and CI is fully green. Ready to merge. |
|
Superseded by #1915, which fixes the same issue (#1850) with the same approach but a more comprehensive test suite (covers both pre-rotation preservation and post-rotation delta append) and cleaner accumulation logic ( Closing this to avoid merging the same fix twice. If you prefer this implementation, reopen and we'll close #1915 instead. |
Summary
Fixes #1850.
When kubelet rotates a container log (at
containerLogMaxSize, default 10Mi), the Kubernetes log API serves only the current/new file. The previous implementation did a finalreadPodLogcall to get the canonical stdout — after rotation, this returned content from the middle of the stream (the new file's head), not the deterministic first-stdoutMaxBytesbytes.Root cause:
pollRunnerStdoutaccumulated deltas into thescannerfor live progress callbacks, but the canonicalstdoutat the end was taken from a fresh fullreadPodLogcall that bypassed the spawner-side accumulation.Fix: Accumulate the canonical stdout spawner-side into
logBuffrom the same polled deltas, capped atstdoutMaxBytesbytes (usingcapText). After rotation (logs.length < lastLogLen),logBufalready holds the pre-rotation head; we resetlastLogLen = 0to continue accumulating from the new file within the remaining cap. The finalreadPodLogcall for canonical stdout is removed —logBufis used directly.This is the "accumulate canonical stdout spawner-side from the polled deltas" direction described in the issue, analogous to docker's
drainAndCapapproach.Changes
services/sandbox/src/backend/kubernetes/k8s-backend.ts: AddlogBufaccumulation inpollRunnerStdout; resetlastLogLen = 0on rotation; replace the finalreadPodLogcall withconst stdout = logBuf.services/sandbox/src/backend/kubernetes/k8s-backend.test.ts: Addstdout log-rotation pinningtest that simulates log shrinkage and verifies the canonical stdout is the pre-rotation head (not the rotated window).Test plan
bun run --filter @tale/sandbox test— 301 pass, 3 pre-existing failures (all due to@kubernetes/client-nodenot installed in the test environment, unrelated to this change)stdout log-rotation pinningadded tok8s-backend.test.tscovering the rotation scenarioSummary by CodeRabbit
New Features
Bug Fixes