Skip to content

fix: adapt code interpreter warm pool to agent-sandbox v0.4.6#387

Open
ranxi2001 wants to merge 1 commit into
volcano-sh:mainfrom
ranxi2001:feat/agent-sandbox-latest
Open

fix: adapt code interpreter warm pool to agent-sandbox v0.4.6#387
ranxi2001 wants to merge 1 commit into
volcano-sh:mainfrom
ranxi2001:feat/agent-sandbox-latest

Conversation

@ranxi2001

Copy link
Copy Markdown

What type of PR is this?

/kind bug

What this PR does / why we need it:

This PR adapts AgentCube's CodeInterpreter warm pool integration to sigs.k8s.io/agent-sandbox v0.4.6.

AgentCube currently depends on agent-sandbox v0.1.1. With newer agent-sandbox releases, a direct dependency bump no longer works:

  • compilation fails because SandboxPodNameAnnotation moved out of the internal controllers package and is now exposed from the public sandbox API package;
  • warm pool adoption changed from the old direct/bare Pod shape to SandboxWarmPool -> Sandbox -> Pod, and SandboxClaim reports the serving Sandbox through status;
  • after the compile fix, runtime creation can still time out because AgentCube was waiting/probing by the claim/template Sandbox name instead of the adopted Sandbox name;
  • agent-sandbox v0.4.6 defaults SandboxTemplate.networkPolicyManagement to managed NetworkPolicy, which blocks AgentCube's current Router / WorkloadManager data path unless AgentCube opts out or provides matching allow rules.

This PR:

  • bumps sigs.k8s.io/agent-sandbox to v0.4.6 and updates the required Go / Kubernetes dependencies;
  • uses the public sandboxv1alpha1.SandboxPodNameAnnotation constant instead of importing an internal agent-sandbox controller package;
  • waits for SandboxClaim.Status.SandboxStatus.Name, fetches the adopted Sandbox, waits until it is Ready, and uses that Sandbox for Pod IP / entrypoint probing;
  • keeps the SandboxClaim name in the AgentCube session store when the request kind is SandboxClaim, so delete / GC still operate on the claim resource;
  • sets CodeInterpreter-created SandboxTemplate objects to networkPolicyManagement: Unmanaged to preserve the existing AgentCube Router / WorkloadManager traffic path;
  • updates warm pool e2e discovery to support both the old direct Pod ownership shape and the newer SandboxWarmPool -> Sandbox -> Pod ownership shape;
  • regenerates the CRD and client-go code with the matching Kubernetes v0.35.4 generator stack;
  • updates hack/update-codegen.sh so code generation uses k8s.io/code-generator v0.35.4 without mutating project dependencies through go get -d;
  • updates Docker builder images to Go 1.26.2.

Which issue(s) this PR fixes:

Refs #386

Special notes for your reviewer:

  • Target version: agent-sandbox v0.4.6, which is the current Go module @latest. I did not target v0.5.0rc1 because Go resolves that tag to a pseudo-version rather than the stable latest release.
  • Dependency impact: agent-sandbox v0.4.6 requires Go 1.26.2 and pulls Kubernetes / controller-runtime forward. Please confirm this dependency bump is acceptable for the v0.2.0 compatibility work.
  • NetworkPolicy compatibility: networkPolicyManagement: Unmanaged keeps the behavior AgentCube had with agent-sandbox v0.1.1. If reviewers prefer to keep agent-sandbox managed NetworkPolicies, the alternative is to add explicit allow rules for agentcube-router and workloadmanager.
  • Generated code: CRD manifest and client-go changes are generated from the dependency / Kubernetes OpenAPI update; this PR does not intentionally change the AgentRuntime API surface. hack/update-codegen.sh was also aligned to the new Kubernetes minor version because the old code-generator v0.34.1 path downgraded agent-sandbox during generation.
  • Local kind validation is blocked on this host during cluster creation before AgentCube resources are installed. Runtime validation below used an existing k3s cluster.
  • AI assistance: I used Codex to inspect the agent-sandbox API changes, implement focused tests, run local / k3s validation, and prepare this PR description. I reviewed and validated the changes.

Tests run:

  • git diff --check
  • /tmp/go-toolchain/go/bin/go test ./pkg/workloadmanager -count=1
  • /tmp/go-toolchain/go/bin/go test -race ./pkg/workloadmanager -count=1
  • /tmp/go-toolchain/go/bin/go list ./... | rg -v '^github.com/volcano-sh/agentcube/test/e2e$' | xargs /tmp/go-toolchain/go/bin/go test
  • PATH=/tmp/go-toolchain/go/bin:$PATH make gen-all
  • PATH=/tmp/go-toolchain/go/bin:$PATH make gen-check
  • PATH=/tmp/go-toolchain/go/bin:$PATH make build-all
  • PATH=/tmp/go-toolchain/go/bin:$PATH make helm-template
  • PATH=/tmp/go-toolchain/go/bin:$PATH make helm-lint
  • make docker-build WORKLOAD_MANAGER_IMAGE=workloadmanager:day16-v046
  • make docker-build-router ROUTER_IMAGE=agentcube-router:day16-v046
  • make docker-build-picod PICOD_IMAGE=picod:day16-v046
  • after commit 5316358: make docker-build WORKLOAD_MANAGER_IMAGE=workloadmanager:day16-v046-final
  • after commit 5316358: make docker-build-router ROUTER_IMAGE=agentcube-router:day16-v046-final
  • after commit 5316358: make docker-build-picod PICOD_IMAGE=picod:day16-v046-final
  • k3s: go test ./test/e2e -run 'TestCodeInterpreter(BasicInvocation|FileOperations)$' -count=1 passed
  • k3s: go test ./test/e2e -run 'TestCodeInterpreterWarmPool$' -count=1 passed
  • k3s: go test ./test/e2e -run 'TestCodeInterpreterWarmPoolLoad$' -count=1 passed with 100 / 100 successful requests
  • Python e2e with a Python 3.11 venv:
    • test/e2e/test_codeinterpreter.py: 3 tests OK
    • test/e2e/test_langchain_agentcube_sandbox.py: 4 tests OK
    • test/e2e/test_mcp_code_interpreter.py: 5 tests OK
    • test/e2e/test_mcp_code_interpreter_stdio.py: 1 test OK
  • math-agent live LLM e2e with an OpenAI-compatible /v1 endpoint returned the expected final answer 42

Does this PR introduce a user-facing change?:

NONE

Signed-off-by: ranxi2001 <ranxi169@163.com>
Copilot AI review requested due to automatic review settings June 17, 2026 12:32
@volcano-sh-bot volcano-sh-bot added the kind/bug Something isn't working label Jun 17, 2026
@volcano-sh-bot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign hzxuzhonghu for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gemini-code-assist

Copy link
Copy Markdown
Contributor

Warning

Gemini encountered an error creating the review. You can try again by commenting /gemini review.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR updates Kubernetes/agent-sandbox dependencies and adjusts workload manager + e2e logic to support newer warm-pool controller ownership patterns and SandboxClaim adoption, while also refreshing build/codegen tooling.

Changes:

  • Refactor warm-pool pod counting/ready checks to support both direct Pod ownership and SandboxWarmPool → Sandbox → Pod ownership.
  • Update sandbox creation flow to handle SandboxClaim adoption by polling claim/sandbox readiness and storing claim identity correctly.
  • Bump Kubernetes/tooling versions, regenerate CRDs, and update Docker build images.

Reviewed changes

Copilot reviewed 13 out of 18 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
test/e2e/e2e_test.go Refactors warm-pool pod discovery to handle new/old ownership chains and reuses shared listing logic.
pkg/workloadmanager/sandbox_helper.go Adds CreatedAt to placeholder sandbox store entries.
pkg/workloadmanager/k8s_client.go Adds dynamic-client getters for Sandbox and SandboxClaim.
pkg/workloadmanager/handlers_test.go Updates annotation constant usage and adds coverage for SandboxClaim-adoption behavior.
pkg/workloadmanager/handlers.go Splits “wait for ready” paths for direct sandboxes vs claim-adopted sandboxes; fixes stored identity for claims.
pkg/workloadmanager/codeinterpreter_controller_test.go Adds tests ensuring SandboxTemplate network policy management is set to unmanaged.
pkg/workloadmanager/codeinterpreter_controller.go Forces SandboxTemplate NetworkPolicyManagement to unmanaged and updates existing templates accordingly.
manifests/charts/base/crds/runtime.agentcube.volcano.sh_agentruntimes.yaml Regenerated CRD schema changes (likely from dependency/codegen bump).
hack/update-codegen.sh Updates code-generator version and changes module discovery approach.
go.mod Updates Go version and bumps k8s/controller-runtime/agent-sandbox deps.
go.sum Updates module sums for the dependency/tooling bumps.
docker/Dockerfile.router Updates Go builder image version.
docker/Dockerfile.picod Updates Go builder image and hardens apt install layer.
docker/Dockerfile Updates Go builder image version.
Files not reviewed (4)
  • client-go/clientset/versioned/fake/clientset_generated.go: Generated file
  • client-go/informers/externalversions/factory.go: Generated file
  • client-go/informers/externalversions/runtime/v1alpha1/agentruntime.go: Generated file
  • client-go/informers/externalversions/runtime/v1alpha1/codeinterpreter.go: Generated file
Comments suppressed due to low confidence (1)

pkg/workloadmanager/sandbox_helper.go:60

  • When CreationTimestamp is zero, createdAt is set from time.Now(), but expiresAt uses a separate time.Now() call. To keep timestamps consistent (and avoid tiny negative/odd deltas in tests/metrics), compute the default expiry from createdAt (e.g., createdAt.Add(DefaultSandboxTTL)).
	createdAt := sandboxCR.GetCreationTimestamp().Time
	if createdAt.IsZero() {
		createdAt = time.Now()
	}
	var expiresAt time.Time
	if sandboxCR.Spec.Lifecycle.ShutdownTime != nil {
		expiresAt = sandboxCR.Spec.Lifecycle.ShutdownTime.Time
	} else {
		expiresAt = time.Now().Add(DefaultSandboxTTL)
	}

return nil
}

func (s *Server) waitForDirectSandboxReady(ctx context.Context, sandbox *sandboxv1alpha1.Sandbox, resultChan <-chan SandboxStatusUpdate) (*sandboxv1alpha1.Sandbox, error) {
Comment on lines +206 to +207
select {
case result := <-resultChan:
Comment on lines +296 to +299
// if warmpool is used, the pod name is stored in sandbox's annotation `agents.x-k8s.io/pod-name`
sandboxNameForPod := createdSandbox.Name
sandboxPodName := createdSandbox.Name
if podName, exists := createdSandbox.Annotations[sandboxv1alpha1.SandboxPodNameAnnotation]; exists {
Comment thread test/e2e/e2e_test.go
Comment on lines +1220 to +1221
warmPoolSandboxes := make(map[string]struct{}, len(sandboxList.Items))
for _, sandbox := range sandboxList.Items {
Comment thread test/e2e/e2e_test.go
Comment on lines +1225 to +1227
for _, owner := range sandbox.OwnerReferences {
if owner.Kind == ownerKindSandboxWarmPool && owner.Name == codeInterpreterName {
warmPoolSandboxes[sandbox.Name] = struct{}{}
Comment thread test/e2e/e2e_test.go
Comment on lines +1245 to +1247
_, ownedByWarmPoolSandbox := warmPoolSandboxes[owner.Name]
if (owner.Kind == ownerKindSandboxWarmPool && owner.Name == codeInterpreterName) ||
(owner.Kind == "Sandbox" && ownedByWarmPoolSandbox) {
Comment on lines +111 to +119
func (f *recordingStore) UpdateSandbox(ctx context.Context, sandbox *types.SandboxInfo) error {
f.fakeStore.UpdateSandbox(ctx, sandbox)
if f.updateErr != nil {
return f.updateErr
}
copied := *sandbox
f.lastUpdated = &copied
return nil
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/bug Something isn't working size/XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants