fix: adapt code interpreter warm pool to agent-sandbox v0.4.6#387
fix: adapt code interpreter warm pool to agent-sandbox v0.4.6#387ranxi2001 wants to merge 1 commit into
Conversation
Signed-off-by: ranxi2001 <ranxi169@163.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Warning Gemini encountered an error creating the review. You can try again by commenting |
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR updates Kubernetes/agent-sandbox dependencies and adjusts workload manager + e2e logic to support newer warm-pool controller ownership patterns and SandboxClaim adoption, while also refreshing build/codegen tooling.
Changes:
- Refactor warm-pool pod counting/ready checks to support both direct Pod ownership and SandboxWarmPool → Sandbox → Pod ownership.
- Update sandbox creation flow to handle SandboxClaim adoption by polling claim/sandbox readiness and storing claim identity correctly.
- Bump Kubernetes/tooling versions, regenerate CRDs, and update Docker build images.
Reviewed changes
Copilot reviewed 13 out of 18 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| test/e2e/e2e_test.go | Refactors warm-pool pod discovery to handle new/old ownership chains and reuses shared listing logic. |
| pkg/workloadmanager/sandbox_helper.go | Adds CreatedAt to placeholder sandbox store entries. |
| pkg/workloadmanager/k8s_client.go | Adds dynamic-client getters for Sandbox and SandboxClaim. |
| pkg/workloadmanager/handlers_test.go | Updates annotation constant usage and adds coverage for SandboxClaim-adoption behavior. |
| pkg/workloadmanager/handlers.go | Splits “wait for ready” paths for direct sandboxes vs claim-adopted sandboxes; fixes stored identity for claims. |
| pkg/workloadmanager/codeinterpreter_controller_test.go | Adds tests ensuring SandboxTemplate network policy management is set to unmanaged. |
| pkg/workloadmanager/codeinterpreter_controller.go | Forces SandboxTemplate NetworkPolicyManagement to unmanaged and updates existing templates accordingly. |
| manifests/charts/base/crds/runtime.agentcube.volcano.sh_agentruntimes.yaml | Regenerated CRD schema changes (likely from dependency/codegen bump). |
| hack/update-codegen.sh | Updates code-generator version and changes module discovery approach. |
| go.mod | Updates Go version and bumps k8s/controller-runtime/agent-sandbox deps. |
| go.sum | Updates module sums for the dependency/tooling bumps. |
| docker/Dockerfile.router | Updates Go builder image version. |
| docker/Dockerfile.picod | Updates Go builder image and hardens apt install layer. |
| docker/Dockerfile | Updates Go builder image version. |
Files not reviewed (4)
- client-go/clientset/versioned/fake/clientset_generated.go: Generated file
- client-go/informers/externalversions/factory.go: Generated file
- client-go/informers/externalversions/runtime/v1alpha1/agentruntime.go: Generated file
- client-go/informers/externalversions/runtime/v1alpha1/codeinterpreter.go: Generated file
Comments suppressed due to low confidence (1)
pkg/workloadmanager/sandbox_helper.go:60
- When
CreationTimestampis zero,createdAtis set fromtime.Now(), butexpiresAtuses a separatetime.Now()call. To keep timestamps consistent (and avoid tiny negative/odd deltas in tests/metrics), compute the default expiry fromcreatedAt(e.g.,createdAt.Add(DefaultSandboxTTL)).
createdAt := sandboxCR.GetCreationTimestamp().Time
if createdAt.IsZero() {
createdAt = time.Now()
}
var expiresAt time.Time
if sandboxCR.Spec.Lifecycle.ShutdownTime != nil {
expiresAt = sandboxCR.Spec.Lifecycle.ShutdownTime.Time
} else {
expiresAt = time.Now().Add(DefaultSandboxTTL)
}
| return nil | ||
| } | ||
|
|
||
| func (s *Server) waitForDirectSandboxReady(ctx context.Context, sandbox *sandboxv1alpha1.Sandbox, resultChan <-chan SandboxStatusUpdate) (*sandboxv1alpha1.Sandbox, error) { |
| select { | ||
| case result := <-resultChan: |
| // if warmpool is used, the pod name is stored in sandbox's annotation `agents.x-k8s.io/pod-name` | ||
| sandboxNameForPod := createdSandbox.Name | ||
| sandboxPodName := createdSandbox.Name | ||
| if podName, exists := createdSandbox.Annotations[sandboxv1alpha1.SandboxPodNameAnnotation]; exists { |
| warmPoolSandboxes := make(map[string]struct{}, len(sandboxList.Items)) | ||
| for _, sandbox := range sandboxList.Items { |
| for _, owner := range sandbox.OwnerReferences { | ||
| if owner.Kind == ownerKindSandboxWarmPool && owner.Name == codeInterpreterName { | ||
| warmPoolSandboxes[sandbox.Name] = struct{}{} |
| _, ownedByWarmPoolSandbox := warmPoolSandboxes[owner.Name] | ||
| if (owner.Kind == ownerKindSandboxWarmPool && owner.Name == codeInterpreterName) || | ||
| (owner.Kind == "Sandbox" && ownedByWarmPoolSandbox) { |
| func (f *recordingStore) UpdateSandbox(ctx context.Context, sandbox *types.SandboxInfo) error { | ||
| f.fakeStore.UpdateSandbox(ctx, sandbox) | ||
| if f.updateErr != nil { | ||
| return f.updateErr | ||
| } | ||
| copied := *sandbox | ||
| f.lastUpdated = &copied | ||
| return nil | ||
| } |
What type of PR is this?
/kind bug
What this PR does / why we need it:
This PR adapts AgentCube's CodeInterpreter warm pool integration to
sigs.k8s.io/agent-sandbox v0.4.6.AgentCube currently depends on
agent-sandbox v0.1.1. With newer agent-sandbox releases, a direct dependency bump no longer works:SandboxPodNameAnnotationmoved out of the internalcontrollerspackage and is now exposed from the public sandbox API package;SandboxWarmPool -> Sandbox -> Pod, andSandboxClaimreports the serving Sandbox through status;agent-sandbox v0.4.6defaultsSandboxTemplate.networkPolicyManagementto managed NetworkPolicy, which blocks AgentCube's current Router / WorkloadManager data path unless AgentCube opts out or provides matching allow rules.This PR:
sigs.k8s.io/agent-sandboxtov0.4.6and updates the required Go / Kubernetes dependencies;sandboxv1alpha1.SandboxPodNameAnnotationconstant instead of importing an internal agent-sandbox controller package;SandboxClaim.Status.SandboxStatus.Name, fetches the adopted Sandbox, waits until it is Ready, and uses that Sandbox for Pod IP / entrypoint probing;SandboxClaimname in the AgentCube session store when the request kind isSandboxClaim, so delete / GC still operate on the claim resource;SandboxTemplateobjects tonetworkPolicyManagement: Unmanagedto preserve the existing AgentCube Router / WorkloadManager traffic path;SandboxWarmPool -> Sandbox -> Podownership shape;v0.35.4generator stack;hack/update-codegen.shso code generation usesk8s.io/code-generator v0.35.4without mutating project dependencies throughgo get -d;1.26.2.Which issue(s) this PR fixes:
Refs #386
Special notes for your reviewer:
agent-sandbox v0.4.6, which is the current Go module@latest. I did not targetv0.5.0rc1because Go resolves that tag to a pseudo-version rather than the stable latest release.agent-sandbox v0.4.6requires Go1.26.2and pulls Kubernetes / controller-runtime forward. Please confirm this dependency bump is acceptable for the v0.2.0 compatibility work.networkPolicyManagement: Unmanagedkeeps the behavior AgentCube had withagent-sandbox v0.1.1. If reviewers prefer to keep agent-sandbox managed NetworkPolicies, the alternative is to add explicit allow rules foragentcube-routerandworkloadmanager.hack/update-codegen.shwas also aligned to the new Kubernetes minor version because the oldcode-generator v0.34.1path downgradedagent-sandboxduring generation.Tests run:
git diff --check/tmp/go-toolchain/go/bin/go test ./pkg/workloadmanager -count=1/tmp/go-toolchain/go/bin/go test -race ./pkg/workloadmanager -count=1/tmp/go-toolchain/go/bin/go list ./... | rg -v '^github.com/volcano-sh/agentcube/test/e2e$' | xargs /tmp/go-toolchain/go/bin/go testPATH=/tmp/go-toolchain/go/bin:$PATH make gen-allPATH=/tmp/go-toolchain/go/bin:$PATH make gen-checkPATH=/tmp/go-toolchain/go/bin:$PATH make build-allPATH=/tmp/go-toolchain/go/bin:$PATH make helm-templatePATH=/tmp/go-toolchain/go/bin:$PATH make helm-lintmake docker-build WORKLOAD_MANAGER_IMAGE=workloadmanager:day16-v046make docker-build-router ROUTER_IMAGE=agentcube-router:day16-v046make docker-build-picod PICOD_IMAGE=picod:day16-v0465316358:make docker-build WORKLOAD_MANAGER_IMAGE=workloadmanager:day16-v046-final5316358:make docker-build-router ROUTER_IMAGE=agentcube-router:day16-v046-final5316358:make docker-build-picod PICOD_IMAGE=picod:day16-v046-finalgo test ./test/e2e -run 'TestCodeInterpreter(BasicInvocation|FileOperations)$' -count=1passedgo test ./test/e2e -run 'TestCodeInterpreterWarmPool$' -count=1passedgo test ./test/e2e -run 'TestCodeInterpreterWarmPoolLoad$' -count=1passed with 100 / 100 successful requeststest/e2e/test_codeinterpreter.py: 3 tests OKtest/e2e/test_langchain_agentcube_sandbox.py: 4 tests OKtest/e2e/test_mcp_code_interpreter.py: 5 tests OKtest/e2e/test_mcp_code_interpreter_stdio.py: 1 test OK/v1endpoint returned the expected final answer42Does this PR introduce a user-facing change?: