Skip to content

feat(sandbox): add Docker container resource usage metrics API#69

Open
DeryFerd wants to merge 1 commit into
anvie:mainfrom
DeryFerd:feat/docker-sandbox-metrics
Open

feat(sandbox): add Docker container resource usage metrics API#69
DeryFerd wants to merge 1 commit into
anvie:mainfrom
DeryFerd:feat/docker-sandbox-metrics

Conversation

@DeryFerd

Copy link
Copy Markdown
Contributor

Problem

Evonic runs Docker containers to sandbox agent tool execution (bash, Python, file operations). Right now there's no visibility into what those containers are doing resource-wise. When you have multiple agents running simultaneously, you can't tell which ones are chewing through CPU or memory, and you can't tell when you're approaching capacity limits.

This becomes an actual problem in a few scenarios:

  • Multi-agent deployments: You spin up 5+ agents, things slow down, but you have no idea which agent session is the bottleneck
  • Runaway processes: An agent starts an infinite loop or leaks memory inside the sandbox, and you don't notice until everything grinds to a halt
  • Capacity planning: You want to increase SANDBOX_MAX_CONTAINERS from 10 to 20, but you don't know if your host can handle it because you're flying blind

The Docker backend already tracks container lifecycle (creation, idle timeout, LRU eviction), but it doesn't expose resource consumption data to the web UI or API consumers.

What Changed

This PR adds two new API endpoints that surface real-time resource metrics for Docker sandbox containers:

1. Per-Agent Container Stats

GET /api/agents/<agent_id>/sandbox/stats

Returns resource usage for all active containers belonging to a specific agent's sessions. Useful when you want to drill down into one agent's behavior.

Response includes:

  • Active container count for that agent
  • Per-session breakdown with container ID, CPU%, memory usage/limit, network I/O, block I/O, PIDs
  • Links each container back to its session ID and external user ID

Example output:

{
  "agent_id": "admin",
  "agent_name": "Admin Assistant",
  "sandbox_enabled": true,
  "active_containers": 2,
  "containers": [
    {
      "session_id": "6c1d9542",
      "external_user_id": "user@example.com",
      "container_id": "a3f8b2c1",
      "container_name": "evonic-6c1d9542-admin",
      "cpu_percent": 15.3,
      "memory": {
        "used": "142MiB",
        "limit": "512MiB",
        "percent": "27.7%"
      },
      "network": {
        "input": "2.1MB",
        "output": "4.8MB"
      },
      "block_io": {
        "read": "8.3MB",
        "write": "12.5MB"
      },
      "pids": "28"
    }
  ]
}

2. Pool-Wide Aggregate Stats

GET /api/sandbox/stats

Returns metrics for every active container across all agents. Useful for monitoring overall system health and spotting resource hogs.

Response includes:

  • Pool size vs max capacity
  • Per-container breakdown with agent ID association
  • Aggregate totals (sum of all CPU%, active container count)

Example output:

{
  "pool_size": 7,
  "max_containers": 10,
  "containers": [
    {
      "session_id": "6c1d9542",
      "agent_id": "admin",
      "container_id": "a3f8b2c1",
      "cpu_percent": 8.2,
      "memory": {...},
      "network": {...},
      "block_io": {...},
      "pids": "18"
    },
    // ... 6 more containers
  ],
  "aggregate": {
    "total_cpu_percent": 42.7,
    "active_containers": 7
  }
}

Implementation Details

Backend changes (backend/tools/lib/backends/docker_backend.py):

  • Added get_container_stats(session_id) — fetches metrics for a single container using docker stats --no-stream --format '{{json .}}' for a snapshot
  • Added get_all_container_stats() — iterates the container pool and aggregates metrics across all active sessions
  • Parses Docker's output format (handles strings like "123.4MiB / 512MiB", "12.34%", etc.)
  • Returns structured dicts with separate memory/network/block_io breakdowns
  • Handles error cases (container doesn't exist, Docker command fails, parse errors)

API changes (routes/agents.py):

  • Added /api/agents/<agent_id>/sandbox/stats endpoint — checks if agent exists, if sandbox is enabled, then queries get_container_stats() for each session belonging to that agent
  • Added /api/sandbox/stats endpoint — calls get_all_container_stats() directly and returns the full pool view
  • Both endpoints require authentication (existing session check)
  • Returns 400 if sandbox is disabled for an agent
  • Returns 404 if agent doesn't exist

Data source: Uses Docker's native stats command (same as docker stats in CLI), so there's no polling overhead or background daemon needed. Each API call fetches a single snapshot.

Why This Matters

  1. Operational visibility: You can finally see what's happening inside the sandbox. No more guessing which agent session is stuck or overloaded.

  2. Debugging: When an agent starts acting weird, check the metrics first. If CPU is pinned at 100% or memory is maxed out, you know it's a sandbox problem, not an LLM prompt issue.

  3. Capacity planning: Before you bump SANDBOX_MAX_CONTAINERS to 50, run a test with 10 agents under load and check the aggregate CPU/memory totals. Now you have actual data to base that decision on.

  4. Future integration: These endpoints are REST-friendly and JSON-based, so they can easily feed into monitoring dashboards, Prometheus exporters, or whatever observability stack you're running.

What This Doesn't Do

  • No historical data: These are point-in-time snapshots. If you want graphs over time, you'll need to poll these endpoints externally and store the results.
  • No alerting: The API just returns data. You still need to wire up your own alerts if CPU goes over 80% or memory hits the limit.
  • No container logs: This PR only exposes resource metrics. If you need to see what the container is actually printing, you still use docker logs <container_id> manually.
  • No Windows container support: Evonic's Docker backend assumes Linux containers. If you're running Docker Desktop on Windows with Windows containers, this probably won't work (untested).

Testing

Validated manually on a local Evonic instance with Docker Desktop:

  1. Created two agents with sandbox enabled
  2. Started sessions for both agents, triggered bash/Python tool calls to spin up containers
  3. Hit /api/agents/<agent_id>/sandbox/stats — confirmed it returned metrics for active sessions only
  4. Hit /api/sandbox/stats — confirmed it showed all containers across both agents with correct agent_id associations
  5. Verified that requesting stats for an agent with no active sessions returns active_containers: 0 with an empty array
  6. Verified that requesting stats for an agent with sandbox_enabled: 0 returns a 400 error with appropriate message
  7. Checked that CPU/memory/network/block_io values matched what docker stats showed in the terminal

No automated tests were added because the Docker backend test suite doesn't currently mock container stats output, and adding that felt out of scope for this PR. If you want test coverage, let me know and I can follow up.

Compatibility

  • Backward compatible: No breaking changes. Existing routes, database schema, and Docker backend behavior are unchanged.
  • Docker version: Tested with Docker Engine 24.x. Should work with any version that supports docker stats --no-stream --format '{{json .}}' (Docker 1.13+).
  • Python version: No new dependencies. Uses only stdlib (json, subprocess) and existing Evonic modules.

Potential Follow-Ups

If this lands and people find it useful, a few natural extensions:

  • Web UI widget: Add a "Sandbox Stats" panel in the agent detail page that auto-refreshes every 5s
  • Per-tool breakdown: Track which specific tool call (bash, runpy, read_file, etc.) triggered high resource usage
  • Historical logging: Store snapshots to SQLite every minute and expose a /api/sandbox/stats/history?since=<timestamp> endpoint
  • Alert thresholds: Let admins configure CPU/memory limits per agent and send notifications when exceeded

…al-time resource monitoring for Docker sandbox containers with CPU, memory, network, and block I/O metrics. Changes: - backend/tools/lib/backends/docker_backend.py: * Add get_container_stats() to fetch metrics for a specific container * Add get_all_container_stats() for aggregate pool-wide metrics * Uses 'docker stats --no-stream' for snapshot data - routes/agents.py: * Add GET /api/agents/<id>/sandbox/stats per-agent container metrics * Add GET /api/sandbox/stats for all active containers * Returns CPU%, memory usage/limit, network I/O, block I/O, PIDs Benefits: - Monitor resource consumption of running sandboxes - Identify resource-hungry agents or sessions - Optimize sandbox limits based on actual usage - Better capacity planning for multi-agent deployments API response format: { "container_id": "abc123", "cpu_percent": 12.5, "memory": {"used": "128MiB", "limit": "512MiB", "percent": "25%"}, "network": {"input": "1.2MB", "output": "3.4MB"}, "block_io": {"read": "5.6MB", "write": "7.8MB"}, "pids": "42" }
@DeryFerd DeryFerd changed the title feat(sandbox): add Docker container resource usage metrics API Add re… feat(sandbox): add Docker container resource usage metrics API Jun 18, 2026
irfansaf pushed a commit to irfansaf/evonic that referenced this pull request Jun 20, 2026
saveFromDetail() was mutating the local tasks array with Object.assign
but never calling loadTasks(), so the board columns (Todo/In Progress/Done)
never re-rendered after editing a task from the detail modal's inline edit
panel. The main edit modal (handleSubmit) already did this correctly.
Changed saveFromDetail to call loadTasks() on success, matching the
pattern used by handleSubmit, moveTask, and deleteTask.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant