feat(sandbox): add Docker container resource usage metrics API by DeryFerd · Pull Request #69 · anvie/evonic

DeryFerd · 2026-06-18T06:07:50Z

Problem

Evonic runs Docker containers to sandbox agent tool execution (bash, Python, file operations). Right now there's no visibility into what those containers are doing resource-wise. When you have multiple agents running simultaneously, you can't tell which ones are chewing through CPU or memory, and you can't tell when you're approaching capacity limits.

This becomes an actual problem in a few scenarios:

Multi-agent deployments: You spin up 5+ agents, things slow down, but you have no idea which agent session is the bottleneck
Runaway processes: An agent starts an infinite loop or leaks memory inside the sandbox, and you don't notice until everything grinds to a halt
Capacity planning: You want to increase SANDBOX_MAX_CONTAINERS from 10 to 20, but you don't know if your host can handle it because you're flying blind

The Docker backend already tracks container lifecycle (creation, idle timeout, LRU eviction), but it doesn't expose resource consumption data to the web UI or API consumers.

What Changed

This PR adds two new API endpoints that surface real-time resource metrics for Docker sandbox containers:

1. Per-Agent Container Stats

GET /api/agents/<agent_id>/sandbox/stats

Returns resource usage for all active containers belonging to a specific agent's sessions. Useful when you want to drill down into one agent's behavior.

Response includes:

Active container count for that agent
Per-session breakdown with container ID, CPU%, memory usage/limit, network I/O, block I/O, PIDs
Links each container back to its session ID and external user ID

Example output:

{
  "agent_id": "admin",
  "agent_name": "Admin Assistant",
  "sandbox_enabled": true,
  "active_containers": 2,
  "containers": [
    {
      "session_id": "6c1d9542",
      "external_user_id": "user@example.com",
      "container_id": "a3f8b2c1",
      "container_name": "evonic-6c1d9542-admin",
      "cpu_percent": 15.3,
      "memory": {
        "used": "142MiB",
        "limit": "512MiB",
        "percent": "27.7%"
      },
      "network": {
        "input": "2.1MB",
        "output": "4.8MB"
      },
      "block_io": {
        "read": "8.3MB",
        "write": "12.5MB"
      },
      "pids": "28"
    }
  ]
}

2. Pool-Wide Aggregate Stats

GET /api/sandbox/stats

Returns metrics for every active container across all agents. Useful for monitoring overall system health and spotting resource hogs.

Response includes:

Pool size vs max capacity
Per-container breakdown with agent ID association
Aggregate totals (sum of all CPU%, active container count)

Example output:

{
  "pool_size": 7,
  "max_containers": 10,
  "containers": [
    {
      "session_id": "6c1d9542",
      "agent_id": "admin",
      "container_id": "a3f8b2c1",
      "cpu_percent": 8.2,
      "memory": {...},
      "network": {...},
      "block_io": {...},
      "pids": "18"
    },
    // ... 6 more containers
  ],
  "aggregate": {
    "total_cpu_percent": 42.7,
    "active_containers": 7
  }
}

Implementation Details

Backend changes (backend/tools/lib/backends/docker_backend.py):

Added get_container_stats(session_id) — fetches metrics for a single container using docker stats --no-stream --format '{{json .}}' for a snapshot
Added get_all_container_stats() — iterates the container pool and aggregates metrics across all active sessions
Parses Docker's output format (handles strings like "123.4MiB / 512MiB", "12.34%", etc.)
Returns structured dicts with separate memory/network/block_io breakdowns
Handles error cases (container doesn't exist, Docker command fails, parse errors)

API changes (routes/agents.py):

Added /api/agents/<agent_id>/sandbox/stats endpoint — checks if agent exists, if sandbox is enabled, then queries get_container_stats() for each session belonging to that agent
Added /api/sandbox/stats endpoint — calls get_all_container_stats() directly and returns the full pool view
Both endpoints require authentication (existing session check)
Returns 400 if sandbox is disabled for an agent
Returns 404 if agent doesn't exist

Data source: Uses Docker's native stats command (same as docker stats in CLI), so there's no polling overhead or background daemon needed. Each API call fetches a single snapshot.

Why This Matters

Operational visibility: You can finally see what's happening inside the sandbox. No more guessing which agent session is stuck or overloaded.
Debugging: When an agent starts acting weird, check the metrics first. If CPU is pinned at 100% or memory is maxed out, you know it's a sandbox problem, not an LLM prompt issue.
Capacity planning: Before you bump SANDBOX_MAX_CONTAINERS to 50, run a test with 10 agents under load and check the aggregate CPU/memory totals. Now you have actual data to base that decision on.
Future integration: These endpoints are REST-friendly and JSON-based, so they can easily feed into monitoring dashboards, Prometheus exporters, or whatever observability stack you're running.

What This Doesn't Do

No historical data: These are point-in-time snapshots. If you want graphs over time, you'll need to poll these endpoints externally and store the results.
No alerting: The API just returns data. You still need to wire up your own alerts if CPU goes over 80% or memory hits the limit.
No container logs: This PR only exposes resource metrics. If you need to see what the container is actually printing, you still use docker logs <container_id> manually.
No Windows container support: Evonic's Docker backend assumes Linux containers. If you're running Docker Desktop on Windows with Windows containers, this probably won't work (untested).

Testing

Validated manually on a local Evonic instance with Docker Desktop:

Created two agents with sandbox enabled
Started sessions for both agents, triggered bash/Python tool calls to spin up containers
Hit /api/agents/<agent_id>/sandbox/stats — confirmed it returned metrics for active sessions only
Hit /api/sandbox/stats — confirmed it showed all containers across both agents with correct agent_id associations
Verified that requesting stats for an agent with no active sessions returns active_containers: 0 with an empty array
Verified that requesting stats for an agent with sandbox_enabled: 0 returns a 400 error with appropriate message
Checked that CPU/memory/network/block_io values matched what docker stats showed in the terminal

No automated tests were added because the Docker backend test suite doesn't currently mock container stats output, and adding that felt out of scope for this PR. If you want test coverage, let me know and I can follow up.

Compatibility

Backward compatible: No breaking changes. Existing routes, database schema, and Docker backend behavior are unchanged.
Docker version: Tested with Docker Engine 24.x. Should work with any version that supports docker stats --no-stream --format '{{json .}}' (Docker 1.13+).
Python version: No new dependencies. Uses only stdlib (json, subprocess) and existing Evonic modules.

Potential Follow-Ups

If this lands and people find it useful, a few natural extensions:

Web UI widget: Add a "Sandbox Stats" panel in the agent detail page that auto-refreshes every 5s
Per-tool breakdown: Track which specific tool call (bash, runpy, read_file, etc.) triggered high resource usage
Historical logging: Store snapshots to SQLite every minute and expose a /api/sandbox/stats/history?since=<timestamp> endpoint
Alert thresholds: Let admins configure CPU/memory limits per agent and send notifications when exceeded

…al-time resource monitoring for Docker sandbox containers with CPU, memory, network, and block I/O metrics. Changes: - backend/tools/lib/backends/docker_backend.py: * Add get_container_stats() to fetch metrics for a specific container * Add get_all_container_stats() for aggregate pool-wide metrics * Uses 'docker stats --no-stream' for snapshot data - routes/agents.py: * Add GET /api/agents/<id>/sandbox/stats per-agent container metrics * Add GET /api/sandbox/stats for all active containers * Returns CPU%, memory usage/limit, network I/O, block I/O, PIDs Benefits: - Monitor resource consumption of running sandboxes - Identify resource-hungry agents or sessions - Optimize sandbox limits based on actual usage - Better capacity planning for multi-agent deployments API response format: { "container_id": "abc123", "cpu_percent": 12.5, "memory": {"used": "128MiB", "limit": "512MiB", "percent": "25%"}, "network": {"input": "1.2MB", "output": "3.4MB"}, "block_io": {"read": "5.6MB", "write": "7.8MB"}, "pids": "42" }

saveFromDetail() was mutating the local tasks array with Object.assign but never calling loadTasks(), so the board columns (Todo/In Progress/Done) never re-rendered after editing a task from the detail modal's inline edit panel. The main edit modal (handleSubmit) already did this correctly. Changed saveFromDetail to call loadTasks() on success, matching the pattern used by handleSubmit, moveTask, and deleteTask.

DeryFerd changed the title ~~feat(sandbox): add Docker container resource usage metrics API Add re…~~ feat(sandbox): add Docker container resource usage metrics API Jun 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(sandbox): add Docker container resource usage metrics API#69

feat(sandbox): add Docker container resource usage metrics API#69
DeryFerd wants to merge 1 commit into
anvie:mainfrom
DeryFerd:feat/docker-sandbox-metrics

DeryFerd commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DeryFerd commented Jun 18, 2026

Problem

What Changed

1. Per-Agent Container Stats

2. Pool-Wide Aggregate Stats

Implementation Details

Why This Matters

What This Doesn't Do

Testing

Compatibility

Potential Follow-Ups

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant