Skip to content

Work Intent: fix frozen NetIO/BlockIO in dockerContainerStats subscription #2008

@jandrop

Description

@jandrop

Overview

Fix #2007: the dockerContainerStats GraphQL subscription emits CPU updates correctly but the cumulative NetIO and BlockIO fields stay frozen at the first sample for the lifetime of the subscription. Consumers that derive per-second rates from consecutive emissions always read 0 B/s — verified against a live Unraid 7.3 box with an actively downloading qBittorrent container (docker stats --no-stream returned the same 40.1GB / 220GB across 12 s while /containers/<id>/stats on the Docker socket showed +109 MB rx in 3 s = 36 MB/s).

Root cause: DockerStatsService.startStatsStream() spawns execa('docker', ['stats', '--format', ..., '--no-trunc']) and parses each output line. The docker CLI's "live" mode keeps the cumulative counters from its initial snapshot — they don't refresh across output ticks the way the per-container /stats socket endpoint does.

Technical Approach

Rewrite DockerStatsService to stream from the Docker daemon socket directly using dockerode (already a dependency, used by DockerEventService):

  • startStatsStream() calls docker.listContainers() and opens one container.stats({ stream: true }) socket per running container. It also subscribes to docker.getEvents({ filters: { type: ['container'] } }) so streams are added on start and torn down on die / stop / kill / destroy.
  • Each stats chunk is parsed: CPU% via the standard ((cpu_delta / system_delta) × online_cpus × 100) formula, memory used as usage − stats.cache, sums over networks and blkio_stats.io_service_bytes_recursive, formatted with the existing binary units (KiB / MiB / GiB) so the GraphQL response shape is unchanged.
  • stopStatsStream() destroys every active socket plus the events stream so OnModuleDestroy releases all resources.
  • Reuses the existing getDockerClient() helper so the socket path stays in one place.

The change is internal to the service — the DockerContainerStats model, the GraphQL schema, the resolver and the pubsub channel are all untouched. Consumers see fresh values without any client-side change.

Implementation already prepared on jandrop/api:fix/docker-stats-cli-cache for reference:

  • 196 lines added / 76 removed in docker-stats.service.ts
  • 22 vitest cases in a new docker-stats.service.spec.ts covering CPU formula + edge cases, memory minus cache, network sum, blkio reads/writes, docker events (start adds, die/stop/kill/destroy remove), malformed JSON resilience, idempotent startStatsStream, OnModuleDestroy cleanup
  • pnpm lint, pnpm type-check, pnpm test all clean (1991 tests pass)

Scope

  • API
  • Plugin
  • Web UI
  • Build/Deploy Process
  • Documentation

Timeline & Impact

  • Estimated time: implementation already done locally; ~1 day for review iteration + any reviewer-requested changes.
  • Impact: behaviour fix only — DockerContainerStats GraphQL response shape is identical, just with non-frozen values. No new dependencies (dockerode is already used by DockerEventService on the same socket).
  • Risk: low. Falls back to the existing fail-on-error path; events stream and per-container streams are independently recoverable.

Pre-submission Checklist

  • I have searched for similar work/issues
  • I understand this needs approval before starting
  • I am willing to make adjustments based on feedback

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions