Skip to content

Celery workers OOM on hosts with many CPUs — --concurrency is unbounded by default #10968

@b-abderrahmane

Description

@b-abderrahmane

Steps to Reproduce

Deploy prowler-api worker mode (e.g., prowlercloud/prowler-api:5.22.0, image entrypoint docker-entrypoint.sh worker) on a Kubernetes node with many CPUs (typical AKS / EKS node sizes: 16, 20, 32 cores), with no Linux CPU pinning on the pod.

resources:
  requests: { memory: "1Gi" }
  limits:   { memory: "4Gi" }

Expected behavior

A bounded number of Celery prefork workers, with predictable per-pod memory consumption regardless of host CPU count.

Actual behavior

api/docker-entrypoint.sh runs:

poetry run python -m celery -A config.celery worker \
  -Q celery,scans,scan-reports,deletion,backfill,overview,integrations,compliance,attack-paths-scans \
  -E --max-tasks-per-child 1

--concurrency is not set on the command line, and config/celery.py does celery_app.config_from_object("django.conf:settings", namespace="CELERY") but Django settings don't define CELERY_WORKER_CONCURRENCY either, so Celery falls back to its default (os.cpu_count() of the host).

On a 20-core AKS node this spawns 20 prefork worker processes. Each loads the Prowler virtualenv plus the Azure / AWS / GCP / M365 SDK clients — about 170 MiB resident at idle. The pod sits at ~3.4 GiB resident with zero scans running, and any scan load tips it over the 4 GiB limit:

$ kubectl top pods
NAME                              CPU(cores)   MEMORY(bytes)
prowler-worker-6b77c7b8f8-2xlp4   2m           3395Mi
prowler-worker-6b77c7b8f8-gkpc4   3m           3393Mi

$ kubectl get pods -l app=prowler-worker -o jsonpath='...'
prowler-worker-6b77c7b8f8-2xlp4  restartCount=4  lastTerminated=OOMKilled  exitCode=137
prowler-worker-6b77c7b8f8-gkpc4  restartCount=8  lastTerminated=OOMKilled  exitCode=137

The same image on a 4-core developer machine would idle below 1 GiB and never hit the limit, which is probably why this hasn't surfaced in local testing — the symptom is host-dependent and only shows up on larger Kubernetes nodes.

A note on --max-tasks-per-child 1

The existing --max-tasks-per-child 1 is a great safeguard against per-task memory leaks — each child is recycled after one task. It doesn't, however, bound the number of concurrent child slots. With concurrency = 20 we still keep 20 long-lived child processes alive (recycled in place after each task), each carrying the SDK working set.

Suggested fix

Either of the two paths would make this configurable for operators without forking:

  1. Pass --concurrency from an env var in the entrypoint (smallest change):
    poetry run python -m celery -A config.celery worker \
      -Q ... -E --max-tasks-per-child 1 \
      --concurrency "${CELERY_WORKER_CONCURRENCY:-4}"
  2. Set the Django setting (cleaner, also picked up by config_from_object):
    # config/django/base.py
    CELERY_WORKER_CONCURRENCY = env.int("CELERY_WORKER_CONCURRENCY", default=4)

A default of 4 feels like a reasonable starting point given the per-process SDK memory footprint; operators who need more can raise it explicitly. Falling through to host nproc is convenient on dev machines, but it's not the best default for Kubernetes deployments where pods routinely see many more CPUs than they're sized for in memory.

Happy to send a PR if the maintainers would find it useful.

How did you install Prowler?

Helm-style Kubernetes manifests on AKS, image prowlercloud/prowler-api:5.22.0.

Environment Resource

AKS, 20-core nodes (Standard_D20s_v5).

OS used

Container base — Prowler image; Linux node.

Prowler version

api 5.22.0 — also present on main at HEAD (api/docker-entrypoint.sh and api/src/backend/config/celery.py).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions