Skip to content

# [BUG] Embedded Redis BGSAVE failures silently break all uploads with HTTP 500 #642

@wnstfy

Description

@wnstfy

[BUG] Embedded Redis BGSAVE failures silently break all uploads with HTTP 500

Edition: DocuSeal Enterprise (licensed) — image ee.docuseal.eu/***eul5h/ds-ee:latest
Redis (embedded): redis_version:8.4.2, process_id:30, started by lib/puma/plugin/redis_server.rb
Symptom: POST /api/attachments returns HTTP 500. UI shows "We're sorry, but something went wrong".
Severity: App-breaking — once it triggers, all writes to the app fail until the embedded Redis is restarted or stop-writes-on-bgsave-error is flipped.


TL;DR

The Enterprise image bundles Redis as a forked subprocess of Puma. The fork command starts redis-server without a config file and pipes its output to /dev/null. As a result:

  1. Redis runs with default RDB persistence (save 3600 1 300 100 60 10000) and default stop-writes-on-bgsave-error yes.
  2. The configured dir is the app's WORKDIR (/data/docuseal) — not /var/lib/redis from the bundled /etc/redis.conf, which is never loaded.
  3. If anything causes BGSAVE to fail (memory pressure, hardened container security, restricted filesystem, etc.), Redis blocks all writes with MISCONF.
  4. Because Redis stdout is /dev/null, the operator gets zero diagnostic information — the only sign is generic 500s in the app and a long trail of RedisClient::CommandError: MISCONF in the Rails logs.
  5. There is no documented way to disable RDB persistence, override the save policy, or point the app at an external Redis (the REDIS_URL env var simply makes the embedded Redis start, see [Doc] REDIS_URL information missing in documentation #587).

I hit this on a hardened Docker setup (cap_drop: ALL, no-new-privileges:true, resource limits). On my container, BGSAVE has failed 161,554 consecutive times over 11 days of uptime.


Root cause (in code)

1. Redis is started without a config file

lib/puma/plugin/redis_server.rb — relevant excerpt observed inside the EE image:

def fork_redis
  fork do
    Process.setsid
    Dir.chdir(ENV.fetch('WORKDIR', nil)) unless ENV['WORKDIR'].to_s.empty?

    exec('redis-server', '--requirepass',
         Digest::SHA1.hexdigest("redis#{ENV.fetch('SECRET_KEY_BASE', '')}"),
         out: '/dev/null')
  end
end

Issues with this:

  • No config file argument. redis-server --requirepass <hash> ignores the bundled /etc/redis.conf, so all dir, dbfilename, save, logfile, stop-writes-on-bgsave-error settings silently default to Redis built-ins.
  • out: '/dev/null' silences every Redis log line — startup banner, fork errors, BGSAVE errors, everything.
  • The Redis dir ends up being the WORKDIR (/data/docuseal), which collocates Redis RDB writes with user attachments — surprising, undocumented, and breaks if the volume's permissions/ownership aren't perfectly aligned.

2. LOCAL_REDIS_URL makes external Redis impossible

config/dotenv.rb:

ENV['LOCAL_REDIS_URL'] = ENV.fetch('REDIS_URL', nil)

And in the plugin:

return if ENV['LOCAL_REDIS_URL'].to_s.empty?

So setting REDIS_URL to point at an external Redis still triggers the embedded Redis to start (because LOCAL_REDIS_URL is now non-empty). There is no way to opt out of the embedded Redis without modifying the image. This compounds with #587 (REDIS_URL undocumented).


Reproduction

Hardened Docker Compose stack (relevant snippet):

services:
  docuseal-app:
    image: ee.docuseal.eu/***eul5h/ds-ee:latest
    security_opt:
      - no-new-privileges:true
    cap_drop:
      - ALL
    cap_add:
      - CHOWN
      - SETUID
      - SETGID
      - DAC_OVERRIDE
      - FOWNER
      - SYS_RESOURCE
      - IPC_LOCK
    deploy:
      resources:
        limits:
          memory: 2G
          cpus: '4'
          pids: 500
    volumes:
      - ${PWD}/app-data:/data/docuseal

Steps:

  1. Deploy the stack.
  2. Sign in, attempt to upload a document → HTTP 500.
  3. Check docker compose logs docuseal-app → flood of MISCONF Redis is configured to save RDB snapshots errors with RedisClient::CommandError.

The container reports healthy throughout, masking the failure.


Proofs

Proof 1 — Redis is running embedded, started without a config file

$ docker exec docuseal-app ps aux
PID   USER     TIME  COMMAND
    1 docuseal 18:36 {ruby} puma 7.2.0 (tcp://0.0.0.0:3000) [docuseal]
   30 root      8:57 redis-server *:6379

Only one container, two processes — Puma and Redis. The fact that we see redis-server *:6379 and not redis-server /etc/redis.conf already tells us no config file is being loaded.

Proof 2 — Bundled /etc/redis.conf is not loaded

The bundled config inside the image specifies:

$ docker exec docuseal-app grep -E "^(dir|save|dbfilename|stop-writes-on-bgsave-error)" /etc/redis.conf
stop-writes-on-bgsave-error yes
dbfilename dump.rdb
dir /var/lib/redis

But the actual running config says otherwise:

$ redis-cli CONFIG GET dir
1) "dir"
2) "/data/docuseal"

$ redis-cli CONFIG GET save
1) "save"
2) "3600 1 300 100 60 10000"

$ redis-cli CONFIG GET dbfilename
1) "dbfilename"
2) "dump.rdb"

dir = /data/docuseal (the WORKDIR), not /var/lib/redis from the conf file → confirms /etc/redis.conf is never loaded. Redis is using built-in defaults plus the single --requirepass override from the Puma plugin.

Proof 3 — BGSAVE failing constantly (161,554 consecutive failures, 11 days)

$ redis-cli INFO server | grep -E "redis_version|process_id|uptime_in_seconds"
redis_version:8.4.2
process_id:30
uptime_in_seconds:969618        # ≈ 11.2 days

$ redis-cli INFO persistence
rdb_changes_since_last_save:253
rdb_bgsave_in_progress:0
rdb_last_save_time:1776420183
rdb_last_bgsave_status:err
rdb_last_bgsave_time_sec:0
rdb_saves:161554
rdb_saves_consecutive_failures:161554
  • rdb_last_bgsave_status:err — every save fails.
  • rdb_last_bgsave_time_sec:0 — they fail instantly (i.e. at the fork stage, before any byte is written), strongly indicating fork() itself is being denied.
  • rdb_saves_consecutive_failures:161554 — there has never been a successful BGSAVE since the container started.

Proof 4 — Logs are empty (zero diagnostics)

$ docker exec docuseal-app ls -la /var/log/redis/
total 8
drwxr-xr-x    2 redis    redis         4096 Mar 31 16:58 .
drwxr-xr-x    1 root     root          4096 Mar 31 16:58 ..

No logfile. Because:

  1. /etc/redis.conf (which sets logfile /var/log/redis/redis.log) is not loaded.
  2. The plugin redirects Redis stdout to /dev/null.

So when BGSAVE fails for whatever reason (capability restriction, OOM, fs error...), the operator has no log entry to point at the cause.

Proof 5 — User-visible failure cascade

docker compose logs docuseal-app (snippet):

ERROR sidekiq heartbeat: MISCONF Redis is configured to save RDB snapshots,
but it's currently unable to persist to disk. Commands that may modify
the data set are disabled, because this instance is configured to report
errors during writes if RDB snapshotting fails (stop-writes-on-bgsave-error option).

ERROR ... (redis://0.0.0.0:6379) (RedisClient::CommandError)

[aac157d7-...] rack-session/lib/rack/session/abstract/id.rb:274:in
'Rack::Session::Abstract::Persisted#context'
[aac157d7-...] actionpack/lib/action_dispatch/middleware/cookies.rb:708:in
'ActionDispatch::Cookies#call'
... (full Rails stack trace, every request) ...

Every request that touches the Rails session (i.e. every authenticated request) explodes because the session store can't write to Redis.

Proof 6 — Workaround proves the cause

$ redis-cli CONFIG SET stop-writes-on-bgsave-error no
OK
$ redis-cli SET test_write hello
OK
$ redis-cli GET test_write
"hello"

After flipping that one setting, writes succeed and uploads start working again. The fix doesn't survive a Redis restart (it's not in any config file), so it has to be re-applied every time.


Why this is genuinely a bug (not just my hardening)

The hardened Docker setup is the trigger I observed, but the design is fragile in at least four independent ways:

  1. Default stop-writes-on-bgsave-error yes + default RDB save policy = any transient BGSAVE failure (memory pressure, fork failure, full disk, slow disk, container kill in mid-fork, etc.) takes the entire app down. PostgreSQL is the system of record; Redis here is a job queue/session store, so RDB persistence isn't critical and shouldn't be the thing that breaks the app.
  2. out: '/dev/null' in the fork — silent failures by design. The first thing a Redis admin does is read the Redis log. Here, there is no log.
  3. Default dir = WORKDIR (/data/docuseal) — RDB files are written next to user attachments. Beyond being surprising, it means Redis BGSAVE shares write contention/permissions/quota with the attachment volume.
  4. No escape hatch. REDIS_URL doesn't disable the embedded Redis — it just starts it (per dotenv.rb). There's no DISABLE_LOCAL_REDIS=true, no way to mount a Redis config, no documented override.

Other users will hit this the moment they:

  • Set memory limits tight enough to make fork() fail with vm.overcommit_memory=0,
  • Run with cap_drop/no-new-privileges (very common in compliant deployments),
  • Mount /data/docuseal on a filesystem that briefly fails a write,
  • Or run on a host where the kernel refuses fork briefly (cgroup pid limits, OOM reaper, etc.).

In every case the symptom is HTTP 500 with no clue why (per the closed bug #514).


Suggested fixes (any one of these would solve it)

  1. Disable RDB persistence by default for the embedded Redis. Add --save '' --appendonly no (or --stop-writes-on-bgsave-error no) to the exec call in lib/puma/plugin/redis_server.rb. Redis is already ephemeral here in practice — Sidekiq jobs/sessions are recreated. Persistence buys nothing and breaks the app when it hiccups.
  2. Stop redirecting Redis output to /dev/null. Let it land in stdout so it shows up in docker logs. Operators need to be able to see Redis errors.
  3. Honor REDIS_URL as a true external-Redis switch. When set, skip fork_redis instead of also starting the embedded one. (Closes [Doc] REDIS_URL information missing in documentation #587 properly.)
  4. At minimum, document this — the REDIS_URL env var, the embedded-Redis design, and the WORKDIR-as-dir behavior.

(1) is the smallest, safest change and would unblock every operator hitting this in the wild.


Environment

  • DocuSeal Enterprise (licensed) — ee.docuseal.eu/***eul5h/ds-ee:latest
  • Embedded Redis 8.4.2
  • Docker 27.x on Debian 13 (kernel 6.12)
  • 11+ days of container uptime
  • vm.overcommit_memory = 0 (Linux default)
  • 125 GiB host RAM, 2 GiB container memory limit, 351 MiB actual usage
  • Container reported as healthy throughout (Puma /up works fine; the failure is in Sidekiq/Redis)

Happy to provide additional diagnostics, a packet capture, or a sanitized minimal repro if helpful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions