Skip to content

Display subprocess dies at boot on race with Redis (control:status is None) — DSI shows TTY instead of dashboard #242

@pythoninthegrass

Description

@pythoninthegrass

Summary

On a fresh boot, control.py's display subprocess reads control:status from Redis before control.py's own main loop has populated it, crashes with TypeError: the JSON object must be str, bytes or bytearray, not NoneType, and is never restarted. The parent control.py stays alive (so supervisor reports RUNNING and does not retry), and on setups driving a DSI panel via pygame/KMSDRM the DSI ends up showing the kernel TTY instead of the dashboard.

A manual sudo supervisorctl restart control after boot (once Redis is populated) always fixes it — confirming the issue is a startup-order race in PiFire itself, not a display/pygame problem.

Environment

  • Raspberry Pi 5
  • Raspberry Pi OS Lite (Bookworm), Python 3.13.5
  • PiFire 1.10.9 (also reproduced on 1.10.1)
  • Display: Waveshare 7" DSI LCD 800x480 (dsi_800x480t) via dtoverlay=vc4-kms-dsi-waveshare-800x480
  • Redis: redis-server 8.0.2 running on 127.0.0.1:6379, healthy
  • Supervisor managing control and webapp as user alex

Reproduction

  1. Boot the Pi from power-off with the DSI display module selected in the wizard.
  2. Observe dashboard web UI loads fine at http://<pi>/.
  3. Observe DSI shows a blinking/login TTY rather than the dashboard.
  4. sudo supervisorctl statuscontrol RUNNING (misleading; the inner display subprocess is dead).
  5. sudo tail /usr/local/bin/pifire/logs/control.err.log shows the traceback below.
  6. sudo supervisorctl restart control → DSI now shows the dashboard; pygame becomes DRM master.

Traceback

Process Process-1:
Traceback (most recent call last):
  File "/usr/lib/python3.13/multiprocessing/process.py", line 313, in _bootstrap
    self.run()
  File "/usr/lib/python3.13/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/bin/pifire/display/dsi_800x480t.py", line 124, in _display_loop
    self._fetch_data()
  File "/usr/local/bin/pifire/display/base_flex.py", line 598, in _fetch_data
    self.status_data = read_status()
  File "/usr/local/bin/pifire/common/common.py", line 2307, in read_status
    status = json.loads(cmdsts.get('control:status'))
  File "/usr/lib/python3.13/json/__init__.py", line 339, in loads
    raise TypeError(f'the JSON object must be str, bytes or bytearray, '
                    f'not {s.__class__.__name__}')
TypeError: the JSON object must be str, bytes or bytearray, not NoneType

Root cause

common.common.read_status() (non-init path) does:

status = json.loads(cmdsts.get('control:status'))

This assumes control:status always exists in Redis. At boot, the display subprocess (spawned from control.py via multiprocessing.Process) may call this before the main loop has called read_status(init=True) / write_status(...) to populate it. cmdsts.get(...) returns None, json.loads(None) raises TypeError, and the display process exits. The parent process does not notice the child death, so supervisor does not retry.

A similar shape likely exists for other keys read by the display loop in base_flex.py _fetch_data() (e.g. read_current, read_notify_data, etc.) — those currently happen to succeed because they are either exists-guarded or return an empty list from llen-based helpers, but they share the same ordering assumption.

Suggested fix

Make read_status() (and any peer in common.py called from the display loop) tolerant of a not-yet-populated key. Two low-risk options:

  1. Treat a missing/None value the same as init=True and return a freshly-constructed default dict (but do not write it back — let the main loop own initialization).
  2. Or, in the display subprocess _fetch_data, sleep/retry while required Redis keys are absent before entering the normal loop.

Option 1 is the smaller diff:

def read_status(init=False):
    global cmdsts
    if init or not cmdsts.exists('control:status'):
        # ... build default status dict as today ...
        if init:
            write_status(status)
        return status
    raw = cmdsts.get('control:status')
    if raw is None:
        # Key was deleted between exists() and get(); fall back to defaults.
        return read_status(init=False)  # or build defaults inline
    return json.loads(raw)

Workaround (documented for others hitting this)

Until the ordering is fixed, we drive control from a systemd oneshot that starts it, waits for control:status to appear in Redis, then restarts it so the display subprocess starts fresh with data available. We also unbind the kernel framebuffer console from the VT so the TTY doesn't render on the DSI while pygame is coming up, and we use /dev/dri/by-path/platform-<addr>.dsi-card for SDL_VIDEO_KMSDRM_DEVICE so DRM card renumbering across boots doesn't matter. Full writeup is happy to be shared if useful.

Related observations

  • rp1-dsi and vc4-drm DRM card numbers swap between boots on Pi 5; any config that hardcodes /dev/dri/card1 vs card2 will be flaky.
  • Updating 1.10.1 → 1.10.9 did not change this behavior.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions