Skip to content

Add preset switch failed callback for shader error logging#23

Closed
djj0s3 wants to merge 84 commits into
projectM-visualizer:masterfrom
djj0s3:feat/shader-error-callback
Closed

Add preset switch failed callback for shader error logging#23
djj0s3 wants to merge 84 commits into
projectM-visualizer:masterfrom
djj0s3:feat/shader-error-callback

Conversation

@djj0s3

@djj0s3 djj0s3 commented Apr 2, 2026

Copy link
Copy Markdown

Summary

  • Registers projectm_set_preset_switch_failed_event_callback after projectm_create()
  • Logs preset filename and exact compilation error message via g_printerr

Why

ProjectM silently swallows shader compilation errors by default. When generated presets fail to compile (producing black frames), there's no way to know WHY. This callback surfaces the exact HLSL/GLSL error message so we can fix the preset generation pipeline.

Test plan

  • Build plugin and load a preset with intentional shader errors
  • Verify error message appears in stderr output
  • Verify valid presets still render normally (no regression)

Generated with Claude Code

djj0s3 and others added 30 commits September 22, 2025 20:30
  control (pass=cbr) was ineffective for
  ProjectM's highly complex visual content
  2. Switching to quality-based encoding: Using
   quantizer=35 with CRF instead of fixed
  bitrate
  3. Adding quality constraints: qp-max=50 to
  prevent quality from degrading too much
  4. Optimizing for speed:
  speed-preset=ultrafast for faster encoding
- Log stdout/stderr from convert.sh on both success and failure
- Add environment diagnostics to convert.sh (GPU detection, GStreamer plugin check)
- Add pre-flight checks for file permissions and accessibility
- Improve error visibility in Runpod logs

This should help identify why jobs are failing with exit code 1.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Update OpenGL version to 4.5 for better compatibility
- Add explicit GStreamer plugin paths and scanner location
- Respect Runpod's NVIDIA_VISIBLE_DEVICES setting (don't override)
- Add LD_LIBRARY_PATH to ensure libraries are found
- Improve NVIDIA driver capabilities configuration

These changes should resolve GPU access and library loading issues.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Added gstreamer1.0-gl package to Dockerfile dependencies
- Provides glcolorconvert and gldownload elements needed for OpenGL texture conversion
- Resolves "no element glcolorconvert/gldownload" pipeline errors
- Built and pushed as v3 and latest tags

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Clean up stale X lock files before starting Xvfb
- Kill existing Xvfb processes on display 99
- Enable GLX extension in Xvfb for better GL compatibility
- Use GLX platform instead of X11 for software rendering
- Improve gpu_accessible() to test nvidia-smi functionality
- Add sleep to ensure Xvfb is ready before use

These changes resolve the "Server is already active for display 99" error
and improve GPU detection when nvidia-smi works but devices aren't exposed.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Add explicit video/x-raw(memory:GLMemory),format=RGBA caps
- Ensures proper capability negotiation in headless EGL mode
- Resolves "could not link projectm0 to glcolorconvertelement0" error

The pipeline now explicitly specifies RGBA format at each GL stage:
projectm -> RGBA(GLMemory) -> glcolorconvert -> RGBA(GLMemory) -> gldownload -> RGBA

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- ProjectM plugin only supports ABGR format output
- Removed explicit format=RGBA caps that were causing negotiation failure
- Let glcolorconvert and videoconvert handle format conversion automatically
- Resolves "projectm0 can't handle caps format=(string)RGBA" error

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- ProjectM GL context fails in headless EGL mode on Runpod
- Always use Xvfb for ProjectM rendering (works reliably with X11 GL)
- Detect GPU separately for hardware encoding (nvh264enc)
- Maintains best of both: stable rendering + GPU-accelerated encoding

This resolves the persistent "could not link projectm0 to glcolorconvertelement0"
errors caused by GL context initialization failures in headless EGL mode.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Calculate display number based on PID: DISPLAY_NUM = 99 + (PID % 100)
- Prevents conflicts when multiple jobs run simultaneously
- Each job gets its own X display (range :99 to :198)
- Removes only the specific lock file for this display

Resolves issues with concurrent jobs interfering with each other's Xvfb instances.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Change GST_GL_API from opengl to opengl3
- Resolves GL context creation error with Xvfb/Mesa
- Mesa provides opengl3 API, not legacy opengl
- Fixes: "Cannot create context with user requested api (opengl)"

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
djj0s3 and others added 28 commits January 22, 2026 07:44
- Start HTTP server by default when not in serverless environment
- Only use serverless handler when RUNPOD_ENDPOINT_ID or RUNPOD_JOB_ID present

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Install openssh-server in container
- Generate SSH host keys during build
- Configure SSH for root login
- Start SSH daemon in start.sh before main process
- Expose ports 22 and 8000

Container works locally but RunPod pod readiness still being debugged.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Build for amd64 architecture (was incorrectly building arm64)
- Add DRI device accessibility check before using EGL-GBM
- Disable EGL surfaceless mode due to framebuffer incompatibility with ProjectM
- Fall back to Mesa software rendering when DRI is not accessible
- Improve start.sh to handle shell commands passed via dockerArgs

The RunPod container now works reliably with Mesa software rendering
when GPU DRI devices are not accessible due to permission issues.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The EGL-device mode (EGL_PLATFORM_DEVICE_EXT) successfully creates an
OpenGL context on RunPod, but ProjectM has framebuffer issues because
it renders to the default framebuffer (0) which doesn't exist in
headless EGL modes.

This is a fundamental limitation of ProjectM's rendering approach -
it expects a display surface with a real framebuffer. Fixing this
would require modifying the gst-projectm plugin to use FBOs.

For now, fall back to Mesa software rendering on RunPod which is
reliable, though slower than GPU rendering.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit adds support for rendering to an FBO (Framebuffer Object) in
headless EGL environments where the default framebuffer (0) doesn't exist.

Plugin changes (src/plugin.c):
- Add GST_PROJECTM_FORCE_FBO environment variable to force FBO mode
- Detect headless mode by checking if framebuffer 0 is complete
- Create and bind FBO before ProjectM initialization in headless mode
- Never unbind to framebuffer 0 in headless mode
- Properly manage FBO lifecycle to avoid binding framebuffer 0 during resize

Convert script changes (convert.sh):
- Add EGL-device surfaceless mode for NVIDIA GPUs without DRI access
- Set GST_PROJECTM_FORCE_FBO=1 in all headless EGL modes
- Add diagnostic output for GST_PROJECTM_FORCE_FBO and GST_GL settings

Known limitation:
ProjectM-4 internally uses framebuffer 0 during its initialization phase.
This causes GL_INVALID_FRAMEBUFFER_OPERATION errors in headless EGL modes
even with our FBO workaround. Full headless GPU rendering requires either:
- Proper DRI device access (currently blocked on RunPod: permission 660)
- Or patches to ProjectM-4 to not use framebuffer 0 during init

The Mesa software rendering fallback continues to work reliably.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Reorder GPU rendering methods to prioritize Xvfb + NVIDIA GLX which
works reliably on Vast.ai and similar cloud GPU platforms:

1. Xvfb + NVIDIA GLX (BEST) - virtual X server with HW-accelerated GL
2. Xorg dummy + NVIDIA GLX - fallback if Xvfb fails
3. EGL-GBM (experimental) - may not work with all NVIDIA drivers
4. EGL-device surfaceless - when DRI isn't accessible

Key improvements:
- Xvfb starts first without NVIDIA vendor set
- NVIDIA GLX vendor is set for client apps only (GStreamer, glxinfo)
- Verifies NVIDIA GLX actually works before proceeding
- Falls back to Mesa if driver doesn't support this mode

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When NVIDIA GLX fails the glxinfo test, now properly falls through
to EGL-GBM and EGL-device methods before falling back to software
rendering. Previously, GLX failure immediately triggered Mesa fallback,
causing slow software rendering on Vast.ai instances.

Changes:
- Use GPU_METHOD_FOUND flag to track successful GPU initialization
- Remove broken "Xorg dummy + NVIDIA GLX" method (requires nvidia_drv.so)
- Prioritize EGL-GBM as reliable GPU method after GLX fails
- EGL-device surfaceless as last GPU resort before Mesa

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
EGL-GBM and EGL-device surfaceless often fail with NVIDIA drivers
on Vast.ai. Now these methods are opt-in (FORCE_EGL=1) and include
validation tests before use.

When NVIDIA GLX fails, go directly to Mesa software rendering which
is more reliable.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Method 1: Use Xorg with modesetting driver + glamor acceleration
- Works through DRM/KMS with nvidia-container-runtime
- Uses xorg-nvidia.conf which enables GPU acceleration
- More reliable than Xvfb + NVIDIA GLX which requires server-side support

Method 2: Xvfb + NVIDIA GLX (kept as fallback)
- Only works when NVIDIA GLX server modules are available

Both methods test with glxinfo before proceeding.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
… frames

Root cause: projectm_opengl_render_frame() renders to ProjectM's internal
buffer, not our external FBO. This caused all frames to be black.

Fix: Use projectm_opengl_render_frame_fbo(handle, fbo_id) when an FBO is
available. This renders directly to our framebuffer object.

Also improved convert.sh GPU initialization:
- Add GPU environment diagnostics for debugging
- Reject llvmpipe/software rendering (causes black frames with gst-projectm)
- Make Xvfb + NVIDIA GLX the preferred method for Vast.ai
- Remove DRI requirement for Method 2 (GLX works without DRI access)
- Add detailed EGL device enumeration for container environments

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Changes:
- Switch base image from ubuntu:24.04 to nvidia/cuda:12.2.0-devel-ubuntu22.04
- Install nv-codec-headers for NVENC/NVDEC support
- Build gst-plugins-bad from source with nvcodec=enabled
- Add libnvidia-encode/decode libraries
- Include 'video' capability in NVIDIA_DRIVER_CAPABILITIES
- Update GST_PLUGIN_PATH to include nvcodec plugin location

This enables hardware H.264 encoding via nvh264enc, which is ~2x faster
than software x264 encoding and offloads work from the CPU to the GPU's
dedicated video encoding hardware (NVENC).

Combined with the mesh optimization (640x480 → 220x140), this should
enable faster-than-realtime rendering for long audio files.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Use nvidia/cuda:12.2.0-devel-ubuntu22.04 base image
- Install nv-codec-headers for NVENC/NVDEC
- Build nvcodec GStreamer plugin from gstreamer 1.20.7 monorepo
- Add libnvidia-encode/decode libraries
- Include 'video' capability for NVENC access

The nvh264enc plugin enables hardware H.264 encoding, offloading
encoding from CPU to GPU's dedicated NVENC hardware for ~2x faster
video encoding.

Image size: 8.79GB (larger due to CUDA devel libraries)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When running in Docker with -e DISPLAY=:0 -v /tmp/.X11-unix:/tmp/.X11-unix,
the container should use the host's X server instead of starting its own.

This enables:
- NVIDIA GPU rendering via host Xorg with NVIDIA driver
- NVENC hardware encoding (host GPU access)
- Proper FBO rendering (no black frames)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Lambda Labs and other compute-focused cloud instances have CUDA but
not OpenGL by default. This change:

- Attempts to install libnvidia-gl for EGL/GLX support
- Creates /usr/share/glvnd/egl_vendor.d/10_nvidia.json so libglvnd
  can find NVIDIA's EGL implementation

With this, the container can use GPU-accelerated OpenGL rendering
when nvidia-container-toolkit injects the host's NVIDIA libraries.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Removed libnvidia-encode-525 and libnvidia-decode-525 packages.
These caused NVENC to fail with "unsupported device" when the host
runs a different driver version (e.g., 570 vs 525).

Kept libnvidia-gl for ProjectM OpenGL rendering (EGL/GLX).

nvidia-container-toolkit will inject the correct encode/decode
libraries at runtime when NVIDIA_DRIVER_CAPABILITIES=video is set.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The easter-egg property controls a startup logo/feature that shows
the ProjectM W logo. Setting it to 0 disables this.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replaced projectM's default logo textures with user's custom VJ logo.
Added multiple filename variations to cover all possible projectM
texture references:
- M.tga, m_logo.tga, mlogo.tga
- projectm.tga, project.tga
- headphones.tga
- spiral.tga
- logo.tga
- pM.tga

These will be included in the Docker image and override any default
projectM logos that appear during idle/startup.
- Add vj_studio_logo.png for "Made With VJ Studio" overlay
- Enable faststart=true on mp4mux for better YouTube streaming

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Load first preset immediately on init to avoid showing idle screen
- Add gst_projectm_load_first_timeline_preset() for timeline mode
- Prevent timeline_activate from resetting to index -1 if first preset already loaded
- Add COPY for vj_studio_logo.png in Dockerfile

This fixes the issue where the ProjectM "M" logo would briefly appear
at the start of videos before transitioning to the first real preset.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Cropped top portion of logo to remove "Made With" text,
leaving just the VJ character and "STUDIO" for a cleaner
bottom-right watermark appearance.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add elapsed_seconds to timeline switch log message so we can see
  the actual PTS value when each switch occurs
- Add periodic PTS diagnostic (every 600 frames / ~10s) logging both
  audio and video buffer PTS to detect drift between them
- Add render_frame_count to GstProjectMPrivate for frame tracking

This helps diagnose an issue where timeline entries get skipped,
possibly due to video PTS drifting ahead of audio PTS.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When CPU encoding is used (x264enc fallback), video PTS runs at
0.5-0.7x of audio time, causing the timeline engine to skip entries.
This resulted in only 90/190 timeline entries being visited for a
53-min DJ set.

Audio PTS advances at the true playback rate regardless of video
encoding speed, ensuring all timeline entries are visited correctly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tr_array_sort

g_ptr_array_sort() passes each comparison argument as a pointer to the
array slot (GstProjectMTimelineEntry**), not a direct pointer to the
entry. Without the extra dereference, the comparator was interpreting
raw memory addresses as gdouble start_time values, resulting in a
semi-random sort order.

This caused large sections of the timeline to be unreachable — the
fast-path optimization in timeline_find_target_index() would stay stuck
on an early index because the "next" entry in the corrupted sort order
had a much later start_time, making the before_next check always true.

Symptoms: only ~89 of 190 timeline entries visited during a 53-min
DJ set render, with 9-17 minute gaps where the same preset played.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Helps verify the sort comparator fix is working by logging
start_time/duration/end_time of the first 20 entries after
g_ptr_array_sort in gst_projectm_load_timeline().

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
G_DEFINE_TYPE_WITH_CODE was initializing the debug category as
"gstprojectm" while plugin_init used "projectm". Since the type
init runs AFTER plugin_init (via gst_element_register), it overwrote
the category variable with "gstprojectm" which didn't match the
GST_DEBUG=projectm:4 setting, causing INFO-level diagnostic messages
(PTS tracking, sort order verification) to be suppressed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use GST_WARNING_OBJECT instead of GST_INFO_OBJECT for timeline
diagnostics so they appear regardless of debug category threshold.
Includes "build v62" marker to verify correct binary is running.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The build v62 WARNING-level markers were temporary debugging aids to verify
the timeline sort fix on RunPod. Now confirmed working (190/190 entries
visited), downgrade back to INFO level for production.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Register projectm_set_preset_switch_failed_event_callback to log the
exact error message when a preset fails to compile. ProjectM silently
swallows these errors by default, making it impossible to debug why
generated presets produce black frames.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@djj0s3

djj0s3 commented Apr 2, 2026

Copy link
Copy Markdown
Author

Opened against upstream by mistake. This is for our fork only.

@djj0s3 djj0s3 closed this Apr 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant