Add preset switch failed callback for shader error logging#23
Closed
djj0s3 wants to merge 84 commits into
Closed
Conversation
control (pass=cbr) was ineffective for ProjectM's highly complex visual content 2. Switching to quality-based encoding: Using quantizer=35 with CRF instead of fixed bitrate 3. Adding quality constraints: qp-max=50 to prevent quality from degrading too much 4. Optimizing for speed: speed-preset=ultrafast for faster encoding
- Log stdout/stderr from convert.sh on both success and failure - Add environment diagnostics to convert.sh (GPU detection, GStreamer plugin check) - Add pre-flight checks for file permissions and accessibility - Improve error visibility in Runpod logs This should help identify why jobs are failing with exit code 1. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Update OpenGL version to 4.5 for better compatibility - Add explicit GStreamer plugin paths and scanner location - Respect Runpod's NVIDIA_VISIBLE_DEVICES setting (don't override) - Add LD_LIBRARY_PATH to ensure libraries are found - Improve NVIDIA driver capabilities configuration These changes should resolve GPU access and library loading issues. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Added gstreamer1.0-gl package to Dockerfile dependencies - Provides glcolorconvert and gldownload elements needed for OpenGL texture conversion - Resolves "no element glcolorconvert/gldownload" pipeline errors - Built and pushed as v3 and latest tags 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Clean up stale X lock files before starting Xvfb - Kill existing Xvfb processes on display 99 - Enable GLX extension in Xvfb for better GL compatibility - Use GLX platform instead of X11 for software rendering - Improve gpu_accessible() to test nvidia-smi functionality - Add sleep to ensure Xvfb is ready before use These changes resolve the "Server is already active for display 99" error and improve GPU detection when nvidia-smi works but devices aren't exposed. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Add explicit video/x-raw(memory:GLMemory),format=RGBA caps - Ensures proper capability negotiation in headless EGL mode - Resolves "could not link projectm0 to glcolorconvertelement0" error The pipeline now explicitly specifies RGBA format at each GL stage: projectm -> RGBA(GLMemory) -> glcolorconvert -> RGBA(GLMemory) -> gldownload -> RGBA 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- ProjectM plugin only supports ABGR format output - Removed explicit format=RGBA caps that were causing negotiation failure - Let glcolorconvert and videoconvert handle format conversion automatically - Resolves "projectm0 can't handle caps format=(string)RGBA" error 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- ProjectM GL context fails in headless EGL mode on Runpod - Always use Xvfb for ProjectM rendering (works reliably with X11 GL) - Detect GPU separately for hardware encoding (nvh264enc) - Maintains best of both: stable rendering + GPU-accelerated encoding This resolves the persistent "could not link projectm0 to glcolorconvertelement0" errors caused by GL context initialization failures in headless EGL mode. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Calculate display number based on PID: DISPLAY_NUM = 99 + (PID % 100) - Prevents conflicts when multiple jobs run simultaneously - Each job gets its own X display (range :99 to :198) - Removes only the specific lock file for this display Resolves issues with concurrent jobs interfering with each other's Xvfb instances. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Change GST_GL_API from opengl to opengl3 - Resolves GL context creation error with Xvfb/Mesa - Mesa provides opengl3 API, not legacy opengl - Fixes: "Cannot create context with user requested api (opengl)" 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Start HTTP server by default when not in serverless environment - Only use serverless handler when RUNPOD_ENDPOINT_ID or RUNPOD_JOB_ID present Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Install openssh-server in container - Generate SSH host keys during build - Configure SSH for root login - Start SSH daemon in start.sh before main process - Expose ports 22 and 8000 Container works locally but RunPod pod readiness still being debugged. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Build for amd64 architecture (was incorrectly building arm64) - Add DRI device accessibility check before using EGL-GBM - Disable EGL surfaceless mode due to framebuffer incompatibility with ProjectM - Fall back to Mesa software rendering when DRI is not accessible - Improve start.sh to handle shell commands passed via dockerArgs The RunPod container now works reliably with Mesa software rendering when GPU DRI devices are not accessible due to permission issues. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The EGL-device mode (EGL_PLATFORM_DEVICE_EXT) successfully creates an OpenGL context on RunPod, but ProjectM has framebuffer issues because it renders to the default framebuffer (0) which doesn't exist in headless EGL modes. This is a fundamental limitation of ProjectM's rendering approach - it expects a display surface with a real framebuffer. Fixing this would require modifying the gst-projectm plugin to use FBOs. For now, fall back to Mesa software rendering on RunPod which is reliable, though slower than GPU rendering. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit adds support for rendering to an FBO (Framebuffer Object) in headless EGL environments where the default framebuffer (0) doesn't exist. Plugin changes (src/plugin.c): - Add GST_PROJECTM_FORCE_FBO environment variable to force FBO mode - Detect headless mode by checking if framebuffer 0 is complete - Create and bind FBO before ProjectM initialization in headless mode - Never unbind to framebuffer 0 in headless mode - Properly manage FBO lifecycle to avoid binding framebuffer 0 during resize Convert script changes (convert.sh): - Add EGL-device surfaceless mode for NVIDIA GPUs without DRI access - Set GST_PROJECTM_FORCE_FBO=1 in all headless EGL modes - Add diagnostic output for GST_PROJECTM_FORCE_FBO and GST_GL settings Known limitation: ProjectM-4 internally uses framebuffer 0 during its initialization phase. This causes GL_INVALID_FRAMEBUFFER_OPERATION errors in headless EGL modes even with our FBO workaround. Full headless GPU rendering requires either: - Proper DRI device access (currently blocked on RunPod: permission 660) - Or patches to ProjectM-4 to not use framebuffer 0 during init The Mesa software rendering fallback continues to work reliably. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Reorder GPU rendering methods to prioritize Xvfb + NVIDIA GLX which works reliably on Vast.ai and similar cloud GPU platforms: 1. Xvfb + NVIDIA GLX (BEST) - virtual X server with HW-accelerated GL 2. Xorg dummy + NVIDIA GLX - fallback if Xvfb fails 3. EGL-GBM (experimental) - may not work with all NVIDIA drivers 4. EGL-device surfaceless - when DRI isn't accessible Key improvements: - Xvfb starts first without NVIDIA vendor set - NVIDIA GLX vendor is set for client apps only (GStreamer, glxinfo) - Verifies NVIDIA GLX actually works before proceeding - Falls back to Mesa if driver doesn't support this mode Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When NVIDIA GLX fails the glxinfo test, now properly falls through to EGL-GBM and EGL-device methods before falling back to software rendering. Previously, GLX failure immediately triggered Mesa fallback, causing slow software rendering on Vast.ai instances. Changes: - Use GPU_METHOD_FOUND flag to track successful GPU initialization - Remove broken "Xorg dummy + NVIDIA GLX" method (requires nvidia_drv.so) - Prioritize EGL-GBM as reliable GPU method after GLX fails - EGL-device surfaceless as last GPU resort before Mesa Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
EGL-GBM and EGL-device surfaceless often fail with NVIDIA drivers on Vast.ai. Now these methods are opt-in (FORCE_EGL=1) and include validation tests before use. When NVIDIA GLX fails, go directly to Mesa software rendering which is more reliable. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Method 1: Use Xorg with modesetting driver + glamor acceleration - Works through DRM/KMS with nvidia-container-runtime - Uses xorg-nvidia.conf which enables GPU acceleration - More reliable than Xvfb + NVIDIA GLX which requires server-side support Method 2: Xvfb + NVIDIA GLX (kept as fallback) - Only works when NVIDIA GLX server modules are available Both methods test with glxinfo before proceeding. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
… frames Root cause: projectm_opengl_render_frame() renders to ProjectM's internal buffer, not our external FBO. This caused all frames to be black. Fix: Use projectm_opengl_render_frame_fbo(handle, fbo_id) when an FBO is available. This renders directly to our framebuffer object. Also improved convert.sh GPU initialization: - Add GPU environment diagnostics for debugging - Reject llvmpipe/software rendering (causes black frames with gst-projectm) - Make Xvfb + NVIDIA GLX the preferred method for Vast.ai - Remove DRI requirement for Method 2 (GLX works without DRI access) - Add detailed EGL device enumeration for container environments Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Changes: - Switch base image from ubuntu:24.04 to nvidia/cuda:12.2.0-devel-ubuntu22.04 - Install nv-codec-headers for NVENC/NVDEC support - Build gst-plugins-bad from source with nvcodec=enabled - Add libnvidia-encode/decode libraries - Include 'video' capability in NVIDIA_DRIVER_CAPABILITIES - Update GST_PLUGIN_PATH to include nvcodec plugin location This enables hardware H.264 encoding via nvh264enc, which is ~2x faster than software x264 encoding and offloads work from the CPU to the GPU's dedicated video encoding hardware (NVENC). Combined with the mesh optimization (640x480 → 220x140), this should enable faster-than-realtime rendering for long audio files. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Use nvidia/cuda:12.2.0-devel-ubuntu22.04 base image - Install nv-codec-headers for NVENC/NVDEC - Build nvcodec GStreamer plugin from gstreamer 1.20.7 monorepo - Add libnvidia-encode/decode libraries - Include 'video' capability for NVENC access The nvh264enc plugin enables hardware H.264 encoding, offloading encoding from CPU to GPU's dedicated NVENC hardware for ~2x faster video encoding. Image size: 8.79GB (larger due to CUDA devel libraries) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When running in Docker with -e DISPLAY=:0 -v /tmp/.X11-unix:/tmp/.X11-unix, the container should use the host's X server instead of starting its own. This enables: - NVIDIA GPU rendering via host Xorg with NVIDIA driver - NVENC hardware encoding (host GPU access) - Proper FBO rendering (no black frames) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Lambda Labs and other compute-focused cloud instances have CUDA but not OpenGL by default. This change: - Attempts to install libnvidia-gl for EGL/GLX support - Creates /usr/share/glvnd/egl_vendor.d/10_nvidia.json so libglvnd can find NVIDIA's EGL implementation With this, the container can use GPU-accelerated OpenGL rendering when nvidia-container-toolkit injects the host's NVIDIA libraries. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Removed libnvidia-encode-525 and libnvidia-decode-525 packages. These caused NVENC to fail with "unsupported device" when the host runs a different driver version (e.g., 570 vs 525). Kept libnvidia-gl for ProjectM OpenGL rendering (EGL/GLX). nvidia-container-toolkit will inject the correct encode/decode libraries at runtime when NVIDIA_DRIVER_CAPABILITIES=video is set. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The easter-egg property controls a startup logo/feature that shows the ProjectM W logo. Setting it to 0 disables this. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replaced projectM's default logo textures with user's custom VJ logo. Added multiple filename variations to cover all possible projectM texture references: - M.tga, m_logo.tga, mlogo.tga - projectm.tga, project.tga - headphones.tga - spiral.tga - logo.tga - pM.tga These will be included in the Docker image and override any default projectM logos that appear during idle/startup.
- Add vj_studio_logo.png for "Made With VJ Studio" overlay - Enable faststart=true on mp4mux for better YouTube streaming Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Load first preset immediately on init to avoid showing idle screen - Add gst_projectm_load_first_timeline_preset() for timeline mode - Prevent timeline_activate from resetting to index -1 if first preset already loaded - Add COPY for vj_studio_logo.png in Dockerfile This fixes the issue where the ProjectM "M" logo would briefly appear at the start of videos before transitioning to the first real preset. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Cropped top portion of logo to remove "Made With" text, leaving just the VJ character and "STUDIO" for a cleaner bottom-right watermark appearance. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add elapsed_seconds to timeline switch log message so we can see the actual PTS value when each switch occurs - Add periodic PTS diagnostic (every 600 frames / ~10s) logging both audio and video buffer PTS to detect drift between them - Add render_frame_count to GstProjectMPrivate for frame tracking This helps diagnose an issue where timeline entries get skipped, possibly due to video PTS drifting ahead of audio PTS. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When CPU encoding is used (x264enc fallback), video PTS runs at 0.5-0.7x of audio time, causing the timeline engine to skip entries. This resulted in only 90/190 timeline entries being visited for a 53-min DJ set. Audio PTS advances at the true playback rate regardless of video encoding speed, ensuring all timeline entries are visited correctly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tr_array_sort g_ptr_array_sort() passes each comparison argument as a pointer to the array slot (GstProjectMTimelineEntry**), not a direct pointer to the entry. Without the extra dereference, the comparator was interpreting raw memory addresses as gdouble start_time values, resulting in a semi-random sort order. This caused large sections of the timeline to be unreachable — the fast-path optimization in timeline_find_target_index() would stay stuck on an early index because the "next" entry in the corrupted sort order had a much later start_time, making the before_next check always true. Symptoms: only ~89 of 190 timeline entries visited during a 53-min DJ set render, with 9-17 minute gaps where the same preset played. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Helps verify the sort comparator fix is working by logging start_time/duration/end_time of the first 20 entries after g_ptr_array_sort in gst_projectm_load_timeline(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
G_DEFINE_TYPE_WITH_CODE was initializing the debug category as "gstprojectm" while plugin_init used "projectm". Since the type init runs AFTER plugin_init (via gst_element_register), it overwrote the category variable with "gstprojectm" which didn't match the GST_DEBUG=projectm:4 setting, causing INFO-level diagnostic messages (PTS tracking, sort order verification) to be suppressed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use GST_WARNING_OBJECT instead of GST_INFO_OBJECT for timeline diagnostics so they appear regardless of debug category threshold. Includes "build v62" marker to verify correct binary is running. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The build v62 WARNING-level markers were temporary debugging aids to verify the timeline sort fix on RunPod. Now confirmed working (190/190 entries visited), downgrade back to INFO level for production. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Register projectm_set_preset_switch_failed_event_callback to log the exact error message when a preset fails to compile. ProjectM silently swallows these errors by default, making it impossible to debug why generated presets produce black frames. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Author
|
Opened against upstream by mistake. This is for our fork only. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
projectm_set_preset_switch_failed_event_callbackafterprojectm_create()g_printerrWhy
ProjectM silently swallows shader compilation errors by default. When generated presets fail to compile (producing black frames), there's no way to know WHY. This callback surfaces the exact HLSL/GLSL error message so we can fix the preset generation pipeline.
Test plan
Generated with Claude Code