Add NV12 output: GPU shader-based RGBA→NV12 conversion (2.8x speedup on Apple Silicon) by djj0s3 · Pull Request #25 · projectM-visualizer/gst-projectm

djj0s3 · 2026-04-24T14:58:36Z

Summary

When the negotiated downstream format is NV12, projectm now does the RGBA→NV12 conversion on the GPU via two GLSL passes against its existing RGBA FBO, then ReadPixels each plane straight into the GstVideoFrame.

This eliminates the downstream `videoconvert ABGR→NV12` CPU step that dominated render time on Apple Silicon. End-to-end render measured against the same audio (6 min track, 1080p30, vtenc_h264):

```
Before: 363s audio → 842s render (2.3x realtime)
After: 363s audio → 300s render (0.8x realtime)
```

2.8x speedup. Render now completes in less time than the audio plays.

How it works

Two shader passes after `projectm_opengl_render_frame_fbo`:

Y pass — full-resolution into a R8 texture, BT.601 luma coefficients
UV pass — half-resolution into a RG8 texture; linear-filtered sampling of the source RGBA texture automatically averages 2x2 blocks for 4:2:0 chroma subsampling

`ReadPixels` then pulls each plane straight into the GstVideoFrame's NV12 plane data with `GL_PACK_ROW_LENGTH` accounting for stride alignment.

Caps

```
video/x-raw, format = { ABGR, NV12 }
```

ABGR path is unchanged for any consumer that doesn't accept NV12. `vtenc_h264` advertises NV12 in its sink caps, so the new path is auto-selected when vtenc is downstream.

Implementation gotchas

VAO is mandatory under GL 3.2 core (macOS Cocoa). Without one, every draw fails with `GL_INVALID_OPERATION`. We bake the vertex attribute setup into the VAO once and rebind per-pass.
Shader uses `#version 150 core` — works on the GL 3.2 core profile macOS exposes via Tauri/Cocoa. GLES2 not implemented since this targets the local Mac renderer; production GPU pods use a separate `convert_cuda.sh` pipeline.
PBO async readback is bypassed in NV12 mode — it was a workaround for the slow CPU conversion, which no longer exists.

Testing

Standalone gst-launch with `audiotestsrc → projectm format=NV12 → vtenc_h264 → mp4mux` produces valid 4:2:0 yuv420p H.264 output (verified with ffprobe).
Pixel sample of decoded frame shows reasonable color values (not clipped, not all black/white).
End-to-end bundled-app render with 6 min audio: visually correct output, 2.8x speedup vs ABGR baseline.

🤖 Generated with Claude Code

control (pass=cbr) was ineffective for ProjectM's highly complex visual content 2. Switching to quality-based encoding: Using quantizer=35 with CRF instead of fixed bitrate 3. Adding quality constraints: qp-max=50 to prevent quality from degrading too much 4. Optimizing for speed: speed-preset=ultrafast for faster encoding

- Log stdout/stderr from convert.sh on both success and failure - Add environment diagnostics to convert.sh (GPU detection, GStreamer plugin check) - Add pre-flight checks for file permissions and accessibility - Improve error visibility in Runpod logs This should help identify why jobs are failing with exit code 1. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

- Update OpenGL version to 4.5 for better compatibility - Add explicit GStreamer plugin paths and scanner location - Respect Runpod's NVIDIA_VISIBLE_DEVICES setting (don't override) - Add LD_LIBRARY_PATH to ensure libraries are found - Improve NVIDIA driver capabilities configuration These changes should resolve GPU access and library loading issues. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

- Added gstreamer1.0-gl package to Dockerfile dependencies - Provides glcolorconvert and gldownload elements needed for OpenGL texture conversion - Resolves "no element glcolorconvert/gldownload" pipeline errors - Built and pushed as v3 and latest tags 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

- Clean up stale X lock files before starting Xvfb - Kill existing Xvfb processes on display 99 - Enable GLX extension in Xvfb for better GL compatibility - Use GLX platform instead of X11 for software rendering - Improve gpu_accessible() to test nvidia-smi functionality - Add sleep to ensure Xvfb is ready before use These changes resolve the "Server is already active for display 99" error and improve GPU detection when nvidia-smi works but devices aren't exposed. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

- Add explicit video/x-raw(memory:GLMemory),format=RGBA caps - Ensures proper capability negotiation in headless EGL mode - Resolves "could not link projectm0 to glcolorconvertelement0" error The pipeline now explicitly specifies RGBA format at each GL stage: projectm -> RGBA(GLMemory) -> glcolorconvert -> RGBA(GLMemory) -> gldownload -> RGBA 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

- ProjectM plugin only supports ABGR format output - Removed explicit format=RGBA caps that were causing negotiation failure - Let glcolorconvert and videoconvert handle format conversion automatically - Resolves "projectm0 can't handle caps format=(string)RGBA" error 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

- ProjectM GL context fails in headless EGL mode on Runpod - Always use Xvfb for ProjectM rendering (works reliably with X11 GL) - Detect GPU separately for hardware encoding (nvh264enc) - Maintains best of both: stable rendering + GPU-accelerated encoding This resolves the persistent "could not link projectm0 to glcolorconvertelement0" errors caused by GL context initialization failures in headless EGL mode. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

- Calculate display number based on PID: DISPLAY_NUM = 99 + (PID % 100) - Prevents conflicts when multiple jobs run simultaneously - Each job gets its own X display (range :99 to :198) - Removes only the specific lock file for this display Resolves issues with concurrent jobs interfering with each other's Xvfb instances. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

- Change GST_GL_API from opengl to opengl3 - Resolves GL context creation error with Xvfb/Mesa - Mesa provides opengl3 API, not legacy opengl - Fixes: "Cannot create context with user requested api (opengl)" 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Method 1: Use Xorg with modesetting driver + glamor acceleration - Works through DRM/KMS with nvidia-container-runtime - Uses xorg-nvidia.conf which enables GPU acceleration - More reliable than Xvfb + NVIDIA GLX which requires server-side support Method 2: Xvfb + NVIDIA GLX (kept as fallback) - Only works when NVIDIA GLX server modules are available Both methods test with glxinfo before proceeding. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

… frames Root cause: projectm_opengl_render_frame() renders to ProjectM's internal buffer, not our external FBO. This caused all frames to be black. Fix: Use projectm_opengl_render_frame_fbo(handle, fbo_id) when an FBO is available. This renders directly to our framebuffer object. Also improved convert.sh GPU initialization: - Add GPU environment diagnostics for debugging - Reject llvmpipe/software rendering (causes black frames with gst-projectm) - Make Xvfb + NVIDIA GLX the preferred method for Vast.ai - Remove DRI requirement for Method 2 (GLX works without DRI access) - Add detailed EGL device enumeration for container environments Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Changes: - Switch base image from ubuntu:24.04 to nvidia/cuda:12.2.0-devel-ubuntu22.04 - Install nv-codec-headers for NVENC/NVDEC support - Build gst-plugins-bad from source with nvcodec=enabled - Add libnvidia-encode/decode libraries - Include 'video' capability in NVIDIA_DRIVER_CAPABILITIES - Update GST_PLUGIN_PATH to include nvcodec plugin location This enables hardware H.264 encoding via nvh264enc, which is ~2x faster than software x264 encoding and offloads work from the CPU to the GPU's dedicated video encoding hardware (NVENC). Combined with the mesh optimization (640x480 → 220x140), this should enable faster-than-realtime rendering for long audio files. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Use nvidia/cuda:12.2.0-devel-ubuntu22.04 base image - Install nv-codec-headers for NVENC/NVDEC - Build nvcodec GStreamer plugin from gstreamer 1.20.7 monorepo - Add libnvidia-encode/decode libraries - Include 'video' capability for NVENC access The nvh264enc plugin enables hardware H.264 encoding, offloading encoding from CPU to GPU's dedicated NVENC hardware for ~2x faster video encoding. Image size: 8.79GB (larger due to CUDA devel libraries) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

When running in Docker with -e DISPLAY=:0 -v /tmp/.X11-unix:/tmp/.X11-unix, the container should use the host's X server instead of starting its own. This enables: - NVIDIA GPU rendering via host Xorg with NVIDIA driver - NVENC hardware encoding (host GPU access) - Proper FBO rendering (no black frames) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Lambda Labs and other compute-focused cloud instances have CUDA but not OpenGL by default. This change: - Attempts to install libnvidia-gl for EGL/GLX support - Creates /usr/share/glvnd/egl_vendor.d/10_nvidia.json so libglvnd can find NVIDIA's EGL implementation With this, the container can use GPU-accelerated OpenGL rendering when nvidia-container-toolkit injects the host's NVIDIA libraries. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Removed libnvidia-encode-525 and libnvidia-decode-525 packages. These caused NVENC to fail with "unsupported device" when the host runs a different driver version (e.g., 570 vs 525). Kept libnvidia-gl for ProjectM OpenGL rendering (EGL/GLX). nvidia-container-toolkit will inject the correct encode/decode libraries at runtime when NVIDIA_DRIVER_CAPABILITIES=video is set. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The easter-egg property controls a startup logo/feature that shows the ProjectM W logo. Setting it to 0 disables this. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Replaced projectM's default logo textures with user's custom VJ logo. Added multiple filename variations to cover all possible projectM texture references: - M.tga, m_logo.tga, mlogo.tga - projectm.tga, project.tga - headphones.tga - spiral.tga - logo.tga - pM.tga These will be included in the Docker image and override any default projectM logos that appear during idle/startup.

- Add vj_studio_logo.png for "Made With VJ Studio" overlay - Enable faststart=true on mp4mux for better YouTube streaming Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Load first preset immediately on init to avoid showing idle screen - Add gst_projectm_load_first_timeline_preset() for timeline mode - Prevent timeline_activate from resetting to index -1 if first preset already loaded - Add COPY for vj_studio_logo.png in Dockerfile This fixes the issue where the ProjectM "M" logo would briefly appear at the start of videos before transitioning to the first real preset. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Cropped top portion of logo to remove "Made With" text, leaving just the VJ character and "STUDIO" for a cleaner bottom-right watermark appearance. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add elapsed_seconds to timeline switch log message so we can see the actual PTS value when each switch occurs - Add periodic PTS diagnostic (every 600 frames / ~10s) logging both audio and video buffer PTS to detect drift between them - Add render_frame_count to GstProjectMPrivate for frame tracking This helps diagnose an issue where timeline entries get skipped, possibly due to video PTS drifting ahead of audio PTS. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When CPU encoding is used (x264enc fallback), video PTS runs at 0.5-0.7x of audio time, causing the timeline engine to skip entries. This resulted in only 90/190 timeline entries being visited for a 53-min DJ set. Audio PTS advances at the true playback rate regardless of video encoding speed, ensuring all timeline entries are visited correctly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…tr_array_sort g_ptr_array_sort() passes each comparison argument as a pointer to the array slot (GstProjectMTimelineEntry**), not a direct pointer to the entry. Without the extra dereference, the comparator was interpreting raw memory addresses as gdouble start_time values, resulting in a semi-random sort order. This caused large sections of the timeline to be unreachable — the fast-path optimization in timeline_find_target_index() would stay stuck on an early index because the "next" entry in the corrupted sort order had a much later start_time, making the before_next check always true. Symptoms: only ~89 of 190 timeline entries visited during a 53-min DJ set render, with 9-17 minute gaps where the same preset played. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Helps verify the sort comparator fix is working by logging start_time/duration/end_time of the first 20 entries after g_ptr_array_sort in gst_projectm_load_timeline(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

G_DEFINE_TYPE_WITH_CODE was initializing the debug category as "gstprojectm" while plugin_init used "projectm". Since the type init runs AFTER plugin_init (via gst_element_register), it overwrote the category variable with "gstprojectm" which didn't match the GST_DEBUG=projectm:4 setting, causing INFO-level diagnostic messages (PTS tracking, sort order verification) to be suppressed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Use GST_WARNING_OBJECT instead of GST_INFO_OBJECT for timeline diagnostics so they appear regardless of debug category threshold. Includes "build v62" marker to verify correct binary is running. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The build v62 WARNING-level markers were temporary debugging aids to verify the timeline sort fix on RunPod. Now confirmed working (190/190 entries visited), downgrade back to INFO level for production. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… path detection - build.sh: Check ~/.local and /opt/homebrew for ProjectM 4 headers before /usr/local; add PKG_CONFIG_PATH for Apple Silicon - setup.sh: Add gst-plugins-base, gst-plugins-good, gst-plugins-bad, ffmpeg to brew list - convert.sh: Add is_macos() helper; detect macOS and use native CGL/Cocoa OpenGL (skip X11/VirtualGL entirely); add vtenc_h264 encoder selection and pipeline; auto-detect preset paths for macOS (~/.local, /opt/homebrew); fix stat command portability for output monitoring Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… g_error) On macOS, some community presets trigger transient FBO errors during shader compilation. These errors are recoverable — ProjectM continues rendering on the next frame. Previously, g_error() called abort(), crashing the entire pipeline. Now it logs a warning and continues. This fixes the crash when using preset= property on macOS with the full 10k+ community preset library. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When negotiated downstream format is NV12, projectm now does the RGBA→NV12 conversion on the GPU via two GLSL passes against its existing RGBA FBO: 1. Y pass: full-resolution into an R8 texture using BT.601 luma coefficients (0.257*R + 0.504*G + 0.098*B + 16/255) 2. UV pass: half-resolution into a RG8 texture, linear-filtered sampling automatically averages 2x2 blocks for 4:2:0 chroma subsampling ReadPixels then pulls each plane straight into the GstVideoFrame's NV12 plane data — no intermediate buffer copies. This eliminates the downstream `videoconvert ABGR→NV12` CPU step that was the dominant cost on Apple Silicon. End-to-end render went from 2.3x realtime to 0.8x realtime (363s audio: 842s → 300s) on the same hardware/preset/audio. Implementation notes: - Caps: `video/x-raw, format = { ABGR, NV12 }` — caps negotiation picks NV12 when downstream advertises it (e.g. vtenc_h264). - ABGR path remains unchanged for non-NV12 consumers. - VAO is required by GL 3.2 core (macOS Cocoa context); without one every draw fails with GL_INVALID_OPERATION. - Shader uses `#version 150 core` — compatible with the GL 3.2 core profile macOS exposes. GLES2 path not implemented since this optimization targets the local Mac renderer; production GPU pods use a different convert_cuda.sh pipeline. - PBO async readback is bypassed in NV12 mode — it was a workaround for the slow CPU conversion, which no longer exists. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

From PR #3 review (P1): - nv12_render now restores projectm's FBO id (not framebuffer 0) so headless EGL contexts don't raise GL_INVALID_OPERATION. Matches the rule the existing ABGR path already follows. - nv12_init cleans up allocated textures + FBOs on the completeness- check early-return paths (previously leaked if R8/RG8 unsupported). - GL_PACK_ALIGNMENT now reset to default (4) after ReadPixels so later code in the same context gets clean state back. - GL_PACK_ROW_LENGTH comments clarified: values are in pixels, not bytes. Y plane (R8) happens to have identical byte/pixel values but is now computed explicitly so the unit is obvious. P2: - BT.601 studio-swing (limited-range) coefficients documented as intentional; pc-range would desync vtenc_h264's color_range=tv output. - Removed per-frame TexParameteri mutations on the source FBO texture. The 4:2:0 subsampling comes from rendering source-sized quad into a half-sized viewport — hardware bilinear on the source texture (GL_LINEAR set once at FBO creation) averages 2x2 blocks naturally. - Dropped the inverted early-exit in nv12_release; per-resource null guards already handle partial-init cleanup. Smoke test post-fixes: same synthetic audiotestsrc → NV12 → vtenc_h264 pipeline still produces valid 4:2:0 H.264 output. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Per re-review: both nv12_init and nv12_render had `else { bind 0 }` fallback branches after the `priv->fbo_id != 0` check. nv12_render is only called from the main frame handler when nv12_mode is on AND gst_projectm_ensure_render_target returned using_fbo=TRUE — which guarantees fbo_id is non-zero. The else-branches were dead code that would, in their one reachable case, do exactly what the comments warn against: bind framebuffer 0 in headless EGL where it doesn't exist. Removed the fallback, made the bind unconditional. Comments updated to explain why no fallback is needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

revmischa · 2026-04-28T04:23:55Z

please don't commit every preset here

djj0s3 · 2026-04-28T14:51:49Z

Oh sorry! I thought I had forked this work out. Oof, I'll fix. Sorry for the unnecessary noise.

djj0s3 · 2026-04-28T16:08:41Z

Closing — this PR was opened against the wrong repo by mistake. The change lives in our private fork (djj0s3/gst-projectm) and isn't intended for upstream. Apologies for the noise.

djj0s3 and others added 30 commits September 22, 2025 20:30

using gstreamer and projectM Docker

faca332

updated convert script

4a59080

conversion settings

aa6c2f6

fixes

91178ae

codex fixes

bd382d1

latest changes

40ebf2b

update docker

43ee792

Merge remote changes with local updates

a3dd765

Keep timeline-driven presets honest

56d84e1

let gst-convert respect timelines and use bitrate mode

20e221e

fixing a ton of conflicts. whoops

f3f51f6

drop unsupported vbv settings (again)

80ee497

updating logic to not rebuild the container on the fly

f6c2db6

making things work on runpod

75ace44

optimizations

dd577ef

runpod

6210cba

fixes

07c13b6

getting runpod working

b2be214

runpod

a713e93

runpod

33ee818

djj0s3 and others added 23 commits January 27, 2026 21:50

Disable ProjectM easter-egg (W logo) at startup

f786229

The easter-egg property controls a startup logo/feature that shows the ProjectM W logo. Setting it to 0 disables this. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add VJ Studio logo and enable MP4 faststart for YouTube

f7d01a3

- Add vj_studio_logo.png for "Made With VJ Studio" overlay - Enable faststart=true on mp4mux for better YouTube streaming Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Remove "Made With" text from logo for cleaner watermark

32acca9

Cropped top portion of logo to remove "Made With" text, leaving just the VJ character and "STUDIO" for a cleaner bottom-right watermark appearance. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Put error message on same line as filename in callback

afb3c59

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

djj0s3 mentioned this pull request Apr 24, 2026

Add NV12 output: GPU shader-based RGBA→NV12 conversion (2.8x speedup) djj0s3/gst-projectm#3

Merged

djj0s3 and others added 2 commits April 24, 2026 09:14

djj0s3 closed this Apr 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add NV12 output: GPU shader-based RGBA→NV12 conversion (2.8x speedup on Apple Silicon)#25

Add NV12 output: GPU shader-based RGBA→NV12 conversion (2.8x speedup on Apple Silicon)#25
djj0s3 wants to merge 89 commits into
projectM-visualizer:masterfrom
djj0s3:feat/glmemory-output

djj0s3 commented Apr 24, 2026

Uh oh!

revmischa commented Apr 28, 2026 •

edited

Loading

Uh oh!

djj0s3 commented Apr 28, 2026

Uh oh!

djj0s3 commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

djj0s3 commented Apr 24, 2026

Summary

How it works

Caps

Implementation gotchas

Testing

Uh oh!

revmischa commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

djj0s3 commented Apr 28, 2026

Uh oh!

djj0s3 commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

revmischa commented Apr 28, 2026 •

edited

Loading