WebGPU fragment shader optimization by cabanier · Pull Request #8733 · playcanvas/engine

cabanier · 2026-05-15T20:09:04Z

Emit WebGPU fragment builtins only when the processed source references the corresponding pc* globals. This avoids carrying unused front-facing, primitive-index, position, and sample-index inputs through material fragment shaders; in particular sample_index is no longer requested unless a shader actually needs pcSampleIndex.

Refactor the WGSL clustered-light hot path to avoid mutating a large ClusterLightData through ptr helper calls. Core light decode now returns value data, and optional spot, area, shadow, cookie, and omni-atlas data is decoded into smaller values at the point of use to reduce register pressure and potential function-memory spills.

Emit WebGPU fragment builtins only when the processed source references the corresponding pc* globals. This avoids carrying unused front-facing, primitive-index, position, and sample-index inputs through material fragment shaders; in particular sample_index is no longer requested unless a shader actually needs pcSampleIndex. Refactor the WGSL clustered-light hot path to avoid mutating a large ClusterLightData through ptr<function> helper calls. Core light decode now returns value data, and optional spot, area, shadow, cookie, and omni-atlas data is decoded into smaller values at the point of use to reduce register pressure and potential function-memory spills.

cabanier · 2026-05-15T20:09:12Z

The fragment shader overhead was mostly from generated WGSL asking the compiler to carry data the shader did not actually use.

Before the fix, the WGSL processor always emitted these fragment inputs/globals for WebGPU:

@Builtin(position) position : vec4f,
@Builtin(front_facing) frontFacing : bool,
@Builtin(sample_index) sampleIndex : u32,
@Builtin(primitive_index) primitiveIndex : u32,

and copied them into private globals:

pcPosition = input.position;
pcFrontFacing = input.frontFacing;
pcSampleIndex = input.sampleIndex;
pcPrimitiveIndex = input.primitiveIndex;

For the material shaders we compared, only pcPosition was used for fog. pcFrontFacing, pcPrimitiveIndex, and usually pcSampleIndex were dead plumbing. In particular, sample_index can be expensive because requesting it may force sample-rate fragment shading on MSAA targets, which is much more work than pixel-rate shading.

The clustered-light path also had WGSL-specific overhead: it decoded light data into a large ClusterLightData local and passed it through helpers as ptr<function, ClusterLightData>. That makes the hot per-light loop look like mutable function-memory traffic to the compiler. It can increase register pressure or cause spills, especially because the struct included fields only needed by optional spot/shadow/cookie/area paths.

The fix was:

emit position, front_facing, sample_index, and primitive_index only when the final fragment source references pcPosition, pcFrontFacing, pcSampleIndex,
or pcPrimitiveIndex;
change clustered-light helpers to return smaller value structs/vectors instead of mutating a large pointer-passed ClusterLightData;
remove the half precision conversion churn, which was adding lots of half(...), half3(...), and f32(...) casts around ordinary lighting math.

cabanier changed the title ~~Webgpu fragment shader optimization~~ WebGPU fragment shader optimization May 15, 2026

cabanier mentioned this pull request May 15, 2026

Experiment with WebXR support using WebGPU #7404

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WebGPU fragment shader optimization#8733

WebGPU fragment shader optimization#8733
cabanier wants to merge 1 commit into
playcanvas:mainfrom
cabanier:webgpu_fragment_shader_optimization

cabanier commented May 15, 2026

Uh oh!

cabanier commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cabanier commented May 15, 2026

Uh oh!

cabanier commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant