WebGPU port prototype for Intel Open Image Denoise.
- Parses upstream OIDN
.tzaweight archives. - Builds and runs the RT U-Net graph.
- Has CPU reference tests and browser WebGPU smoke tests.
- Current
RTFilterpath supports LDR color, albedo, normal, and tiled runs.
Fetch upstream weights for local development:
npm run fetch:oidn-weightsThis clones/pulls the official oidn-weights repository into
.tmp/oidn-weights, checks out the pinned ref, pulls Git LFS blobs, validates
the required .tza files, and keeps weights as a local symlink.
npm testTo compare the JS reference path against native OIDN for LDR, guided, HDR, and HDR clean-aux model variants:
npm run check:oidnThe checker uses deterministic synthetic inputs, the .tza files in weights/,
and tools/oidn-native/.../oidnDenoise.exe when present. Use -- --case hdr_alb_nrm or -- --cases ldr,ldr_alb,hdr to narrow the matrix.
npm run test:browser:serveOpen:
http://127.0.0.1:5173/test/browser/webgpu_oidn_demo.html
Recent local 1024x1024 resident-network timings on an RTX5000 Ada with --tune preset:
| model | WebGPU network | native CPU | native CUDA |
|---|---|---|---|
rt_ldr_small.tza |
18.3 ms | 46.1 ms | 11.7 ms |
rt_hdr_small.tza |
19.2 ms | 47.3 ms | 11.8 ms |
rt_ldr.tza |
50.9 ms | 83.5 ms | 13.8 ms |
rt_hdr.tza |
51.5 ms | 85.0 ms | 14.1 ms |
Expect the small models to be roughly 1.6x slower than native CUDA but faster than native CPU at 1024x1024. The base models are currently much farther from CUDA, likely because native CUDA uses a more efficient half-precision blocked layout. Sub-1024 results can be noisy due to fixed browser/WebGPU overhead.
To compare the WebGPU resident network path against native OIDN CPU and CUDA
for the same .tza model weights:
npm run benchmark:oidn -- --cases ldr_small,ldr,hdr_small,hdr --sizes 256,512,1024 --runs 10The benchmark writes JSON and Markdown reports under benchmark-results/.
Defaults cover representative LDR/HDR, guided, and clean-aux RT model variants.
Use --webgpu-mode rtfilter to measure the end-to-end CPU image RTFilter
path instead, or --webgpu-modes network,rtfilter to report both. Use
--tune off|preset|quick|full, --native-only, --webgpu-only,
--devices cpu,cuda, --chrome <path>, or --oidn-denoise <path> to adapt
the run to the machine.
/test/browser/webgpu_oidn_rt_filter.html/test/browser/webgpu_oidn_benchmark_matrix.html/test/browser/webgpu_oidn_network_benchmark.html/test/browser/webgpu_oidn_image_path.html/test/browser/webgpu_oidn_resident_image_path.html/test/browser/webgpu_oidn_full_unet.html
Use OIDNWebGPUDevice.fromDevice() to share your renderer's GPUDevice.
RTFilter accepts CPU images, GPU buffers, or RGBA storage textures.
import { OIDNWebGPUDevice, RgbaTextureFormat, RTFilter, tuneWebGPU } from "./src/index.js";
const weights = await fetch("/weights/rt_hdr_alb_nrm.tza").then((r) => r.arrayBuffer());
const tuning = await tuneWebGPU({
device: rendererDevice,
adapter: rendererAdapter,
weights,
width,
height,
features: ["color", "albedo", "normal"]
});
const oidnDevice = OIDNWebGPUDevice.fromDevice(rendererDevice, tuning.deviceOptions);
const filter = new RTFilter(oidnDevice, { hdr: true });
filter.setWeights(weights);
filter.setImageTexture("color", colorTexture, width, height, { format: RgbaTextureFormat.RGBA32Float });
filter.setImageTexture("albedo", albedoTexture, width, height, { format: RgbaTextureFormat.RGBA16Float });
filter.setImageTexture("normal", normalTexture, width, height, { format: RgbaTextureFormat.RGBA16Float });
filter.setImageTexture("output", denoisedTexture, width, height, { format: RgbaTextureFormat.RGBA16Float });
filter.prepare(width, height, ["color", "albedo", "normal"]);
await filter.execute();To record into an existing frame command buffer instead of submitting inside OIDN:
const encoder = device.createCommandEncoder();
await filter.execute({ commandEncoder: encoder });
device.queue.submit([encoder.finish()]);prepare() caches per-size graph resources. Rebinding images keeps parsed
weights and GPU weight buffers alive. Call dispose() to release filter-owned
resources.
GPU buffers support rgb32float, rgb16float, and rgba16float; textures
support rgba16float and rgba32float. GPU execution pads non-16-multiple
sizes internally and crops the output back to the requested size. HDR stays
linear; LDR uses sRGB transfer. Type declarations are in index.d.ts.
tuneWebGPU() runs a quick cached sweep and returns deviceOptions. Use
force: true to retune, mode: "full" for a broader sweep, or
OIDNWebGPUDevice.createTuned({ weights, width, height }) when OIDN should own
the GPUDevice.
This project builds on the amazing work of the Intel Open Image Denoise project. OIDN as well as the separately fetched OIDN model weights are under Apache License 2.0. See THIRD_PARTY_NOTICES.md for attribution and redistribution notes.