oidn-wgpu

WebGPU port prototype for Intel Open Image Denoise.

Status

Parses upstream OIDN .tza weight archives.
Builds and runs the RT U-Net graph.
Has CPU reference tests and browser WebGPU smoke tests.
Current RTFilter path supports LDR color, albedo, normal, and tiled runs.

Fetch upstream weights for local development:

npm run fetch:oidn-weights

This clones/pulls the official oidn-weights repository into .tmp/oidn-weights, checks out the pinned ref, pulls Git LFS blobs, validates the required .tza files, and keeps weights as a local symlink.

Test

npm test

To compare the JS reference path against native OIDN for LDR, guided, HDR, and HDR clean-aux model variants:

npm run check:oidn

The checker uses deterministic synthetic inputs, the .tza files in weights/, and tools/oidn-native/.../oidnDenoise.exe when present. Use -- --case hdr_alb_nrm or -- --cases ldr,ldr_alb,hdr to narrow the matrix.

Demo

npm run test:browser:serve

Open:

http://127.0.0.1:5173/test/browser/webgpu_oidn_demo.html

Performance

Recent local 1024x1024 resident-network timings on an RTX5000 Ada with --tune preset:

model	WebGPU network	native CPU	native CUDA
`rt_ldr_small.tza`	18.3 ms	46.1 ms	11.7 ms
`rt_hdr_small.tza`	19.2 ms	47.3 ms	11.8 ms
`rt_ldr.tza`	50.9 ms	83.5 ms	13.8 ms
`rt_hdr.tza`	51.5 ms	85.0 ms	14.1 ms

Expect the small models to be roughly 1.6x slower than native CUDA but faster than native CPU at 1024x1024. The base models are currently much farther from CUDA, likely because native CUDA uses a more efficient half-precision blocked layout. Sub-1024 results can be noisy due to fixed browser/WebGPU overhead.

Benchmark

To compare the WebGPU resident network path against native OIDN CPU and CUDA for the same .tza model weights:

npm run benchmark:oidn -- --cases ldr_small,ldr,hdr_small,hdr --sizes 256,512,1024 --runs 10

The benchmark writes JSON and Markdown reports under benchmark-results/. Defaults cover representative LDR/HDR, guided, and clean-aux RT model variants. Use --webgpu-mode rtfilter to measure the end-to-end CPU image RTFilter path instead, or --webgpu-modes network,rtfilter to report both. Use --tune off|preset|quick|full, --native-only, --webgpu-only, --devices cpu,cuda, --chrome <path>, or --oidn-denoise <path> to adapt the run to the machine.

Browser Smoke Tests

/test/browser/webgpu_oidn_rt_filter.html
/test/browser/webgpu_oidn_benchmark_matrix.html
/test/browser/webgpu_oidn_network_benchmark.html
/test/browser/webgpu_oidn_image_path.html
/test/browser/webgpu_oidn_resident_image_path.html
/test/browser/webgpu_oidn_full_unet.html

Integrating With A WebGPU Renderer

Use OIDNWebGPUDevice.fromDevice() to share your renderer's GPUDevice. RTFilter accepts CPU images, GPU buffers, or RGBA storage textures.

import { OIDNWebGPUDevice, RgbaTextureFormat, RTFilter, tuneWebGPU } from "./src/index.js";

const weights = await fetch("/weights/rt_hdr_alb_nrm.tza").then((r) => r.arrayBuffer());
const tuning = await tuneWebGPU({
  device: rendererDevice,
  adapter: rendererAdapter,
  weights,
  width,
  height,
  features: ["color", "albedo", "normal"]
});

const oidnDevice = OIDNWebGPUDevice.fromDevice(rendererDevice, tuning.deviceOptions);
const filter = new RTFilter(oidnDevice, { hdr: true });
filter.setWeights(weights);

filter.setImageTexture("color", colorTexture, width, height, { format: RgbaTextureFormat.RGBA32Float });
filter.setImageTexture("albedo", albedoTexture, width, height, { format: RgbaTextureFormat.RGBA16Float });
filter.setImageTexture("normal", normalTexture, width, height, { format: RgbaTextureFormat.RGBA16Float });
filter.setImageTexture("output", denoisedTexture, width, height, { format: RgbaTextureFormat.RGBA16Float });

filter.prepare(width, height, ["color", "albedo", "normal"]);
await filter.execute();

To record into an existing frame command buffer instead of submitting inside OIDN:

const encoder = device.createCommandEncoder();
await filter.execute({ commandEncoder: encoder });
device.queue.submit([encoder.finish()]);

prepare() caches per-size graph resources. Rebinding images keeps parsed weights and GPU weight buffers alive. Call dispose() to release filter-owned resources.

GPU buffers support rgb32float, rgb16float, and rgba16float; textures support rgba16float and rgba32float. GPU execution pads non-16-multiple sizes internally and crops the output back to the requested size. HDR stays linear; LDR uses sRGB transfer. Type declarations are in index.d.ts.

tuneWebGPU() runs a quick cached sweep and returns deviceOptions. Use force: true to retune, mode: "full" for a broader sweep, or OIDNWebGPUDevice.createTuned({ weights, width, height }) when OIDN should own the GPUDevice.

Third-Party Notices

This project builds on the amazing work of the Intel Open Image Denoise project. OIDN as well as the separately fetched OIDN model weights are under Apache License 2.0. See THIRD_PARTY_NOTICES.md for attribution and redistribution notes.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
src		src
test		test
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
THIRD_PARTY_NOTICES.md		THIRD_PARTY_NOTICES.md
index.d.ts		index.d.ts
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

oidn-wgpu

Status

Test

Demo

Performance

Benchmark

Browser Smoke Tests

Integrating With A WebGPU Renderer

Third-Party Notices

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

oidn-wgpu

Status

Test

Demo

Performance

Benchmark

Browser Smoke Tests

Integrating With A WebGPU Renderer

Third-Party Notices

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages