Skip to content

bsdorra/oidn-wgpu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

oidn-wgpu

WebGPU port prototype for Intel Open Image Denoise.

Status

  • Parses upstream OIDN .tza weight archives.
  • Builds and runs the RT U-Net graph.
  • Has CPU reference tests and browser WebGPU smoke tests.
  • Current RTFilter path supports LDR color, albedo, normal, and tiled runs.

Fetch upstream weights for local development:

npm run fetch:oidn-weights

This clones/pulls the official oidn-weights repository into .tmp/oidn-weights, checks out the pinned ref, pulls Git LFS blobs, validates the required .tza files, and keeps weights as a local symlink.

Test

npm test

To compare the JS reference path against native OIDN for LDR, guided, HDR, and HDR clean-aux model variants:

npm run check:oidn

The checker uses deterministic synthetic inputs, the .tza files in weights/, and tools/oidn-native/.../oidnDenoise.exe when present. Use -- --case hdr_alb_nrm or -- --cases ldr,ldr_alb,hdr to narrow the matrix.

Demo

npm run test:browser:serve

Open:

http://127.0.0.1:5173/test/browser/webgpu_oidn_demo.html

Performance

Recent local 1024x1024 resident-network timings on an RTX5000 Ada with --tune preset:

model WebGPU network native CPU native CUDA
rt_ldr_small.tza 18.3 ms 46.1 ms 11.7 ms
rt_hdr_small.tza 19.2 ms 47.3 ms 11.8 ms
rt_ldr.tza 50.9 ms 83.5 ms 13.8 ms
rt_hdr.tza 51.5 ms 85.0 ms 14.1 ms

Expect the small models to be roughly 1.6x slower than native CUDA but faster than native CPU at 1024x1024. The base models are currently much farther from CUDA, likely because native CUDA uses a more efficient half-precision blocked layout. Sub-1024 results can be noisy due to fixed browser/WebGPU overhead.

Benchmark

To compare the WebGPU resident network path against native OIDN CPU and CUDA for the same .tza model weights:

npm run benchmark:oidn -- --cases ldr_small,ldr,hdr_small,hdr --sizes 256,512,1024 --runs 10

The benchmark writes JSON and Markdown reports under benchmark-results/. Defaults cover representative LDR/HDR, guided, and clean-aux RT model variants. Use --webgpu-mode rtfilter to measure the end-to-end CPU image RTFilter path instead, or --webgpu-modes network,rtfilter to report both. Use --tune off|preset|quick|full, --native-only, --webgpu-only, --devices cpu,cuda, --chrome <path>, or --oidn-denoise <path> to adapt the run to the machine.

Browser Smoke Tests

  • /test/browser/webgpu_oidn_rt_filter.html
  • /test/browser/webgpu_oidn_benchmark_matrix.html
  • /test/browser/webgpu_oidn_network_benchmark.html
  • /test/browser/webgpu_oidn_image_path.html
  • /test/browser/webgpu_oidn_resident_image_path.html
  • /test/browser/webgpu_oidn_full_unet.html

Integrating With A WebGPU Renderer

Use OIDNWebGPUDevice.fromDevice() to share your renderer's GPUDevice. RTFilter accepts CPU images, GPU buffers, or RGBA storage textures.

import { OIDNWebGPUDevice, RgbaTextureFormat, RTFilter, tuneWebGPU } from "./src/index.js";

const weights = await fetch("/weights/rt_hdr_alb_nrm.tza").then((r) => r.arrayBuffer());
const tuning = await tuneWebGPU({
  device: rendererDevice,
  adapter: rendererAdapter,
  weights,
  width,
  height,
  features: ["color", "albedo", "normal"]
});

const oidnDevice = OIDNWebGPUDevice.fromDevice(rendererDevice, tuning.deviceOptions);
const filter = new RTFilter(oidnDevice, { hdr: true });
filter.setWeights(weights);

filter.setImageTexture("color", colorTexture, width, height, { format: RgbaTextureFormat.RGBA32Float });
filter.setImageTexture("albedo", albedoTexture, width, height, { format: RgbaTextureFormat.RGBA16Float });
filter.setImageTexture("normal", normalTexture, width, height, { format: RgbaTextureFormat.RGBA16Float });
filter.setImageTexture("output", denoisedTexture, width, height, { format: RgbaTextureFormat.RGBA16Float });

filter.prepare(width, height, ["color", "albedo", "normal"]);
await filter.execute();

To record into an existing frame command buffer instead of submitting inside OIDN:

const encoder = device.createCommandEncoder();
await filter.execute({ commandEncoder: encoder });
device.queue.submit([encoder.finish()]);

prepare() caches per-size graph resources. Rebinding images keeps parsed weights and GPU weight buffers alive. Call dispose() to release filter-owned resources.

GPU buffers support rgb32float, rgb16float, and rgba16float; textures support rgba16float and rgba32float. GPU execution pads non-16-multiple sizes internally and crops the output back to the requested size. HDR stays linear; LDR uses sRGB transfer. Type declarations are in index.d.ts.

tuneWebGPU() runs a quick cached sweep and returns deviceOptions. Use force: true to retune, mode: "full" for a broader sweep, or OIDNWebGPUDevice.createTuned({ weights, width, height }) when OIDN should own the GPUDevice.

Third-Party Notices

This project builds on the amazing work of the Intel Open Image Denoise project. OIDN as well as the separately fetched OIDN model weights are under Apache License 2.0. See THIRD_PARTY_NOTICES.md for attribution and redistribution notes.

About

WebGPU port prototype for [Intel Open Image Denoise](https://github.com/RenderKit/oidn).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors