A GPU-accelerated wavefront path tracer built from scratch in CUDA C++17. Renders physically-based scenes interactively at 1920x1080 with real-time camera control via CUDA-OpenGL interop. Supports OBJ mesh loading, HDRI environment maps, texture-mapped materials, and a physically-based shading system, with some experimental spectral and ReSTIR code also present in the repository.
Gold dragon: SAH-BVH traversal on a 249k-triangle OBJ mesh with HDRI lighting.
Material gallery: the full BSDF set across a GGX roughness sweep.
Cornell box: global illumination, color bleeding, soft area-light shadows.
animation.mp4
See the full demo & results page for more renders, the live orbit video, and RTX 4090 benchmarks.
- Wavefront path tracing - separates ray generation, intersection, shading, and accumulation into distinct GPU kernels for maximum occupancy and minimal warp divergence
- Structure-of-Arrays (SoA) memory layout - all per-path and per-hit data is stored in SoA form (
PathStateSoA,HitInfoSoA,ReservoirSoA) with__restrict__pointer views for coalesced global memory access - Physically-based material system - Lambertian, Oren-Nayar rough diffuse, GGX metal, rough/smooth dielectric, plastic, thin-film, and emissive materials
- Texture-mapped materials - per-material albedo, roughness, and normal textures are sampled in the main shading path, and the OBJ loader wires common material texture fields when present
- OBJ mesh loading - import arbitrary triangle meshes with per-vertex normals, UV coordinates, optional normal recalculation, and configurable scale/transform
- HDRI environment mapping - HDR radiance environment maps with configurable intensity for image-based lighting
- SAH-accelerated BVH - CPU-built surface area heuristic BVH with stackful GPU traversal, watertight triangle intersection, and dedicated shadow ray occlusion test
- CUDA-OpenGL interop - zero-copy display via pixel buffer object (PBO); CUDA writes tonemapped pixels directly into the mapped OpenGL buffer each frame
- ACES tonemapping + sRGB gamma - HDR accumulation buffer with progressive running average, ACES filmic curve, interactive exposure control, and proper linear-to-sRGB conversion
Frame N
=======
generate_rays_kernel 1 ray per pixel, jittered + optional DOF
|
v
+----------- bounce loop (up to 8 bounces) -----------+
| |
| intersect_kernel BVH traversal per active ray |
| | |
| shade_miss_kernel HDRI environment / gradient |
| | sky fallback |
| | |
| shade_surface_kernel BSDF sampling, Russian |
| | roulette (depth > 3), |
| | emissive hit detection |
| | |
| swap work queues compact via atomicAdd |
+--------------------------------------------------------+
|
accumulate_kernel progressive running average
|
tonemap_kernel ACES filmic + sRGB output
|
OpenGL PBO blit display via fullscreen quad
Active path compaction between bounces uses a pair of work queues swapped each iteration. Surviving paths are appended to the next queue via atomicAdd, naturally eliminating terminated paths without a separate stream compaction pass.
Requirements: CUDA Toolkit (12.x recommended), OpenGL, CMake 3.24+, Visual Studio 2022
GLFW 3.3.9 and GLM 1.0.1 are fetched automatically via CMake FetchContent. GLAD is vendored.
# One-step build and run
powershell -File run.ps1
# Manual build
cmake -S . -B build -G "Visual Studio 17 2022" -A x64
cmake --build build --config Release --target wavefront-path-tracer
./build/bin/Release/wavefront-path-tracer.exe [options]Target CUDA architectures: SM 75 (Turing), SM 86 (Ampere), SM 89 (Ada Lovelace).
Scene & rendering
| Option | Description | Default |
|---|---|---|
--width <n> |
Window / render width | 1920 |
--height <n> |
Window / render height | 1080 |
--scene <path> |
Load an OBJ mesh file | (built-in demo) |
--scale <float> |
Scale factor for loaded OBJ | 1.0 |
--hdri <path> |
HDR environment map (.hdr) |
(gradient sky) |
--hdri-intensity <float> |
Environment map intensity | 2.0 |
--floor |
Add a neutral ground plane under a loaded OBJ (contact shadows) | off |
--material <name> |
Override a loaded OBJ's material: gold, chrome, copper, silver, obsidian, jade, pearl, marble, glass, plastic |
(OBJ's own) |
--cornell |
Render the built-in classic Cornell box (ignored if --scene is given) |
off |
--help |
Show usage |
Headless rendering (no display / window required)
| Option | Description | Default |
|---|---|---|
--headless |
Render to PNG without a window | off |
--frames <n> (alias --spp) |
Samples per pixel in headless / animation mode | 256 (64 for animation) |
--output <path> (alias -o) |
Output PNG path in headless mode | render.png |
Camera framing (applied to the auto-framed camera)
| Option | Description | Default |
|---|---|---|
--cam-yaw <deg> |
Orbit the camera horizontally | 0 (straight-on) |
--cam-pitch <deg> |
Orbit the camera vertically | 0 |
--cam-zoom <mult> |
Scale the auto-framed distance (<1 zooms in, >1 out) |
1.0 |
Model orientation (for a loaded OBJ)
| Option | Description | Default |
|---|---|---|
--model-yaw <deg> |
Rotate the mesh about the vertical (Y) axis | 0 |
--model-pitch <deg> |
Rotate the mesh about the X axis | 0 |
--model-roll <deg> |
Rotate the mesh about the Z axis | 0 |
Headless camera animation (implies --headless)
| Option | Description | Default |
|---|---|---|
--animate <preset> |
Render a camera-path video. Presets: orbit, dolly |
|
--anim-frames <n> |
Number of frames in the animation | 120 |
--fps <n> |
Frames per second for video encoding | 30 |
--orbit-degrees <d> |
Total yaw sweep for the orbit preset |
360 |
--dolly-start <m> |
Dolly distance multiplier at start | 1.0 |
--dolly-end <m> |
Dolly distance multiplier at end | 0.45 |
--output-dir <dir> |
Directory for the PNG frame sequence | animation |
--video |
Encode the frame sequence to a video via ffmpeg | off |
--video-format <f> |
Video container: mp4 or gif |
mp4 |
# Load a mesh with an HDRI environment (interactive)
./build/bin/Release/wavefront-path-tracer.exe --scene models/dragon.obj --scale 0.5 --hdri envmaps/studio.hdr
# Headless hero still: 512 spp straight to PNG, no display needed
./build/bin/Release/wavefront-path-tracer.exe --headless --scene models/bunny.obj --model-yaw 90 --floor --frames 512 -o bunny.png
# Headless 360-degree orbit encoded to mp4
./build/bin/Release/wavefront-path-tracer.exe --animate orbit --scene models/dragon.obj --anim-frames 180 --output-dir orbit_frames --video| Input | Action |
|---|---|
| Left mouse drag | Orbit camera |
| Shift + drag | Pan camera |
| Ctrl + drag | Zoom |
| Scroll wheel | Zoom in/out |
+ / - |
Adjust exposure |
R |
Reset accumulation |
P |
Save screenshot (PNG) |
Esc |
Quit |
The camera automatically frames loaded scenes based on world bounds.
Pressing P writes the current frame to screenshots/screenshot_YYYYMMDD_HHMMSS.png (the directory is created automatically next to the working directory).
src/
app/ Entry point, window, camera, scene, demo scenes
core/
math/ Vector/matrix ops, sampling, spectral rendering
memory/ DeviceBuffer RAII wrappers, CUDA_CHECK macros
random/ PCG32 RNG for device code
texture/ Texture manager, CUDA texture objects, HDR/LDR loading
geometry/
bvh/ BVH node layout, SAH builder, GPU traversal
primitives/ Triangle, sphere helpers, AABB
integrators/
wavefront/ Wavefront kernels, path state (SoA), active/next work queues
lighting/
restir/ Experimental ReSTIR DI prototype and reservoir helpers
materials/
bsdf/ Lambert, Oren-Nayar, GGX conductor, dielectric,
plastic, thin film, emission
spectral/ Sellmeier equation, Cauchy dispersion, complex IOR,
blackbody radiation, metal spectral data utilities
external/
glad/ Vendored OpenGL loader
Traditional GPU path tracers use a single megakernel per bounce. This renderer uses the wavefront architecture (Laine et al. 2013) which splits each bounce into separate kernels. This eliminates register pressure from monolithic shaders and allows each kernel to run at optimal occupancy.
All path state is stored in Structure-of-Arrays layout rather than AoS. When a warp of 32 threads accesses ray_origin_x[thread_id], the loads coalesce into a single memory transaction. The View structs hold raw __restrict__ device pointers passed to kernels, while SoA structs own the underlying DeviceBuffer allocations.
| BSDF | Model | Sampling |
|---|---|---|
| Lambert | Cosine-weighted diffuse | Cosine hemisphere sampling |
| Oren-Nayar | Rough diffuse with angle-dependent reflectance | Cosine hemisphere sampling |
| Metal (GGX Conductor) | Microfacet metal; perfect mirror at zero roughness | VNDF (visible normal distribution function) |
| Dielectric | Smooth or rough glass with Fresnel reflection/transmission | VNDF + refraction via Snell's law |
| Plastic | Diffuse substrate with specular GGX coat | Cosine hemisphere + VNDF |
| Thin Film | Thin-film-inspired RGB interference model | Stochastic specular or diffuse branch |
| Emission | Emissive surface (area light) | N/A (light source) |
GGX sampling uses the Heitz 2018 VNDF method for importance sampling the visible microfacet normals, which gives zero-variance weighting in the specular limit.
Per-material texture support includes albedo maps, roughness maps, and normal maps. Textures are loaded via the HDR/LDR image loaders and bound as CUDA texture objects for hardware-accelerated bilinear filtering. Color textures are decoded as sRGB while roughness and normal textures stay in linear space. A texture cache automatically deduplicates repeated loads.
The codebase includes hero-wavelength sampling helpers, CIE XYZ/sRGB conversion utilities, Sellmeier dispersion models for several dielectrics, and blackbody helper functions. The main interactive render loop on this branch still accumulates RGB radiance, so these spectral pieces are better thought of as supporting utilities and experimental groundwork than the default rendering path.
HDRI environment maps (.hdr radiance format) provide image-based lighting. When no HDRI is loaded, a procedural gradient sky is used as fallback. For OBJ scenes without explicit lights, a default three-point lighting setup (key, fill, rim) is generated automatically.
The repository contains a reservoir-based direct-lighting prototype inspired by Bitterli et al. 2020, including alias-table sampling plus temporal and spatial resampling passes. It is not currently wired into the main interactive renderer loop described above.
The active renderer traces triangle geometry through the BVH. Triangles use watertight intersection with precomputed edge data and per-vertex normal/UV interpolation. Procedural shape generators (box, pyramid, torus, octahedron, UV sphere) are available for scene construction, and arbitrary meshes can be imported via OBJ loading with configurable scale and optional smooth normal recalculation.


