Skip to content

Add multi-gencode support for fat binary builds#22

Open
w4nderlust wants to merge 2 commits into
Narsil:mainfrom
w4nderlust:main
Open

Add multi-gencode support for fat binary builds#22
w4nderlust wants to merge 2 commits into
Narsil:mainfrom
w4nderlust:main

Conversation

@w4nderlust
Copy link
Copy Markdown

  • Add compute_caps() builder method for multiple architecture targets
  • Add CUDA_COMPUTE_CAPS env var (plural) with priority over CUDA_COMPUTE_CAP
  • Support CUDA_COMPUTE_CAPS=all to expand to default set (75,80,86,89,90)
  • build_lib() uses -gencode flags when multiple CCs configured
  • build_ptx() compiles for lowest CC (forward-compatible via JIT)
  • Validate all requested CCs against nvcc --list-gpu-code
  • Expose resolved CC list via cargo:rustc-env=CUDA_COMPUTE_CAPS
  • Add unit tests for gencode flag generation
  • Full backward compatibility when new env vars are unset

- Add compute_caps() builder method for multiple architecture targets
- Add CUDA_COMPUTE_CAPS env var (plural) with priority over CUDA_COMPUTE_CAP
- Support CUDA_COMPUTE_CAPS=all to expand to default set (75,80,86,89,90)
- build_lib() uses -gencode flags when multiple CCs configured
- build_ptx() compiles for lowest CC (forward-compatible via JIT)
- Validate all requested CCs against nvcc --list-gpu-code
- Expose resolved CC list via cargo:rustc-env=CUDA_COMPUTE_CAPS
- Add unit tests for gencode flag generation
- Full backward compatibility when new env vars are unset
Detect GPU compute capability at build time by dynamically loading
libcuda.so.1 (Linux) or nvcuda.dll (Windows) and calling cuInit +
cuDeviceGetAttribute directly — same approach as cuda-diagnostic.

This eliminates the nvidia-smi PATH dependency that caused silent
fallback to sm_75 on Windows, which compiled out all BF16 kernels
(gated behind __CUDA_ARCH__ >= 800) and caused severe perf regressions.

Fallback raised from sm_75 to sm_80 when no driver is available.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant