Add multi-gencode support for fat binary builds#22
Open
w4nderlust wants to merge 2 commits into
Open
Conversation
w4nderlust
commented
Feb 22, 2026
- Add compute_caps() builder method for multiple architecture targets
- Add CUDA_COMPUTE_CAPS env var (plural) with priority over CUDA_COMPUTE_CAP
- Support CUDA_COMPUTE_CAPS=all to expand to default set (75,80,86,89,90)
- build_lib() uses -gencode flags when multiple CCs configured
- build_ptx() compiles for lowest CC (forward-compatible via JIT)
- Validate all requested CCs against nvcc --list-gpu-code
- Expose resolved CC list via cargo:rustc-env=CUDA_COMPUTE_CAPS
- Add unit tests for gencode flag generation
- Full backward compatibility when new env vars are unset
- Add compute_caps() builder method for multiple architecture targets - Add CUDA_COMPUTE_CAPS env var (plural) with priority over CUDA_COMPUTE_CAP - Support CUDA_COMPUTE_CAPS=all to expand to default set (75,80,86,89,90) - build_lib() uses -gencode flags when multiple CCs configured - build_ptx() compiles for lowest CC (forward-compatible via JIT) - Validate all requested CCs against nvcc --list-gpu-code - Expose resolved CC list via cargo:rustc-env=CUDA_COMPUTE_CAPS - Add unit tests for gencode flag generation - Full backward compatibility when new env vars are unset
Detect GPU compute capability at build time by dynamically loading libcuda.so.1 (Linux) or nvcuda.dll (Windows) and calling cuInit + cuDeviceGetAttribute directly — same approach as cuda-diagnostic. This eliminates the nvidia-smi PATH dependency that caused silent fallback to sm_75 on Windows, which compiled out all BF16 kernels (gated behind __CUDA_ARCH__ >= 800) and caused severe perf regressions. Fallback raised from sm_75 to sm_80 when no driver is available.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.