Add Windows support#354
Conversation
|
Regarding Incidentally, doing this review would unblock #323 |
|
/ok to test dd1ffc9 |
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughSummary by CodeRabbit
suggestion: WalkthroughAdds Windows build/test support: enables Windows symbol export and MSVC link option, adds a CUDA profiler API installer and CI invocation, re-enables Windows PR job, makes CUPTI import platform-aware, adjusts CUDA/MSVC compile options and C++ detection, and configures per-test runtime library paths. suggestion: ChangesWindows build and test support
suggestion:
suggestion:
suggestion:
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: ffe8731a-1bea-4fe3-bd00-ae6f639bb863
📒 Files selected for processing (8)
CMakeLists.txtci/build_common.shcmake/NVBenchCUPTI.cmakecmake/NVBenchConfigTarget.cmakenvbench/config.cuh.intesting/axes_metadata.cutesting/cmake/CMakeLists.txttesting/cmake/test_export/CMakeLists.txt
|
I will revert the changes to |
|
@mfranzrebsal The #362 to enable MSVC build of NVBench has been merged, but it is presently unconditionally skipped due to known build failure this PR fixes. Please merge main into this branch, and revert c632eb2 to reenable the PR. The expectation is that CI build using MSVC would complete successfully now. |
edec813 to
787e435
Compare
|
The commit you mention is nowhere to be found, either in main or my rebased branch. I think we are good? |
|
/ok to test 787e435 |
|
@mfranzrebsal Right now the CI has Windows build job disabled in pr.yml#L82-83. Please push a change to remove these two lines to enable the job. |
Remove gate that disables Windows NVBench build job in pr.yaml
|
/ok to test 78b674b |
|
Windows build job fails with: On Linux, the compilation command for This folder should contain "cuda_profiler_api.h" though. |
|
Per my agent, the devcontainer does not install CUDA Profiler API component. I will try installing CUDA Profiler API in the container next |
|
/ok to test 460e14f |
|
/ok to test c6cd097 |
|
/ok to test f8c0554 |
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 4
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 7e39f8ee-6c3d-442e-9516-deabdc1e7d19
📒 Files selected for processing (11)
.github/workflows/build-windows.yml.github/workflows/pr.ymlCMakeLists.txtci/windows/install_cuda_profiler_api.ps1cmake/NVBenchCUPTI.cmakecmake/NVBenchConfigTarget.cmakenvbench/CMakeLists.txtnvbench/config.cuh.intesting/axes_metadata.cutesting/cmake/CMakeLists.txttesting/cmake/test_export/CMakeLists.txt
💤 Files with no reviewable changes (1)
- .github/workflows/pr.yml
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
.github/workflows/build-windows.yml (1)
122-166: ⚡ Quick winsuggestion: Pass
NVBENCH_WINDOWS_CUDA,NVBENCH_WINDOWS_STD, andNVBENCH_WINDOWS_ARCHthrough Docker--envand resolve them in-container.The generated PowerShell script currently relies on host-side interpolation: because the here-string uses double quotes,
$env:NVBENCH_WINDOWS_CUDAand related variables expand when the script is written on the host, not in the container. The Docker args never forward these variables, so the container has no access to them at runtime. Forward them explicitly with--env NVBENCH_WINDOWS_CUDA=...and backtick-escape the$env:references in the here-string so they evaluate in-container instead.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 317fd287-282f-4378-b196-b1fb8d3d98cc
📒 Files selected for processing (11)
.github/workflows/build-windows.yml.github/workflows/pr.ymlCMakeLists.txtci/windows/install_cuda_profiler_api.ps1cmake/NVBenchCUPTI.cmakecmake/NVBenchConfigTarget.cmakenvbench/CMakeLists.txtnvbench/config.cuh.intesting/axes_metadata.cutesting/cmake/CMakeLists.txttesting/cmake/test_export/CMakeLists.txt
💤 Files with no reviewable changes (1)
- .github/workflows/pr.yml
|
/ok to test ccfa1b5 |
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 1
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 8008810e-7c32-43f8-a2c8-940704148f55
📒 Files selected for processing (11)
.github/workflows/build-windows.yml.github/workflows/pr.ymlCMakeLists.txtci/windows/install_cuda_profiler_api.ps1cmake/NVBenchCUPTI.cmakecmake/NVBenchConfigTarget.cmakenvbench/CMakeLists.txtnvbench/config.cuh.intesting/axes_metadata.cutesting/cmake/CMakeLists.txttesting/cmake/test_export/CMakeLists.txt
💤 Files with no reviewable changes (1)
- .github/workflows/pr.yml
1. Install CUDA Profiler API into toolkit matching to what is installed in dev-container 2. Pass linker argument to use main from static nvbench_main library when linking examples and tests 3. Instruct MSVC to use standard-compliant preprocessor 4. Use environment modification for targets to help them find shared libraries needed as runtime, such as CUPTI on Windows/Linux. Remainder is aggregation of 53 individual commit messages Install CUDA Profiler API into toolkit Add intall_cuda_profiler_api.ps1 Inform MSVC that static library export main Attempt to fix "LINK : fatal error LNK1561: entry point must be defined" when building benchmarks which need main function provided by static library libnvbench_main after NVIDIA#350 Review feedback to PowerShell script Fix how CMAKE_CUDA_HOST_COMPILER is set in call to cmake Filter out empty directories LD_LIBRARY_PATH/PATH Act on review feedback regarding corner cases when testing may dependent on the directory it is performed from Check that cudaVersion and :CUDA_PATH are consistent Do not overwrite ENVIRONMENT property with empty values Implement retry logic in downloading of CUDA Profiler API Strengthen publisher verification of downloaded artifact Prepend new folders to LD_LIBRARY_PATH, do not overwrite Implement timeout, fail on 40x HTTP response code 4xx responses now fail immediately, and the installer is bounded to 15 minutes before being killed and reported as a timeout. USE ENVIRONMENT_MODIFICATION property, not ENVIRONMENT escape environment modification values Fix cmake script error breaking the build Added recommented timeout to Invoke-WebRequest Set cmake_minimum_required version to 3.30.4, consistent with main project Pass NVBENCH environment variables through docker for Windows build Export IMPORTLIB_LOCATION for CUPTI on Windows and use in testing projects Add Zc:preprocessor to host compiler on Windows. Configure runtime env for tests to find CUPTI library Better fix to add /Zc:preprocessor that also propagates to header testing target Address code rabbit concern Validate before casting in PowerShell script decouple nvbench runtime path setup from cupti target detection Normalize multiple ARCH args Better validation of gpu_args parameter use get_imported_location to get CUPTI library to improve multi-config support Validation of combinations of gpu, run_tests and device_testing Resolve code-rabbit concern in handling multiple imported configurations to match build type, if set Reject GPU requests for forks Prevents installing cuda_profiler_api.h into one toolkit while CMake builds with another. Fail fast for deterministic client errors returned by download request more robust imported_location computation Make Linux also use ENVIRONMENT_MODIFICATION to simplify code run_tests=false is not allows when device_testing=true Specify Windows CUDA toolkit version major.minor.patch, derive devcontainer tag from full spec Handle edge case when multiple CUPTI dlls exist, pick up, warn, do not fail Always specify -DNVBench_ENABLE_DEVICE_TESTING=VAL per value of Back to cuda major.minor being input What CUDA Profiler API to install is determined from redist information stored in version.json stored at root of CUDA Toolkit. If version.json is not found, an error occurs Remove parameters intended to enable testing builds on Windows. Deferred for future work Handle import nvbench::nvbench the same as nvbench target in NVBenchConfigTarget Forward cmake variables only if set Use UTF-8 encoding when appending to GITHUB_OUTPUT Avoid power-shell footgun where local variable shadows builtin variable due to case insensitivity enable device testing parameter in build_nvbench, passed as True by workflow Lower CMake version required as much as possible LINKER:/INCLUDE:main for proper CUDA link driver routing Add conda-specific hints for find_library call to find CUPTI test_export must require 3.22 version ENVIRONMENT_MODIFICATION feature was added in 3.22.0 https://cmake.org/cmake/help/latest/prop_test/ENVIRONMENT_MODIFICATION.html Delete unused function Test-Preset Guard the CUPTI runtime path extraction Check before executing cmake_path() in testing/cmake/CMakeLists.txt Also, use nvbench_get_imported_location to extract imported location use the config-aware generator expression for all runtime targets Remove the configure-time imported-location helper entirely. Deduplicate WINDOWS_CI_IMAGE construction
ccfa1b5 to
0ca8414
Compare
|
/ok to test 0ca8414 |
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
♻️ Duplicate comments (1)
testing/cmake/test_export/CMakeLists.txt (1)
1-1:⚠️ Potential issue | 🟠 Major | ⚡ Quick winimportant: bump
cmake_minimum_requiredto3.30.4(or newer).3.22.0is below the repository baseline for**/CMakeLists.txtand can drift from rapids-cmake compatibility expectations.-cmake_minimum_required(VERSION 3.22.0) +cmake_minimum_required(VERSION 3.30.4)As per coding guidelines,
**/CMakeLists.txt: CMake minimum version must be 3.30.4 or higher for rapids-cmake compatibility.#!/bin/bash # Verify minimum CMake versions declared across CMakeLists.txt files. fd -a 'CMakeLists.txt' | while read -r f; do awk -v file="$f" ' match($0,/cmake_minimum_required\(VERSION[[:space:]]*([0-9]+\.[0-9]+\.[0-9]+)/,m){ printf "%s:%d:%s\n", file, NR, m[1] }' "$f" done
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 34f64449-f3b7-4dff-8c2f-e26e3b7212be
📒 Files selected for processing (16)
.github/workflows/build-windows.yml.github/workflows/pr.ymlCMakeLists.txtci/windows/build_nvbench.ps1ci/windows/install_cuda_profiler_api.ps1cmake/NVBenchCUPTI.cmakecmake/NVBenchConfigTarget.cmakeexamples/CMakeLists.txtexec/CMakeLists.txtnvbench/CMakeLists.txtnvbench/config.cuh.intesting/CMakeLists.txttesting/axes_metadata.cutesting/cmake/CMakeLists.txttesting/cmake/test_export/CMakeLists.txttesting/device/CMakeLists.txt
💤 Files with no reviewable changes (1)
- .github/workflows/pr.yml
|
@coderabbitai disagree with nitpick comment made above in #354 (review) See my counter-argument in #367 (comment) and associated learning. |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 1306 🏁 Script executed: Length of output: 1021 🏁 Script executed: Length of output: 370
🐇 (`_/) noted and corrected ✏️ Learnings added
|
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
cmake/NVBenchCUPTI.cmake (1)
58-69: 💤 Low valuesuggestion: Natural descending sort picks the lexicographically "largest" name, which may not always correspond to the newest version (e.g.,
cupti64_9.dll>cupti64_10.dlllexicographically, but numerically 10 > 9). Consider usingVERSIONcompare if filenames follow semver patterns, or document the assumption that version suffixes sort correctly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: e6acfbeb-b3fd-458a-b299-411c657c85dc
📒 Files selected for processing (16)
.github/workflows/build-windows.yml.github/workflows/pr.ymlCMakeLists.txtci/windows/build_nvbench.ps1ci/windows/install_cuda_profiler_api.ps1cmake/NVBenchCUPTI.cmakecmake/NVBenchConfigTarget.cmakeexamples/CMakeLists.txtexec/CMakeLists.txtnvbench/CMakeLists.txtnvbench/config.cuh.intesting/CMakeLists.txttesting/axes_metadata.cutesting/cmake/CMakeLists.txttesting/cmake/test_export/CMakeLists.txttesting/device/CMakeLists.txt
💤 Files with no reviewable changes (1)
- .github/workflows/pr.yml
|
I was also able to build NVBench in conda (on machine where Visual Studio Community 2026 with build tools for 19.44 are installed): Build steps in condaTest run, conda envC:\Users\opavlyk\work\nvbench>ctest --test-dir build_conda
Test project C:/Users/opavlyk/work/nvbench/build_conda
Start 1: nvbench.ctl.no_args
1/52 Test #1: nvbench.ctl.no_args ........................... Passed 3.35 sec
Start 2: nvbench.ctl.version
2/52 Test #2: nvbench.ctl.version ........................... Passed 0.12 sec
Start 3: nvbench.ctl.list
3/52 Test #3: nvbench.ctl.list .............................. Passed 0.24 sec
Start 4: nvbench.ctl.l
4/52 Test #4: nvbench.ctl.l ................................. Passed 0.24 sec
Start 5: nvbench.ctl.help
5/52 Test #5: nvbench.ctl.help .............................. Passed 0.13 sec
Start 6: nvbench.ctl.h
6/52 Test #6: nvbench.ctl.h ................................. Passed 0.12 sec
Start 7: nvbench.ctl.help_axes
7/52 Test #7: nvbench.ctl.help_axes ......................... Passed 0.14 sec
Start 8: nvbench.ctl.help_axis
8/52 Test #8: nvbench.ctl.help_axis ......................... Passed 0.12 sec
Start 9: nvbench.example.cpp17.auto_throughput
9/52 Test #9: nvbench.example.cpp17.auto_throughput ......... Passed 1.84 sec
Start 10: nvbench.example.cpp17.axes
10/52 Test #10: nvbench.example.cpp17.axes .................... Passed 7.76 sec
Start 11: nvbench.example.cpp17.custom_criterion
11/52 Test #11: nvbench.example.cpp17.custom_criterion ........ Passed 1.99 sec
Start 12: nvbench.example.cpp17.cpu_only
12/52 Test #12: nvbench.example.cpp17.cpu_only ................ Passed 11.36 sec
Start 13: nvbench.example.cpp17.enums
13/52 Test #13: nvbench.example.cpp17.enums ................... Passed 2.33 sec
Start 14: nvbench.example.cpp17.exec_tag_sync
14/52 Test #14: nvbench.example.cpp17.exec_tag_sync ........... Passed 2.55 sec
Start 15: nvbench.example.cpp17.exec_tag_timer
15/52 Test #15: nvbench.example.cpp17.exec_tag_timer .......... Passed 1.96 sec
Start 16: nvbench.example.cpp17.skip
16/52 Test #16: nvbench.example.cpp17.skip .................... Passed 3.15 sec
Start 17: nvbench.example.cpp17.stream
17/52 Test #17: nvbench.example.cpp17.stream .................. Passed 1.94 sec
Start 18: nvbench.example.cpp17.summaries
18/52 Test #18: nvbench.example.cpp17.summaries ............... Passed 3.49 sec
Start 19: nvbench.example.cpp17.throughput
19/52 Test #19: nvbench.example.cpp17.throughput .............. Passed 2.02 sec
Start 20: nvbench.test.axes_metadata
20/52 Test #20: nvbench.test.axes_metadata .................... Passed 0.74 sec
Start 21: nvbench.test.benchmark
21/52 Test #21: nvbench.test.benchmark ........................ Passed 1.75 sec
Start 22: nvbench.test.create
22/52 Test #22: nvbench.test.create ........................... Passed 0.87 sec
Start 23: nvbench.test.cuda_timer
23/52 Test #23: nvbench.test.cuda_timer ....................... Passed 2.41 sec
Start 24: nvbench.test.cuda_stream
24/52 Test #24: nvbench.test.cuda_stream ...................... Passed 1.87 sec
Start 25: nvbench.test.cpu_timer
25/52 Test #25: nvbench.test.cpu_timer ........................ Passed 1.12 sec
Start 26: nvbench.test.criterion_manager
26/52 Test #26: nvbench.test.criterion_manager ................ Passed 0.93 sec
Start 27: nvbench.test.criterion_params
27/52 Test #27: nvbench.test.criterion_params ................. Passed 0.90 sec
Start 28: nvbench.test.custom_main_custom_args
28/52 Test #28: nvbench.test.custom_main_custom_args .......... Passed 1.82 sec
Start 29: nvbench.test.custom_main_custom_exceptions
29/52 Test #29: nvbench.test.custom_main_custom_exceptions .... Passed 1.81 sec
Start 30: nvbench.test.custom_main_global_state_raii
30/52 Test #30: nvbench.test.custom_main_global_state_raii .... Passed 1.82 sec
Start 31: nvbench.test.enum_type_list
31/52 Test #31: nvbench.test.enum_type_list ................... Passed 0.73 sec
Start 32: nvbench.test.entropy_criterion
32/52 Test #32: nvbench.test.entropy_criterion ................ Passed 0.81 sec
Start 33: nvbench.test.float64_axis
33/52 Test #33: nvbench.test.float64_axis ..................... Passed 0.96 sec
Start 34: nvbench.test.int64_axis
34/52 Test #34: nvbench.test.int64_axis ....................... Passed 0.95 sec
Start 35: nvbench.test.named_values
35/52 Test #35: nvbench.test.named_values ..................... Passed 0.91 sec
Start 36: nvbench.test.option_parser
36/52 Test #36: nvbench.test.option_parser .................... Passed 1.66 sec
Start 37: nvbench.test.range
37/52 Test #37: nvbench.test.range ............................ Passed 0.85 sec
Start 38: nvbench.test.reset_error
38/52 Test #38: nvbench.test.reset_error ...................... Passed 1.78 sec
Start 39: nvbench.test.ring_buffer
39/52 Test #39: nvbench.test.ring_buffer ...................... Passed 0.86 sec
Start 40: nvbench.test.runner
40/52 Test #40: nvbench.test.runner ........................... Passed 0.77 sec
Start 41: nvbench.test.state
41/52 Test #41: nvbench.test.state ............................ Passed 1.81 sec
Start 42: nvbench.test.statistics
42/52 Test #42: nvbench.test.statistics ....................... Passed 0.87 sec
Start 43: nvbench.test.state_generator
43/52 Test #43: nvbench.test.state_generator .................. Passed 1.05 sec
Start 44: nvbench.test.stdrel_criterion
44/52 Test #44: nvbench.test.stdrel_criterion ................. Passed 0.73 sec
Start 45: nvbench.test.string_axis
45/52 Test #45: nvbench.test.string_axis ...................... Passed 0.86 sec
Start 46: nvbench.test.type_axis
46/52 Test #46: nvbench.test.type_axis ........................ Passed 0.95 sec
Start 47: nvbench.test.type_list
47/52 Test #47: nvbench.test.type_list ........................ Passed 0.98 sec
Start 48: nvbench.test.cmake.test_export.build_tree
48/52 Test #48: nvbench.test.cmake.test_export.build_tree ..... Passed 89.16 sec
Start 50: nvbench.test.cmake.install_tree.install
49/52 Test #50: nvbench.test.cmake.install_tree.install ....... Passed 0.53 sec
Start 49: nvbench.test.cmake.test_export.install_tree
50/52 Test #49: nvbench.test.cmake.test_export.install_tree ... Passed 88.72 sec
Start 51: nvbench.test.cmake.install_tree.cleanup
51/52 Test #51: nvbench.test.cmake.install_tree.cleanup ....... Passed 0.08 sec
Start 52: nvbench.test.device.noisy_bench
52/52 Test #52: nvbench.test.device.noisy_bench ...............***Failed Error regular expression found in output. Regex=[Warn] 36.51 sec
98% tests passed, 1 tests failed out of 52
Total Test time (real) = 293.76 sec
The following tests FAILED:
52 - nvbench.test.device.noisy_bench (Failed)
Errors while running CTest
Output from these tests are in: C:/Users/opavlyk/work/nvbench/build_conda/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.
C:\Users\opavlyk\work\nvbench>conda env export name: ctk-13 channels: - conda-forge dependencies: - c-compiler=1.11.0=h528c1b4_0 - cuda-cccl_win-64=13.2.75=h57928b3_0 - cuda-compiler=13.2.1=h559df3f_0 - cuda-crt-dev_win-64=13.2.78=h57928b3_0 - cuda-crt-tools=13.2.78=h57928b3_0 - cuda-ctadvisor=13.2.78=hac47afa_0 - cuda-cudart=13.2.75=hac47afa_0 - cuda-cudart-dev=13.2.75=hac47afa_0 - cuda-cudart-dev_win-64=13.2.75=hac47afa_0 - cuda-cudart-static=13.2.75=hac47afa_0 - cuda-cudart-static_win-64=13.2.75=hac47afa_0 - cuda-cudart_win-64=13.2.75=hac47afa_0 - cuda-cuobjdump=13.2.78=hac47afa_0 - cuda-cupti=13.2.75=hac47afa_0 - cuda-cupti-dev=13.2.75=hac47afa_0 - cuda-cuxxfilt=13.2.78=hac47afa_0 - cuda-nvcc=13.2.78=h8f04d04_0 - cuda-nvcc-dev_win-64=13.2.78=h36c15f3_0 - cuda-nvcc-impl=13.2.78=h53cbb54_0 - cuda-nvcc-tools=13.2.78=he0c23c2_0 - cuda-nvcc_win-64=13.2.78=hd70436c_0 - cuda-nvdisasm=13.2.78=hac47afa_0 - cuda-nvml-dev=13.2.82=hac47afa_0 - cuda-nvprune=13.2.78=hac47afa_0 - cuda-nvvm-dev_win-64=13.2.78=h57928b3_0 - cuda-nvvm-impl=13.2.78=h2466b09_0 - cuda-nvvm-tools=13.2.78=h2466b09_0 - cuda-profiler-api=13.2.75=h57928b3_0 - cuda-tileiras=13.2.78=hac47afa_0 - cuda-version=13.2=he2cc418_3 - cxx-compiler=1.11.0=h1c1089f_0 - git=2.54.0=h57928b3_0 - libnvptxcompiler-dev=13.2.78=h57928b3_0 - libnvptxcompiler-dev_win-64=13.2.78=h57928b3_0 - ucrt=10.0.26100.0=h57928b3_0 - vc=14.5=h1b7c187_36 - vc14_runtime=14.51.36231=h1b9f54f_36 - vcomp14=14.51.36231=h1b9f54f_36 - vs2019_win-64=19.29.30139=h7dcff83_36 - vs2022_win-64=19.44.35207=ha74f236_36 - vswhere=3.1.7=h40126e0_1 prefix: C:\Users\opavlyk\miniforge\envs\ctk-13 |
|
/ok to test 2aaf76e |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
alliepiper
left a comment
There was a problem hiding this comment.
LGTM -- This can go in as is, my notes can be addressed as a followup.
It might be a good idea to point an agent at this PR and the following tag/commits to see if there's anything else that might be worth restoring. I can't remember if I mentioned these before:
tag pre_msvc_drop: https://github.com/NVIDIA/nvbench/blob/pre_msvc_drop
Commit that removed MSVC: 93ea533 (Part of #200)
I made the necessary changes for NVBench to build and the tests to pass. I also tested that it works when using NVBench inside of my own project, and made sure that the scripts
ci/build_nvbench.shandci/test_nvbench.shwork. How should support testing be enabled for the CI? I have a local commit with some changes, but did not want to add them here and possibly trip the CI run.Here is a summary of the changes, disclaimer that they were made by Claude and verified by me, but I am in no way a CMake expert:
#: 1
File: CMakeLists.txt
Change: CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS ON
Failure without it: Link error LNK1181: cannot open input file 'lib\nvbench.lib'. MSVC only generates a .lib import library when the DLL exports symbols. NVBench has no
__declspec(dllexport) annotations, so without this CMake flag, no import library is produced and all downstream targets fail to link.
#: 2
File: cmake/NVBenchCUPTI.cmake
Change: IMPORTED_IMPLIB instead of IMPORTED_LOCATION on Win32
Failure without it: CMake generate error IMPORTED_IMPLIB not set for imported target "nvbench::cupti". On Windows, find_library locates .lib import libraries. A SHARED IMPORTED
target
on Windows requires the .lib path via IMPORTED_IMPLIB (the import library), not IMPORTED_LOCATION (which expects the .dll).
#: 3
File: cmake/NVBenchConfigTarget.cmake
Change: FMT_UNICODE=0, -Xcompiler=/utf-8, --diag_suppress=27
Failure without it: Build errors in every .cu file. (a) fmtlib 11 static-asserts that /utf-8 mode is active — MSVC's host compiler satisfies this with -Xcompiler=/utf-8, but cudafe
evaluates the check independently and always fails, requiring FMT_UNICODE=0 for CUDA. (b) fmtlib's lookup tables use out-of-range char32_t sentinel values that cudafe rejects,
requiring --diag_suppress=27.
#: 4
File: cmake/NVBenchConfigTarget.cmake
Change: AND NOT WIN32 on INSTALL_RPATH
Failure without it: No failure. INSTALL_RPATH is a Unix/ELF concept silently ignored on Windows. The guard is purely a hygiene fix.
#: 5
File: nvbench/config.cuh.in
Change: MSVC_LANG instead of _cplusplus
Failure without it: Build error #error: "NVBench requires a C++17 compiler." in every .cxx file. MSVC reports __cplusplus as 199711L (C++98) regardless of actual standard, unless
/Zc:__cplusplus is passed. _MSVC_LANG always reflects the real standard level.
#: 6
File: testing/axes_metadata.cu
Change: #include
Failure without it: Build error namespace "std" has no member "back_inserter". MSVC's STL doesn't transitively include from like GCC's libstdc++ does.
#: 7
File: testing/cmake/CMakeLists.txt
Change: Forward CMAKE_CUDA_HOST_COMPILER, CMAKE_LINKER, CMAKE_RC_COMPILER, CMAKE_MT
Failure without it: Test failure CUDA_ARCHITECTURES is set to "native", but no NVIDIA GPU was detected. The sub-project cmake configure can't compile/link the GPU query program
#: 8
File: testing/cmake/CMakeLists.txt
Change: ENVIRONMENT "PATH=..." with nvbench bin + CUPTI lib dirs
Failure without it: No failure when run via the build script (which pre-sets PATH). Needed for robustness when ctest is invoked directly — the Windows equivalent of the
LD_LIBRARY_PATH setup the sub-project already has for Unix.
#: 9
File: testing/cmake/test_export/CMakeLists.txt
Change: Add Windows PATH setup for sub-project tests (parallel to existing Unix LD_LIBRARY_PATH)
Reason: The original code only set LD_LIBRARY_PATH on Unix and did nothing on Windows. The sub-project's test_bench.exe and nvbench-ctl.exe need
nvbench.dll and CUPTI DLLs at runtime. On Unix the build tree embeds RUNPATH into the binary so the executable finds libnvbench.so without environment
help; only CUPTI and the install tree need LD_LIBRARY_PATH. Windows has no RUNPATH equivalent — DLL lookup always goes through PATH — so the
sub-project
must set PATH for both tree types. Previously this worked only because the outer test in testing/cmake/CMakeLists.txt set an ENVIRONMENT property on
the ctest --build-and-test process, which the inner CTest happened to inherit. This fix makes the sub-project self-sufficient: it reads the nvbench DLL and CUPTI library locations from imported target properties and sets PATH itself. The shared code resolves the imported configuration once, then
branches only for the CUPTI property name (IMPORTED_IMPLIB on Windows vs IMPORTED_LOCATION on Unix, since find_library locates .lib import libraries on Windows) and the environment variable format.