Skip to content

Further testing windows#367

Closed
oleksandr-pavlyk wants to merge 56 commits into
NVIDIA:mainfrom
oleksandr-pavlyk:further-testing-windows
Closed

Further testing windows#367
oleksandr-pavlyk wants to merge 56 commits into
NVIDIA:mainfrom
oleksandr-pavlyk:further-testing-windows

Conversation

@oleksandr-pavlyk
Copy link
Copy Markdown
Collaborator

This is a temporary draft PR to work out kinks in Windows building.

mfranzrebsal and others added 22 commits May 13, 2026 10:40
Remove gate that disables Windows NVBench build job in pr.yaml
Attempt to fix "LINK : fatal error LNK1561: entry point must be defined"
when building benchmarks which need main function provided by static
library libnvbench_main after NVIDIA#350
Act on review feedback regarding corner cases when testing
may dependent on the directory it is performed from
4xx responses now fail immediately, and the installer is bounded
to 15 minutes before being killed and reported as a timeout.
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 18, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@oleksandr-pavlyk
Copy link
Copy Markdown
Collaborator Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 18, 2026

✅ Actions performed

Full review triggered.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 18, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 17b4323f-cac2-4617-b9b6-f340698ca4bb

📥 Commits

Reviewing files that changed from the base of the PR and between ce75dab and 9486121.

📒 Files selected for processing (16)
  • .github/workflows/build-windows.yml
  • .github/workflows/pr.yml
  • CMakeLists.txt
  • ci/windows/build_nvbench.ps1
  • ci/windows/install_cuda_profiler_api.ps1
  • cmake/NVBenchCUPTI.cmake
  • cmake/NVBenchConfigTarget.cmake
  • examples/CMakeLists.txt
  • exec/CMakeLists.txt
  • nvbench/CMakeLists.txt
  • nvbench/config.cuh.in
  • testing/CMakeLists.txt
  • testing/axes_metadata.cu
  • testing/cmake/CMakeLists.txt
  • testing/cmake/test_export/CMakeLists.txt
  • testing/device/CMakeLists.txt
💤 Files with no reviewable changes (1)
  • .github/workflows/pr.yml

📝 Walkthrough

Summary by CodeRabbit

Release Notes

  • New Features

    • Enhanced Windows build validation with CUDA toolkit version verification.
    • Improved CUPTI runtime library detection for Windows environments.
  • Build & Infrastructure

    • Added Windows shared library symbol export support.
    • Improved C++ standard compliance and MSVC preprocessor handling.
    • Refined test runtime environment configuration across platforms.

Walkthrough

PR enables Windows CI for nvbench by adding validated CUDA/std/arch workflow inputs, introducing a CUDA profiler API installer script, handling Windows CUPTI runtime DLL resolution, adjusting MSVC compiler/linker settings, and wiring test runtime environments across multiple test suites.

Changes

Windows Build, CUDA Profiler, and Test Runtime

Layer / File(s) Summary
CI workflow: PR job enablement and input validation
.github/workflows/pr.yml, .github/workflows/build-windows.yml
Enable nvbench-windows job; add validation step for CUDA major.minor format, std ∈ {17,20}, and arch normalization; update description text; rewire Build NVBench step to consume validated outputs; inject install_cuda_profiler_api.ps1 invocation into generated container script; propagate NVBENCH_WINDOWS_CUDA, NVBENCH_WINDOWS_STD, NVBENCH_WINDOWS_ARCH into container environment.
CUDA profiler API installer
ci/windows/install_cuda_profiler_api.ps1
New 519-line PowerShell script to install cuda_profiler_api.h for requested CUDA major.minor: validate input, detect CUDA root via nvcc.exe and CUDA_PATH, download/parse NVIDIA redist index HTML, enumerate candidate redistrib manifests, select best manifest via exact cuda_profiler_api version or component consistency matching, download archive with retry, verify SHA256, extract, and install with final verification.
Windows CMake build and CUPTI runtime discovery
CMakeLists.txt, cmake/NVBenchCUPTI.cmake
Enable CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS for Windows shared-library builds; add nvbench_cupti_library_hints from cupti_root with Windows-specific lib/x64 and lib paths; introduce nvbench_find_windows_cupti_runtime_library helper to locate runtime DLLs from import-library paths with cupti64_*.dllcupti.dll fallback; update nvbench_add_cupti_dep to set both IMPORTED_IMPLIB and IMPORTED_LOCATION on Windows.
MSVC compile/link adjustments
cmake/NVBenchConfigTarget.cmake, nvbench/CMakeLists.txt, nvbench/config.cuh.in
Add MSVC-hosted-CUDA compile flags: FMT_UNICODE=0, -Xcompiler=/utf-8, and three -Xcudafe=--diag_suppress entries; use _MSVC_LANG for NVBENCH_CPLUSPLUS when available; add /Zc:preprocessor to nvbench with CUDA forward; add /INCLUDE:main link option to nvbench.main on MSVC; restrict CUPTI INSTALL_RPATH to non-Windows platforms.
CMake test runtime helpers
cmake/NVBenchConfigTarget.cmake
Add nvbench_append_test_runtime_path helper; implement nvbench_config_test_runtime_environment to return immediately on non-Windows; on Windows, accumulate PATH-prepend modifications for nvbench, CUPTI, and nvperf imported targets, de-duplicate, and append as ENVIRONMENT_MODIFICATION.
Test export runtime and CMake version
testing/cmake/test_export/CMakeLists.txt
Bump CMake minimum from 3.20.1 to 3.22.0; add get_imported_location helper to resolve config-specific imported-target locations with fallback; refactor runtime directory derivation to compute nvbench path only on WIN32 or INSTALL_TREE, and CUPTI path only when target exists; replace direct LD_LIBRARY_PATH wiring with cross-platform ENVIRONMENT_MODIFICATION using path_list_prepend for Windows and non-Windows runtime library paths.
Apply runtime environment to tests
examples/CMakeLists.txt, exec/CMakeLists.txt, testing/CMakeLists.txt, testing/device/CMakeLists.txt, testing/axes_metadata.cu, testing/cmake/CMakeLists.txt
Call nvbench_config_test_runtime_environment for example, CLI (nvbench.ctl variants), testing, and device tests after registration; add <iterator> header to testing/axes_metadata.cu; extend testing/cmake with Windows-specific CMake CMAKE_CUDA_HOST_COMPILER, CMAKE_LINKER, CMAKE_RC_COMPILER, CMAKE_MT forwarding; add Windows-only per-test ENVIRONMENT_MODIFICATION prepending NVBench_EXECUTABLE_OUTPUT_DIR and CUPTI target directory to PATH.
Windows build script device-testing option
ci/windows/build_nvbench.ps1
Add optional DEVICE_TESTING boolean parameter (default false, aliased as device-testing); compute "ON"/"OFF" string and forward into CMake as -DNVBench_ENABLE_DEVICE_TESTING.

Possibly related PRs

  • NVIDIA/nvbench#362: Introduced the Windows CI workflow and build-container setup that this PR extends with profiler-API installation, improved input validation, and runtime wiring.

Suggested labels

type: enhancement

Suggested reviewers

  • alliepiper
  • gevtushenko

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
.github/workflows/build-windows.yml (1)

122-122: ⚡ Quick win

suggestion: add a fail-fast validation step for inputs.cuda, inputs.std, and inputs.arch before pulling/running the container so malformed values fail immediately instead of after expensive setup.

As per coding guidelines, for GitHub workflows you should focus on clear failures and avoiding unnecessary CI fanout/expense.

Also applies to: 135-136


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 54c447f2-d3e8-46bf-ae59-c3f64d071670

📥 Commits

Reviewing files that changed from the base of the PR and between ce75dab and cb00417.

📒 Files selected for processing (11)
  • .github/workflows/build-windows.yml
  • .github/workflows/pr.yml
  • CMakeLists.txt
  • ci/windows/install_cuda_profiler_api.ps1
  • cmake/NVBenchCUPTI.cmake
  • cmake/NVBenchConfigTarget.cmake
  • nvbench/CMakeLists.txt
  • nvbench/config.cuh.in
  • testing/axes_metadata.cu
  • testing/cmake/CMakeLists.txt
  • testing/cmake/test_export/CMakeLists.txt
💤 Files with no reviewable changes (1)
  • .github/workflows/pr.yml

Comment thread ci/windows/install_cuda_profiler_api.ps1 Outdated
Comment thread testing/cmake/test_export/CMakeLists.txt Outdated
@oleksandr-pavlyk
Copy link
Copy Markdown
Collaborator Author

oleksandr-pavlyk commented May 18, 2026

Building

Using Visual Studio Community 2026 and CUDA Toolkit 13.2 on Windows 11, I used the following commands to build and test NVBench:

git clone https://github.com/oleksandr-pavlyk/nvbench -b further-testing-windows
cd nvbench
cmake -B build -G Ninja --preset nvbench-dev -DCMAKE_CXX_FLAGS="/Zc:preprocessor" -DCMAKE_CUDA_FLAGS="-Xcompiler=/Zc:preprocessor"

Setting of "/Zc:preprocessor" is made necessary to work around fatal error C1189 tripped in libcudac++:

› C:\PROGRA~1\NVIDIA~2\CUDA\v13.2\bin/../include/cccl\cuda/std/__cccl/preprocessor.h(20): fatal error C1189: #error:  MSVC/cl.exe with traditional preprocessor is
  used. This may lead to unexpected compilation errors. Please switch to the standard conforming preprocessor by passing `/Zc:preprocessor` to cl.exe. You can
  define CCCL_IGNORE_MSVC_TRADITIONAL_PREPROCESSOR_WARNING to suppress this warning.
  exec_tag_timer.cu

Running tests

Executing ctest --test-dir build immediately runs into cupti64_2026.1.1.dll not found. Adding the necessary folder to PATH helped resolve this:

set "PATH=%PATH%;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.2\extras\CUPTI\lib64"
ctest --test-dir build

Tests pass, except the expected noisy_bench, which requires GPU frequency to be fixed:

<<< -- output elided -- >>>
52/52 Test #52: nvbench.test.device.noisy_bench ...............***Failed  Error regular expression found in output. Regex=[Warn] 43.16 sec

98% tests passed, 1 tests failed out of 52

Total Test time (real) = 218.04 sec

The following tests FAILED:
         52 - nvbench.test.device.noisy_bench (Failed)
Errors while running CTest
Output from these tests are in: C:/Users/opavlyk/work/nvbench/build/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.

@oleksandr-pavlyk
Copy link
Copy Markdown
Collaborator Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 18, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5ef9adff-1e21-4204-a389-ea494b62f9a7

📥 Commits

Reviewing files that changed from the base of the PR and between ce75dab and dd0a9b0.

📒 Files selected for processing (17)
  • .github/workflows/build-windows.yml
  • .github/workflows/pr.yml
  • CMakeLists.txt
  • ci/windows/build_common.psm1
  • ci/windows/build_nvbench.ps1
  • ci/windows/install_cuda_profiler_api.ps1
  • cmake/NVBenchCUPTI.cmake
  • cmake/NVBenchConfigTarget.cmake
  • examples/CMakeLists.txt
  • exec/CMakeLists.txt
  • nvbench/CMakeLists.txt
  • nvbench/config.cuh.in
  • testing/CMakeLists.txt
  • testing/axes_metadata.cu
  • testing/cmake/CMakeLists.txt
  • testing/cmake/test_export/CMakeLists.txt
  • testing/device/CMakeLists.txt
💤 Files with no reviewable changes (1)
  • .github/workflows/pr.yml

Comment thread .github/workflows/build-windows.yml Outdated
Comment thread ci/windows/install_cuda_profiler_api.ps1 Outdated
@oleksandr-pavlyk
Copy link
Copy Markdown
Collaborator Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 18, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
ci/windows/build_nvbench.ps1 (1)

33-37: ⚡ Quick win

suggestion: Line 36 forces -DNVBench_ENABLE_DEVICE_TESTING=ON for every invocation. Make this configurable (for example, a script parameter with default OFF) so CPU-oriented CI jobs only enable device-test targets when explicitly requested. As per coding guidelines, for ci/**/*: “focus on ... targeted build/test behavior, ... GPU availability assumptions, ... and avoiding unnecessary expensive jobs.”

nvbench/CMakeLists.txt (1)

143-146: ⚡ Quick win

suggestion: use LINKER:/INCLUDE:main instead of raw /INCLUDE:main for proper CUDA link driver routing

When nvcc drives the link step for CUDA-linked consumers, the raw option may not be correctly passed to the host linker, potentially causing the flag to be ignored or build failures. CMake's LINKER: prefix provides proper abstraction-layer handling across toolchains, ensuring this flag reaches the linker correctly.


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 2ba6dc46-23fc-4217-8363-8149a60b269c

📥 Commits

Reviewing files that changed from the base of the PR and between ce75dab and 697f387.

📒 Files selected for processing (17)
  • .github/workflows/build-windows.yml
  • .github/workflows/pr.yml
  • CMakeLists.txt
  • ci/windows/build_common.psm1
  • ci/windows/build_nvbench.ps1
  • ci/windows/install_cuda_profiler_api.ps1
  • cmake/NVBenchCUPTI.cmake
  • cmake/NVBenchConfigTarget.cmake
  • examples/CMakeLists.txt
  • exec/CMakeLists.txt
  • nvbench/CMakeLists.txt
  • nvbench/config.cuh.in
  • testing/CMakeLists.txt
  • testing/axes_metadata.cu
  • testing/cmake/CMakeLists.txt
  • testing/cmake/test_export/CMakeLists.txt
  • testing/device/CMakeLists.txt
💤 Files with no reviewable changes (1)
  • .github/workflows/pr.yml

Comment thread testing/cmake/test_export/CMakeLists.txt Outdated
@oleksandr-pavlyk
Copy link
Copy Markdown
Collaborator Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 18, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
.github/workflows/build-windows.yml (1)

152-153: ⚡ Quick win

suggestion: Deduplicate WINDOWS_CI_IMAGE construction. The same expression is defined twice (Line 152 and Line 161), which can drift and make docker pull and docker run target different tags later. Compute it once (job env or a setup step output) and reuse it in both steps.

As per coding guidelines, for .github/**/*: “focus on ... status/check behavior ... and avoiding unnecessary CI fanout.”

Also applies to: 161-162


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 7533a7c2-7b0b-4e42-aeac-cfdeb8e003e6

📥 Commits

Reviewing files that changed from the base of the PR and between ce75dab and 9bfe117.

📒 Files selected for processing (17)
  • .github/workflows/build-windows.yml
  • .github/workflows/pr.yml
  • CMakeLists.txt
  • ci/windows/build_common.psm1
  • ci/windows/build_nvbench.ps1
  • ci/windows/install_cuda_profiler_api.ps1
  • cmake/NVBenchCUPTI.cmake
  • cmake/NVBenchConfigTarget.cmake
  • examples/CMakeLists.txt
  • exec/CMakeLists.txt
  • nvbench/CMakeLists.txt
  • nvbench/config.cuh.in
  • testing/CMakeLists.txt
  • testing/axes_metadata.cu
  • testing/cmake/CMakeLists.txt
  • testing/cmake/test_export/CMakeLists.txt
  • testing/device/CMakeLists.txt
💤 Files with no reviewable changes (1)
  • .github/workflows/pr.yml

Comment thread cmake/NVBenchConfigTarget.cmake Outdated
Comment thread testing/cmake/CMakeLists.txt Outdated
Comment thread testing/cmake/test_export/CMakeLists.txt Outdated
Check before executing cmake_path() in testing/cmake/CMakeLists.txt
Also, use nvbench_get_imported_location to extract imported location
Remove the configure-time imported-location helper entirely.
@oleksandr-pavlyk
Copy link
Copy Markdown
Collaborator Author

oleksandr-pavlyk commented May 18, 2026

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 18, 2026

✅ Actions performed

Full review triggered.

@oleksandr-pavlyk
Copy link
Copy Markdown
Collaborator Author

/ok to test 9486121

@oleksandr-pavlyk
Copy link
Copy Markdown
Collaborator Author

The Code Rabbit is happy, local builds succeeded and tests passed on both Windows and Linux.

I am squashing all these commits and pushing them to #354 for final review.

Closing this PR now.

@github-project-automation github-project-automation Bot moved this from In Progress to Done in CCCL May 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

2 participants