Skip to content

Make SCCG test path CPU, CUDA, and HIP capable#422

Merged
shakedregev merged 11 commits into
andrew/sccg-and-matrix-classfrom
tamar/sccg-gpu-cpu
May 29, 2026
Merged

Make SCCG test path CPU, CUDA, and HIP capable#422
shakedregev merged 11 commits into
andrew/sccg-and-matrix-classfrom
tamar/sccg-gpu-cpu

Conversation

@tamar-dewilde
Copy link
Copy Markdown
Collaborator

@tamar-dewilde tamar-dewilde commented May 25, 2026

Description

This PR addresses the SCCG work for HyKKT from #335 by making the test path work on CPU and CUDA builds. The main issue was that SCCG owned a compile-time-selected workspace internally that caused the CPU test path to construct CUDA workspace resources and segfault during cleanup. The SCCG test also needed to avoid loading file backed matrix and vector data directly into device memory. This change keeps backend ownership with the caller by using caller-provided MatrixHandler and VectorHandler instances and updates the SCCG test setup so file backed data is loaded on host before the CUDA backend is used. No additional dependencies are required for this change.

Related to #335.

Proposed changes

  • Updated SCCG to use caller-provided backend-specific MatrixHandler and VectorHandler instances instead of owning an internal compile-time-selected workspace.
  • Updated the SCCG test to use the backend-specific handlers provided by the test runner instead of constructing a CPU-specific handler internally.
  • Updated SCCG test data setup so file-backed matrix/vector data is loaded on host before being used with the CUDA backend.
  • Verified the targeted hykkt_sccg_test locally in both CPU and CUDA builds.
  • Ran additional CUDA tests after the SCCG fix to check that the change did not introduce broader CUDA test regressions.

These changes should be accepted because they keep backend ownership outside of SCCG and allow the solver/test path to use the existing Re::Solve matrix and vector handler abstractions. This avoids duplicating backend logic inside SCCG and keeps the implementation consistent with the existing CPU/CUDA handler pattern. The data-loading change also keeps file I/O host-side, while still allowing the test to exercise CUDA execution after setup.

Checklist

  • All tests pass (make test and make test_install per testing instructions). Code tested on
    • CPU backend
    • CUDA backend
    • HIP backend
  • I have manually run the non-experimental examples and verified that residuals are close to machine precision. (In your build directory run: ./examples/<your_example>.exe -h to get instructions how to run examples). Code tested on:
    • CPU backend
    • CUDA backend
    • HIP backend
  • Code compiles cleanly with flags -Wall -Wpedantic -Wconversion -Wextra.
  • The new code follows Re::Solve style guidelines.
  • There are unit tests for the new code.
  • The new code is documented.
  • The feature branch is rebased with respect to the target branch.
  • I have updated CHANGELOG.md to reflect the changes in this PR. If this is a minor PR that is part of a larger fix already included in the file, state so.

Further comments

I tested the HyKKT SCCG test locally in both CPU and CUDA builds. I did not test the HIP backend locally.

  • CPU: ctest -R hykkt_sccg_test --output-on-failure passed in build-cpu
  • CUDA: ctest -R hykkt_sccg_test --output-on-failure passed in build-cuda

@tamar-dewilde tamar-dewilde requested a review from shakedregev May 25, 2026 20:09
@pelesh pelesh self-requested a review May 25, 2026 22:39
@pelesh pelesh added enhancement New feature or request hip cuda labels May 25, 2026
@pelesh pelesh added this to the HyKKT milestone May 25, 2026
@pelesh pelesh marked this pull request as draft May 25, 2026 22:40
@pelesh
Copy link
Copy Markdown
Collaborator

pelesh commented May 25, 2026

Set to draft as it is not ready to merge.

Copy link
Copy Markdown
Collaborator

@shakedregev shakedregev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work on this. Implement HIP as well and I will test and merge if it all works.
PS: Document the code as well.

@tamar-dewilde
Copy link
Copy Markdown
Collaborator Author

Pushed the HIP update and Doxygen-style documentation. CPU/CUDA SCCG tests and pre-commit pass locally. I can’t test HIP locally, but the HIP SpMV cache reset logic now mirrors the CUDA path.

@tamar-dewilde tamar-dewilde changed the title Make SCCG test path CPU and CUDA capable Make SCCG test path CPU, CUDA, and HIP capable May 26, 2026
@shakedregev
Copy link
Copy Markdown
Collaborator

Good work on this! Tests pass on Frontier, but there are some build warnings:
/ccs/home/regevs/ReSolve_dir/ReSolve/tests/unit/hykkt/HykktSCCGTests.hpp:156:43: warning: unused parameter 'x0' [-Wunused-parameter] 156 | bool validateResult(vector::Vector* x0, real_type tol) | ^ /ccs/home/regevs/ReSolve_dir/ReSolve/tests/unit/hykkt/HykktSCCGTests.hpp:156:57: warning: unused parameter 'tol' [-Wunused-parameter] 156 | bool validateResult(vector::Vector* x0, real_type tol) | ^ 2 warnings generated when compiling for gfx90a. In file included from /ccs/home/regevs/ReSolve_dir/ReSolve/tests/unit/hykkt/runHykktSCCGTests.cpp:20: /ccs/home/regevs/ReSolve_dir/ReSolve/tests/unit/hykkt/HykktSCCGTests.hpp:156:43: warning: unused parameter 'x0' [-Wunused-parameter] 156 | bool validateResult(vector::Vector* x0, real_type tol) | ^ /ccs/home/regevs/ReSolve_dir/ReSolve/tests/unit/hykkt/HykktSCCGTests.hpp:156:57: warning: unused parameter 'tol' [-Wunused-parameter] 156 | bool validateResult(vector::Vector* x0, real_type tol)

and

/ccs/home/regevs/ReSolve_dir/ReSolve/resolve/hykkt/sccg/SchurComplementConjugateGradient.cpp:29:9: warning: initializer order does not match the declaration order [-Wreorder-ctor] 28 | choleskySolver_(choleskySolver), | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | matrixhandler_(matrixHandler) 29 | memspace_(memspace), | ^~~~~~~~~~~~~~~~~~~ | vectorhandler_(vectorHandler) 30 | mem_(), | ~~~~~~ | choleskySolver_(choleskySolver) 31 | matrixhandler_(matrixHandler), | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | y_(m_) 32 | vectorhandler_(vectorHandler), | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | z_(m_) 33 | y_(m_), | ~~~~~~ | r_(n_) 34 | z_(m_), | ~~~~~~ | p_(n_) 35 | r_(n_), | ~~~~~~ | s_(n_) 36 | p_(n_), | ~~~~~~ | w_(n_) 37 | s_(n_), | ~~~~~~ | mem_() 38 | w_(n_) | ~~~~~~ | memspace_(memspace) /ccs/home/regevs/ReSolve_dir/ReSolve/resolve/hykkt/sccg/SchurComplementConjugateGradient.cpp:29:9: note: field 'memspace_' will be initialized after field 'mem_' 29 | memspace_(memspace), | ^~~~~~~~~~~~~~~~~~~ /ccs/home/regevs/ReSolve_dir/ReSolve/resolve/hykkt/sccg/SchurComplementConjugateGradient.cpp:30:9: note: field 'mem_' will be initialized after field 'matrixhandler_' 30 | mem_(), | ^~~~~~ In file included from /ccs/home/regevs/ReSolve_dir/ReSolve/resolve/hykkt/sccg/SchurComplementConjugateGradient.cpp:1: /ccs/home/regevs/ReSolve_dir/ReSolve/resolve/hykkt/sccg/SchurComplementConjugateGradient.hpp:82:27: warning: private field 'mem_' is not used [-Wunused-private-field] 82 | MemoryHandler mem_; | ^ 2 warnings generated when compiling for gfx90a. /ccs/home/regevs/ReSolve_dir/ReSolve/resolve/hykkt/sccg/SchurComplementConjugateGradient.cpp:29:9: warning: initializer order does not match the declaration order [-Wreorder-ctor] 28 | choleskySolver_(choleskySolver), | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | matrixhandler_(matrixHandler) 29 | memspace_(memspace), | ^~~~~~~~~~~~~~~~~~~ | vectorhandler_(vectorHandler) 30 | mem_(), | ~~~~~~ | choleskySolver_(choleskySolver) 31 | matrixhandler_(matrixHandler), | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | y_(m_) 32 | vectorhandler_(vectorHandler), | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | z_(m_) 33 | y_(m_), | ~~~~~~ | r_(n_) 34 | z_(m_), | ~~~~~~ | p_(n_) 35 | r_(n_), | ~~~~~~ | s_(n_) 36 | p_(n_), | ~~~~~~ | w_(n_) 37 | s_(n_), | ~~~~~~ | mem_() 38 | w_(n_) | ~~~~~~ | memspace_(memspace) /ccs/home/regevs/ReSolve_dir/ReSolve/resolve/hykkt/sccg/SchurComplementConjugateGradient.cpp:29:9: note: field 'memspace_' will be initialized after field 'mem_' 29 | memspace_(memspace), | ^~~~~~~~~~~~~~~~~~~ /ccs/home/regevs/ReSolve_dir/ReSolve/resolve/hykkt/sccg/SchurComplementConjugateGradient.cpp:30:9: note: field 'mem_' will be initialized after field 'matrixhandler_' 30 | mem_(), | ^~~~~~ In file included from /ccs/home/regevs/ReSolve_dir/ReSolve/resolve/hykkt/sccg/SchurComplementConjugateGradient.cpp:1: /ccs/home/regevs/ReSolve_dir/ReSolve/resolve/hykkt/sccg/SchurComplementConjugateGradient.hpp:82:27: warning: private field 'mem_' is not used [-Wunused-private-field] 82 | MemoryHandler mem_;

Should be straightforward. Just remove what you don't need.

Comment thread tests/unit/hykkt/HykktSCCGTests.hpp Outdated
vector::Vector* b = new vector::Vector(n);
b->allocate(memspace_);
b->allocate(memory::HOST);
io::updateVectorFromFile(bFile, b);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_ for variable names convention

Comment on lines +55 to +56
MatrixHandler& matrixhandler_; ///< Backend-specific matrix handler.
VectorHandler& vectorhandler_; ///< Backend-specific vector handler.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

follow naming conventions with _

Comment thread tests/unit/hykkt/runHykktSCCGTests.cpp Outdated
Comment on lines +37 to +38
ReSolve::VectorHandler vectorHandler(&workspace);
ReSolve::tests::HykktSchurComplementConjugateGradientTests test(memspace, matrixHandler, vectorHandler);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

follow naming conventions

tamar-dewilde and others added 3 commits May 27, 2026 12:45
Fixed stochastic failures due to a faulty random number generator and asynchronous operations.

Co-authored-by: shakedregev <shakedregev@users.noreply.github.com>
@shakedregev shakedregev self-requested a review May 28, 2026 13:33
@shakedregev shakedregev marked this pull request as ready for review May 28, 2026 13:33
Copy link
Copy Markdown
Collaborator

@shakedregev shakedregev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work.

@shakedregev shakedregev merged commit 1230d1d into andrew/sccg-and-matrix-class May 29, 2026
6 checks passed
@shakedregev shakedregev deleted the tamar/sccg-gpu-cpu branch May 29, 2026 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda enhancement New feature or request hip

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants