diff --git a/CHANGELOG.md b/CHANGELOG.md index ad503b2e..62cf8b82 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -22,6 +22,8 @@ - Improved coding guidelines for developers on floating point conventions. +- Added Developer Guide documentation for vector and matrix classes and handlers. + ## Changes to Re::Solve in release 0.99.2 ### Major Features diff --git a/docs/sphinx/developer_guide/index.rst b/docs/sphinx/developer_guide/index.rst index fa2022ff..59a6975d 100644 --- a/docs/sphinx/developer_guide/index.rst +++ b/docs/sphinx/developer_guide/index.rst @@ -17,3 +17,4 @@ that they are consistently applied. build_system coding_guidelines documentation + vector_matrix_handlers diff --git a/docs/sphinx/developer_guide/vector_matrix_handlers.rst b/docs/sphinx/developer_guide/vector_matrix_handlers.rst new file mode 100644 index 00000000..c59f67cc --- /dev/null +++ b/docs/sphinx/developer_guide/vector_matrix_handlers.rst @@ -0,0 +1,454 @@ +Vector and Matrix Classes and Handlers +====================================== + +Background +---------- + +The purpose of this page is to help new developers understand how Re::Solve +separates data containers, operation handlers, backend workspaces, and memory +spaces. In particular, it explains the difference between vector and matrix +classes and the ``VectorHandler`` and ``MatrixHandler`` classes that operate on +them. + +The main distinction is that vector and matrix classes store data, while +handler classes perform operations on that data. This distinction is important +when writing code that needs to port to different backends run (e.g. CPU, CUDA, and HIP). + +This separation allows solver logic to remain independent of backend-specific +vector and matrix operations. + +The main questions this page is meant to answer are: + +* What object stores the data? +* Where does the data live? +* What object performs the operation? +* What backend resources does the operation need? +* What is the difference between a vector or matrix class and a vector or + matrix handler? +* What needs to happen when data is loaded on the host but used on the device? + +This page is not meant to document every method in detail. It is meant to give +a practical mental model for reading and writing backend-capable Re::Solve code. + +Core Design +----------- + +The main design idea is that Re::Solve separates storage, operations, and +backend resources. + +The major pieces are: + +* ``vector::Vector`` objects store vector data. +* Matrix objects, such as ``matrix::Csr``, store sparse matrix data. +* ``VectorHandler`` objects perform vector operations. +* ``MatrixHandler`` objects perform matrix operations. +* ``LinAlgWorkspace`` objects provide backend-specific resources for the + handlers. +* ``memory::HOST`` describes data stored in host-accessible memory. +* ``memory::DEVICE`` describes data stored in device-accessible memory. + +This means that a vector or matrix object is not automatically a CPU or GPU +operation. The data object stores the values. The handler performs the +operation. The workspace gives the handler the backend resources it needs. + +This separation helps the same solver path run with different backend +implementations. + +Vector Objects +-------------- + +A ``vector::Vector`` object represents vector data. The vector object is a data +container. It stores the size of the vector and the data associated with that +vector. Before a vector is used, it must be allocated in a memory space. + +Simplified example: + +.. code:: cpp + + vector::Vector* x = new vector::Vector(n); + x->allocate(memory::HOST); + +If the vector is loaded or initialized on the host but later used by a GPU +backend, the data may need to be synchronized to the device. + +Simplified example: + +.. code:: cpp + + if (memspace == memory::DEVICE) + { + x->syncData(memory::DEVICE); + } + +The important distinction is that allocation and operation are separate steps. +Allocating the vector controls where the data is stored. Calling a handler +method controls what operation is performed on the data. + +This is useful because the same vector object may be part of a CPU test path or +a GPU test path, depending on how it is allocated, synchronized, and passed to +backend-specific operations. + +Matrix Objects +-------------- + +Matrix objects represent matrix data. Like vector objects, matrix objects are +data containers. They store or describe the matrix data, but they do not perform +matrix operations by themselves. + +Sparse matrices are commonly stored in compressed sparse formats, such as CSR +(compressed sparse row) and CSC (compressed sparse column). These formats store +only the nonzero values of a sparse matrix along with index information that +describes where those values belong. + +In Re::Solve, matrix objects such as ``matrix::Csr`` store sparse matrix data. +A CSR matrix stores the matrix dimensions, nonzero count, and sparse matrix +data. Like vectors, a matrix object must be allocated in a memory space before +it is used. + +Simplified example: + +.. code:: cpp + + matrix::Csr* A = new matrix::Csr(num_rows, num_cols, nnz); + A->allocateMatrixData(memory::HOST); + +In file-loading paths, matrix data may need to be loaded into host memory +first. For example, Matrix Market file readers write into host-accessible +memory. If the test is running on a GPU backend, the matrix can then be +synchronized to device memory. + +SCCG test path example: + +.. code:: cpp + + matrix::Csr* h = new matrix::Csr(2278, 2278, 11304, true, false); + h->allocateMatrixData(memory::HOST); + io::updateMatrixFromFile(h_file, h); + + if (memspace_ == memory::DEVICE) + { + h->syncData(memory::DEVICE); + } + +This pattern matters because the memory space used for loading data is not +always the same as the memory space used for computation. + +Vector Handlers +--------------- + +A ``VectorHandler`` performs operations on ``vector::Vector`` objects. It does not replace +the vector class. Instead, it provides backend-specific operations +that act on existing vector data. + +A ``VectorHandler`` may perform operations such as: + +* ``dot`` +* ``scal`` +* ``axpy`` + +A useful way to think about the difference is: + +* ``vector::Vector`` stores the vector data. +* ``VectorHandler`` performs vector operations on that data. + +A useful way to separate the roles is that ``vector::Vector`` stores the data, while ``VectorHandler`` performs operations on that data. +For example, a vector object may hold the entries of a residual vector, while a +vector handler may compute a dot product, scale the vector, or add one vector +to another. + +Matrix Handlers +--------------- + +A ``MatrixHandler`` performs operations on matrix objects such as ``matrix::Csr``. It does not replace +the matrix class. Instead, it provides backend-specific matrix operations that +act on existing matrix data. + +A ``MatrixHandler`` may perform operations such as: + +* ``matvec`` +* ``transpose`` + +A useful way to separate the roles is that ``matrix::Csr`` stores the data, while ``MatrixHandler`` performs matrix operations on that data. +For example, a matrix object may hold the CSR representation of a sparse +matrix, while a matrix handler may perform a sparse matrix-vector product or +construct a transpose. + +Handler Setup +------------- + +Handlers are created using a workspace for the selected backend. A simplified +setup pattern is: + +.. code:: cpp + + WorkspaceType workspace; + workspace.initializeHandles(); + + MatrixHandler matrix_handler(&workspace); + VectorHandler vector_handler(&workspace); + +The handler uses the workspace that was created for the selected backend. This +is why backend-capable solver code should generally receive the correct +handlers from the caller instead of creating a hard-coded CPU, CUDA, or HIP +handler internally. + +Workspaces +---------- + +Workspace classes provide the backend-specific resources needed by handlers. A +CPU workspace, CUDA workspace, and HIP workspace may initialize different +backend handles or library resources. + +The general setup is: + +1. Create the workspace for the selected backend. +2. Initialize the workspace handles. +3. Create matrix and vector handlers using that workspace. +4. Pass those handlers into the solver or test fixture. + +Simplified SCCG setup example: + +.. code:: cpp + + WorkspaceType workspace; + workspace.initializeHandles(); + + MatrixHandler matrix_handler(&workspace); + VectorHandler vector_handler(&workspace); + + HykktSchurComplementConjugateGradientTests test(memspace, + matrix_handler, + vector_handler); + +This keeps the solver or test fixture from being tied to only one backend. + +Principle of Operation +---------------------- + +The basic flow for backend-capable code is: + +1. Create or load vector and matrix data. +2. Allocate that data in the correct memory space. +3. If data is loaded on the host and used on the device, synchronize it to the + device. +4. Create the backend workspace. +5. Create handlers from that workspace. +6. Pass the handlers into the solver or test path. +7. Use the handlers to perform vector and matrix operations. + +This flow keeps the data, operation, and backend setup separate. It also makes +it easier to identify whether a problem is caused by data storage, memory +movement, backend setup, or the solver algorithm itself. + +Re::Solve Context +----------------- + +Re::Solve examples are designed around repeated linear solver use cases. The +public Re::Solve documentation describes examples that emulate a nonlinear +solver calling the linear solver repeatedly. This matters because repeated +solver calls can make setup cost, memory movement, and backend resource +management important. + +The public HyKKT documentation describes HyKKT as a solver for +Karush-Kuhn-Tucker systems that can use hardware accelerators efficiently. The +HyKKT description also explains that the solver uses block reduction and +conjugate gradient on the Schur complement. + +This background is useful for understanding why the SCCG path needs careful +handling of matrix dimensions, memory spaces, and backend-specific handlers. + +SCCG Example +------------ + +SCCG stands for Schur Complement Conjugate Gradient. The SCCG test path is a +useful example because it uses vector objects, matrix objects, vector handlers, +matrix handlers, workspaces, and memory spaces together. + +In the SCCG test path, the matrices are represented with ``matrix::Csr`` +objects. This makes SCCG a useful example of how data containers and operation +handlers work together in a backend-capable solver path. + +SCCG uses a Schur complement structure. In the test path, the matrices do not +all have the same dimensions, and this is expected. + +The main matrices are: + +* ``H``: a square matrix used in the inner solve. +* ``Jc``: a rectangular matrix. +* ``Jc_tr``: the transpose of ``Jc``. + +A simplified operation chain is: + +1. Multiply by ``Jc_tr``. +2. Solve with ``H``. +3. Multiply by ``Jc``. + +Because of this structure, not every temporary vector has the same size. Some +vectors match the outer system dimension. Other vectors match the inner solve +dimension. The important requirement is that each matrix and vector matches the +operation being performed. + +This is similar to other system designs where each component has a specific +role. The matrix dimensions, memory spaces, and handlers all need to match the +part of the solver path where they are being used. + +Important Implementation Detail +------------------------------- + +One important detail in the SCCG test path is that the Matrix Market file +readers write into host-accessible memory. This means the test data should be +loaded into ``memory::HOST`` first. + +For GPU backends, the data should then be synchronized to ``memory::DEVICE``. +This avoids trying to load file data directly into device memory when the file +reader expects host-accessible memory. + +The pattern is: + +1. Allocate in ``memory::HOST``. +2. Load the file data. +3. If running on ``memory::DEVICE``, synchronize to device memory. + +This applies to both matrix and vector test data. + +Why Solver Paths Receive Handlers +--------------------------------- + +Solver paths that support multiple backends should receive backend-specific +handlers from the caller because the caller knows which backend is being used. +If a solver creates its own handler internally, it can accidentally create a +handler for the wrong backend. + +The safer design is: + +* The caller or test runner selects the backend. +* The caller or test runner creates the correct workspace. +* The caller or test runner creates the correct matrix and vector handlers. +* The solver receives and uses those handlers. + +In the SCCG path, this allows the same solver code to work with CPU, CUDA, and +HIP backends. + +Inputs and Outputs +------------------ + +The main inputs to this code pattern are: + +* Matrix and vector data. +* A selected memory space, such as ``memory::HOST`` or ``memory::DEVICE``. +* A backend workspace. +* Matrix and vector handlers. +* Solver-specific data, such as matrix dimensions and solver tolerance. + +The main outputs are: + +* Correctly allocated and synchronized data. +* Backend-specific matrix and vector operations. +* A solver path that can run on more than one backend. +* A clearer separation between storage, computation, and backend resources. + +Common Details to Watch For +--------------------------- + +The following points may not be clear when first reading this part of the code: + +* File readers may require host-accessible memory. +* Loading data and using data may happen in different memory spaces. +* A ``vector::Vector`` or ``matrix::Csr`` object stores data, while a handler performs an + operation. +* A workspace provides backend-specific resources for handlers. +* A solver that supports multiple backends should receive backend-specific + handlers from the caller instead of creating a hard-coded backend handler + internally. +* Rectangular matrices can be expected in SCCG because the Schur complement + path uses different inner and outer dimensions. +* For GPU tests, loading into ``memory::HOST`` first and then synchronizing to + ``memory::DEVICE`` may be necessary. +* A test that passes on CPU may still expose memory-space or backend-handler + issues on CUDA or HIP. + +Checklist for Backend-Capable Code +---------------------------------- + +When writing or reviewing code that should work on CPU and GPU backends, check +the following: + +* Is the object allocated before it is used? +* Is the object allocated in the memory space expected by the next operation? +* If data was loaded on the host, is it synchronized to the device before GPU + operations? +* Are the matrix and vector dimensions consistent with the operation chain? +* Are the handlers created from the correct backend workspace? +* Is the solver receiving backend-specific handlers from the caller? + +Suggested Validation +-------------------- + +When changing code that uses these classes and handlers, it is useful to test +the relevant CPU and GPU paths when the local environment supports them. For an +SCCG-related change, this may include building the CPU and CUDA configurations +and running the SCCG test executable. + +Example commands may vary by environment, but the basic checks are: + +.. code:: shell + + cmake --build build-cpu + ./build-cpu/tests/unit/hykkt/hykkt_sccg_test + + cmake --build build-cuda + ./build-cuda/tests/unit/hykkt/hykkt_sccg_test + +System Analysis +--------------- + +The main purpose of this structure is to make backend-capable solver code +easier to reason about. The vector and matrix classes provide the data storage. +The handlers provide the operations. The workspace provides backend resources. +The memory space describes where the data lives and where operations should +occur. + +This separation is especially useful for solver code that needs to work across +CPU, CUDA, and HIP. It reduces the chance that solver code will accidentally +use a CPU-specific handler inside a GPU path. It also makes the memory movement +more explicit when data is loaded on the host and then used on the device. + +In the SCCG test path, this structure helps explain why the test loads data +into host memory first, why it synchronizes to device memory for GPU backends, +and why SCCG receives matrix and vector handlers from the caller. + +This design also fits the larger Re::Solve and HyKKT motivation. Public ORNL +and Re::Solve materials describe GPU-resident linear solvers as useful in +scientific computing and optimization workflows where linear solves can +dominate runtime. In those workflows, keeping data movement and backend +operations organized is part of making the solver path practical on modern CPU +and GPU systems. + +Related Background +------------------ + +The references below provide additional context for why Re::Solve separates +solver logic, backend operations, and memory movement. + +HyKKT is one example of this type of workflow. Shaked Regev's dissertation +describes HyKKT as a method for sparse KKT linear systems that uses an +iterative solver on the Schur complement with an inner Cholesky factorization. +This is relevant to the SCCG path because it explains why matrix-vector +operations, Cholesky solves, matrix dimensions, and backend-specific execution +all appear in the same solver workflow. + +Krylov methods provide related background because they are commonly used when +direct methods are too expensive for large systems. Katarzyna Swirydowicz's +dissertation explains repeated large linear solves, Krylov subspace methods, +and GPU implementation tradeoffs for Krylov solvers and preconditioners. + +Further Reading +--------------- + +* `Re::Solve documentation and developer guide `_ +* `Re::Solve GitHub repository `_ +* `HyKKT GitHub repository `_ +* `Shaked Regev, Preconditioning Techniques for Sparse Linear Systems `_ +* `Katarzyna Swirydowicz, Strategies for Recycling Krylov Subspace Methods and Bilinear Form Estimation `_ +* `ORNL publication page on GPU-resident sparse direct linear solvers for ACOPF `_ +* `OSTI paper, Iterative Methods in GPU-Resident Linear Solvers for Nonlinear Constrained Optimization `_