diff --git a/CHANGELOG.md b/CHANGELOG.md
index ad503b2e..62cf8b82 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -22,6 +22,8 @@
- Improved coding guidelines for developers on floating point conventions.
+- Added Developer Guide documentation for vector and matrix classes and handlers.
+
## Changes to Re::Solve in release 0.99.2
### Major Features
diff --git a/docs/sphinx/developer_guide/index.rst b/docs/sphinx/developer_guide/index.rst
index fa2022ff..59a6975d 100644
--- a/docs/sphinx/developer_guide/index.rst
+++ b/docs/sphinx/developer_guide/index.rst
@@ -17,3 +17,4 @@ that they are consistently applied.
build_system
coding_guidelines
documentation
+ vector_matrix_handlers
diff --git a/docs/sphinx/developer_guide/vector_matrix_handlers.rst b/docs/sphinx/developer_guide/vector_matrix_handlers.rst
new file mode 100644
index 00000000..c59f67cc
--- /dev/null
+++ b/docs/sphinx/developer_guide/vector_matrix_handlers.rst
@@ -0,0 +1,454 @@
+Vector and Matrix Classes and Handlers
+======================================
+
+Background
+----------
+
+The purpose of this page is to help new developers understand how Re::Solve
+separates data containers, operation handlers, backend workspaces, and memory
+spaces. In particular, it explains the difference between vector and matrix
+classes and the ``VectorHandler`` and ``MatrixHandler`` classes that operate on
+them.
+
+The main distinction is that vector and matrix classes store data, while
+handler classes perform operations on that data. This distinction is important
+when writing code that needs to port to different backends run (e.g. CPU, CUDA, and HIP).
+
+This separation allows solver logic to remain independent of backend-specific
+vector and matrix operations.
+
+The main questions this page is meant to answer are:
+
+* What object stores the data?
+* Where does the data live?
+* What object performs the operation?
+* What backend resources does the operation need?
+* What is the difference between a vector or matrix class and a vector or
+ matrix handler?
+* What needs to happen when data is loaded on the host but used on the device?
+
+This page is not meant to document every method in detail. It is meant to give
+a practical mental model for reading and writing backend-capable Re::Solve code.
+
+Core Design
+-----------
+
+The main design idea is that Re::Solve separates storage, operations, and
+backend resources.
+
+The major pieces are:
+
+* ``vector::Vector`` objects store vector data.
+* Matrix objects, such as ``matrix::Csr``, store sparse matrix data.
+* ``VectorHandler`` objects perform vector operations.
+* ``MatrixHandler`` objects perform matrix operations.
+* ``LinAlgWorkspace`` objects provide backend-specific resources for the
+ handlers.
+* ``memory::HOST`` describes data stored in host-accessible memory.
+* ``memory::DEVICE`` describes data stored in device-accessible memory.
+
+This means that a vector or matrix object is not automatically a CPU or GPU
+operation. The data object stores the values. The handler performs the
+operation. The workspace gives the handler the backend resources it needs.
+
+This separation helps the same solver path run with different backend
+implementations.
+
+Vector Objects
+--------------
+
+A ``vector::Vector`` object represents vector data. The vector object is a data
+container. It stores the size of the vector and the data associated with that
+vector. Before a vector is used, it must be allocated in a memory space.
+
+Simplified example:
+
+.. code:: cpp
+
+ vector::Vector* x = new vector::Vector(n);
+ x->allocate(memory::HOST);
+
+If the vector is loaded or initialized on the host but later used by a GPU
+backend, the data may need to be synchronized to the device.
+
+Simplified example:
+
+.. code:: cpp
+
+ if (memspace == memory::DEVICE)
+ {
+ x->syncData(memory::DEVICE);
+ }
+
+The important distinction is that allocation and operation are separate steps.
+Allocating the vector controls where the data is stored. Calling a handler
+method controls what operation is performed on the data.
+
+This is useful because the same vector object may be part of a CPU test path or
+a GPU test path, depending on how it is allocated, synchronized, and passed to
+backend-specific operations.
+
+Matrix Objects
+--------------
+
+Matrix objects represent matrix data. Like vector objects, matrix objects are
+data containers. They store or describe the matrix data, but they do not perform
+matrix operations by themselves.
+
+Sparse matrices are commonly stored in compressed sparse formats, such as CSR
+(compressed sparse row) and CSC (compressed sparse column). These formats store
+only the nonzero values of a sparse matrix along with index information that
+describes where those values belong.
+
+In Re::Solve, matrix objects such as ``matrix::Csr`` store sparse matrix data.
+A CSR matrix stores the matrix dimensions, nonzero count, and sparse matrix
+data. Like vectors, a matrix object must be allocated in a memory space before
+it is used.
+
+Simplified example:
+
+.. code:: cpp
+
+ matrix::Csr* A = new matrix::Csr(num_rows, num_cols, nnz);
+ A->allocateMatrixData(memory::HOST);
+
+In file-loading paths, matrix data may need to be loaded into host memory
+first. For example, Matrix Market file readers write into host-accessible
+memory. If the test is running on a GPU backend, the matrix can then be
+synchronized to device memory.
+
+SCCG test path example:
+
+.. code:: cpp
+
+ matrix::Csr* h = new matrix::Csr(2278, 2278, 11304, true, false);
+ h->allocateMatrixData(memory::HOST);
+ io::updateMatrixFromFile(h_file, h);
+
+ if (memspace_ == memory::DEVICE)
+ {
+ h->syncData(memory::DEVICE);
+ }
+
+This pattern matters because the memory space used for loading data is not
+always the same as the memory space used for computation.
+
+Vector Handlers
+---------------
+
+A ``VectorHandler`` performs operations on ``vector::Vector`` objects. It does not replace
+the vector class. Instead, it provides backend-specific operations
+that act on existing vector data.
+
+A ``VectorHandler`` may perform operations such as:
+
+* ``dot``
+* ``scal``
+* ``axpy``
+
+A useful way to think about the difference is:
+
+* ``vector::Vector`` stores the vector data.
+* ``VectorHandler`` performs vector operations on that data.
+
+A useful way to separate the roles is that ``vector::Vector`` stores the data, while ``VectorHandler`` performs operations on that data.
+For example, a vector object may hold the entries of a residual vector, while a
+vector handler may compute a dot product, scale the vector, or add one vector
+to another.
+
+Matrix Handlers
+---------------
+
+A ``MatrixHandler`` performs operations on matrix objects such as ``matrix::Csr``. It does not replace
+the matrix class. Instead, it provides backend-specific matrix operations that
+act on existing matrix data.
+
+A ``MatrixHandler`` may perform operations such as:
+
+* ``matvec``
+* ``transpose``
+
+A useful way to separate the roles is that ``matrix::Csr`` stores the data, while ``MatrixHandler`` performs matrix operations on that data.
+For example, a matrix object may hold the CSR representation of a sparse
+matrix, while a matrix handler may perform a sparse matrix-vector product or
+construct a transpose.
+
+Handler Setup
+-------------
+
+Handlers are created using a workspace for the selected backend. A simplified
+setup pattern is:
+
+.. code:: cpp
+
+ WorkspaceType workspace;
+ workspace.initializeHandles();
+
+ MatrixHandler matrix_handler(&workspace);
+ VectorHandler vector_handler(&workspace);
+
+The handler uses the workspace that was created for the selected backend. This
+is why backend-capable solver code should generally receive the correct
+handlers from the caller instead of creating a hard-coded CPU, CUDA, or HIP
+handler internally.
+
+Workspaces
+----------
+
+Workspace classes provide the backend-specific resources needed by handlers. A
+CPU workspace, CUDA workspace, and HIP workspace may initialize different
+backend handles or library resources.
+
+The general setup is:
+
+1. Create the workspace for the selected backend.
+2. Initialize the workspace handles.
+3. Create matrix and vector handlers using that workspace.
+4. Pass those handlers into the solver or test fixture.
+
+Simplified SCCG setup example:
+
+.. code:: cpp
+
+ WorkspaceType workspace;
+ workspace.initializeHandles();
+
+ MatrixHandler matrix_handler(&workspace);
+ VectorHandler vector_handler(&workspace);
+
+ HykktSchurComplementConjugateGradientTests test(memspace,
+ matrix_handler,
+ vector_handler);
+
+This keeps the solver or test fixture from being tied to only one backend.
+
+Principle of Operation
+----------------------
+
+The basic flow for backend-capable code is:
+
+1. Create or load vector and matrix data.
+2. Allocate that data in the correct memory space.
+3. If data is loaded on the host and used on the device, synchronize it to the
+ device.
+4. Create the backend workspace.
+5. Create handlers from that workspace.
+6. Pass the handlers into the solver or test path.
+7. Use the handlers to perform vector and matrix operations.
+
+This flow keeps the data, operation, and backend setup separate. It also makes
+it easier to identify whether a problem is caused by data storage, memory
+movement, backend setup, or the solver algorithm itself.
+
+Re::Solve Context
+-----------------
+
+Re::Solve examples are designed around repeated linear solver use cases. The
+public Re::Solve documentation describes examples that emulate a nonlinear
+solver calling the linear solver repeatedly. This matters because repeated
+solver calls can make setup cost, memory movement, and backend resource
+management important.
+
+The public HyKKT documentation describes HyKKT as a solver for
+Karush-Kuhn-Tucker systems that can use hardware accelerators efficiently. The
+HyKKT description also explains that the solver uses block reduction and
+conjugate gradient on the Schur complement.
+
+This background is useful for understanding why the SCCG path needs careful
+handling of matrix dimensions, memory spaces, and backend-specific handlers.
+
+SCCG Example
+------------
+
+SCCG stands for Schur Complement Conjugate Gradient. The SCCG test path is a
+useful example because it uses vector objects, matrix objects, vector handlers,
+matrix handlers, workspaces, and memory spaces together.
+
+In the SCCG test path, the matrices are represented with ``matrix::Csr``
+objects. This makes SCCG a useful example of how data containers and operation
+handlers work together in a backend-capable solver path.
+
+SCCG uses a Schur complement structure. In the test path, the matrices do not
+all have the same dimensions, and this is expected.
+
+The main matrices are:
+
+* ``H``: a square matrix used in the inner solve.
+* ``Jc``: a rectangular matrix.
+* ``Jc_tr``: the transpose of ``Jc``.
+
+A simplified operation chain is:
+
+1. Multiply by ``Jc_tr``.
+2. Solve with ``H``.
+3. Multiply by ``Jc``.
+
+Because of this structure, not every temporary vector has the same size. Some
+vectors match the outer system dimension. Other vectors match the inner solve
+dimension. The important requirement is that each matrix and vector matches the
+operation being performed.
+
+This is similar to other system designs where each component has a specific
+role. The matrix dimensions, memory spaces, and handlers all need to match the
+part of the solver path where they are being used.
+
+Important Implementation Detail
+-------------------------------
+
+One important detail in the SCCG test path is that the Matrix Market file
+readers write into host-accessible memory. This means the test data should be
+loaded into ``memory::HOST`` first.
+
+For GPU backends, the data should then be synchronized to ``memory::DEVICE``.
+This avoids trying to load file data directly into device memory when the file
+reader expects host-accessible memory.
+
+The pattern is:
+
+1. Allocate in ``memory::HOST``.
+2. Load the file data.
+3. If running on ``memory::DEVICE``, synchronize to device memory.
+
+This applies to both matrix and vector test data.
+
+Why Solver Paths Receive Handlers
+---------------------------------
+
+Solver paths that support multiple backends should receive backend-specific
+handlers from the caller because the caller knows which backend is being used.
+If a solver creates its own handler internally, it can accidentally create a
+handler for the wrong backend.
+
+The safer design is:
+
+* The caller or test runner selects the backend.
+* The caller or test runner creates the correct workspace.
+* The caller or test runner creates the correct matrix and vector handlers.
+* The solver receives and uses those handlers.
+
+In the SCCG path, this allows the same solver code to work with CPU, CUDA, and
+HIP backends.
+
+Inputs and Outputs
+------------------
+
+The main inputs to this code pattern are:
+
+* Matrix and vector data.
+* A selected memory space, such as ``memory::HOST`` or ``memory::DEVICE``.
+* A backend workspace.
+* Matrix and vector handlers.
+* Solver-specific data, such as matrix dimensions and solver tolerance.
+
+The main outputs are:
+
+* Correctly allocated and synchronized data.
+* Backend-specific matrix and vector operations.
+* A solver path that can run on more than one backend.
+* A clearer separation between storage, computation, and backend resources.
+
+Common Details to Watch For
+---------------------------
+
+The following points may not be clear when first reading this part of the code:
+
+* File readers may require host-accessible memory.
+* Loading data and using data may happen in different memory spaces.
+* A ``vector::Vector`` or ``matrix::Csr`` object stores data, while a handler performs an
+ operation.
+* A workspace provides backend-specific resources for handlers.
+* A solver that supports multiple backends should receive backend-specific
+ handlers from the caller instead of creating a hard-coded backend handler
+ internally.
+* Rectangular matrices can be expected in SCCG because the Schur complement
+ path uses different inner and outer dimensions.
+* For GPU tests, loading into ``memory::HOST`` first and then synchronizing to
+ ``memory::DEVICE`` may be necessary.
+* A test that passes on CPU may still expose memory-space or backend-handler
+ issues on CUDA or HIP.
+
+Checklist for Backend-Capable Code
+----------------------------------
+
+When writing or reviewing code that should work on CPU and GPU backends, check
+the following:
+
+* Is the object allocated before it is used?
+* Is the object allocated in the memory space expected by the next operation?
+* If data was loaded on the host, is it synchronized to the device before GPU
+ operations?
+* Are the matrix and vector dimensions consistent with the operation chain?
+* Are the handlers created from the correct backend workspace?
+* Is the solver receiving backend-specific handlers from the caller?
+
+Suggested Validation
+--------------------
+
+When changing code that uses these classes and handlers, it is useful to test
+the relevant CPU and GPU paths when the local environment supports them. For an
+SCCG-related change, this may include building the CPU and CUDA configurations
+and running the SCCG test executable.
+
+Example commands may vary by environment, but the basic checks are:
+
+.. code:: shell
+
+ cmake --build build-cpu
+ ./build-cpu/tests/unit/hykkt/hykkt_sccg_test
+
+ cmake --build build-cuda
+ ./build-cuda/tests/unit/hykkt/hykkt_sccg_test
+
+System Analysis
+---------------
+
+The main purpose of this structure is to make backend-capable solver code
+easier to reason about. The vector and matrix classes provide the data storage.
+The handlers provide the operations. The workspace provides backend resources.
+The memory space describes where the data lives and where operations should
+occur.
+
+This separation is especially useful for solver code that needs to work across
+CPU, CUDA, and HIP. It reduces the chance that solver code will accidentally
+use a CPU-specific handler inside a GPU path. It also makes the memory movement
+more explicit when data is loaded on the host and then used on the device.
+
+In the SCCG test path, this structure helps explain why the test loads data
+into host memory first, why it synchronizes to device memory for GPU backends,
+and why SCCG receives matrix and vector handlers from the caller.
+
+This design also fits the larger Re::Solve and HyKKT motivation. Public ORNL
+and Re::Solve materials describe GPU-resident linear solvers as useful in
+scientific computing and optimization workflows where linear solves can
+dominate runtime. In those workflows, keeping data movement and backend
+operations organized is part of making the solver path practical on modern CPU
+and GPU systems.
+
+Related Background
+------------------
+
+The references below provide additional context for why Re::Solve separates
+solver logic, backend operations, and memory movement.
+
+HyKKT is one example of this type of workflow. Shaked Regev's dissertation
+describes HyKKT as a method for sparse KKT linear systems that uses an
+iterative solver on the Schur complement with an inner Cholesky factorization.
+This is relevant to the SCCG path because it explains why matrix-vector
+operations, Cholesky solves, matrix dimensions, and backend-specific execution
+all appear in the same solver workflow.
+
+Krylov methods provide related background because they are commonly used when
+direct methods are too expensive for large systems. Katarzyna Swirydowicz's
+dissertation explains repeated large linear solves, Krylov subspace methods,
+and GPU implementation tradeoffs for Krylov solvers and preconditioners.
+
+Further Reading
+---------------
+
+* `Re::Solve documentation and developer guide `_
+* `Re::Solve GitHub repository `_
+* `HyKKT GitHub repository `_
+* `Shaked Regev, Preconditioning Techniques for Sparse Linear Systems `_
+* `Katarzyna Swirydowicz, Strategies for Recycling Krylov Subspace Methods and Bilinear Form Estimation `_
+* `ORNL publication page on GPU-resident sparse direct linear solvers for ACOPF `_
+* `OSTI paper, Iterative Methods in GPU-Resident Linear Solvers for Nonlinear Constrained Optimization `_