Skip to content

Separate API into Volume, Image and Snapshot#821

Draft
lukasfrank wants to merge 11 commits into
mainfrom
idea/refactoring
Draft

Separate API into Volume, Image and Snapshot#821
lukasfrank wants to merge 11 commits into
mainfrom
idea/refactoring

Conversation

@lukasfrank

@lukasfrank lukasfrank commented Jan 14, 2026

Copy link
Copy Markdown
Member

Scaffold to illustrate a simpler and maintainable flow on how to realize snapshots. The orchestration is done in a Volume resource to have a 1:1 mapping with Image and Snapshot to the ceph core resources.

@coderabbitai

coderabbitai Bot commented Jan 14, 2026

Copy link
Copy Markdown

Walkthrough

Restructures image/snapshot APIs and controller flows and introduces a new Volume API and VolumeReconciler. ImageSpec now uses SnapshotSource; Snapshot uses Spec with ImageRef and Protection; legacy IronCore image handling and many snapshot/image cleanup paths were removed; a new Ceph-backed volume reconciliation component was added.

Changes

Cohort / File(s) Summary
API: Image model
api/image.go
ImageSpec fields changed: removed Image and SnapshotRef; added SnapshotSource *string and reordered Encryption. Review serialization and consumers of SnapshotSource.
API: Snapshot model
api/snapshot.go
Replaced Source with Spec (SnapshotSpec with ImageRef and Protection). Added SnapshotProtection type and constants; removed VolumeImageID from SnapshotSource. Update callers that used old Source/VolumeImageID.
API: Volume model
api/volume.go
New Volume resource, VolumeSpec, VolumeStatus, VolumeSource (OSVolume / SnapshotSource), encryption types/states, access struct, and enums. New public API surface — verify JSON tags and defaults.
Controllers: common
internal/controllers/common.go
Added a TODO above ImageIDToRBDID. No functional change.
Controllers: Image controller
internal/controllers/image_controller.go
Replaced uses of SnapshotRef with SnapshotSource; removed deleteImage and reconcileSnapshot logic and related image deletion/population flows; cleaned image-digest label usage. Check image lifecycle and finalizer assumptions.
Controllers: Snapshot controller
internal/controllers/snapshot_controller.go
Removed IronCore-specific reconciliation and helpers (removeSnapshot, reconcileIroncoreImageSnapshot, open/prepare/populate helpers). reconcileVolumeImageSnapshot and reconcileSnapshot updated to use snapshot.Spec.ImageRef and snapshot.Spec.Protection. Verify snapshot lifecycle and protection semantics.
Controllers: Volume controller
internal/controllers/volume_controller.go
New VolumeReconciler implementation: constructor, Start/worker loop, queue processing, reconcileVolume, delete/finalizer handling, populateImage, and stubs for reconcile paths (empty/OS/snapshot-restored). High-impact new logic interacting with Ceph/RADOS/RBD and stores.

Sequence Diagram(s)

sequenceDiagram
    participant VR as VolumeReconciler
    participant ES as EventSource
    participant Store as VolumeStore
    participant RADOS as RADOS
    participant RBD as RBD

    ES->>VR: Snapshot event
    VR->>VR: Enqueue volume ID
    VR->>VR: processNextWorkItem()
    VR->>Store: Get Volume by ID
    Store-->>VR: Volume

    alt Volume marked for deletion
        VR->>VR: deleteVolume() (finalizer handling)
        VR->>Store: Update Volume (remove finalizer)
    else Reconcile volume
        VR->>RADOS: Open IOContext
        RADOS-->>VR: IOContext

        alt Source == OSVolume
            VR->>RBD: Create/prepare image from OSVolume
            RBD-->>VR: Image ready
        else Source == SnapshotSource
            VR->>RBD: Restore image from snapshot (ImageRef)
            RBD-->>VR: Restore complete
        else Empty volume
            VR->>VR: reconcileEmptyVolume()
        end

        VR->>Store: Update Volume status/state
    end
Loading
sequenceDiagram
    participant IC as ImageController
    participant SnapshotStore as SnapshotStore
    participant SC as SnapshotController
    participant RBD as RBD

    IC->>IC: Detect `SnapshotSource` on ImageSpec
    IC->>SnapshotStore: Fetch Snapshot by SnapshotSource
    SnapshotStore-->>IC: Snapshot (Spec.ImageRef)

    SC->>SC: Reconcile Snapshot (check Spec.Protection)
    alt Protection == "protected"
        SC->>RBD: Protect RBD snapshot
        RBD-->>SC: Protected
    else Protection == "none"
        SC->>RBD: Unprotect or allow deletion
        RBD-->>SC: Unprotected
    end

    SC->>SnapshotStore: Update Snapshot status
    IC->>IC: Pull/reference image from snapshot
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description check ⚠️ Warning The description is minimal and lacks the proposed changes structure from the template. Key details about API changes and refactoring scope are missing. Expand the description to include proposed changes section with specific details about the new Volume/Image/Snapshot separation and the refactoring objectives.
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title directly and clearly reflects the main change: separating the API into three distinct resource types (Volume, Image, and Snapshot).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings


📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bd6b404 and ef1d728.

📒 Files selected for processing (7)
  • api/image.go
  • api/snapshot.go
  • api/volume.go
  • internal/controllers/common.go
  • internal/controllers/image_controller.go
  • internal/controllers/snapshot_controller.go
  • internal/controllers/volume_controller.go

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@lukasfrank

Copy link
Copy Markdown
Member Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jan 14, 2026

Copy link
Copy Markdown
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@lukasfrank

Copy link
Copy Markdown
Member Author

@coderabbitai help

@coderabbitai

coderabbitai Bot commented Jan 14, 2026

Copy link
Copy Markdown

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

CodeRabbit commands

These commands are invoked using PR/Issue comments.

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai evaluate custom pre-merge check --instructions <custom-checks-instructions> --name <custom-checks-title> [--mode <error|warning>] to validate the custom pre-merge checks instructions. Defaults to error when --mode is omitted.
    • @coderabbitai ignore pre-merge checks to override pre-merge checks and get an approval on PR.
    • @coderabbitai run pre-merge checks to run pre-merge checks on the pull request.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve to resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai generate configuration to create a PR that adds the current resolved configuration as .coderabbit.yaml (or show it if already present).
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit configuration file (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • You can also validate your configuration using the online YAML validator.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, support, documentation and community

  • Visit our status page to check the current availability of CodeRabbit.
  • Create a ticket on our support page for assistance with any issues or questions.
  • Visit our documentation site for detailed information on how to use CodeRabbit.
  • Join our Discord community to connect with other users and get help from the community.
  • Follow us on X/Twitter for updates and announcements.

@lukasfrank

Copy link
Copy Markdown
Member Author

@coderabbitai summary

@coderabbitai

coderabbitai Bot commented Jan 14, 2026

Copy link
Copy Markdown
✅ Actions performed

Summary regeneration triggered.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
internal/controllers/image_controller.go (1)

220-223: Empty deleteImage will cause resource leaks and orphaned finalizers.

The function body was removed but the finalizer (ImageFinalizer) is still being set on images (line 332). When an image is deleted:

  1. The finalizer prevents garbage collection
  2. deleteImage is called but does nothing
  3. The finalizer is never removed
  4. RBD images are never cleaned up in Ceph

This will cause images to be stuck in deletion and leak Ceph storage.

If this is intentional scaffolding for the draft PR, consider adding a // TODO: implement deletion logic comment and either:

  • Remove the finalizer addition temporarily, or
  • Implement at minimum the finalizer removal to unblock deletion
🔧 Minimal fix to unblock deletion (if scaffold)
 func (r *ImageReconciler) deleteImage(ctx context.Context, log logr.Logger, ioCtx *rados.IOContext, image *providerapi.Image) error {
-
+	// TODO: implement RBD image cleanup
+	image.Finalizers = slices.DeleteFunc(image.Finalizers, func(f string) bool {
+		return f == ImageFinalizer
+	})
+	if _, err := r.images.Update(ctx, image); err != nil {
+		return fmt.Errorf("failed to remove finalizer: %w", err)
+	}
 	return nil
 }
api/snapshot.go (1)

43-45: Remove Source field assignment and fix field mismatch in SnapshotSource struct.

The Snapshot struct does not have a Source field, but code in internal/volumeserver/volumesnapshot_create.go:36 and internal/volumeserver/snapshot.go:40 attempts to access snapshot.Source.VolumeImageID. Additionally, the SnapshotSource struct only defines IronCoreImage, not VolumeImageID. Either the struct definitions are incomplete or the field references are incorrect—this will cause compilation/runtime failures.

internal/controllers/common.go (1)

30-42: getSnapshotSourceDetails references fields that no longer exist on Snapshot.

The function at lines 30-42 attempts to access snapshot.Source.IronCoreImage and snapshot.Source.VolumeImageID, but:

  • The Snapshot struct contains Spec SnapshotSpec and Status SnapshotStatus—there is no Source field
  • The SnapshotSource struct only contains IronCoreImage and does not have VolumeImageID

The function will fail to compile as written.

internal/controllers/snapshot_controller.go (1)

236-255: Critical: Error from GetSize() is checked after using its result.

The error returned by rbdImg.GetSize() at line 236 is not checked until line 244, but the size value is already used at line 240 and 242. If GetSize() fails, this uses an undefined/zero value for the size calculations.

Additionally, the if err != nil block at line 244 appears to be orphaned refactoring remnants—it wraps error handling that references err but no failing operation directly precedes it.

Proposed fix
 	// Get image size
 	size, err := rbdImg.GetSize()
 	if err != nil {
+		snapshot.Status.State = providerapi.SnapshotStateFailed
+		if _, updateErr := r.store.Update(ctx, snapshot); updateErr != nil {
+			return errors.Join(err, fmt.Errorf("failed to update snapshot state: %w", updateErr))
+		}
 		return fmt.Errorf("failed to get snapshot image size %s: %w", snapshotName, err)
 	}
 	roundedSize := round.OffBytes(size)
-
 	snapshot.Status.Size = int64(roundedSize)
-
-	if err != nil {
-		snapshot.Status.State = providerapi.SnapshotStateFailed
-		if _, updateErr := r.store.Update(ctx, snapshot); updateErr != nil {
-			return errors.Join(err, fmt.Errorf("failed to update snapshot state: %w", updateErr))
-		}
-		return fmt.Errorf("failed to reconcile snapshot: %w", err)
-	}
-
 	snapshot.Status.State = providerapi.SnapshotStateReady
 	if _, err = r.store.Update(ctx, snapshot); err != nil {
 		return fmt.Errorf("failed to update snapshot: %w", err)
 	}
🤖 Fix all issues with AI agents
In `@internal/controllers/snapshot_controller.go`:
- Around line 218-228: The protection handling is wrong: remove the stale TODO,
stop blindly calling imgSnap.Unprotect() on a new snapshot (check
snapshot.Spec.Protection and the current protection state before calling
Unprotect/Protect), fix the error messages to reflect the actual operation
(e.g., "unable to unprotect snapshot" vs "unable to protect snapshot"), and add
a default case for snapshot.Spec.Protection to validate unexpected values and
return a clear error; locate this logic around snapshot.Spec.Protection and the
calls to imgSnap.Protect()/imgSnap.Unprotect() and update accordingly.

In `@internal/controllers/volume_controller.go`:
- Around line 207-210: The log currently says "Snapshot already populated"
inside the volume reconciler when checking if volume.Status.State ==
providerapi.VolumeStateAvailable; update the log message at the
log.V(1).Info(...) call to accurately reference the volume (e.g., "Volume
already available" or similar), so the message reflects volume.Status.State and
the providerapi.VolumeStateAvailable check.
- Around line 26-30: Rename the misspelled type SnapshoVolumelerOptions to
VolumeReconcilerOptions and update all references (including the constructor
parameter type used by NewVolumeReconciler) to the new name; change the type
declaration "type SnapshoVolumelerOptions struct { Pool string;
PopulatorBufferSize int64; WorkerSize int }" to "type VolumeReconcilerOptions
struct { ... }", update the NewVolumeReconciler signature and any variables,
function parameters, or tests that reference SnapshoVolumelerOptions to
VolumeReconcilerOptions, and run `go build`/tests to ensure no remaining
references.
- Around line 229-234: The code currently swallows the reconciliation error by
leaving the todo and returning nil; replace the TODO with returning the captured
err (preferably wrapped with context) so failures propagate and trigger
controller retries—for example, change the block that checks "if err != nil" to
"return fmt.Errorf(\"reconcile failed: %w\", err)" or simply "return err" inside
the same function (the one containing the reconciliation switch and the err
variable) so the controller’s rate-limited requeue behavior can act on real
errors.
🧹 Nitpick comments (4)
api/image.go (1)

30-37: Potential naming confusion: SnapshotSource field vs SnapshotSource struct type.

The field SnapshotSource *string in ImageSpec shares the same name as the SnapshotSource struct type defined in api/snapshot.go. This may cause confusion for developers navigating the codebase.

Consider renaming the field to SnapshotRef or SnapshotID to clarify that it's a string reference rather than the struct type.

💡 Suggested rename for clarity
 type ImageSpec struct {
 	Size       uint64         `json:"size"`
 	WWN        string         `json:"wwn"`
 	Limits     Limits         `json:"limits"`
 	Encryption EncryptionSpec `json:"encryption"`
 
-	SnapshotSource *string `json:"snapshotSource"`
+	SnapshotRef *string `json:"snapshotRef"`
 }
api/volume.go (1)

56-59: Consider consolidating duplicate types with api/image.go.

VolumeEncryptionSpec and VolumeAccess are nearly identical to EncryptionSpec and ImageAccess in api/image.go. If these semantically represent the same concepts, consider:

  1. Reusing the existing types, or
  2. Extracting common types to reduce duplication

This is a minor concern given the PR's scaffolding nature.

Also applies to: 68-74

internal/controllers/volume_controller.go (2)

264-278: Parameter ctx2 is misleadingly named.

The parameter is a *rados.IOContext, not a context.Context. Rename to ioCtx for consistency with reconcileVolume and deleteVolume.

Proposed fix
-func (r *VolumeReconciler) reconcileEmptyVolume(ctx context.Context, log logr.Logger, ctx2 *rados.IOContext, volume *providerapi.Volume) error {
+func (r *VolumeReconciler) reconcileEmptyVolume(ctx context.Context, log logr.Logger, ioCtx *rados.IOContext, volume *providerapi.Volume) error {
 	//Case 1: Empty Volume -> create img
 	return nil
 }

-func (r *VolumeReconciler) reconcileOSVolume(ctx context.Context, log logr.Logger, ctx2 *rados.IOContext, volume *providerapi.Volume) error {
+func (r *VolumeReconciler) reconcileOSVolume(ctx context.Context, log logr.Logger, ioCtx *rados.IOContext, volume *providerapi.Volume) error {
 	//Case 2: OS Volume -> create img, dump os on img, snapshot, create img with snap ref
 	// r.populateImage
 	return nil
 }

-func (r *VolumeReconciler) reconcileRestoredVolume(ctx context.Context, log logr.Logger, ctx2 *rados.IOContext, volume *providerapi.Volume) error {
+func (r *VolumeReconciler) reconcileRestoredVolume(ctx context.Context, log logr.Logger, ioCtx *rados.IOContext, volume *providerapi.Volume) error {
 	//Case 3: Restore volume -> create img with snap ref
 	return nil
 }

168-182: Deletion logic is a stub—finalizer removed without actual resource cleanup.

The TODO at line 174 indicates this is intentional scaffold code. Ensure actual RBD image deletion is implemented before merging.

Would you like me to help generate the RBD image deletion logic based on patterns from the snapshot or image controllers?

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bd6b404 and ef1d728.

📒 Files selected for processing (7)
  • api/image.go
  • api/snapshot.go
  • api/volume.go
  • internal/controllers/common.go
  • internal/controllers/image_controller.go
  • internal/controllers/snapshot_controller.go
  • internal/controllers/volume_controller.go
🧰 Additional context used
🧬 Code graph analysis (5)
internal/controllers/image_controller.go (1)
api/snapshot.go (1)
  • SnapshotSource (43-45)
api/image.go (1)
api/snapshot.go (1)
  • SnapshotSource (43-45)
api/volume.go (1)
api/image.go (1)
  • Limits (66-66)
internal/controllers/volume_controller.go (5)
api/snapshot.go (1)
  • Snapshot (10-15)
api/image.go (1)
  • Image (10-15)
api/volume.go (1)
  • Volume (10-15)
internal/utils/utils.go (1)
  • DeleteSliceElement (11-18)
internal/rater/rater.go (1)
  • NewRater (12-16)
internal/controllers/snapshot_controller.go (1)
api/snapshot.go (4)
  • SnapshotProtectionNone (20-20)
  • SnapshotProtectionProtected (21-21)
  • SnapshotStateFailed (34-34)
  • SnapshotStateReady (33-33)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: lint
  • GitHub Check: run
🔇 Additional comments (7)
internal/controllers/common.go (1)

21-24: LGTM — TODO noted for future cleanup.

The TODO comment appropriately flags the function for potential removal during the refactoring effort.

api/snapshot.go (1)

13-27: LGTM — Clean API design with SnapshotSpec and SnapshotProtection.

The new SnapshotSpec structure with ImageRef and Protection fields provides clear separation of concerns. The protection semantics (none/protected) align well with RBD snapshot protection concepts.

api/volume.go (1)

10-42: LGTM — Volume API structure is well-organized.

The new Volume resource model establishes a clean 1:1 mapping as described in the PR objectives. The VolumeSource with OSVolume and SnapshotSource options provides flexibility for different volume provisioning paths.

internal/controllers/image_controller.go (3)

164-166: LGTM — Event handler updated to use SnapshotSource.

The change from SnapshotRef to SnapshotSource is consistent with the API changes in api/image.go.


361-370: LGTM — Reconcile logic updated to use SnapshotSource.

The switch case correctly uses img.Spec.SnapshotSource consistent with the updated ImageSpec.


229-230: TODO noted — unique credentials per image.

This is a valid enhancement tracked for future implementation.

internal/controllers/snapshot_controller.go (1)

198-204: LGTM!

The switch to snapshot.Spec.ImageRef aligns correctly with the new API structure.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment on lines 218 to 228
//TODO implement protection
switch snapshot.Spec.Protection {
case providerapi.SnapshotProtectionNone:
if err := imgSnap.Unprotect(); err != nil {
return fmt.Errorf("unable to protect snapshot: %w", err)
}
case providerapi.SnapshotProtectionProtected:
if err := imgSnap.Protect(); err != nil {
return fmt.Errorf("unable to protect snapshot: %w", err)
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Error message mismatch and missing validation in protection handling.

  1. Line 222: The error message says "unable to protect" but the operation is Unprotect().
  2. Line 218: The TODO comment is stale since protection is implemented below.
  3. Calling Unprotect() on a newly created snapshot (which is unprotected by default) will likely return an error from RBD.
  4. No default case handles unexpected Protection values.
Proposed fix
-	//TODO implement protection
-	switch snapshot.Spec.Protection {
-	case providerapi.SnapshotProtectionNone:
-		if err := imgSnap.Unprotect(); err != nil {
-			return fmt.Errorf("unable to protect snapshot: %w", err)
-		}
-	case providerapi.SnapshotProtectionProtected:
-		if err := imgSnap.Protect(); err != nil {
-			return fmt.Errorf("unable to protect snapshot: %w", err)
-		}
-	}
+	switch snapshot.Spec.Protection {
+	case providerapi.SnapshotProtectionNone:
+		// Newly created snapshots are unprotected by default; only unprotect if already protected
+		isProtected, err := imgSnap.IsProtected()
+		if err != nil {
+			return fmt.Errorf("failed to check snapshot protection status: %w", err)
+		}
+		if isProtected {
+			if err := imgSnap.Unprotect(); err != nil {
+				return fmt.Errorf("unable to unprotect snapshot: %w", err)
+			}
+		}
+	case providerapi.SnapshotProtectionProtected:
+		if err := imgSnap.Protect(); err != nil {
+			return fmt.Errorf("unable to protect snapshot: %w", err)
+		}
+	default:
+		return fmt.Errorf("unknown snapshot protection value: %s", snapshot.Spec.Protection)
+	}
🤖 Prompt for AI Agents
In `@internal/controllers/snapshot_controller.go` around lines 218 - 228, The
protection handling is wrong: remove the stale TODO, stop blindly calling
imgSnap.Unprotect() on a new snapshot (check snapshot.Spec.Protection and the
current protection state before calling Unprotect/Protect), fix the error
messages to reflect the actual operation (e.g., "unable to unprotect snapshot"
vs "unable to protect snapshot"), and add a default case for
snapshot.Spec.Protection to validate unexpected values and return a clear error;
locate this logic around snapshot.Spec.Protection and the calls to
imgSnap.Protect()/imgSnap.Unprotect() and update accordingly.

Comment on lines +26 to +30
type SnapshoVolumelerOptions struct {
Pool string
PopulatorBufferSize int64
WorkerSize int
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Typo in struct name: SnapshoVolumelerOptions.

This appears to be a copy-paste artifact. Rename to VolumeReconcilerOptions for consistency with VolumeReconciler.

Proposed fix
-type SnapshoVolumelerOptions struct {
+type VolumeReconcilerOptions struct {
 	Pool                string
 	PopulatorBufferSize int64
 	WorkerSize          int
 }

Also update the constructor parameter type at line 40.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
type SnapshoVolumelerOptions struct {
Pool string
PopulatorBufferSize int64
WorkerSize int
}
type VolumeReconcilerOptions struct {
Pool string
PopulatorBufferSize int64
WorkerSize int
}
🤖 Prompt for AI Agents
In `@internal/controllers/volume_controller.go` around lines 26 - 30, Rename the
misspelled type SnapshoVolumelerOptions to VolumeReconcilerOptions and update
all references (including the constructor parameter type used by
NewVolumeReconciler) to the new name; change the type declaration "type
SnapshoVolumelerOptions struct { Pool string; PopulatorBufferSize int64;
WorkerSize int }" to "type VolumeReconcilerOptions struct { ... }", update the
NewVolumeReconciler signature and any variables, function parameters, or tests
that reference SnapshoVolumelerOptions to VolumeReconcilerOptions, and run `go
build`/tests to ensure no remaining references.

Comment on lines +104 to +117
events event.Source[*providerapi.Snapshot]

pool string
populatorBufferSize int64

workerSize int
}

func (r *VolumeReconciler) Start(ctx context.Context) error {
log := r.log

reg, err := r.events.AddHandler(event.HandlerFunc[*providerapi.Snapshot](func(event event.Event[*providerapi.Snapshot]) {
r.queue.Add(event.Object.ID)
}))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Event source mismatch: subscribing to Snapshot events but reconciling Volumes.

The VolumeReconciler subscribes to event.Source[*providerapi.Snapshot] and adds snapshot IDs to the queue, but reconcileVolume fetches from volumeStore using that ID. This will cause lookups to fail or return incorrect resources.

Proposed fix

Either change the event source to Volume events:

-	events event.Source[*providerapi.Snapshot]
+	events event.Source[*providerapi.Volume]

And update the handler accordingly:

-	reg, err := r.events.AddHandler(event.HandlerFunc[*providerapi.Snapshot](func(event event.Event[*providerapi.Snapshot]) {
+	reg, err := r.events.AddHandler(event.HandlerFunc[*providerapi.Volume](func(event event.Event[*providerapi.Volume]) {
 		r.queue.Add(event.Object.ID)
 	}))

Or if the design intent is to react to snapshot events to reconcile related volumes, the handler should map snapshot to volume ID before enqueueing.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
events event.Source[*providerapi.Snapshot]
pool string
populatorBufferSize int64
workerSize int
}
func (r *VolumeReconciler) Start(ctx context.Context) error {
log := r.log
reg, err := r.events.AddHandler(event.HandlerFunc[*providerapi.Snapshot](func(event event.Event[*providerapi.Snapshot]) {
r.queue.Add(event.Object.ID)
}))
events event.Source[*providerapi.Volume]
pool string
populatorBufferSize int64
workerSize int
}
func (r *VolumeReconciler) Start(ctx context.Context) error {
log := r.log
reg, err := r.events.AddHandler(event.HandlerFunc[*providerapi.Volume](func(event event.Event[*providerapi.Volume]) {
r.queue.Add(event.Object.ID)
}))

Comment on lines +207 to +210
if volume.Status.State == providerapi.VolumeStateAvailable {
log.V(1).Info("Snapshot already populated")
return nil
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Log message refers to "Snapshot" but this is the Volume reconciler.

Update the log message to accurately describe the volume state.

Proposed fix
 	if volume.Status.State == providerapi.VolumeStateAvailable {
-		log.V(1).Info("Snapshot already populated")
+		log.V(1).Info("Volume already available")
 		return nil
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if volume.Status.State == providerapi.VolumeStateAvailable {
log.V(1).Info("Snapshot already populated")
return nil
}
if volume.Status.State == providerapi.VolumeStateAvailable {
log.V(1).Info("Volume already available")
return nil
}
🤖 Prompt for AI Agents
In `@internal/controllers/volume_controller.go` around lines 207 - 210, The log
currently says "Snapshot already populated" inside the volume reconciler when
checking if volume.Status.State == providerapi.VolumeStateAvailable; update the
log message at the log.V(1).Info(...) call to accurately reference the volume
(e.g., "Volume already available" or similar), so the message reflects
volume.Status.State and the providerapi.VolumeStateAvailable check.

Comment on lines +229 to +234
if err != nil {
//TODO
}

return nil
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Errors from reconciliation methods are silently discarded.

The err from the reconciliation switch is never propagated, causing failures to be lost and preventing rate-limited retries.

Proposed fix
 	if err != nil {
-		//TODO
+		volume.Status.State = providerapi.VolumeStateFailed
+		if _, updateErr := r.volumeStore.Update(ctx, volume); updateErr != nil {
+			return errors.Join(err, fmt.Errorf("failed to update volume state: %w", updateErr))
+		}
+		return fmt.Errorf("failed to reconcile volume: %w", err)
 	}
 
-	return nil
+	volume.Status.State = providerapi.VolumeStateAvailable
+	if _, err = r.volumeStore.Update(ctx, volume); err != nil {
+		return fmt.Errorf("failed to update volume: %w", err)
+	}
+	return nil
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if err != nil {
//TODO
}
return nil
}
if err != nil {
volume.Status.State = providerapi.VolumeStateFailed
if _, updateErr := r.volumeStore.Update(ctx, volume); updateErr != nil {
return errors.Join(err, fmt.Errorf("failed to update volume state: %w", updateErr))
}
return fmt.Errorf("failed to reconcile volume: %w", err)
}
volume.Status.State = providerapi.VolumeStateAvailable
if _, err = r.volumeStore.Update(ctx, volume); err != nil {
return fmt.Errorf("failed to update volume: %w", err)
}
return nil
}
🤖 Prompt for AI Agents
In `@internal/controllers/volume_controller.go` around lines 229 - 234, The code
currently swallows the reconciliation error by leaving the todo and returning
nil; replace the TODO with returning the captured err (preferably wrapped with
context) so failures propagate and trigger controller retries—for example,
change the block that checks "if err != nil" to "return fmt.Errorf(\"reconcile
failed: %w\", err)" or simply "return err" inside the same function (the one
containing the reconciliation switch and the err variable) so the controller’s
rate-limited requeue behavior can act on real errors.

@hardikdr hardikdr added the area/storage Storage solutions and related concerns. label Jan 15, 2026
@hardikdr hardikdr added this to Roadmap Jan 15, 2026
@gonzolino gonzolino moved this to In Progress in Storage Mar 5, 2026
@gonzolino gonzolino linked an issue Mar 5, 2026 that may be closed by this pull request
@gonzolino gonzolino self-assigned this Mar 5, 2026
@gonzolino gonzolino removed this from Storage Mar 5, 2026
@gonzolino gonzolino added the enhancement New feature or request label Mar 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/storage Storage solutions and related concerns. enhancement New feature or request size/XL

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

Refactor ceph-provider API and controllers

4 participants