Run 39636 controller core logic#77
Conversation
|
@CodeRabbit full review |
✅ Actions performedFull review triggered. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughAdds a Karta operator: API condition/label contracts, step-chain primitives and status helpers, a Reconciler with CRD-watch mapping, ordered reconcile steps (validate, CRD existence, Ready derivation, conditional status patch, label stamping/removal), manager bootstrap, tests, and module manifest. ChangesKarta Operator
Estimated code review effort🎯 4 (Complex) | ⏱️ ~65 minutes Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
operator/internal/status.go (1)
128-137: ⚡ Quick winMake
statusChangedorder-insensitive by comparingKartaStatus.ConditionsviaCondition.Type.
KartaStatus.Conditionsis CRD-modeled as a map keyed bytype(+listType=map,+listMapKey=type,patchMergeKey:"type"), so slice reordering shouldn’t be treated as a meaningful change; comparing by index can trigger unnecessary status patches. UpdatestatusChangedto compare conditions byType(and the existingType/Status/Reason/Messagefields).Suggested refactor
func statusChanged(original, current *kartav1alpha1.KartaStatus) bool { if len(original.Conditions) != len(current.Conditions) { return true } - for i := range original.Conditions { - o, c := original.Conditions[i], current.Conditions[i] + byType := func(in []metav1.Condition) map[string]metav1.Condition { + out := make(map[string]metav1.Condition, len(in)) + for _, c := range in { + out[c.Type] = c + } + return out + } + origByType := byType(original.Conditions) + currByType := byType(current.Conditions) + for t, o := range origByType { + c, ok := currByType[t] + if !ok { + return true + } if o.Type != c.Type || o.Status != c.Status || o.Reason != c.Reason || o.Message != c.Message { return true } } return false }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@operator/internal/status.go` around lines 128 - 137, The statusChanged function currently compares KartaStatus.Conditions by index which treats reordering as a change; update statusChanged to be order-insensitive by building a lookup map keyed by Condition.Type (e.g., map[string]v1.Condition or the concrete condition type) from one of the slices (original or current), then iterate the other slice and for each condition.Type look up the counterpart and compare Type, Status, Reason and Message, returning true on any mismatch; also ensure you handle missing entries (treat as change) and compare lengths via map sizes to detect added/removed conditions.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@operator/go.mod`:
- Line 78: Remove the local replace directive "replace github.com/run-ai/karta
=> ../" from the module manifest and restore a versioned dependency declaration;
instead configure local monorepo wiring via a go.work file or CI/local-only
overrides so the published module remains reproducible. Specifically, delete the
replace line in operator/go.mod and create/use a go.work that references the
local module during development (or configure your build/CI to inject
replacements), ensuring the module requires a proper semantic version of
github.com/run-ai/karta for consumers.
In `@operator/internal/reconciler.go`:
- Around line 75-80: The current error handling after calling r.crdExistsForGVK
treats any lookup error as a successful reconcile by calling
setCRDExists(&karta.Status, metav1.ConditionFalse) and returning Continue(),
which suppresses retries; instead, when r.crdExistsForGVK returns an error you
should not flip the status to False—log the error via logger.Error but propagate
the error (or return a requeue result) so the controller will retry; update the
branch in reconciler.go to avoid calling setCRDExists on transient List errors
and return the error (or ctrl.Result{Requeue: true}) rather than Continue() so
failed CRD lookups trigger a retry.
- Around line 104-107: When rootGVK(karta) returns nil (gvk == nil), remove any
stale operator-owned labels "karta/group", "karta/version", and "karta/kind"
from the object before returning Continue(); locate the gvk check in
reconciler.go and add logic to mutate the object's labels to delete those keys
and persist the change (e.g., using the reconciler's client Update or Patch),
handling update errors appropriately, then return Continue().
- Around line 172-174: crdMatchesGVK currently returns true if a version name
matches gvk.Version even when that CRD version is not served; update
crdMatchesGVK to also require the version’s served flag be true by checking
v.Served (or v.served) in the loop alongside v.Name == gvk.Version so only
served API versions count toward a match; make sure to reference the function
crdMatchesGVK and the loop over crd.Spec.Versions (v.Name and v.Served) when
applying the change.
---
Nitpick comments:
In `@operator/internal/status.go`:
- Around line 128-137: The statusChanged function currently compares
KartaStatus.Conditions by index which treats reordering as a change; update
statusChanged to be order-insensitive by building a lookup map keyed by
Condition.Type (e.g., map[string]v1.Condition or the concrete condition type)
from one of the slices (original or current), then iterate the other slice and
for each condition.Type look up the counterpart and compare Type, Status, Reason
and Message, returning true on any mismatch; also ensure you handle missing
entries (treat as change) and compare lengths via map sizes to detect
added/removed conditions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: 2f3aad05-4068-4146-898b-1fcab74fcf2f
⛔ Files ignored due to path filters (1)
operator/go.sumis excluded by!**/*.sum
📒 Files selected for processing (15)
operator/cmd/main.gooperator/go.modoperator/internal/eventfilter.gooperator/internal/eventfilter_test.gooperator/internal/flow.gooperator/internal/flow_test.gooperator/internal/operator.gooperator/internal/operator_test.gooperator/internal/reconciler.gooperator/internal/reconciler_test.gooperator/internal/status.gooperator/internal/status_test.gooperator/internal/suite_test.gopkg/api/runai/v1alpha1/conditions.gopkg/api/runai/v1alpha1/labels.go
| sigs.k8s.io/yaml v1.6.0 // indirect | ||
| ) | ||
|
|
||
| replace github.com/run-ai/karta => ../ |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify whether workspace-based wiring already exists and where local replace directives are committed.
set -euo pipefail
echo "== go.work presence =="
fd -HI '^go\.work$' .
echo
echo "== replace directives across repo =="
rg -n '^\s*replace\s+github\.com/run-ai/karta\s*=>\s*\.\./\s*$|^\s*replace\s+' --glob '**/go.mod'
echo
echo "== module roots and go directives =="
fd -HI '^go\.mod$' . -x sh -c 'echo "--- {}"; rg -n "^(module|go)\s+" "{}"'Repository: run-ai/karta
Length of output: 347
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== operator/go.mod: relevant sections =="
nl -ba operator/go.mod | sed -n '1,120p'
echo
echo "== other go.mod files with replace directives =="
rg -n '^\s*replace\s+' --glob '**/go.mod' . || trueRepository: run-ai/karta
Length of output: 158
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== operator/go.mod (lines 1-120) =="
cat -n operator/go.mod | sed -n '1,120p'
echo
echo "== operator/go.mod (lines 60-90) =="
cat -n operator/go.mod | sed -n '60,90p'
echo
echo "== other go.mod files with replace directives =="
rg -n '^\s*replace\s+' --glob '**/go.mod' . || true
echo
echo "== other occurrences of github.com/run-ai/karta in operator/go.mod =="
rg -n 'github\.com/run-ai/karta' operator/go.modRepository: run-ai/karta
Length of output: 5142
Avoid committing a local replace path in operator/go.mod
operator/go.mod hard-codes replace github.com/run-ai/karta => ../ and the repo has no go.work, so this relative path override becomes part of the published module and can break external consumers/builds where ../ doesn’t exist. Use go.work (or CI/local-only overrides) for monorepo wiring, and keep module manifests versioned/reproducible.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@operator/go.mod` at line 78, Remove the local replace directive "replace
github.com/run-ai/karta => ../" from the module manifest and restore a versioned
dependency declaration; instead configure local monorepo wiring via a go.work
file or CI/local-only overrides so the published module remains reproducible.
Specifically, delete the replace line in operator/go.mod and create/use a
go.work that references the local module during development (or configure your
build/CI to inject replacements), ensuring the module requires a proper semantic
version of github.com/run-ai/karta for consumers.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@operator/internal/status.go`:
- Around line 112-121: The current code uses r.Status().Patch with
client.RawPatch(types.MergePatchType, ...) and a full conditions array
(karta.Status.Conditions) which causes JSON MergePatch to replace the array
wholesale; replace this with a conflict-aware status update: fetch the latest
Karta (get), merge conditions by their "type" key into karta.Status.Conditions
(preserving other writers' per-type entries), then call r.Status().Update(ctx,
karta); wrap in a retry on conflict (detect apierrors.IsConflict and
requeue/loop to refetch+merge+update) so concurrent writers surface conflicts
instead of clobbering; stop using client.RawPatch(types.MergePatchType, ...) for
conditions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: 844a0515-373f-4282-95fb-38bb7411c948
⛔ Files ignored due to path filters (1)
operator/go.sumis excluded by!**/*.sum
📒 Files selected for processing (15)
operator/cmd/main.gooperator/go.modoperator/internal/eventfilter.gooperator/internal/eventfilter_test.gooperator/internal/flow.gooperator/internal/flow_test.gooperator/internal/operator.gooperator/internal/operator_test.gooperator/internal/reconciler.gooperator/internal/reconciler_test.gooperator/internal/status.gooperator/internal/status_test.gooperator/internal/suite_test.gopkg/api/runai/v1alpha1/conditions.gopkg/api/runai/v1alpha1/labels.go
StopWithError on crd list failure. checking if CRD served. Removing Karta gvk labels when GVK is nil.
|
Actionable comments posted: 0 |
|
@CodeRabbit full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
operator/internal/status.go (1)
111-115:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAdd optimistic locking to the status patch to prevent concurrent condition loss.
Line 114 uses
client.MergeFrom(base)without optimistic lock, so concurrent status writers can still overwritestatus.conditionsupdates instead of surfacing a conflict.Suggested patch
import ( "context" "fmt" kartav1alpha1 "github.com/run-ai/karta/pkg/api/runai/v1alpha1" + apierrors "k8s.io/apimachinery/pkg/api/errors" "k8s.io/apimachinery/pkg/api/equality" metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" "sigs.k8s.io/controller-runtime/pkg/client" ) @@ - if err := r.Status().Patch(ctx, karta, client.MergeFrom(base)); err != nil { + patch := client.MergeFromWithOptions(base, client.MergeFromWithOptimisticLock{}) + if err := r.Status().Patch(ctx, karta, patch); err != nil { + if apierrors.IsConflict(err) { + return fmt.Errorf("patch status conflict: %w", err) + } return fmt.Errorf("patch status: %w", err) }#!/bin/bash set -euo pipefail # Verify status patch callsites and whether optimistic lock is used. rg -n -C3 'Status\(\)\.Patch\(' operator/internal rg -n -C3 'MergeFromWithOptimisticLock|MergeFrom\(' operator/internal/status.go🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@operator/internal/status.go` around lines 111 - 115, The status patch currently uses client.MergeFrom(base) in the r.Status().Patch call (after the equality.Semantic.DeepEqual check between base.Status and karta.Status), which can allow concurrent writes to overwrite conditions; change the patch to use optimistic locking by replacing client.MergeFrom(base) with client.MergeFromWithOptimisticLock(base) so conflicts are surfaced, and ensure the controller-runtime client import is available and any error handling remains the same for the r.Status().Patch(ctx, karta, client.MergeFromWithOptimisticLock(base)) call.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@operator/internal/status.go`:
- Around line 111-115: The status patch currently uses client.MergeFrom(base) in
the r.Status().Patch call (after the equality.Semantic.DeepEqual check between
base.Status and karta.Status), which can allow concurrent writes to overwrite
conditions; change the patch to use optimistic locking by replacing
client.MergeFrom(base) with client.MergeFromWithOptimisticLock(base) so
conflicts are surfaced, and ensure the controller-runtime client import is
available and any error handling remains the same for the r.Status().Patch(ctx,
karta, client.MergeFromWithOptimisticLock(base)) call.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: 9c286e69-8073-4e16-a185-7c32d84b4cb7
📒 Files selected for processing (3)
operator/internal/reconciler.gooperator/internal/status.gooperator/internal/status_test.go
💤 Files with no reviewable changes (1)
- operator/internal/status_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- operator/internal/reconciler.go
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
operator/internal/status.go (1)
114-115:⚠️ Potential issue | 🟠 Major | ⚡ Quick winUse optimistic-locking for status patches to avoid lost concurrent updates.
At Line 114,
client.MergeFrom(base)does not enforceresourceVersionpreconditions, so concurrent status writers can still overwrite each other. Use optimistic locking so conflicts surface and requeue.🔧 Proposed fix
- if err := r.Status().Patch(ctx, karta, client.MergeFrom(base)); err != nil { + patch := client.MergeFromWithOptions(base, client.MergeFromWithOptimisticLock()) + if err := r.Status().Patch(ctx, karta, patch); err != nil { return fmt.Errorf("patch status: %w", err) }#!/bin/bash set -euo pipefail # Verify current status patch strategy and whether optimistic lock is used. rg -n --type=go 'Status\(\)\.Patch\(' operator/internal rg -n --type=go 'MergeFromWithOptimisticLock|MergeFromWithOptions|MergeFrom\(' operator/internal/status.go -C2🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@operator/internal/status.go` around lines 114 - 115, The current status patch uses client.MergeFrom(base) which does not enforce resourceVersion preconditions and can silently overwrite concurrent updates; change the patch to use the controller-runtime optimistic-locking variant (e.g., client.MergeFromWithOptimisticLock or client.MergeFromWithOptions with a Preconditions/ResourceVersion set from base) when calling r.Status().Patch so conflicts surface as errors and the reconcile requeues; update the call site that currently uses client.MergeFrom(base) to the optimistic-locking variant and handle the resulting conflict (requeue) error path.
🧹 Nitpick comments (2)
operator/internal/operator_test.go (1)
72-72: 💤 Low valueConsider removing the "Q4" comment prefix.
The comment includes "Q4" which appears to be a reference to a planning phase or issue tracker. Consider simplifying to just describe the test's purpose.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@operator/internal/operator_test.go` at line 72, Remove the "Q4:" planning/issue-tracker prefix from the test comment so it reads simply "verify GenerationChangedPredicate semantics on CRDs." — update the comment near the test that references GenerationChangedPredicate in operator/internal/operator_test.go to be concise and descriptive without the "Q4" tag.operator/internal/status_test.go (1)
71-71: ⚖️ Poor tradeoffConsider avoiding
time.Sleepin timestamp comparison tests.The tests use
time.Sleep(2 * time.Millisecond)before comparingLastTransitionTimevalues. This can be flaky in CI environments with low time resolution or under heavy load. Consider using a time-mocking approach (e.g., injecting a clock interface) or increasing the sleep duration significantly (e.g., 100ms+) to reduce flakiness risk.Also applies to: 81-81
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@operator/internal/status_test.go` at line 71, Tests in operator/internal/status_test.go use time.Sleep(2 * time.Millisecond) before comparing LastTransitionTime which is flaky; replace the sleeps by making the test deterministic — either inject a clock interface (replace real time.Now usage in the code under test with a Clock and in tests use a fake/static Clock) or explicitly set LastTransitionTime values in the test before comparisons; update the tests that reference LastTransitionTime (the assertions around LastTransitionTime at the locations where time.Sleep is currently used) to compare against the injected/static times or use assert/require.WithinDuration with a larger tolerance (e.g., 100ms) if changing the code to accept a Clock is not feasible.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@operator/internal/reconciler.go`:
- Around line 119-129: The current MergePatch in stepEnsureLabels builds
`merged` from all existing labels (`current`) then marshals it, which will
clobber concurrent non-operator labels; instead, construct a new map (e.g.,
`operatorLabels`) containing only operator-owned label keys (those matching the
`karta/` prefix) copied from `current`, apply/override those keys from
`desired`, and marshal only `metadata.labels: operatorLabels` for the Patch;
finally call `r.Patch(..., client.RawPatch(types.MergePatchType, patchBytes))`
with that trimmed patch so non-operator labels are not touched.
---
Duplicate comments:
In `@operator/internal/status.go`:
- Around line 114-115: The current status patch uses client.MergeFrom(base)
which does not enforce resourceVersion preconditions and can silently overwrite
concurrent updates; change the patch to use the controller-runtime
optimistic-locking variant (e.g., client.MergeFromWithOptimisticLock or
client.MergeFromWithOptions with a Preconditions/ResourceVersion set from base)
when calling r.Status().Patch so conflicts surface as errors and the reconcile
requeues; update the call site that currently uses client.MergeFrom(base) to the
optimistic-locking variant and handle the resulting conflict (requeue) error
path.
---
Nitpick comments:
In `@operator/internal/operator_test.go`:
- Line 72: Remove the "Q4:" planning/issue-tracker prefix from the test comment
so it reads simply "verify GenerationChangedPredicate semantics on CRDs." —
update the comment near the test that references GenerationChangedPredicate in
operator/internal/operator_test.go to be concise and descriptive without the
"Q4" tag.
In `@operator/internal/status_test.go`:
- Line 71: Tests in operator/internal/status_test.go use time.Sleep(2 *
time.Millisecond) before comparing LastTransitionTime which is flaky; replace
the sleeps by making the test deterministic — either inject a clock interface
(replace real time.Now usage in the code under test with a Clock and in tests
use a fake/static Clock) or explicitly set LastTransitionTime values in the test
before comparisons; update the tests that reference LastTransitionTime (the
assertions around LastTransitionTime at the locations where time.Sleep is
currently used) to compare against the injected/static times or use
assert/require.WithinDuration with a larger tolerance (e.g., 100ms) if changing
the code to accept a Clock is not feasible.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: 209e8d33-51a5-4c7f-a66a-269047af0dce
⛔ Files ignored due to path filters (1)
operator/go.sumis excluded by!**/*.sum
📒 Files selected for processing (15)
operator/cmd/main.gooperator/go.modoperator/internal/eventfilter.gooperator/internal/eventfilter_test.gooperator/internal/flow.gooperator/internal/flow_test.gooperator/internal/operator.gooperator/internal/operator_test.gooperator/internal/reconciler.gooperator/internal/reconciler_test.gooperator/internal/status.gooperator/internal/status_test.gooperator/internal/suite_test.gopkg/api/runai/v1alpha1/conditions.gopkg/api/runai/v1alpha1/labels.go
|
Actionable comments posted: 0 |
| flag.StringVar(&leaderElectionID, "leader-election-id", "karta-operator.run.ai", | ||
| "Name of the resource used for leader election.") | ||
|
|
||
| zapOpts := zap.Options{Development: false} |
| import ( | ||
| "flag" | ||
| "fmt" | ||
| "github.com/run-ai/karta/operator/internal" |
There was a problem hiding this comment.
Semi-random location for this comment - why is the package name internal?
Since 100% of the code lives in internal this means that it's not go-gettable, none can import this.
For example: Error types to assert, Utility functions, Consts, Shared status updates etc...
Unless there is a specific reason to make it internal, rename the top level to pkg or similar.
| } | ||
|
|
||
| func main() { | ||
| if err := run(); err != nil { |
There was a problem hiding this comment.
Why is run() needed? We can directly write in main. Bloats the code
| // StopWithError signals that the step failed. The manager will requeue with | ||
| // exponential back-off. | ||
| func StopWithError(err error) StepResult { | ||
| return StepResult{result: ctrl.Result{Requeue: true}, err: err, continueReconcile: false} |
There was a problem hiding this comment.
StopWithError sets both Requeue:true and a non-nil error, . When err != nil it ignores Result and requeues via the rate limiter anyway.
| } | ||
|
|
||
| // shortCircuit returns true when the step chain should stop. | ||
| func shortCircuit(r StepResult) bool { |
There was a problem hiding this comment.
Should be instance method
| func shortCircuit(r StepResult) bool { | |
| func (r StepResult) ShortCircuitReconcileFlow() bool { | |
| return !r.continueReconcile | |
| } |
| logger := log.FromContext(ctx).WithValues("karta", karta.Name) | ||
| base := karta.DeepCopy() | ||
|
|
||
| steps := []StepFn{ |
There was a problem hiding this comment.
IMO the step chain is over-engineered.
Wraps a fixed 5-step linear pipeline that never branches dynamically.
- The function signature becomes bloated a superset of everything other "steps" need
- Not clear what each step does.
stepPatchStatusWithgenerator callback confuses
Some of those steps doesn't even require a function (such as stepDeriveReady).
Those are very small functions, with very explicit code that are not re-used anywhere else.
TLDR:
- Remove the StepFn concept
- Inline the methods, it's easier to read.
| func Stop() StepResult { | ||
| return StepResult{continueReconcile: false} | ||
| } |
There was a problem hiding this comment.
Nothing call's Stop() do we need it?
| }) | ||
| } | ||
|
|
||
| func setCRDExists(status *kartav1alpha1.KartaStatus, s metav1.ConditionStatus) { |
There was a problem hiding this comment.
We pass everywhere setCRDExists(&...) instead of doing reference everywhere, why not pass it like this?
i.e setCRDExists(&karta.Status, metav1.ConditionFalse)
| func setCRDExists(status *kartav1alpha1.KartaStatus, s metav1.ConditionStatus) { | |
| func setCRDExists(status kartav1alpha1.KartaStatus, s metav1.ConditionStatus) { |
| const ( | ||
| // LabelRootGroup is the API group of the root component | ||
| // (e.g. "ray.io", "kubeflow.org"). | ||
| LabelRootGroup = "karta/group" | ||
|
|
||
| // LabelRootVersion is the API version of the root component | ||
| // (e.g. "v1", "v1alpha1"). | ||
| LabelRootVersion = "karta/version" | ||
|
|
||
| // LabelRootKind is the kind of the root component | ||
| // (e.g. "RayCluster", "PyTorchJob"). | ||
| LabelRootKind = "karta/kind" | ||
| ) |
There was a problem hiding this comment.
What is the architectural justification for maintaining three separate labels if they are exclusively used within the same list/get API calls? To reduce metadata overhead and simplify our query logic, I highly recommend consolidating them into a single karta/gvk label.
Co-authored-by: Yuval Grauer <108901541+yuval-gr@users.noreply.github.com> Signed-off-by: shaked-bouktus <157353909+shaked-bouktus@users.noreply.github.com>
| const ( | ||
| // LabelRootGroup is the API group of the root component | ||
| // (e.g. "ray.io", "kubeflow.org"). | ||
| LabelRootGroup = "karta/group" |
There was a problem hiding this comment.
labels should be called
run.ai/karta-group
run.ai/karta-version
run.ai/karta-kind
| // stepValidateKarta runs the Karta spec validator and writes Validated. | ||
| func (r *Reconciler) stepValidateKarta(_ context.Context, logger logr.Logger, karta *kartav1alpha1.Karta) StepResult { | ||
| if err := kartav1alpha1.NewKartaValidator(karta).Validate(); err != nil { | ||
| logger.V(1).Info("Karta spec validation failed", "error", err.Error()) |
There was a problem hiding this comment.
pass err directly - logr extracts the message and you keep the error type for structured logging. its also what every controller-runtime example does
| return nil | ||
| } | ||
|
|
||
| func buildCondition(t kartav1alpha1.ConditionType, status metav1.ConditionStatus, reason, message string) metav1.Condition { |
There was a problem hiding this comment.
i would add
cond.ObservedGeneration = karta.Generation
without it a Validated=False from gen 3 looks the same as gen 12 and you cant tell if its stale
this convention is being used in cert-manager / cluster-api etc..
| ) | ||
|
|
||
| func setValidated(status *kartav1alpha1.KartaStatus, s metav1.ConditionStatus) { | ||
| upsertConditions(&status.Conditions, map[kartav1alpha1.ConditionType]metav1.Condition{ |
There was a problem hiding this comment.
why not using apimeta.SetStatusCondition
it handles LastTransitionTime + ObservedGeneration for free and preserves foreign conditions
| // stepPatchStatusWith returns a step that flushes status to the cluster, | ||
| // closing over the whole-object snapshot taken at the start of reconcile so | ||
| // that only the fields that actually changed are sent in the patch body. | ||
| func stepPatchStatusWith(r *Reconciler, base *kartav1alpha1.Karta) StepFn { |
| // CRs by validating their spec and verifying that the referenced | ||
| // CustomResourceDefinition is present in the cluster. | ||
| package main | ||
|
|
There was a problem hiding this comment.
no EventRecorder. today if validation fails the user has to dig into status.conditions , with events kubectl describe karta foo shows it at the bottom which is where everyone looks first. kubebuilder template ships it by default
| // controller-runtime metrics and logs. | ||
| ControllerName = "karta-operator" | ||
|
|
||
| rateLimiterBaseDelay = 500 * time.Millisecond |
There was a problem hiding this comment.
lets put this as an env so we will be able to control it
| setConditions(status, allTrue) | ||
| before := transitionTimes(status.Conditions) | ||
|
|
||
| time.Sleep(2 * time.Millisecond) |
There was a problem hiding this comment.
i dont like time sleep might be flaky lets use eventually instead
What does this PR do?
Karta controller core logic
Related issue(s)
#67
#68
#69
#70
Summary by CodeRabbit
New Features
API
Tests
Chores