Add KEDA-based autoscaling support with scale-to-zero by danielnyari · Pull Request #74 · danielnyari/flokoa

danielnyari · 2026-02-17T20:51:47Z

Summary

This PR adds KEDA-based autoscaling capabilities to agents, enabling dynamic scaling based on custom metrics and scale-to-zero functionality. When an agent specifies scaling configuration, the operator automatically creates and manages a KEDA ScaledObject targeting the agent's Deployment.

Key Changes

New Scaling API Types (agent_types.go):
- Added ScalingSpec to configure KEDA autoscaling parameters (min/max replicas, cooldown, polling interval)
- Added ScalingTrigger to define KEDA triggers (Prometheus, CPU, cron, etc.) with metadata and optional authentication
- Added ScalingTriggerAuth to reference KEDA TriggerAuthentication resources
ScaledObject Builder (scaledobject.go):
- Pure function BuildScaledObject() that constructs unstructured KEDA ScaledObjects from agent configuration
- Handles optional fields (only sets values when non-nil) and converts Go types to unstructured format
- Helper function ScaledObjectName() for consistent naming convention
Repository Layer (scaledobject.go, scaledobject_fake.go):
- ScaledObjectRepoImpl for Kubernetes API interactions using unstructured objects (avoids hard KEDA dependency)
- FakeScaledObjectRepo for testing with in-memory storage
- Added ScaledObjectRepo interface to interfaces.go
Reconciliation Logic (reconcile.go):
- New reconcileScaledObject() method that:
  - Creates ScaledObject when agent.Spec.Scaling is set
  - Updates existing ScaledObject when scaling config changes
  - Deletes ScaledObject when scaling is removed
  - Sets appropriate status conditions (ScalingReady) and updates agent.Status.ScaledObjectName
- Gracefully skips if ScaledObject repo is not configured (optional feature)
Status Management (status.go):
- Added ConditionTypeScalingReady condition type
- Added reason constants: ReasonScaledObjectReady, ReasonScaledObjectRemoved, ReasonScaledObjectFailed
RBAC & Configuration:
- Updated operator RBAC role to allow create/delete/update/patch/watch on keda.sh/scaledobjects
- Updated controller-gen markers for ScaledObject management
- Added example Agent manifests (agent_v1alpha1_agent_keda.yaml) demonstrating Prometheus and cron-based scaling
Comprehensive Tests:
- scaledobject_test.go: Tests for ScaledObject builder covering basic fields, triggers with auth, named triggers, optional field omission, and multiple triggers
- reconcile_scaling_test.go: Integration tests for reconciliation lifecycle (create, update, delete, skip when repo nil)

Implementation Details

Uses unstructured objects for ScaledObject to avoid hard dependency on KEDA Go module
ScaledObject failures are non-fatal—agents continue running without autoscaling if KEDA is unavailable
Follows existing patterns: builder pattern for object construction, repository pattern for persistence, condition-based status tracking
Supports all major KEDA trigger types through flexible metadata maps
Properly handles owner references for garbage collection

https://claude.ai/code/session_018ctEFbksh15TJzfJwhBNvs

Introduces KEDA ScaledObject management for Agent CRDs, enabling event-driven autoscaling including scale-to-zero for idle agents. This addresses the real cost problem of running unused agent replicas without building custom autoscaling infrastructure. Changes: - Add ScalingSpec to AgentSpec with triggers, min/max replicas, cooldown period, and polling interval configuration - Create ScaledObject builder using unstructured objects to avoid hard KEDA module dependency - Add ScaledObjectRepo with create/update/delete via controller-runtime - Integrate ScaledObject reconciliation into agent app service with graceful handling when KEDA is not installed - Add ScalingReady condition for observability - Add RBAC for keda.sh/scaledobjects - Update CRD schema and Helm chart - Include sample configs for Prometheus and cron-based scaling https://claude.ai/code/session_018ctEFbksh15TJzfJwhBNvs

Copilot

Pull request overview

This PR adds KEDA-based autoscaling support to the Flokoa operator, enabling agents to dynamically scale based on custom metrics and scale to zero when idle. The implementation uses unstructured Kubernetes objects to avoid a hard dependency on the KEDA Go module, making KEDA an optional cluster component.

Changes:

Added ScalingSpec, ScalingTrigger, and ScalingTriggerAuth types to the Agent CRD API for configuring KEDA autoscaling
Implemented ScaledObject builder and repository layers using unstructured objects to create/update/delete KEDA resources
Integrated ScaledObject reconciliation into the agent reconciliation loop with non-fatal error handling
Added comprehensive unit tests for builder and reconciliation logic with fake repository implementations

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
`operator/api/v1alpha1/agent_types.go`	Defines ScalingSpec, ScalingTrigger, and ScalingTriggerAuth API types with kubebuilder validation markers
`operator/api/v1alpha1/zz_generated.deepcopy.go`	Auto-generated DeepCopy methods for new scaling-related types
`operator/internal/infra/builder/scaledobject.go`	Pure function to build unstructured KEDA ScaledObjects from agent configuration
`operator/internal/infra/builder/scaledobject_test.go`	Unit tests covering ScaledObject construction with various trigger configurations
`operator/internal/infra/repo/scaledobject.go`	Repository implementation for CRUD operations on KEDA ScaledObjects using unstructured client
`operator/internal/infra/repo/fakes/scaledobject_fake.go`	In-memory fake implementation for testing ScaledObject operations
`operator/internal/infra/repo/interfaces.go`	Added ScaledObjectRepo interface to repository layer
`operator/internal/app/agent/reconcile.go`	Added reconcileScaledObject method with non-fatal error handling and KEDA CRD detection
`operator/internal/app/agent/reconcile_scaling_test.go`	Integration tests for ScaledObject lifecycle (create, update, delete)
`operator/internal/domain/agent/status.go`	Added ScalingReady condition type and related reason constants
`operator/internal/controller/agent_controller.go`	Added RBAC markers for ScaledObject permissions
`operator/cmd/main.go`	Wired ScaledObjectRepoImpl into agent service dependencies
`operator/config/samples/agent_v1alpha1_agent_keda.yaml`	Example Agent manifests demonstrating Prometheus and cron-based scaling
`operator/config/rbac/role.yaml`	Generated RBAC role with ScaledObject permissions
`operator/config/crd/bases/agent.flokoa.ai_agents.yaml`	Generated CRD with scaling spec and status fields
`operator/charts/flokoa/files/crds/agent.flokoa.ai_agents.yaml`	Helm chart CRD with scaling support

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-17T20:56:20Z

+			"spec": spec,
+		},
+	}
+


The constructed unstructured object should have its GVK explicitly set using SetGroupVersionKind. While the apiVersion and kind are present in the Object map, explicitly setting the GVK ensures proper handling by the Kubernetes client, especially for owner reference resolution. Consider adding obj.SetGroupVersionKind(ScaledObjectGVK) after constructing the object.

Suggested change

obj.SetGroupVersionKind(ScaledObjectGVK)

Copilot · 2026-02-17T20:56:21Z

+
+	// Preserve the existing metadata (resourceVersion, etc.) and update spec + labels
+	existing.Object["spec"] = desired.Object["spec"]
+	existing.SetLabels(desired.GetLabels())


When updating an existing ScaledObject, consider preserving annotations in addition to labels. KEDA and other controllers may add annotations to ScaledObjects that should be preserved across updates. The current implementation only preserves labels but directly overwrites the spec, which could lose important annotations added by KEDA itself.

Suggested change

existing.SetLabels(desired.GetLabels())

existing.SetLabels(desired.GetLabels())

// Preserve existing annotations while merging in desired annotations.

// This avoids dropping annotations that may have been added by KEDA or other controllers.

desiredAnnotations := desired.GetAnnotations()

if len(desiredAnnotations) > 0 {

existingAnnotations := existing.GetAnnotations()

if existingAnnotations == nil {

existingAnnotations = map[string]string{}

}

for k, v := range desiredAnnotations {

existingAnnotations[k] = v

}

existing.SetAnnotations(existingAnnotations)

}

Copilot · 2026-02-17T20:56:21Z

+			}
+			agent.Status.ScaledObjectName = ""
+			agentdomain.SetCondition(agent, agentdomain.ConditionTypeScalingReady,
+				metav1.ConditionFalse, agentdomain.ReasonScaledObjectRemoved, "ScaledObject removed")


The condition should be set to True (not False) when a ScaledObject is successfully removed, or the condition should be removed entirely. Setting status to ConditionFalse with reason ReasonScaledObjectRemoved is semantically confusing - "ScalingReady=False" suggests scaling is not working, but removal is actually the correct desired state when scaling is disabled. Consider either setting it to True with a message like "Scaling disabled" or removing the condition entirely when scaling is not configured.

Suggested change

metav1.ConditionFalse, agentdomain.ReasonScaledObjectRemoved, "ScaledObject removed")

metav1.ConditionTrue, agentdomain.ReasonScaledObjectRemoved, "Scaling disabled: ScaledObject removed")

Copilot · 2026-02-17T20:56:22Z

+	if agent.Spec.Scaling == nil {
+		// Scaling removed — delete ScaledObject if it exists and clear status
+		if agent.Status.ScaledObjectName != "" {
+			if err := s.deps.ScaledObjects.DeleteScaledObject(ctx, types.NamespacedName{
+				Name:      agent.Status.ScaledObjectName,
+				Namespace: agent.Namespace,
+			}); err != nil {
+				agentdomain.SetCondition(agent, agentdomain.ConditionTypeScalingReady,
+					metav1.ConditionFalse, agentdomain.ReasonScaledObjectFailed,
+					fmt.Sprintf("Failed to delete ScaledObject: %v", err))
+				return err
+			}
+			agent.Status.ScaledObjectName = ""
+			agentdomain.SetCondition(agent, agentdomain.ConditionTypeScalingReady,
+				metav1.ConditionFalse, agentdomain.ReasonScaledObjectRemoved, "ScaledObject removed")
+		}
+		return nil


There's a potential issue when agent.Spec.Scaling is nil but agent.Status.ScaledObjectName is also empty - the deletion will be skipped even if a ScaledObject with the standard naming convention exists. This could happen if the status was never updated or was cleared. Consider attempting deletion based on the naming convention (builder.ScaledObjectName(agent.Name)) rather than only relying on agent.Status.ScaledObjectName, treating NotFound errors as success.

Copilot · 2026-02-17T20:56:22Z

+func TestReconcileScaledObject_CreatesWhenScalingConfigured(t *testing.T) {
+	scaledObjectRepo := fakes.NewFakeScaledObjectRepo()
+	svc := newTestServiceWithScaling(scaledObjectRepo)
+
+	agent := &agentv1alpha1.Agent{
+		ObjectMeta: metav1.ObjectMeta{
+			Name:      "test-agent",
+			Namespace: "default",
+		},
+		Spec: agentv1alpha1.AgentSpec{
+			Scaling: &agentv1alpha1.ScalingSpec{
+				MinReplicaCount: int32Ptr(0),
+				MaxReplicaCount: int32Ptr(5),
+				CooldownPeriod:  int32Ptr(300),
+				PollingInterval: int32Ptr(15),
+				Triggers: []agentv1alpha1.ScalingTrigger{
+					{
+						Type:     "prometheus",
+						Metadata: map[string]string{"threshold": "100"},
+					},
+				},
+			},
+		},
+	}
+
+	err := svc.reconcileScaledObject(context.Background(), agent)
+	if err != nil {
+		t.Fatalf("reconcileScaledObject() error = %v", err)
+	}
+
+	// Verify ScaledObject was created
+	key := types.NamespacedName{
+		Name:      builder.ScaledObjectName("test-agent"),
+		Namespace: "default",
+	}
+	if _, ok := scaledObjectRepo.ScaledObjects[key]; !ok {
+		t.Error("ScaledObject was not created")
+	}
+
+	// Verify status updated
+	if agent.Status.ScaledObjectName != "test-agent-scaler" {
+		t.Errorf("ScaledObjectName = %q, want test-agent-scaler", agent.Status.ScaledObjectName)
+	}
+
+	// Verify condition set
+	cond := meta.FindStatusCondition(agent.Status.Conditions, agentdomain.ConditionTypeScalingReady)
+	if cond == nil {
+		t.Fatal("ScalingReady condition not set")
+	}
+	if cond.Status != metav1.ConditionTrue {
+		t.Errorf("ScalingReady status = %q, want True", cond.Status)
+	}
+	if cond.Reason != agentdomain.ReasonScaledObjectReady {
+		t.Errorf("ScalingReady reason = %q, want %q", cond.Reason, agentdomain.ReasonScaledObjectReady)
+	}
+}


The test should also verify that the ScaledObject's spec contains the expected trigger configuration, not just that it exists. Consider adding assertions to check that the trigger type, metadata, and other scaling parameters were correctly transferred to the ScaledObject to ensure the builder and repository layer work end-to-end.

Copilot · 2026-02-17T20:56:22Z

+  runtime:
+    type: standard
+    standard:
+      replicas: 1


The agent specifies both runtime.standard.replicas: 1 and a KEDA scaling configuration with minReplicaCount: 0. When KEDA takes over autoscaling, it manages the Deployment replicas through an HPA, which may conflict with the static replica count. Consider removing the replicas field from the standard runtime spec when scaling is configured, or add documentation clarifying that KEDA will override this value. The second example (agent-cron-scaling) correctly omits the replicas field.

Suggested change

replicas: 1

Copilot AI review requested due to automatic review settings February 17, 2026 20:51

Copilot started reviewing on behalf of danielnyari February 17, 2026 20:52 View session

Copilot AI reviewed Feb 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add KEDA-based autoscaling support with scale-to-zero#74

Add KEDA-based autoscaling support with scale-to-zero#74
danielnyari wants to merge 1 commit into
mainfrom
claude/add-keda-integration-gKJSv

danielnyari commented Feb 17, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 17, 2026

Uh oh!

Copilot AI Feb 17, 2026

Uh oh!

Copilot AI Feb 17, 2026

Uh oh!

Copilot AI Feb 17, 2026

Uh oh!

Copilot AI Feb 17, 2026

Uh oh!

Copilot AI Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-	existing.SetLabels(desired.GetLabels())
+	existing.SetLabels(desired.GetLabels())
+	// Preserve existing annotations while merging in desired annotations.
+	// This avoids dropping annotations that may have been added by KEDA or other controllers.
+	desiredAnnotations := desired.GetAnnotations()
+	if len(desiredAnnotations) > 0 {
+		existingAnnotations := existing.GetAnnotations()
+		if existingAnnotations == nil {
+			existingAnnotations = map[string]string{}
+		}
+		for k, v := range desiredAnnotations {
+			existingAnnotations[k] = v
+		}
+		existing.SetAnnotations(existingAnnotations)
+	}

	metav1.ConditionFalse, agentdomain.ReasonScaledObjectRemoved, "ScaledObject removed")
	metav1.ConditionTrue, agentdomain.ReasonScaledObjectRemoved, "Scaling disabled: ScaledObject removed")

Conversation

danielnyari commented Feb 17, 2026

Summary

Key Changes

Implementation Details

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants