[platform, system, apps] Add Gateway API support with Envoy Gateway#2213
[platform, system, apps] Add Gateway API support with Envoy Gateway#2213Kirill Ilin (sircthulhu) wants to merge 37 commits into
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds Gateway API and Envoy Gateway support: new CRDs, Envoy Gateway charts/package sources, tenant-scoped gateway HelmRelease, platform wiring and values, per-service Gateway/HTTPRoute/TLSRoute Helm templates, and end-to-end Bats tests validating Gateway behavior and connectivity. Changes
Sequence DiagramsequenceDiagram
rect rgba(200,200,255,0.5)
participant Client
participant LB as LoadBalancer
participant Gateway as Envoy_Gateway
participant Route as HTTPRoute/TLSRoute
participant Service as Backend
end
Client->>LB: HTTP request (80)
LB->>Gateway: forward request
Gateway->>Route: evaluate route rules
Route-->>Gateway: instruct redirect or forward
Gateway-->>Client: 301 redirect to HTTPS
Client->>LB: HTTPS request (443)
LB->>Gateway: TLS passthrough/terminate
Gateway->>Route: select backend
Route->>Service: forward request
Service-->>Gateway: response
Gateway-->>Client: HTTPS response (200/3xx/401/403)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the platform's networking capabilities by integrating the Kubernetes Gateway API. It provides a modern and flexible alternative to traditional Ingress resources, facilitating a smoother migration path to Cilium Gateway API. The changes introduce a structured approach for both system-wide and tenant-specific traffic management, allowing for fine-grained control over routing and load balancing while ensuring backward compatibility with existing Ingress configurations. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on Gemini (@gemini-code-assist) comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces comprehensive support for the Gateway API across system and tenant components, providing a clear migration path from ingress-nginx. The changes are well-structured, with a central gateway for system services and per-tenant gateways for tenant services, which is a solid design. The use of HTTPRoute and TLSRoute is appropriate for the different services.
I have a few suggestions for improvement:
- My main concern is the use of the deprecated
gateway.networking.k8s.io/v1alpha2API version forTLSRouteresources. I strongly recommend updating this to a stable version likev1orv1beta1for future compatibility. - I've also identified a couple of areas with repetitive code in the Helm templates where using helper templates or loops could improve maintainability.
Overall, this is a great enhancement. Addressing these points will make the implementation more robust and easier to maintain.
| namespaces: | ||
| from: Same | ||
| --- | ||
| apiVersion: gateway.networking.k8s.io/v1alpha2 |
There was a problem hiding this comment.
The apiVersion for TLSRoute is set to gateway.networking.k8s.io/v1alpha2, which is from an old experimental channel of the Gateway API and has been deprecated. It's recommended to use a more recent, stable version. Given that HTTPRoute resources in this PR are using v1, for consistency and future compatibility, you should update TLSRoute to gateway.networking.k8s.io/v1 if your cluster's Gateway API support is at v1.0.0 or newer. If not, v1beta1 would be a better choice than v1alpha2. This applies to all new TLSRoute resources in this pull request (cozystack-api/templates/api-tlsroute.yaml, kubevirt-cdi/templates/cdi-uploadproxy-tlsroute.yaml, and kubevirt/templates/vm-exportproxy-tlsroute.yaml).
apiVersion: gateway.networking.k8s.io/v1There was a problem hiding this comment.
Fixed that
| {{- $gateway := $parentNamespace.gateway | default "" }} | ||
| {{- if .Values.gateway }} | ||
| {{- $gateway = $tenantName }} | ||
| {{- end }} |
There was a problem hiding this comment.
This logic for determining the service reference is repeated for etcd, ingress, monitoring, and seaweedfs. To improve maintainability and reduce code duplication, consider creating a shared helper template. For example, you could define a helper in a common _helpers.tpl file:
{{- define "tenant.serviceRef" -}}
{{- $serviceName := .serviceName -}}
{{- $ref := index .parentNamespace $serviceName | default "" -}}
{{- if index .values $serviceName -}}
{{- $ref = .tenantName -}}
{{- end -}}
{{- $ref -}}
{{- end -}}
And then use it like this for each service:
{{- $gateway := include "tenant.serviceRef" (dict "serviceName" "gateway" "tenantName" $tenantName "parentNamespace" $parentNamespace "values" .Values) -}}
This would make the template cleaner and easier to manage.
| {{- if has "dashboard" $exposeServices }} | ||
| - name: https-dashboard | ||
| protocol: HTTPS | ||
| port: 443 | ||
| hostname: {{ index $hostnames "dashboard" | default (printf "dashboard.%s" $rootHost) | quote }} | ||
| tls: | ||
| mode: Terminate | ||
| certificateRefs: | ||
| - name: gateway-dashboard-tls | ||
| allowedRoutes: | ||
| namespaces: | ||
| from: All | ||
| {{- end }} | ||
| {{- if eq $oidcEnabled "true" }} | ||
| - name: https-keycloak | ||
| protocol: HTTPS | ||
| port: 443 | ||
| hostname: {{ index $hostnames "keycloak" | default (printf "keycloak.%s" $rootHost) | quote }} | ||
| tls: | ||
| mode: Terminate | ||
| certificateRefs: | ||
| - name: gateway-keycloak-tls | ||
| allowedRoutes: | ||
| namespaces: | ||
| from: All | ||
| {{- end }} | ||
| {{- if has "api" $exposeServices }} | ||
| - name: tls-passthrough-api | ||
| protocol: TLS | ||
| port: 443 | ||
| hostname: {{ index $hostnames "api" | default (printf "api.%s" $rootHost) | quote }} | ||
| tls: | ||
| mode: Passthrough | ||
| allowedRoutes: | ||
| namespaces: | ||
| from: All | ||
| {{- end }} | ||
| {{- if has "vm-exportproxy" $exposeServices }} | ||
| - name: tls-passthrough-vm-exportproxy | ||
| protocol: TLS | ||
| port: 443 | ||
| hostname: {{ index $hostnames "vm-exportproxy" | default (printf "vm-exportproxy.%s" $rootHost) | quote }} | ||
| tls: | ||
| mode: Passthrough | ||
| allowedRoutes: | ||
| namespaces: | ||
| from: All | ||
| {{- end }} | ||
| {{- if has "cdi-uploadproxy" $exposeServices }} | ||
| - name: tls-passthrough-cdi-uploadproxy | ||
| protocol: TLS | ||
| port: 443 | ||
| hostname: {{ index $hostnames "cdi-uploadproxy" | default (printf "cdi-uploadproxy.%s" $rootHost) | quote }} | ||
| tls: | ||
| mode: Passthrough | ||
| allowedRoutes: | ||
| namespaces: | ||
| from: All | ||
| {{- end }} |
There was a problem hiding this comment.
The listener definitions for different services are very repetitive. To improve maintainability, you could refactor this by defining a list or map of services and their properties, and then looping through them to generate the listener configurations. This would reduce code duplication and make it easier to add or modify services in the future.
3b366e2 to
8c3ad83
Compare
17f38a2 to
d0cc82b
Compare
There was a problem hiding this comment.
Actionable comments posted: 8
🧹 Nitpick comments (1)
packages/system/cozystack-gateway/Chart.yaml (1)
1-3: Consider adding a description field.While not required, adding a
descriptionfield would improve discoverability and help users understand the chart's purpose.📝 Suggested enhancement
apiVersion: v2 name: cozy-cozystack-gateway +description: Central Gateway API Gateway for system-level HTTP and TLS routing version: 0.0.0 # Placeholder, the actual version will be automatically set during the build process🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/system/cozystack-gateway/Chart.yaml` around lines 1 - 3, Add a human-readable description field to Chart.yaml to improve discoverability and explain the chart's purpose; in the Chart.yaml near the existing apiVersion/name/version entries, add a top-level description: "<brief one-line summary of cozy-cozystack-gateway's purpose>" (use clear, concise wording) so consumers see what cozy-cozystack-gateway does when viewing the chart metadata.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@hack/e2e-install-cozystack.bats`:
- Around line 259-267: The curl probes use the IP in the URL so TLS SNI is sent
as the IP (causing hostname-based routing/cert selection to fail); update the
curl invocations that use "https://${gateway_ip}" (the blocks that set http_code
and check Dashboard and Kubernetes API responses) to instead request the
hostname in the URL (e.g., https://dashboard.example.org and
https://api.example.org) and add curl's --resolve option to map each hostname to
${gateway_ip} so the hostname is preserved for SNI while resolving to the
gateway IP; modify both the Dashboard probe (where http_code is set for
dashboard.example.org) and the Kubernetes API probe (and the similar block
around the second occurrence noted) to use --resolve with the appropriate
hostname and ${gateway_ip}.
In `@packages/core/platform/templates/bundles/system.yaml`:
- Around line 173-186: Update the v1.0.0 migration/upgrade notes to document
that external-dns is now opt-in by requiring cozystack.external-dns to be added
to bundles.enabledPackages (which defaults to an empty list), and add a short
operator guidance block instructing anyone upgrading from v0.x to check if they
depend on external-dns and, if so, explicitly add "cozystack.external-dns" to
bundles.enabledPackages in their Platform values before upgrading.
In `@packages/extra/bootbox/templates/matchbox/httproute.yaml`:
- Around line 74-76: The HTTPRoute backendRefs currently references name:
bootbox with port: 8080 which does not match the Service bootbox exposed port
80; update the HTTPRoute backendRefs entry (backendRefs / name: bootbox / port)
in matchbox/httproute.yaml to use port: 80 so it matches the Service definition
in packages/extra/bootbox/templates/matchbox/service.yaml and ensures proper
route resolution.
In `@packages/extra/seaweedfs/templates/tlsroute.yaml`:
- Line 7: The top render condition currently requires (.Values.filer.grpcHost)
which prevents the template from rendering when grpcHost is empty but a fallback
host exists; update the initial if-condition (the line with and (eq $gatewayAPI
"true") (ne $gateway "") (not (eq .Values.topology "Client"))
(.Values.filer.grpcHost)) to instead allow either .Values.filer.grpcHost or the
fallback host (e.g., check for (.Values.filer.grpcHost) OR (. _namespace.host) /
the same host variable used in the fallback at the usages around the
filer.<host> logic on lines referencing the fallback) so the block renders when
a fallback host is available.
- Line 1: Normalize the gateway-api check by converting the value to a string
before comparing and ensure the grpcHost fallback is considered in the
conditional: change the template to set $gatewayAPI := printf "%v" (index
.Values._cluster "gateway-api" | default false) and then compute a $grpcHost
variable using default .Values.filer.grpcHost with the same fallback expression
used later in the template (the explicit fallback hostnames present where
grpcHost is used), and use $grpcHost in the if-condition that currently compares
$gatewayAPI to "true" so boolean/unquoted values and the fallback host are both
respected.
In `@packages/system/cozystack-gateway/templates/gateway.yaml`:
- Around line 111-113: The template loop that emits "cidr: {{ . }}/32" from "{{-
range splitList \",\" $exposeExternalIPs }}" can produce "cidr: /32" for empty
items; update the range to skip empty/whitespace entries by checking the trimmed
value before emitting. Specifically, inside the range over splitList ","
$exposeExternalIPs, wrap the cidr block in a conditional like "if (ne (trim .)
\"\")" (or equivalent trim+not-empty check) so only non-empty values produce
"cidr: {{ . }}/32".
In `@packages/system/gateway-api-crds/Makefile`:
- Line 9: The Makefile now kustomizes Gateway API CRDs to v1.5.0 (the kubectl
kustomize
"github.com/kubernetes-sigs/gateway-api/config/crd/experimental?ref=v1.5.0"
line) but v1.5.0 has breaking changes (TLSRoute and ListenerSet moved from
Experimental->Standard, new validating admission policy
safe-upgrades.gateway.networking.k8s.io, and TLSRoute CEL requiring Kubernetes
>=1.31); before changing the ref, either keep the existing v1.2.0 CRD ref or
update other CRD targets so Standard and Experimental CRDs are applied together,
scan the cluster for resources using TLSRoute v1alpha2 and migrate them to the
new v1 types (or plan removal), ensure target clusters run Kubernetes >=1.31 if
using TLSRoute CEL, and apply CRDs with kubectl apply --server-side=true to
avoid VAP conflicts; also remove or adjust generation of
templates/crds-experimental.yaml if you intend to fully migrate to the Standard
channel so you don't install mixed channels separately which will be blocked by
the new VAP.
In `@packages/system/keycloak/templates/httproute.yaml`:
- Around line 27-29: The HTTPRoute's backendRefs currently points to port 80
which does not exist on the keycloak-http Service (service.yaml exposes port
8080); update the HTTPRoute backendRefs entry for name: keycloak-http to use
port: 8080 (replace the existing port: 80) so the route matches the Service's
exposed port and routing will succeed.
---
Nitpick comments:
In `@packages/system/cozystack-gateway/Chart.yaml`:
- Around line 1-3: Add a human-readable description field to Chart.yaml to
improve discoverability and explain the chart's purpose; in the Chart.yaml near
the existing apiVersion/name/version entries, add a top-level description:
"<brief one-line summary of cozy-cozystack-gateway's purpose>" (use clear,
concise wording) so consumers see what cozy-cozystack-gateway does when viewing
the chart metadata.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d3b12f3a-79b0-42fa-9d84-ffce62178120
📒 Files selected for processing (30)
hack/e2e-install-cozystack.batspackages/apps/harbor/templates/httproute.yamlpackages/apps/tenant/README.mdpackages/apps/tenant/templates/namespace.yamlpackages/apps/tenant/values.schema.jsonpackages/apps/tenant/values.yamlpackages/core/platform/sources/cozystack-gateway.yamlpackages/core/platform/sources/networking.yamlpackages/core/platform/templates/apps.yamlpackages/core/platform/templates/bundles/system.yamlpackages/core/platform/values.yamlpackages/extra/bootbox/templates/matchbox/httproute.yamlpackages/extra/seaweedfs/templates/tlsroute.yamlpackages/system/bucket/templates/httproute.yamlpackages/system/cert-manager/values.yamlpackages/system/cozystack-api/templates/api-tlsroute.yamlpackages/system/cozystack-basics/templates/cozystack-values-secret.yamlpackages/system/cozystack-gateway/Chart.yamlpackages/system/cozystack-gateway/templates/gateway.yamlpackages/system/cozystack-gateway/values.yamlpackages/system/dashboard/templates/httproute.yamlpackages/system/gateway-api-crds/Makefilepackages/system/gateway-api-crds/templates/crds-experimental.yamlpackages/system/keycloak/templates/httproute.yamlpackages/system/keycloak/templates/sts.yamlpackages/system/kubevirt-cdi/templates/cdi-uploadproxy-tlsroute.yamlpackages/system/kubevirt/templates/vm-exportproxy-tlsroute.yamlpackages/system/monitoring/templates/alerta/httproute.yamlpackages/system/monitoring/templates/grafana/httproute.yamlpackages/system/tenant-rd/cozyrds/tenant.yaml
| @@ -0,0 +1,43 @@ | |||
| {{- $gatewayAPI := (index .Values._cluster "gateway-api") | default "false" }} | |||
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Find gateway-api definitions and example values =="
rg -n -C2 'gateway-api' -g '*.yaml' -g '*.yml' -g '*.json'
echo
echo "== Find how _cluster.gateway-api is consumed in templates =="
rg -n -C2 '_cluster|gateway-api|eq \$gatewayAPI "true"' -g '*.yaml' -g '*.tpl'Repository: cozystack/cozystack
Length of output: 50376
🏁 Script executed:
cat -n packages/extra/seaweedfs/templates/tlsroute.yamlRepository: cozystack/cozystack
Length of output: 1582
🏁 Script executed:
# Find values files where gateway-api is configured
find . -name "values*.yaml" -o -name "values*.yml" | head -30 | xargs grep -l "gateway-api" 2>/dev/null | head -10
# Then examine how the value is set (boolean vs string)
echo "=== Checking how gateway-api is provided in values ==="
find . -name "values*.yaml" -o -name "values*.yml" | xargs grep -A2 -B2 "gateway-api:" 2>/dev/null | head -50Repository: cozystack/cozystack
Length of output: 117
🏁 Script executed:
# Search for where gateway.gatewayAPI is defined
echo "=== Looking for gateway.gatewayAPI definition ==="
rg -n "gateway\.gatewayAPI|gatewayAPI:" -A2 -B2 | head -60
# Also search for how _cluster values are passed/injected
echo ""
echo "=== Looking for _cluster value injection ==="
rg -n "\"_cluster\"|_cluster:" -l | head -10
# Check the platform values to understand gateway configuration
echo ""
echo "=== Platform gateway configuration ==="
find . -path "*/platform/*" -name "values*.yaml" | xargs cat 2>/dev/null | grep -A 10 "gateway:" | head -30Repository: cozystack/cozystack
Length of output: 5718
🏁 Script executed:
# Check if other templates have similar grpcHost gates
echo "=== Checking for grpcHost usage in other templates ==="
rg -n "grpcHost" -B2 -A2
# Also check other TLSRoute files to compare their gate conditions
echo ""
echo "=== Comparing gate conditions in similar TLSRoute templates ==="
fd "tlsroute.yaml|httproute.yaml" -x head -10 {} +Repository: cozystack/cozystack
Length of output: 51185
🏁 Script executed:
# Check if seaweedfs ingress.yaml has the same grpcHost gate
echo "=== Checking seaweedfs ingress.yaml for grpcHost gate ==="
sed -n '1,20p' packages/extra/seaweedfs/templates/ingress.yaml
echo ""
echo "=== Checking if other packages with external endpoints use similar gates ==="
rg "grpcHost" packages/extra/seaweedfs/templates/Repository: cozystack/cozystack
Length of output: 1998
Normalize gateway-api type before comparison and make the grpcHost fallback actually reachable.
Line 7 compares $gatewayAPI to the string "true". If _cluster.gateway-api is ever passed as a boolean (not quoted), this comparison silently fails and resources don't render. Additionally, the same line requires (.Values.filer.grpcHost) to be truthy, but lines 22 and 38 provide a fallback hostname—making the gate inconsistent with the template's intent.
🔧 Proposed fix
-{{- $gatewayAPI := (index .Values._cluster "gateway-api") | default "false" }}
+{{- $gatewayAPI := (index .Values._cluster "gateway-api") | default false | toString | lower }}
{{- $gateway := .Values._namespace.gateway | default "" }}
{{- $host := .Values._namespace.host }}
{{- $gatewayClassName := (index .Values._cluster "gateway-class-name") | default "cilium" }}
-{{- if and (eq $gatewayAPI "true") (ne $gateway "") (not (eq .Values.topology "Client")) (.Values.filer.grpcHost) }}
+{{- if and (eq $gatewayAPI "true") (ne $gateway "") (not (eq .Values.topology "Client")) (or .Values.filer.grpcHost $host) }}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| {{- $gatewayAPI := (index .Values._cluster "gateway-api") | default "false" }} | |
| {{- $gatewayAPI := (index .Values._cluster "gateway-api") | default false | toString | lower }} |
🧰 Tools
🪛 YAMLlint (1.38.0)
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/extra/seaweedfs/templates/tlsroute.yaml` at line 1, Normalize the
gateway-api check by converting the value to a string before comparing and
ensure the grpcHost fallback is considered in the conditional: change the
template to set $gatewayAPI := printf "%v" (index .Values._cluster "gateway-api"
| default false) and then compute a $grpcHost variable using default
.Values.filer.grpcHost with the same fallback expression used later in the
template (the explicit fallback hostnames present where grpcHost is used), and
use $grpcHost in the if-condition that currently compares $gatewayAPI to "true"
so boolean/unquoted values and the fallback host are both respected.
|
|
||
| {{- $gatewayClassName := (index .Values._cluster "gateway-class-name") | default "cilium" }} | ||
|
|
||
| {{- if and (eq $gatewayAPI "true") (ne $gateway "") (not (eq .Values.topology "Client")) (.Values.filer.grpcHost) }} |
There was a problem hiding this comment.
Render gate blocks the documented/default hostname path.
Line 7 requires .Values.filer.grpcHost, but Lines 22 and 38 already support fallback to filer.<host>. With grpcHost empty and _namespace.host set, this template won’t render at all.
🔧 Proposed fix
{{- $gatewayClassName := (index .Values._cluster "gateway-class-name") | default "cilium" }}
+{{- $filerHost := .Values.filer.grpcHost | default (printf "filer.%s" $host) }}
-{{- if and (eq $gatewayAPI "true") (ne $gateway "") (not (eq .Values.topology "Client")) (.Values.filer.grpcHost) }}
+{{- if and (eq $gatewayAPI "true") (ne $gateway "") (not (eq .Values.topology "Client")) (or .Values.filer.grpcHost $host) }}
@@
- hostname: {{ .Values.filer.grpcHost | default (printf "filer.%s" $host) | quote }}
+ hostname: {{ $filerHost | quote }}
@@
- - {{ .Values.filer.grpcHost | default (printf "filer.%s" $host) | quote }}
+ - {{ $filerHost | quote }}Also applies to: 22-22, 38-38
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/extra/seaweedfs/templates/tlsroute.yaml` at line 7, The top render
condition currently requires (.Values.filer.grpcHost) which prevents the
template from rendering when grpcHost is empty but a fallback host exists;
update the initial if-condition (the line with and (eq $gatewayAPI "true") (ne
$gateway "") (not (eq .Values.topology "Client")) (.Values.filer.grpcHost)) to
instead allow either .Values.filer.grpcHost or the fallback host (e.g., check
for (.Values.filer.grpcHost) OR (. _namespace.host) / the same host variable
used in the fallback at the usages around the filer.<host> logic on lines
referencing the fallback) so the block renders when a fallback host is
available.
| {{- range splitList "," $exposeExternalIPs }} | ||
| - cidr: {{ . }}/32 | ||
| {{- end }} |
There was a problem hiding this comment.
Guard against empty IP entries when generating CIDRs.
If expose-external-ips has a trailing comma, this renders an invalid block (cidr: /32). Please skip empty items before emitting blocks.
🔧 Proposed fix
- {{- range splitList "," $exposeExternalIPs }}
- - cidr: {{ . }}/32
- {{- end }}
+ {{- range $ip := splitList "," $exposeExternalIPs }}
+ {{- if $ip }}
+ - cidr: {{ $ip }}/32
+ {{- end }}
+ {{- end }}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| {{- range splitList "," $exposeExternalIPs }} | |
| - cidr: {{ . }}/32 | |
| {{- end }} | |
| {{- range $ip := splitList "," $exposeExternalIPs }} | |
| {{- if $ip }} | |
| - cidr: {{ $ip }}/32 | |
| {{- end }} | |
| {{- end }} |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/system/cozystack-gateway/templates/gateway.yaml` around lines 111 -
113, The template loop that emits "cidr: {{ . }}/32" from "{{- range splitList
\",\" $exposeExternalIPs }}" can produce "cidr: /32" for empty items; update the
range to skip empty/whitespace entries by checking the trimmed value before
emitting. Specifically, inside the range over splitList "," $exposeExternalIPs,
wrap the cidr block in a conditional like "if (ne (trim .) \"\")" (or equivalent
trim+not-empty check) so only non-empty values produce "cidr: {{ . }}/32".
| rm -rf templates | ||
| mkdir templates | ||
| kubectl kustomize "github.com/kubernetes-sigs/gateway-api/config/crd/experimental?ref=v1.2.0" > templates/crds-experimental.yaml | ||
| kubectl kustomize "github.com/kubernetes-sigs/gateway-api/config/crd/experimental?ref=v1.5.0" > templates/crds-experimental.yaml |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
What is the latest version of Kubernetes Gateway API and when was v1.5.0 released?
💡 Result:
The latest version of Kubernetes Gateway API is v1.5.1, released on March 14, 2026. The v1.5.0 version was released on February 26, 2026.
Citations:
- 1: https://github.com/kubernetes-sigs/gateway-api/releases
- 2: https://newreleases.io/project/github/kubernetes-sigs/gateway-api/release/v1.5.0
- 3: https://pkg.go.dev/sigs.k8s.io/gateway-api@v1.5.0
🌐 Web query:
Kubernetes Gateway API v1.5.0 release notes breaking changes changelog
💡 Result:
Kubernetes Gateway API v1.5.0 was released around February 26-27, 2026. Official release notes are available at https://github.com/kubernetes-sigs/gateway-api/releases/tag/v1.5.0. ## Breaking Changes 1. TLSRoute v1alpha2 and XListenerSet removal from Experimental channel: TLSRoute and ListenerSet have graduated to the Standard channel as v1. TLSRoute v1alpha2 is only in Experimental channel in v1.5.0 and will be removed in v1.6. Additionally, TLSRoute's CEL validation requires Kubernetes 1.31+. 2. New Validating Admission Policy (VAP): Introduces safe-upgrades.gateway.networking.k8s.io VAP to prevent: - Downgrading to versions prior to 1.5 after installing v1.5. - Installing Experimental CRDs after Standard CRDs. Workaround: Delete the VAP if needed for specific operations. ## Major Changes Since v1.4.1 (Standard Channel Graduations) - TLSRoute v1 (GEP-2643) - HTTPRoute CORS filter (GEP-1767) - ListenerSet support (GEP-1713) - Certificate selection for Gateway TLS origination (GEP-3155) - ReferenceGrant to v1 ## Experimental - Gateway/HTTPRoute level authentication (GEP-1494) Full changelog: kubernetes-sigs/gateway-api@v1.4.1...v1.5.0 Note: Use server-side apply for installation due to large CRD sizes: kubectl apply --server-side=true -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.5.0/[standard|experimental]-install.yaml. A patch release v1.5.1 followed with minor fixes.
Citations:
- 1: https://github.com/kubernetes-sigs/gateway-api/releases
- 2: https://newreleases.io/project/github/kubernetes-sigs/gateway-api/release/v1.5.0
Address breaking changes in Gateway API v1.5.0 before upgrading.
This updates Gateway API CRDs from v1.2.0 to v1.5.0 (released Feb 26, 2026). While the version is valid, this release includes breaking changes that require attention:
- TLSRoute and ListenerSet: Both graduated from Experimental to Standard channel as v1. TLSRoute v1alpha2 is deprecated and will be removed in v1.6.
- New Validating Admission Policy: A VAP (
safe-upgrades.gateway.networking.k8s.io) now prevents downgrading to pre-1.5 versions and mixing Experimental/Standard CRDs installed separately. - Kubernetes 1.31+ requirement: TLSRoute CEL validation requires Kubernetes 1.31 or later.
Consider whether existing resources depend on the experimental v1alpha2 types and verify cluster compatibility before applying. Use kubectl apply --server-side=true due to large CRD sizes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/system/gateway-api-crds/Makefile` at line 9, The Makefile now
kustomizes Gateway API CRDs to v1.5.0 (the kubectl kustomize
"github.com/kubernetes-sigs/gateway-api/config/crd/experimental?ref=v1.5.0"
line) but v1.5.0 has breaking changes (TLSRoute and ListenerSet moved from
Experimental->Standard, new validating admission policy
safe-upgrades.gateway.networking.k8s.io, and TLSRoute CEL requiring Kubernetes
>=1.31); before changing the ref, either keep the existing v1.2.0 CRD ref or
update other CRD targets so Standard and Experimental CRDs are applied together,
scan the cluster for resources using TLSRoute v1alpha2 and migrate them to the
new v1 types (or plan removal), ensure target clusters run Kubernetes >=1.31 if
using TLSRoute CEL, and apply CRDs with kubectl apply --server-side=true to
avoid VAP conflicts; also remove or adjust generation of
templates/crds-experimental.yaml if you intend to fully migrate to the Standard
channel so you don't install mixed channels separately which will be blocked by
the new VAP.
2ccb450 to
e4c7658
Compare
There was a problem hiding this comment.
Actionable comments posted: 7
🧹 Nitpick comments (10)
packages/system/gateway-rd/Makefile (1)
4-4: Optional: add local phony passthrough targets to silence checkmake warnings.Line 4 likely pulls required targets from
package.mk, but checkmake can still warn when scanning this file alone. Consider lightweight passthrough targets.Proposed non-functional Makefile tweak
export NAME=gateway-rd export NAMESPACE=cozy-system include ../../../hack/package.mk + +.PHONY: all clean test +all: + @$(MAKE) -f ../../../hack/package.mk all +clean: + @$(MAKE) -f ../../../hack/package.mk clean +test: + @$(MAKE) -f ../../../hack/package.mk test🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/system/gateway-rd/Makefile` at line 4, The Makefile currently only contains the include line (include ../../../hack/package.mk) which causes checkmake to warn when scanning this file alone; add lightweight local phony passthrough targets (e.g., .PHONY plus targets like build, test, lint, install) that simply forward to the same targets provided by the included package.mk so checkmake sees concrete targets. Ensure the passthrough targets are non-functional stubs that invoke the included makefile's targets (using the standard make invocation) and keep them minimal so behavior doesn't change.packages/system/gateway-rd/templates/cozyrd.yaml (1)
1-1: Narrow the glob to manifest files only.
cozyrds/*may include non-YAML artifacts and produce invalid output. Restricting to YAML files makes rendering safer and more predictable.Suggested diff
-{{- range $path, $_ := .Files.Glob "cozyrds/*" }} +{{- range $path, $_ := .Files.Glob "cozyrds/*.yaml" }} --- {{ $.Files.Get $path }} {{- end }}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/system/gateway-rd/templates/cozyrd.yaml` at line 1, The template currently uses .Files.Glob "cozyrds/*" which can pull non-YAML artifacts; narrow the glob to manifest files only by replacing that pattern with YAML-specific globs (e.g., use .Files.Glob "cozyrds/*.yaml" and also handle "cozyrds/*.yml" or iterate over both patterns) in the same template range expression so only .yaml/.yml files are rendered; update the .Files.Glob call(s) in the template to those YAML-only patterns (and add a second loop if needed) to ensure safe, predictable output.packages/extra/gateway/templates/gateway.yaml (1)
13-14: Consider making replica count configurable.The
replicas: 2is hardcoded here, but based on the relevant context,values.yamlprovides empty_clusterand_namespacedictionaries with no documented interface for replica configuration. Consider exposing this as a configurable value for environments with different availability requirements.♻️ Proposed refactor to make replicas configurable
+{{- $replicas := (index .Values._cluster "envoy-replicas") | default 2 }} {{- $exposeGateway := (index .Values._cluster "gateway") | default "" }} {{- $exposeExternalIPs := (index .Values._cluster "expose-external-ips") | default "" | nospace }}Then use
$replicasinstead of the hardcoded value:envoyDeployment: - replicas: 2 + replicas: {{ $replicas }}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/extra/gateway/templates/gateway.yaml` around lines 13 - 14, The hardcoded replicas: 2 under envoyDeployment should be made configurable via values.yaml; add a new value (e.g., envoyDeployment.replicas or replicaCount) in values.yaml and replace the literal 2 in gateway.yaml with the Helm template variable (reference the envoyDeployment object and its replicas field) so deployments can be adjusted per-environment; ensure you provide a sensible default in values.yaml and update any README or comments to document the new envoyDeployment.replicas option.packages/system/envoy-gateway/charts/gateway-helm/templates/envoy-gateway-poddisruptionbudget.yaml (1)
8-8: Redundantandwith single argument.Line 8 uses
{{- if and .Values.podDisruptionBudget.minAvailable }}butandwith a single argument is redundant.♻️ Proposed fix
- {{- if and .Values.podDisruptionBudget.minAvailable }} + {{- if .Values.podDisruptionBudget.minAvailable }}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/system/envoy-gateway/charts/gateway-helm/templates/envoy-gateway-poddisruptionbudget.yaml` at line 8, The conditional uses a redundant "and" with a single argument; change the template conditional that currently uses "{{- if and .Values.podDisruptionBudget.minAvailable }}" to a simple "{{- if .Values.podDisruptionBudget.minAvailable }}" so the PodDisruptionBudget conditional correctly evaluates the presence/truthiness of .Values.podDisruptionBudget.minAvailable (update the corresponding if/endif block around that symbol).packages/system/monitoring/templates/alerta/httproute.yaml (1)
3-7: Same pattern issues as Grafana template.This template has the same redundant variable pattern and potential nil access as the Grafana HTTPRoute template:
- Lines 3 and 7 both derive gateway values redundantly
- Line 4 accesses
._namespace.hostwithout default fallbackConsider applying consistent fixes across all monitoring HTTPRoute templates.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/system/monitoring/templates/alerta/httproute.yaml` around lines 3 - 7, The template repeats and risks nil access: remove the redundant declaration of $gatewayClassName and consolidate gateway derivation by using a single variable (e.g., $gateway) for both purposes, and add a safe default when reading .Values._namespace.host (use default "" or fall back to .Values.host) before computing $alertaHost; update references to use the single $gateway variable and ensure $host is defined with a default so .Values._namespace.host cannot be nil.packages/system/monitoring/templates/grafana/httproute.yaml (1)
3-7: Redundant variable and potential nil access.
Lines 3 and 7 assign the same value (
._namespace.gateway) to$gatewayand$gatewayClassNameseparately.Line 4 accesses
.Values._namespace.hostwithout a default fallback. If_namespaceis nil, this could cause a template error.Suggested fix
{{- $gateway := .Values._namespace.gateway | default "" }} -{{- $host := .Values._namespace.host }} +{{- $host := (.Values._namespace).host | default "" }} {{- $grafanaHost := printf "grafana.%s" (.Values.host | default $host) }} - -{{- $gatewayClassName := .Values._namespace.gateway | default "" }}Then use
$gatewaydirectly forgatewayClassNameat line 18.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/system/monitoring/templates/grafana/httproute.yaml` around lines 3 - 7, The template defines duplicate variables $gateway and $gatewayClassName from .Values._namespace.gateway and reads .Values._namespace.host without a safe default; remove the redundancy by using the single $gateway variable wherever gatewayClassName is needed (e.g., replace uses of $gatewayClassName with $gateway) and protect host access by giving .Values._namespace.host a default (e.g., use a default fallback when assigning $host) to avoid nil access in templates such as in the $host and $grafanaHost assignments.packages/system/dashboard/templates/httproute.yaml (1)
6-7: Redundant variable assignment.
$gatewayand$gatewayClassNameare assigned the same value. Consider using a single variable for clarity.Suggested simplification
{{- $gateway := ((.Values._namespace).gateway) | default (index .Values._cluster "gateway") | default "" }} -{{- $gatewayClassName := $gateway }} +{{- /* $gateway is also used as gatewayClassName */ -}}Then use
$gatewaydirectly at line 20.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/system/dashboard/templates/httproute.yaml` around lines 6 - 7, The template defines two identical variables ($gateway and $gatewayClassName); remove the redundant $gatewayClassName assignment and update any usages of $gatewayClassName to use $gateway instead (e.g., replace references to $gatewayClassName with $gateway and delete the line setting $gatewayClassName).packages/apps/harbor/templates/httproute.yaml (1)
3-7: Redundant variable assignment.Same pattern as other templates:
$gateway(line 3) and$gatewayClassName(line 7) are assigned identical values.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/apps/harbor/templates/httproute.yaml` around lines 3 - 7, The template defines two identical variables $gateway and $gatewayClassName causing redundancy; remove one (prefer keeping a single variable name, e.g., $gatewayClassName or $gateway), update all references in the template to use the chosen variable, and delete the other assignment (the duplicate default assignment of .Values._namespace.gateway) so only one defaulting expression remains (ensure functions or blocks that reference $gateway or $gatewayClassName are updated accordingly).packages/system/envoy-gateway/charts/gateway-helm/templates/certgen.yaml (1)
30-44: Args handling is verbose but functional.The conditional logic to append
--disable-topology-injectorwhen topology injector is disabled works correctly. However, the if/else creates duplicatedcommandblocks.Potential simplification
containers: {{- $args := .Values.certgen.job.args }} {{- if not .Values.topologyInjector.enabled }} {{- $args = append $args "--disable-topology-injector" }} {{- end }} - {{- if $args }} - - args: - {{- toYaml $args | nindent 8 }} - command: + - command: - envoy-gateway - certgen - {{- else }} - - command: - - envoy-gateway - - certgen + {{- with $args }} + args: + {{- toYaml . | nindent 8 }} {{- end }}This puts
argsaftercommandwhich is valid YAML and avoids duplication.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/system/envoy-gateway/charts/gateway-helm/templates/certgen.yaml` around lines 30 - 44, The template duplicates the `command` block because of the if/else around args; keep `command` rendered once and only conditionally render the `args` block when `$args` is non-empty. In the certgen template, compute `$args` (from `.Values.certgen.job.args` and appending `"--disable-topology-injector"` when `.Values.topologyInjector.enabled` is false) as you already do, then unconditionally emit the `- command: - envoy-gateway - certgen` lines and, if `$args` is set, emit the `- args:` block with `{{- toYaml $args | nindent 8 }}`; remove the current if/else that duplicates `command`.packages/system/envoy-gateway/charts/gateway-helm/templates/namespaced-infra-manager-rbac.yaml (1)
1-35: Clarify relationship with infra-manager-rbac.yaml whenwatch.type == "NamespaceSelector".When
watch.type == "NamespaceSelector"is configured alongsidedeploy.type == "GatewayNamespace", both files create tokenreview ClusterRoles:
infra-manager-rbac.yamlcreates*-cluster-infra-manager(includes both basic + tokenreview rules)- This file creates
*-infra-manager-tokenreview(tokenreview only)While functionally harmless (redundant permissions for the same ServiceAccount), this could be confusing when operators audit RBAC rules.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/system/envoy-gateway/charts/gateway-helm/templates/namespaced-infra-manager-rbac.yaml` around lines 1 - 35, When watch.type == "NamespaceSelector" and deploy.type == "GatewayNamespace" the template creates a tokenreview ClusterRole '{{ include "eg.fullname" $ }}-infra-manager-tokenreview' that duplicates the tokenreview rules already present in the other ClusterRole ('*-cluster-infra-manager' from infra-manager-rbac.yaml); update this template to skip creating the tokenreview ClusterRole when $kube.watch.type == "NamespaceSelector" (i.e., add a conditional around the ClusterRole/ClusterRoleBinding blocks checking that not (eq $kube.watch.type "NamespaceSelector")), or alternatively add a clear comment in the template explaining why both exist to avoid audit confusion. Ensure you reference $kube (from .Values.config.envoyGateway.provider.kubernetes) and the ClusterRole name '{{ include "eg.fullname" $ }}-infra-manager-tokenreview' when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/apps/harbor/templates/httproute.yaml`:
- Around line 1-9: The Ingress template is created unconditionally causing
conflicts when Gateway API is enabled; update the ingress template (the file
that currently renders the Ingress resource) to wrap the whole Ingress manifest
with a guard that checks the same $gatewayAPI value used in httproute.yaml (and
optionally $gateway) so the Ingress is skipped when gateway-api is "true" (e.g.,
use an if that evaluates ne $gatewayAPI "true" around the Ingress manifest);
ensure you reference the same .Values._cluster "gateway-api" lookup used to
populate $gatewayAPI so behavior matches the httproute rendering.
In `@packages/system/dashboard/templates/httproute.yaml`:
- Line 11: The conditional at the top of the template only checks eq $gatewayAPI
"true" and has "dashboard" $exposeServices but omits verifying $gateway is
non-empty; update the if clause to also include (ne $gateway "") so it matches
the Grafana/Alerta templates and prevents rendering an empty gatewayClassName
when $gateway is empty (adjust the conditional that currently references
$gatewayAPI, $exposeServices to also check $gateway).
- Around line 1-11: Ingress is still created unconditionally when "dashboard" is
in expose-services, causing conflicts with the Gateway API httproute which uses
$gatewayAPI; update the dashboard Ingress template to skip creating the Ingress
when Gateway API is enabled by wrapping the Ingress resource block with a
conditional that checks both that "dashboard" is in $exposeServices and that
$gatewayAPI is not "true" (use the same $gatewayAPI and $exposeServices symbols
as in httproute.yaml and the has/and/ne helpers to guard the Ingress).
In `@packages/system/envoy-gateway/charts/gateway-helm/templates/_helpers.tpl`:
- Around line 126-133: The current helper uses a hardcoded fallback "master" for
$imageTag which causes non-deterministic upgrades; change the default to a
pinned value coming from chart values (e.g. replace the literal "master" default
with something like .Values.global.images.ratelimit.tag or prefer a digest field
.Values.global.images.ratelimit.digest when present) and update values.yaml to
expose that pinned tag/digest; adjust the logic around
$repositoryParts/$imageTag to prefer the provided digest/tag from
.Values.global.images.ratelimit before falling back to the chart-level default.
- Around line 69-76: The template currently splits
.Values.deployment.envoyGateway.image.repository with splitn and uses
$repositoryParts._1, which drops namespace segments (e.g.,
"envoyproxy/gateway"); change the logic to preserve the full repository path
when .Values.global.imageRegistry is set by assigning $repositoryName to the
entire .Values.deployment.envoyGateway.image.repository (instead of using
splitn/$repositoryParts._1) and then build the final image string with printf
"%s/%s:%s" using $registryName, the full $repositoryName, and $imageTag.
In
`@packages/system/envoy-gateway/charts/gateway-helm/templates/envoy-gateway-hpa.yaml`:
- Line 15: The error message in the Helm template's required check for
maxReplicas contains a typo (`.Values.hps.maxReplicas`); update the required
call message to reference the correct path `.Values.hpa.maxReplicas` so the
check on maxReplicas (the line using required and .Values.hpa.maxReplicas)
reports the correct configuration key.
In
`@packages/system/envoy-gateway/charts/gateway-helm/templates/envoy-gateway-service.yaml`:
- Around line 1-30: This change edits the vendored upstream Helm chart's Service
template for the "envoy-gateway" Service (envoy-gateway-service.yaml) and must
be reverted; instead, remove local modifications from the vendored chart and
implement your desired service adjustments by either adding a Service override
in the wrapper chart's values (use .Values.service, .Values.deployment.ports,
.Values.topologyInjector.enabled, and .Values.service.trafficDistribution to
merge/override) or by adding a custom templates/Service definition in the
parent/wrapper chart that defines the Service for "envoy-gateway"; also file an
upstream issue against the Envoy Gateway repo so the change can be considered
upstream.
---
Nitpick comments:
In `@packages/apps/harbor/templates/httproute.yaml`:
- Around line 3-7: The template defines two identical variables $gateway and
$gatewayClassName causing redundancy; remove one (prefer keeping a single
variable name, e.g., $gatewayClassName or $gateway), update all references in
the template to use the chosen variable, and delete the other assignment (the
duplicate default assignment of .Values._namespace.gateway) so only one
defaulting expression remains (ensure functions or blocks that reference
$gateway or $gatewayClassName are updated accordingly).
In `@packages/extra/gateway/templates/gateway.yaml`:
- Around line 13-14: The hardcoded replicas: 2 under envoyDeployment should be
made configurable via values.yaml; add a new value (e.g.,
envoyDeployment.replicas or replicaCount) in values.yaml and replace the literal
2 in gateway.yaml with the Helm template variable (reference the envoyDeployment
object and its replicas field) so deployments can be adjusted per-environment;
ensure you provide a sensible default in values.yaml and update any README or
comments to document the new envoyDeployment.replicas option.
In `@packages/system/dashboard/templates/httproute.yaml`:
- Around line 6-7: The template defines two identical variables ($gateway and
$gatewayClassName); remove the redundant $gatewayClassName assignment and update
any usages of $gatewayClassName to use $gateway instead (e.g., replace
references to $gatewayClassName with $gateway and delete the line setting
$gatewayClassName).
In `@packages/system/envoy-gateway/charts/gateway-helm/templates/certgen.yaml`:
- Around line 30-44: The template duplicates the `command` block because of the
if/else around args; keep `command` rendered once and only conditionally render
the `args` block when `$args` is non-empty. In the certgen template, compute
`$args` (from `.Values.certgen.job.args` and appending
`"--disable-topology-injector"` when `.Values.topologyInjector.enabled` is
false) as you already do, then unconditionally emit the `- command: -
envoy-gateway - certgen` lines and, if `$args` is set, emit the `- args:` block
with `{{- toYaml $args | nindent 8 }}`; remove the current if/else that
duplicates `command`.
In
`@packages/system/envoy-gateway/charts/gateway-helm/templates/envoy-gateway-poddisruptionbudget.yaml`:
- Line 8: The conditional uses a redundant "and" with a single argument; change
the template conditional that currently uses "{{- if and
.Values.podDisruptionBudget.minAvailable }}" to a simple "{{- if
.Values.podDisruptionBudget.minAvailable }}" so the PodDisruptionBudget
conditional correctly evaluates the presence/truthiness of
.Values.podDisruptionBudget.minAvailable (update the corresponding if/endif
block around that symbol).
In
`@packages/system/envoy-gateway/charts/gateway-helm/templates/namespaced-infra-manager-rbac.yaml`:
- Around line 1-35: When watch.type == "NamespaceSelector" and deploy.type ==
"GatewayNamespace" the template creates a tokenreview ClusterRole '{{ include
"eg.fullname" $ }}-infra-manager-tokenreview' that duplicates the tokenreview
rules already present in the other ClusterRole ('*-cluster-infra-manager' from
infra-manager-rbac.yaml); update this template to skip creating the tokenreview
ClusterRole when $kube.watch.type == "NamespaceSelector" (i.e., add a
conditional around the ClusterRole/ClusterRoleBinding blocks checking that not
(eq $kube.watch.type "NamespaceSelector")), or alternatively add a clear comment
in the template explaining why both exist to avoid audit confusion. Ensure you
reference $kube (from .Values.config.envoyGateway.provider.kubernetes) and the
ClusterRole name '{{ include "eg.fullname" $ }}-infra-manager-tokenreview' when
making the change.
In `@packages/system/gateway-rd/Makefile`:
- Line 4: The Makefile currently only contains the include line (include
../../../hack/package.mk) which causes checkmake to warn when scanning this file
alone; add lightweight local phony passthrough targets (e.g., .PHONY plus
targets like build, test, lint, install) that simply forward to the same targets
provided by the included package.mk so checkmake sees concrete targets. Ensure
the passthrough targets are non-functional stubs that invoke the included
makefile's targets (using the standard make invocation) and keep them minimal so
behavior doesn't change.
In `@packages/system/gateway-rd/templates/cozyrd.yaml`:
- Line 1: The template currently uses .Files.Glob "cozyrds/*" which can pull
non-YAML artifacts; narrow the glob to manifest files only by replacing that
pattern with YAML-specific globs (e.g., use .Files.Glob "cozyrds/*.yaml" and
also handle "cozyrds/*.yml" or iterate over both patterns) in the same template
range expression so only .yaml/.yml files are rendered; update the .Files.Glob
call(s) in the template to those YAML-only patterns (and add a second loop if
needed) to ensure safe, predictable output.
In `@packages/system/monitoring/templates/alerta/httproute.yaml`:
- Around line 3-7: The template repeats and risks nil access: remove the
redundant declaration of $gatewayClassName and consolidate gateway derivation by
using a single variable (e.g., $gateway) for both purposes, and add a safe
default when reading .Values._namespace.host (use default "" or fall back to
.Values.host) before computing $alertaHost; update references to use the single
$gateway variable and ensure $host is defined with a default so
.Values._namespace.host cannot be nil.
In `@packages/system/monitoring/templates/grafana/httproute.yaml`:
- Around line 3-7: The template defines duplicate variables $gateway and
$gatewayClassName from .Values._namespace.gateway and reads
.Values._namespace.host without a safe default; remove the redundancy by using
the single $gateway variable wherever gatewayClassName is needed (e.g., replace
uses of $gatewayClassName with $gateway) and protect host access by giving
.Values._namespace.host a default (e.g., use a default fallback when assigning
$host) to avoid nil access in templates such as in the $host and $grafanaHost
assignments.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 2e8d5e5d-0469-4c6c-a271-57f6b3d81a9c
⛔ Files ignored due to path filters (9)
packages/extra/gateway/logos/gateway.svgis excluded by!**/*.svgpackages/system/envoy-gateway/charts/gateway-helm/crds/generated/gateway.envoyproxy.io_backends.yamlis excluded by!**/generated/**packages/system/envoy-gateway/charts/gateway-helm/crds/generated/gateway.envoyproxy.io_backendtrafficpolicies.yamlis excluded by!**/generated/**packages/system/envoy-gateway/charts/gateway-helm/crds/generated/gateway.envoyproxy.io_clienttrafficpolicies.yamlis excluded by!**/generated/**packages/system/envoy-gateway/charts/gateway-helm/crds/generated/gateway.envoyproxy.io_envoyextensionpolicies.yamlis excluded by!**/generated/**packages/system/envoy-gateway/charts/gateway-helm/crds/generated/gateway.envoyproxy.io_envoypatchpolicies.yamlis excluded by!**/generated/**packages/system/envoy-gateway/charts/gateway-helm/crds/generated/gateway.envoyproxy.io_envoyproxies.yamlis excluded by!**/generated/**packages/system/envoy-gateway/charts/gateway-helm/crds/generated/gateway.envoyproxy.io_httproutefilters.yamlis excluded by!**/generated/**packages/system/envoy-gateway/charts/gateway-helm/crds/generated/gateway.envoyproxy.io_securitypolicies.yamlis excluded by!**/generated/**
📒 Files selected for processing (53)
hack/e2e-install-cozystack.batspackages/apps/harbor/templates/httproute.yamlpackages/apps/tenant/templates/gateway.yamlpackages/core/platform/sources/envoy-gateway.yamlpackages/core/platform/sources/gateway-application.yamlpackages/core/platform/templates/apps.yamlpackages/core/platform/templates/bundles/system.yamlpackages/core/platform/values.yamlpackages/extra/bootbox/templates/matchbox/httproute.yamlpackages/extra/gateway/Chart.yamlpackages/extra/gateway/Makefilepackages/extra/gateway/templates/gateway.yamlpackages/extra/gateway/values.schema.jsonpackages/extra/gateway/values.yamlpackages/extra/seaweedfs/templates/tlsroute.yamlpackages/system/bucket/templates/httproute.yamlpackages/system/cozystack-api/templates/api-tlsroute.yamlpackages/system/dashboard/templates/httproute.yamlpackages/system/envoy-gateway/Chart.yamlpackages/system/envoy-gateway/Makefilepackages/system/envoy-gateway/charts/gateway-helm/.helmignorepackages/system/envoy-gateway/charts/gateway-helm/Chart.yamlpackages/system/envoy-gateway/charts/gateway-helm/README.mdpackages/system/envoy-gateway/charts/gateway-helm/crds/gatewayapi-crds.yamlpackages/system/envoy-gateway/charts/gateway-helm/templates/NOTES.txtpackages/system/envoy-gateway/charts/gateway-helm/templates/_helpers.tplpackages/system/envoy-gateway/charts/gateway-helm/templates/_rbac.tplpackages/system/envoy-gateway/charts/gateway-helm/templates/certgen-rbac.yamlpackages/system/envoy-gateway/charts/gateway-helm/templates/certgen.yamlpackages/system/envoy-gateway/charts/gateway-helm/templates/envoy-gateway-config.yamlpackages/system/envoy-gateway/charts/gateway-helm/templates/envoy-gateway-deployment.yamlpackages/system/envoy-gateway/charts/gateway-helm/templates/envoy-gateway-hpa.yamlpackages/system/envoy-gateway/charts/gateway-helm/templates/envoy-gateway-poddisruptionbudget.yamlpackages/system/envoy-gateway/charts/gateway-helm/templates/envoy-gateway-rbac.yamlpackages/system/envoy-gateway/charts/gateway-helm/templates/envoy-gateway-service.yamlpackages/system/envoy-gateway/charts/gateway-helm/templates/envoy-gateway-serviceaccount.yamlpackages/system/envoy-gateway/charts/gateway-helm/templates/envoy-proxy-topology-injector-webhook.yamlpackages/system/envoy-gateway/charts/gateway-helm/templates/infra-manager-rbac.yamlpackages/system/envoy-gateway/charts/gateway-helm/templates/leader-election-rbac.yamlpackages/system/envoy-gateway/charts/gateway-helm/templates/namespace.yamlpackages/system/envoy-gateway/charts/gateway-helm/templates/namespaced-infra-manager-rbac.yamlpackages/system/envoy-gateway/charts/gateway-helm/values.yamlpackages/system/envoy-gateway/values.yamlpackages/system/gateway-rd/Chart.yamlpackages/system/gateway-rd/Makefilepackages/system/gateway-rd/cozyrds/gateway.yamlpackages/system/gateway-rd/templates/cozyrd.yamlpackages/system/gateway-rd/values.yamlpackages/system/keycloak/templates/httproute.yamlpackages/system/kubevirt-cdi/templates/cdi-uploadproxy-tlsroute.yamlpackages/system/kubevirt/templates/vm-exportproxy-tlsroute.yamlpackages/system/monitoring/templates/alerta/httproute.yamlpackages/system/monitoring/templates/grafana/httproute.yaml
✅ Files skipped from review due to trivial changes (14)
- packages/extra/gateway/values.yaml
- packages/system/gateway-rd/values.yaml
- packages/system/envoy-gateway/values.yaml
- packages/extra/gateway/values.schema.json
- packages/system/envoy-gateway/Chart.yaml
- packages/core/platform/sources/envoy-gateway.yaml
- packages/extra/gateway/Chart.yaml
- packages/system/gateway-rd/Chart.yaml
- packages/system/envoy-gateway/charts/gateway-helm/.helmignore
- packages/system/envoy-gateway/charts/gateway-helm/templates/envoy-gateway-serviceaccount.yaml
- packages/core/platform/values.yaml
- packages/core/platform/sources/gateway-application.yaml
- packages/system/gateway-rd/cozyrds/gateway.yaml
- packages/system/envoy-gateway/charts/gateway-helm/Chart.yaml
🚧 Files skipped from review as they are similar to previous changes (2)
- hack/e2e-install-cozystack.bats
- packages/core/platform/templates/bundles/system.yaml
| {{- $gatewayAPI := (index .Values._cluster "gateway-api") | default "false" }} | ||
| {{- $clusterIssuer := (index .Values._cluster "issuer-name") | default "letsencrypt-prod" }} | ||
| {{- $gateway := .Values._namespace.gateway | default "" }} | ||
| {{- $host := .Values._namespace.host }} | ||
| {{- $harborHost := .Values.host | default (printf "%s.%s" .Release.Name $host) }} | ||
|
|
||
| {{- $gatewayClassName := .Values._namespace.gateway | default "" }} | ||
|
|
||
| {{- if and (eq $gatewayAPI "true") (ne $gateway "") }} |
There was a problem hiding this comment.
Routing conflict: Harbor Ingress will still be deployed when Gateway API is enabled.
The existing packages/apps/harbor/templates/ingress.yaml (shown in context snippets) creates an Ingress unconditionally without checking the gateway-api flag. When gateway-api: "true", both the Ingress and these Gateway API resources will be deployed for the same Harbor hostname, causing routing conflicts.
Consider adding a guard to ingress.yaml to skip Ingress creation when Gateway API is enabled:
{{- if ne $gatewayAPI "true" }}🧰 Tools
🪛 YAMLlint (1.38.0)
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/apps/harbor/templates/httproute.yaml` around lines 1 - 9, The
Ingress template is created unconditionally causing conflicts when Gateway API
is enabled; update the ingress template (the file that currently renders the
Ingress resource) to wrap the whole Ingress manifest with a guard that checks
the same $gatewayAPI value used in httproute.yaml (and optionally $gateway) so
the Ingress is skipped when gateway-api is "true" (e.g., use an if that
evaluates ne $gatewayAPI "true" around the Ingress manifest); ensure you
reference the same .Values._cluster "gateway-api" lookup used to populate
$gatewayAPI so behavior matches the httproute rendering.
|
|
||
| {{- $dashboardHost := index $hostnames "dashboard" | default (printf "dashboard.%s" $host) }} | ||
|
|
||
| {{- if and (eq $gatewayAPI "true") (has "dashboard" $exposeServices) }} |
There was a problem hiding this comment.
Inconsistent condition compared to other Gateway API templates.
This template only checks eq $gatewayAPI "true" and has "dashboard" $exposeServices, but doesn't verify that $gateway is non-empty. In contrast, the Grafana and Alerta templates (also in this PR) use (ne $gateway "") as part of their condition.
If $gateway is empty, line 20 will render an empty gatewayClassName, which is invalid. Consider aligning with the other templates:
-{{- if and (eq $gatewayAPI "true") (has "dashboard" $exposeServices) }}
+{{- if and (eq $gatewayAPI "true") (has "dashboard" $exposeServices) (ne $gateway "") }}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| {{- if and (eq $gatewayAPI "true") (has "dashboard" $exposeServices) }} | |
| {{- if and (eq $gatewayAPI "true") (has "dashboard" $exposeServices) (ne $gateway "") }} |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/system/dashboard/templates/httproute.yaml` at line 11, The
conditional at the top of the template only checks eq $gatewayAPI "true" and has
"dashboard" $exposeServices but omits verifying $gateway is non-empty; update
the if clause to also include (ne $gateway "") so it matches the Grafana/Alerta
templates and prevents rendering an empty gatewayClassName when $gateway is
empty (adjust the conditional that currently references $gatewayAPI,
$exposeServices to also check $gateway).
| {{- if .Values.deployment.envoyGateway.image.repository -}} | ||
| {{/* if global.imageRegistry is defined, it takes precedence always */}} | ||
| {{- if .Values.global.imageRegistry -}} | ||
| {{- $repositoryParts := splitn "/" 2 .Values.deployment.envoyGateway.image.repository -}} | ||
| {{- $registryName := .Values.global.imageRegistry -}} | ||
| {{- $repositoryName := $repositoryParts._1 -}} | ||
| {{- $imageTag := default .Chart.AppVersion .Values.deployment.envoyGateway.image.tag -}} | ||
| {{- printf "%s/%s:%s" $registryName $repositoryName $imageTag -}} |
There was a problem hiding this comment.
Preserve the full repository path when global.imageRegistry is applied.
On Line 72–Line 75, the current splitn "/" 2 + _1 logic drops the first segment unconditionally. For repos like envoyproxy/gateway, this renders as <registry>/gateway:<tag> (namespace lost), which can break pulls.
💡 Suggested fix
{{- if .Values.global.imageRegistry -}}
-{{- $repositoryParts := splitn "/" 2 .Values.deployment.envoyGateway.image.repository -}}
-{{- $registryName := .Values.global.imageRegistry -}}
-{{- $repositoryName := $repositoryParts._1 -}}
+{{- $repo := .Values.deployment.envoyGateway.image.repository -}}
+{{- $repoParts := splitList "/" $repo -}}
+{{- $firstPart := first $repoParts -}}
+{{- $hasRegistryPrefix := or (contains "." $firstPart) (contains ":" $firstPart) (eq $firstPart "localhost") -}}
+{{- $repositoryName := ternary (join "/" (rest $repoParts)) $repo $hasRegistryPrefix -}}
+{{- $registryName := .Values.global.imageRegistry -}}
{{- $imageTag := default .Chart.AppVersion .Values.deployment.envoyGateway.image.tag -}}
{{- printf "%s/%s:%s" $registryName $repositoryName $imageTag -}}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/system/envoy-gateway/charts/gateway-helm/templates/_helpers.tpl`
around lines 69 - 76, The template currently splits
.Values.deployment.envoyGateway.image.repository with splitn and uses
$repositoryParts._1, which drops namespace segments (e.g.,
"envoyproxy/gateway"); change the logic to preserve the full repository path
when .Values.global.imageRegistry is set by assigning $repositoryName to the
entire .Values.deployment.envoyGateway.image.repository (instead of using
splitn/$repositoryParts._1) and then build the final image string with printf
"%s/%s:%s" using $registryName, the full $repositoryName, and $imageTag.
| {{- $imageParts := splitn "/" 2 .Values.global.images.ratelimit.image -}} | ||
| {{/* if global.imageRegistry is defined, it takes precedence always */}} | ||
| {{- $registryName := default $imageParts._0 .Values.global.imageRegistry -}} | ||
| {{- $repositoryTag := $imageParts._1 -}} | ||
| {{- $repositoryParts := splitn ":" 2 $repositoryTag -}} | ||
| {{- $repositoryName := $repositoryParts._0 -}} | ||
| {{- $imageTag := default "master" $repositoryParts._1 -}} | ||
| {{- printf "%s/%s:%s" $registryName $repositoryName $imageTag -}} |
There was a problem hiding this comment.
Avoid floating master as the ratelimit image fallback tag.
Line 132 defaults to master when the image string has no tag. That introduces non-deterministic upgrades and supply-chain drift. Please switch this fallback to a pinned value sourced from chart values (or a digest-based default).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/system/envoy-gateway/charts/gateway-helm/templates/_helpers.tpl`
around lines 126 - 133, The current helper uses a hardcoded fallback "master"
for $imageTag which causes non-deterministic upgrades; change the default to a
pinned value coming from chart values (e.g. replace the literal "master" default
with something like .Values.global.images.ratelimit.tag or prefer a digest field
.Values.global.images.ratelimit.digest when present) and update values.yaml to
expose that pinned tag/digest; adjust the logic around
$repositoryParts/$imageTag to prefer the provided digest/tag from
.Values.global.images.ratelimit before falling back to the chart-level default.
| {{- if .Values.hpa.minReplicas }} | ||
| minReplicas: {{ .Values.hpa.minReplicas }} | ||
| {{- end }} | ||
| maxReplicas: {{ required ".Values.hps.maxReplicas is required when hpa is enabled" .Values.hpa.maxReplicas }} |
There was a problem hiding this comment.
Typo in error message: hps should be hpa.
The required error message references .Values.hps.maxReplicas but the actual value path is .Values.hpa.maxReplicas. This will confuse users when debugging.
🔧 Proposed fix
- maxReplicas: {{ required ".Values.hps.maxReplicas is required when hpa is enabled" .Values.hpa.maxReplicas }}
+ maxReplicas: {{ required ".Values.hpa.maxReplicas is required when hpa is enabled" .Values.hpa.maxReplicas }}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@packages/system/envoy-gateway/charts/gateway-helm/templates/envoy-gateway-hpa.yaml`
at line 15, The error message in the Helm template's required check for
maxReplicas contains a typo (`.Values.hps.maxReplicas`); update the required
call message to reference the correct path `.Values.hpa.maxReplicas` so the
check on maxReplicas (the line using required and .Values.hpa.maxReplicas)
reports the correct configuration key.
ccb04ca to
5fcac56
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/system/envoy-gateway/Makefile`:
- Line 14: The Makefile currently deletes charts/gateway-helm/crds which mutates
the vendored chart; remove the `rm -rf charts/gateway-helm/crds` step and
instead ensure CRDs are copied from the pulled chart into a separate location
used by packaging (e.g., copy charts/gateway-helm/crds -> templates/crds/ or a
build/artifacts dir) so charts/gateway-helm remains an untouched upstream copy;
update the Makefile targets that relied on that removal to use the new copy step
(reference the Makefile target that runs `helm pull` and the path
charts/gateway-helm/crds and templates/crds to locate where to change).
In
`@packages/system/envoy-gateway/templates/crds/gateway.envoyproxy.io_backends.yaml`:
- Around line 4-7: The CRD metadata for backends.gateway.envoyproxy.io is
missing the Helm-safe annotation; add the annotation "helm.sh/resource-policy:
keep" under metadata.annotations for the CRD (the block where name:
backends.gateway.envoyproxy.io is defined) to prevent Helm from deleting the CRD
on uninstall, and apply the same annotation to every other copied CRD file in
this templates directory so all CRDs are preserved across Helm uninstall.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: fdfa7ce6-ea9b-4d6e-aa48-d3b96edffe59
📒 Files selected for processing (9)
packages/system/envoy-gateway/Makefilepackages/system/envoy-gateway/templates/crds/gateway.envoyproxy.io_backends.yamlpackages/system/envoy-gateway/templates/crds/gateway.envoyproxy.io_backendtrafficpolicies.yamlpackages/system/envoy-gateway/templates/crds/gateway.envoyproxy.io_clienttrafficpolicies.yamlpackages/system/envoy-gateway/templates/crds/gateway.envoyproxy.io_envoyextensionpolicies.yamlpackages/system/envoy-gateway/templates/crds/gateway.envoyproxy.io_envoypatchpolicies.yamlpackages/system/envoy-gateway/templates/crds/gateway.envoyproxy.io_envoyproxies.yamlpackages/system/envoy-gateway/templates/crds/gateway.envoyproxy.io_httproutefilters.yamlpackages/system/envoy-gateway/templates/crds/gateway.envoyproxy.io_securitypolicies.yaml
| rm -rf templates/crds | ||
| mkdir -p templates/crds | ||
| cp charts/gateway-helm/crds/generated/*.yaml templates/crds/ | ||
| rm -rf charts/gateway-helm/crds |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Keep the vendored chart pristine after helm pull.
The last line rewrites the upstream chart inside charts/. Even if we need copied CRDs under templates/crds/, charts/gateway-helm should stay as-pulled so future bumps remain auditable and we do not drift from the repo’s vendoring pattern.
♻️ Proposed fix
cp charts/gateway-helm/crds/generated/*.yaml templates/crds/
- rm -rf charts/gateway-helm/crdsBased on learnings, vendored charts under packages/system/*/charts/ should remain an upstream copy and not be modified directly.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| rm -rf charts/gateway-helm/crds | |
| cp charts/gateway-helm/crds/generated/*.yaml templates/crds/ |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/system/envoy-gateway/Makefile` at line 14, The Makefile currently
deletes charts/gateway-helm/crds which mutates the vendored chart; remove the
`rm -rf charts/gateway-helm/crds` step and instead ensure CRDs are copied from
the pulled chart into a separate location used by packaging (e.g., copy
charts/gateway-helm/crds -> templates/crds/ or a build/artifacts dir) so
charts/gateway-helm remains an untouched upstream copy; update the Makefile
targets that relied on that removal to use the new copy step (reference the
Makefile target that runs `helm pull` and the path charts/gateway-helm/crds and
templates/crds to locate where to change).
| metadata: | ||
| annotations: | ||
| controller-gen.kubebuilder.io/version: v0.18.0 | ||
| name: backends.gateway.envoyproxy.io |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In Helm v3, when a CustomResourceDefinition is rendered from a chart's templates/directory instead ofcrds/, is it deleted on uninstall, and is metadata.annotations["helm.sh/resource-policy"]="keep" the supported way to prevent that?
💡 Result:
In Helm v3, a CustomResourceDefinition (CRD) rendered from a chart's templates/ directory is treated as a regular templated resource and will be deleted on helm uninstall. CRDs in the crds/ directory are handled specially: they are installed if missing but never deleted by Helm. The annotation metadata.annotations["helm.sh/resource-policy"]="keep" is the supported way to prevent deletion of templated resources (including CRDs from templates/) during helm uninstall, helm upgrade, or helm rollback. Official Helm documentation explicitly describes this usage, and GitHub issues confirm it works for CRDs when applied via the chart template (though manual kubectl annotate may have limitations in some versions). To use it, add to your CRD YAML in templates/: metadata: annotations: helm.sh/resource-policy: keep Note: Resources with this annotation become orphaned after uninstall; Helm no longer manages them.
Citations:
- 1: Moving CRDs from templates/ to crds/ results in CRDs being deleted helm/helm#7279
- 2: https://v3.helm.sh/docs/chart_best_practices/custom_resource_definitions/
- 3: helm3 deleted CRDs with uninstall helm/helm#11564
- 4: https://helm.sh/docs/chart_best_practices/custom_resource_definitions
- 5: Annotation
"helm.sh/resource-policy": keepnot honored when uninstalling helm/helm#10890 - 6: https://helm.sh/docs/howto/charts_tips_and_tricks
Add helm.sh/resource-policy: keep to prevent Helm from deleting the CRD on uninstall.
This CRD is rendered from templates/ rather than the special crds/ directory, so Helm treats it as a normal templated resource. On uninstall, Helm will delete it and cascade-delete every instance of this API. Add the annotation to prevent deletion, or move these CRDs to a dedicated release.
Suggested change
metadata:
annotations:
+ helm.sh/resource-policy: keep
controller-gen.kubebuilder.io/version: v0.18.0Apply this fix to all copied CRDs in this directory.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| metadata: | |
| annotations: | |
| controller-gen.kubebuilder.io/version: v0.18.0 | |
| name: backends.gateway.envoyproxy.io | |
| metadata: | |
| annotations: | |
| helm.sh/resource-policy: keep | |
| controller-gen.kubebuilder.io/version: v0.18.0 | |
| name: backends.gateway.envoyproxy.io |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@packages/system/envoy-gateway/templates/crds/gateway.envoyproxy.io_backends.yaml`
around lines 4 - 7, The CRD metadata for backends.gateway.envoyproxy.io is
missing the Helm-safe annotation; add the annotation "helm.sh/resource-policy:
keep" under metadata.annotations for the CRD (the block where name:
backends.gateway.envoyproxy.io is defined) to prevent Helm from deleting the CRD
on uninstall, and apply the same annotation to every other copied CRD file in
this templates directory so all CRDs are preserved across Helm uninstall.
Aleksei Sviridkin (lexfrei)
left a comment
There was a problem hiding this comment.
Architecture
The overall design is sound: per-tenant GatewayClass (name = namespace), mergeGateways: true for one LB Service per tenant, Envoy Gateway as controller with Cilium staying as CNI. The dual-stack coexistence with ingress-nginx and the migration path (ingress: true → both → gatewayAPI: true) are well thought out.
However, the execution has several issues — some are security concerns, others are functional bugs from missing conditions or sources. The iterative nature of 31 commits (1 feat + 15 fixes) suggests these were discovered during debugging but not all resolved.
Blocking issues
- Security:
acme-challengeGateway accepts routes from any namespace — allows tenants to intercept HTTP traffic on port 80 - external-dns missing
gateway-tlsroutesource — 5 TLSRoute services (kubernetes-api, vm-exportproxy, cdi-uploadproxy, seaweedfs-filer, openbao) won't get DNS records - Empty
solvers:in ClusterIssuer — reachable when bothingress: falseandgatewayAPI: false - Keycloak sticky session disabled unconditionally — regression for ingress-only deployments
- Envoy Gateway deployed unconditionally — resource overhead and attack surface when
gatewayAPI: false - Alerta/Grafana duplicate HTTP listeners — architectural inconsistency with dashboard/keycloak pattern
| port: 80 | ||
| allowedRoutes: | ||
| namespaces: | ||
| from: All |
There was a problem hiding this comment.
allowedRoutes.namespaces.from: All on a port 80 listener means any namespace can attach HTTPRoutes to this Gateway. With mergeGateways: true, a tenant can create an HTTPRoute with a specific hostname that takes priority over the catch-all redirect-to-https, intercepting HTTP traffic or blocking ACME challenges.
Compare with Harbor's Gateway which correctly uses from: Same.
This should be from: Same or use a label selector. For cert-manager cross-namespace ACME challenges, a ReferenceGrant is the proper mechanism.
There was a problem hiding this comment.
Fixed. Split into two Gateways:
acme-challenge:from: Selectorwithcozystack.io/system: "true"label — accepts ACME challenge HTTPRoutes only from system namespaces (cozy-dashboard, cozy-keycloak, etc.)http-redirect:from: Same— accepts the redirect-to-https route and same-namespace ACME challenges (tenant bucket certs)
The ClusterIssuer solver now references both Gateways as parentRefs, so cert-manager challenge HTTPRoutes attach to whichever Gateway allows the namespace. Random tenants can't attach to either.
| {{include "cozystack.platform.package.optional.default" (list "cozystack.external-dns" $) }} | ||
| {{- $externalDnsComponents := dict -}} | ||
| {{- if .Values.gateway.gatewayAPI -}} | ||
| {{- $externalDnsValues := dict "external-dns" (dict "sources" (list "service" "ingress" "gateway-httproute")) -}} |
There was a problem hiding this comment.
Missing gateway-tlsroute source. Services exposed via TLSRoute won't get DNS records:
kubernetes-api(cozystack-api/templates/api-tlsroute.yaml)vm-exportproxy(kubevirt/templates/vm-exportproxy-tlsroute.yaml)cdi-uploadproxy(kubevirt-cdi/templates/cdi-uploadproxy-tlsroute.yaml)seaweedfs-filer(extra/seaweedfs/templates/tlsroute.yaml)openbao(openbao/charts/openbao/templates/server-tlsroute.yaml)
Add "gateway-tlsroute" to the sources list.
There was a problem hiding this comment.
Fixed. Added gateway-tlsroute to the external-dns sources list alongside gateway-httproute. TLSRoute services (kubernetes-api, vm-exportproxy, cdi-uploadproxy, seaweedfs) will now get DNS records.
| {{- $_ := set $tenantComponents "tenant" (dict "values" $tenantClusterValues) }} | ||
| {{include "cozystack.platform.package" (list "cozystack.tenant-application" "default" $ $tenantComponents) }} | ||
| {{include "cozystack.platform.package.default" (list "cozystack.ingress-application" $) }} | ||
| {{include "cozystack.platform.package.default" (list "cozystack.gateway-application" $) }} |
There was a problem hiding this comment.
gateway-application (Envoy Gateway controller + CRDs + per-tenant chart) is deployed via package.default — always installed unless explicitly in disabledPackages. This installs Envoy Gateway pods, webhooks, and ~31k lines of CRDs on every cluster, even when gatewayAPI: false.
Compare with how ingress-application is also always deployed — but ingress-nginx is the established default. Envoy Gateway is a new, optional controller.
Consider either package.optional.default or a conditional on gateway.gatewayAPI.
There was a problem hiding this comment.
Fixed. Wrapped in {{- if .Values.gateway.gatewayAPI }} condition. Envoy Gateway controller, CRDs, and per-tenant chart are only deployed when gatewayAPI is explicitly enabled.
| apiTokenSecretRef: | ||
| name: cloudflare-api-token-secret | ||
| key: api-token | ||
| {{- else }} |
There was a problem hiding this comment.
When solver != "dns01": if both ingressEnabled == "false" (line 19 condition fails) and gatewayAPI == "false" or gateway == "" (line 24 condition fails), the solvers: list is empty. An ACME ClusterIssuer with zero solvers is invalid.
Default values have ingress: true, so this is unlikely but reachable through explicit user configuration. Add a fail with a clear message or a default solver.
There was a problem hiding this comment.
Fixed. Added a fallback: when both ingress-enabled and gatewayAPI are false (or gateway name is empty), an ingress solver is added as default to prevent an invalid empty solvers list. The fallback only triggers when neither ingress nor gateway solver is active.
| value: https://{{ $ingressHost }} | ||
| - name: JAVA_OPTS_APPEND | ||
| value: "-Djgroups.dns.query=keycloak-headless.cozy-keycloak.svc.{{ $clusterDomain }}" | ||
| - name: KC_SPI_STICKY_SESSION_ENCODER_INFINISPAN_SHOULD_ATTACH_ROUTE |
There was a problem hiding this comment.
This env var was added specifically for Gateway API (commit message: "Disable Keycloak sticky session route attachment... removes dependency on nginx cookie affinity"). But it's set unconditionally, affecting ingress-only deployments too.
With ingress-nginx, Keycloak relies on nginx.ingress.kubernetes.io/affinity: "cookie" for session affinity. Disabling route attachment unconditionally may break multi-replica Keycloak session handling under ingress-nginx.
Gate this on the gateway-api cluster value.
There was a problem hiding this comment.
Fixed. Wrapped in {{- if eq ((index .Values._cluster "gateway-api") | default "false") "true" }} condition. Ingress-only deployments keep the default Keycloak behavior with sticky sessions via nginx cookie affinity.
| labels: | ||
| cozystack.io/gateway: {{ $gateway }} | ||
| listeners: | ||
| - name: http |
There was a problem hiding this comment.
This Gateway creates a per-component HTTP listener on port 80 with its own redirect-to-https HTTPRoute. But the architecture already has a shared acme-challenge Gateway (in extra/gateway/templates/gateway.yaml) that handles HTTP→HTTPS redirect for all traffic.
Compare with dashboard and keycloak which correctly have HTTPS-only listeners and rely on the shared redirect. Alerta and Grafana should follow the same pattern.
There was a problem hiding this comment.
Fixed. Removed HTTP listener and redirect-to-https HTTPRoute from Alerta Gateway. Now HTTPS-only, consistent with dashboard, keycloak, bucket, and harbor. HTTP redirect is handled by the shared http-redirect Gateway in extra/gateway.
| labels: | ||
| cozystack.io/gateway: {{ $gateway }} | ||
| listeners: | ||
| - name: http |
There was a problem hiding this comment.
Same issue as Alerta — per-component HTTP listener on port 80 duplicates the shared acme-challenge Gateway's redirect functionality. Should have HTTPS-only listener like dashboard and keycloak.
There was a problem hiding this comment.
Fixed. Same as Alerta — removed HTTP listener and redirect route. HTTPS-only listener now.
|
|
||
| # Wait for monitoring to reconcile with gateway config | ||
| if ! kubectl wait hr/monitoring -n tenant-root --timeout=3m --for=condition=ready; then | ||
| flux reconcile hr monitoring -n tenant-root --force |
There was a problem hiding this comment.
flux reconcile requires flux CLI to be installed. Line 219 in the same PR explicitly says: "Use annotation instead of flux reconcile --force to avoid blocking on dependencies" and uses kubectl annotate instead. This line should follow the same pattern for consistency and to avoid an undeclared dependency on flux CLI.
There was a problem hiding this comment.
Fixed. Replaced flux reconcile hr monitoring --force with kubectl annotate hr monitoring -n tenant-root reconcile.fluxcd.io/requestedAt="$(date +%s)" --overwrite. Consistent with line 219 approach — non-blocking, no flux CLI dependency.
Aleksei Sviridkin (lexfrei)
left a comment
There was a problem hiding this comment.
Round 2
The previous findings are addressed — thanks for the quick turnaround. However, the fixes introduce new issues.
New blocking issues
- Two Gateways with identical port 80 listeners —
acme-challengeandhttp-redirectboth declareport: 80, protocol: HTTP, hostname: (none). Gateway API spec requires unique port/protocol/hostname tuples per merged listener set. Envoy Gateway'smergeGatewaysmay handle this today, but it's implementation-specific behavior not guaranteed by the spec. - Fallback solver silently broken — new code adds
ingress: { class: nginx }solver when ingress is disabled. nginx-ingress-controller won't be running, so ACME challenges will fail at runtime with no error at render time.
New non-blocking issues
- Dual parentRefs create noise — every ACME challenge HTTPRoute will have one
Accepted: Falseparent (system services fail onhttp-redirect, tenant services fail onacme-challenge). Functional but noisy. - Tenant ACME challenges became more fragile — now depend on namespace co-location with gateway (worked from any namespace before).
| listeners: | ||
| - name: http | ||
| protocol: HTTP | ||
| port: 80 |
There was a problem hiding this comment.
acme-challenge (line 52) and http-redirect (line 72) both declare port: 80, protocol: HTTP with no hostname filter. Gateway API spec requires unique (port, protocol, hostname) tuples across all listeners in a merged set.
Envoy Gateway's mergeGateways may handle this by merging the two listeners into one physical listener with separate allowedRoutes — but this is implementation-specific behavior. The Gateway API conformance tests would flag this as a conflict. If Envoy Gateway tightens spec compliance in a future version, one of these Gateways will get Accepted: False.
Consider consolidating into a single Gateway with a broader allowedRoutes (e.g., from: Selector with a label that covers both system and gateway namespaces), or document explicitly why two Gateways on the same port are needed and that this relies on Envoy Gateway's merge behavior.
There was a problem hiding this comment.
Added a block comment documenting the two-Gateway architecture, why it relies on Envoy Gateway's mergeGateways, and the consolidation path if spec compliance tightens (requires OR logic in label selectors or operator-managed labels).
| kind: Gateway | ||
| {{- end }} | ||
| {{- if and (ne $ingressEnabled "true") (or (ne $gatewayAPI "true") (eq $gateway "")) }} | ||
| - http01: |
There was a problem hiding this comment.
This fallback is new code introduced by this fix. When ingress: false AND gatewayAPI: false (or gateway is empty), it adds:
- http01:
ingress:
class: nginxBut if ingress is disabled, nginx-ingress-controller is not running. The solver exists (so the ClusterIssuer is valid YAML), but ACME challenges will silently fail at runtime — no ingress controller will pick up the challenge.
Replace with:
{{- fail "At least one of gateway.ingress or gateway.gatewayAPI must be enabled for ACME HTTP-01 solver" }}
This catches the misconfiguration at helm template time instead of silently failing in production.
There was a problem hiding this comment.
Fixed. Replaced the silent fallback with {{- fail "At least one of gateway.ingress or gateway.gatewayAPI must be enabled for ACME HTTP-01 solver" }}. Misconfiguration is now caught at helm template time.
| - name: acme-challenge | ||
| namespace: {{ $gateway }} | ||
| kind: Gateway | ||
| - name: http-redirect |
There was a problem hiding this comment.
Adding http-redirect as a second parentRef means every challenge HTTPRoute will have one rejected parent:
- System services (e.g., dashboard in
cozy-dashboard):acme-challengeaccepts (Selector matches),http-redirectrejects (from: Same, different namespace) - Tenant services (e.g., grafana in
tenant-root):http-redirectaccepts (from: Same, same namespace),acme-challengerejects (Selector, no system label)
cert-manager handles partial acceptance correctly, so this is functional. But every challenge will show Accepted: False on one parent in kubectl get httproute -o yaml, which adds noise during debugging.
Consider adding a comment in this template explaining the dual-parentRef strategy and why one rejection per challenge is expected.
There was a problem hiding this comment.
Added a comment in the template explaining the dual-parentRef strategy and that one rejection per challenge is expected. cert-manager handles partial acceptance correctly.
| from: Selector | ||
| selector: | ||
| matchLabels: | ||
| cozystack.io/system: "true" |
There was a problem hiding this comment.
The two-Gateway model (acme-challenge for system namespaces + http-redirect for catch-all redirect) is architecturally sound but non-obvious. A future maintainer debugging ACME failures will need to understand:
- Why two Gateways exist on port 80
- That
acme-challengeserves system namespaces (cozy-*) via Selector - That
http-redirectserves tenant namespaces via Same + catch-all redirect - That cert-manager challenges use dual parentRefs to cover both cases
- That one parentRef rejection per challenge is expected
A block comment here explaining this architecture would save significant debugging time.
There was a problem hiding this comment.
Added a block comment in extra/gateway/templates/gateway.yaml covering all five points: why two Gateways exist, Selector vs Same roles, dual parentRefs in cert-manager solver, and expected single rejection per challenge.
- Upgrade Gateway API CRDs from v1.2.0 to v1.5.0 (experimental channel) - Add gateway-api-crds as a component in all networking variants, with Cilium depending on it to ensure CRDs are available first - Add gateway.gatewayAPI platform config (enables Cilium gatewayAPI.enabled=true), can be used alongside gateway.ingress - Enable Gateway API support in cert-manager Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Kirill Ilin <stitch14@yandex.ru>
When gateway.gatewayAPI is true, pass gateway-httproute source to the system external-dns package via components values override. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Kirill Ilin <stitch14@yandex.ru>
…ayAPI Previously enableGatewayAPI was hardcoded to true in cert-manager values, causing startup failures in isp-hosted variant where Gateway API CRDs are not installed. Now passed via component override only when gateway.gatewayAPI is enabled. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Kirill Ilin <stitch14@yandex.ru>
Add cozystack-gateway system package that creates a shared Gateway resource with per-hostname HTTPS listeners and TLS passthrough listeners for system services. Listeners are dynamically generated from expose-services config. Platform changes: - Add gateway-api flag to _cluster config for component awareness - Add gateway.name/namespace to platform values for parentRef config - Add publishing.hostnames map for per-service hostname overrides - Wire cozystack-gateway package into system bundle (conditional) Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Kirill Ilin <stitch14@yandex.ru>
…vices System services (dashboard, keycloak) need HTTP→HTTPS redirect through the acme-challenge Gateway. Since system namespaces have the cozystack.io/system label, their HTTPRoutes can attach to the acme-challenge Gateway (from: Selector). Tenant services use the http-redirect Gateway (from: Same) for redirect. This keeps the security model intact: only system namespaces can attach to acme-challenge, only the gateway's own namespace can attach to http-redirect. Assisted-By: Claude AI Signed-off-by: Kirill Ilin <stitch14@yandex.ru>
…nfiguration Add architecture documentation comment to extra/gateway explaining the two-Gateway model on port 80, mergeGateways reliance, and why one rejected parentRef per ACME challenge is expected. Add comment to cert-manager-issuers explaining dual parentRef strategy. Replace silent ingress fallback with fail() when both ingress and gatewayAPI are disabled — catches misconfiguration at template time. Assisted-By: Claude AI Signed-off-by: Kirill Ilin <stitch14@yandex.ru>
c6ddefb to
b7deea9
Compare
Regenerated values.schema.json, cozyrds/tenant.yaml, and Go types using cozyvalues-gen v1.3.0 to match CI pre-commit checks. Assisted-By: Claude AI Signed-off-by: Kirill Ilin <stitch14@yandex.ru>
bf2c705 to
f361e37
Compare
Aleksei Sviridkin (lexfrei)
left a comment
There was a problem hiding this comment.
Round 3 — Cross-model review (Opus + Haiku second opinion)
Previous findings are resolved. This round focused on design validation with a second model providing an independent review.
New finding: child tenant ACME challenges silently broken
The ClusterIssuer solver hardcodes namespace: tenant-root (from _cluster.gateway). When a child tenant has gateway: true, it gets its own Gateways in its namespace, but the solver doesn't reference them. Challenge HTTPRoutes created in tenant-child cannot attach to tenant-root's Gateways (acme-challenge rejects via Selector, http-redirect rejects via Same). ACME fails silently.
This is a fundamental limitation of cert-manager's cluster-scoped gatewayHTTPRoute solver combined with namespace-scoped Gateway API allowedRoutes. With ingress, class: nginx is namespace-agnostic so this doesn't happen.
This does NOT need to be fixed in this PR — the root tenant (primary use case) works, and the fix is non-trivial (per-tenant Issuer, DNS-01, or upstream cert-manager changes). But it MUST be documented so users don't hit it silently.
Cleanup: dead PackageSource
cozystack.envoy-gateway PackageSource is never referenced in any bundle. Envoy Gateway is deployed through cozystack.gateway-application. The dead file should be removed to avoid confusion.
| gatewayHTTPRoute: | ||
| parentRefs: | ||
| - name: acme-challenge | ||
| namespace: {{ $gateway }} |
There was a problem hiding this comment.
The solver always references Gateways in {{ $gateway }} which resolves to _cluster.gateway = tenant-root (hardcoded in platform values). This works for:
- System services (cozy-dashboard, cozy-keycloak) —
acme-challengeaccepts via Selector (system label) ✅ - Root tenant services (grafana in tenant-root) —
http-redirectaccepts via Same ✅
But breaks for child tenants with gateway: true:
- Child tenant
tenant-childhas its ownacme-challenge+http-redirectintenant-childnamespace - cert-manager creates challenge HTTPRoute in
tenant-child, referencing Gateways intenant-root acme-challengeintenant-root: Selector requirescozystack.io/system: "true"—tenant-childdoesn't have it → rejectedhttp-redirectintenant-root:from: Same—tenant-child≠tenant-root→ rejected- ACME challenge fails silently
This is a fundamental limitation of ClusterIssuer + namespace-scoped Gateway API (ingress doesn't have this issue because class: nginx is namespace-agnostic).
Doesn't need to be fixed here, but must be documented — add a comment explaining that Gateway API ACME HTTP-01 only works for the root tenant and system services. Child tenants should use DNS-01 solver or namespace-scoped Issuers.
There was a problem hiding this comment.
Added a comment in the solver template documenting the limitation. This will also be covered in the Gateway API documentation on the cozystack website — explaining that child tenants with gateway: true should use dns01 solver or namespace-scoped Issuers for ACME HTTP-01.
| apiVersion: cozystack.io/v1alpha1 | ||
| kind: PackageSource | ||
| metadata: | ||
| name: cozystack.envoy-gateway |
There was a problem hiding this comment.
This PackageSource (cozystack.envoy-gateway) is never referenced in any bundle template (system.yaml). Envoy Gateway is actually deployed through cozystack.gateway-application which has its own envoy-gateway-system component pointing to the same chart, same namespace, same release name.
This is dead code. If someone later tries to use cozystack.envoy-gateway as a standalone Package, it would conflict with gateway-application's component (same Helm release in same namespace). Should be removed.
There was a problem hiding this comment.
Already removed in an earlier commit — the file packages/core/platform/sources/envoy-gateway.yaml was deleted as part of the cozystack-gateway cleanup. Envoy Gateway is deployed exclusively through gateway-application.
There was a problem hiding this comment.
Correction: the file was not actually deleted in earlier commits — my mistake. Removed it now. packages/core/platform/sources/envoy-gateway.yaml is deleted, Envoy Gateway is deployed exclusively through gateway-application.
…child tenant limitation The rebase on main reverted cert-manager-issuers to the old inline format. Restore the helper-based template with gateway solver support. Add comment documenting that Gateway API ACME HTTP-01 only works for the root tenant and system services. Child tenants with gateway: true should use dns01 solver or namespace-scoped Issuers. Assisted-By: Claude AI Signed-off-by: Kirill Ilin <stitch14@yandex.ru>
Add networking/gateway-api.md covering: - Architecture overview with mermaid diagrams - Per-tenant model with Envoy Gateway and mergeGateways - Platform and tenant configuration examples - TLS certificate management (ACME HTTP-01, self-signed, dns01) - Child tenant ACME limitation and workarounds - HTTP-to-HTTPS redirect architecture - ExternalIPs configuration - Supported services tables (HTTPRoute and TLSRoute) - Comparison with ingress-nginx Update install/cozystack/platform.md: - Add Gateway API tabs alongside Ingress in networking setup (MetalLB and Public IP sections) - Add Gateway API tab in root tenant services setup - Add Gateway API link to Next Steps Related: cozystack/cozystack#2213 Assisted-By: Claude AI Signed-off-by: Kirill Ilin <stitch14@yandex.ru>
|
Documentation PR for Gateway API: cozystack/website#460 Covers architecture, configuration, TLS certificates, install instructions with tabbed Ingress/Gateway API examples. |
Add networking/gateway-api.md covering: - Architecture overview with mermaid diagrams - Per-tenant model with Envoy Gateway and mergeGateways - Platform and tenant configuration examples - TLS certificate management (ACME HTTP-01, self-signed, dns01) - Child tenant ACME limitation and workarounds - HTTP-to-HTTPS redirect architecture - ExternalIPs configuration - Supported services tables (HTTPRoute and TLSRoute) - Comparison with ingress-nginx Update install/cozystack/platform.md: - Add Gateway API tabs alongside Ingress in networking setup (MetalLB and Public IP sections) - Add Gateway API tab in root tenant services setup - Add Gateway API link to Next Steps Related: cozystack/cozystack#2213 Assisted-By: Claude AI Signed-off-by: Kirill Ilin <stitch14@yandex.ru>
Add networking/gateway-api.md covering: - Architecture overview with mermaid diagrams - Per-tenant model with Envoy Gateway and mergeGateways - Platform and tenant configuration examples - TLS certificate management (ACME HTTP-01, self-signed, dns01) - Child tenant ACME limitation and workarounds - HTTP-to-HTTPS redirect architecture - ExternalIPs configuration - Supported services tables (HTTPRoute and TLSRoute) - Comparison with ingress-nginx Update install/cozystack/platform.md: - Add Gateway API tabs alongside Ingress in networking setup (MetalLB and Public IP sections) - Add Gateway API tab in root tenant services setup - Add Gateway API link to Next Steps Related: cozystack/cozystack#2213 Assisted-By: Claude AI Signed-off-by: Kirill Ilin <stitch14@yandex.ru>
Add networking/gateway-api.md covering: - Prerequisites (Cozystack v1.2.0+) - Architecture overview with mermaid diagrams - Per-tenant model with Envoy Gateway and mergeGateways - Platform and tenant configuration examples - TLS certificate management (ACME HTTP-01, self-signed, dns01) - Child tenant ACME limitation and workarounds - HTTP-to-HTTPS redirect architecture - ExternalIPs configuration - Supported services tables (HTTPRoute and TLSRoute) - Comparison with ingress-nginx Update install/cozystack/platform.md: - Add Gateway API tabs alongside Ingress in networking setup (MetalLB and Public IP sections) - Add Gateway API tab in root tenant services setup - Add Gateway API link to Next Steps Related: cozystack/cozystack#2213 Assisted-By: Claude AI Signed-off-by: Kirill Ilin <stitch14@yandex.ru>
cozystack.envoy-gateway is never referenced in any bundle template. Envoy Gateway is deployed through cozystack.gateway-application which has its own envoy-gateway-system component. Keeping the standalone source would conflict if someone tried to use it (same Helm release in same namespace). Assisted-By: Claude AI Signed-off-by: Kirill Ilin <stitch14@yandex.ru>
Aleksei Sviridkin (lexfrei)
left a comment
There was a problem hiding this comment.
LGTM
|
Andrei Kvapil (@kvaps) |
|
thanks, closed in favor #2213 |
…2470) ## What this PR does Replaces ingress-nginx as the default ingress layer for cozystack-native services with **Cilium Gateway API**, materialised per-tenant via a new `TenantGateway` CRD reconciled by `cozystack-controller`. Supersedes #2213 / #2208. Stacked on top of #2468. ## Architecture The chart renders one `gateway.cozystack.io/v1alpha1 TenantGateway` CR per tenant that owns its own Gateway. The cozystack-controller reconciles the actual `Gateway`, per-tenant `Issuer`, `Certificate` (wildcard or per-listener), and HTTP→HTTPS redirect `HTTPRoute` from there. Helm does not render `Gateway` or `Certificate` directly — that prevents the Helm-vs-controller race on `Gateway.spec.listeners` that route-driven listener additions would otherwise cause. ``` ┌── TenantGateway CR (rendered by extra/gateway chart for owning tenants) │ └──▶ cozystack-controller reconciles: ├─ Gateway (with dynamic listeners, label-based allowedRoutes) ├─ Issuer (per-tenant ACME account) ├─ Certificate(s) — wildcard + per-child SANs in DNS-01, per-listener in HTTP-01 ├─ HTTPRoute (controller-owned http→https redirect) ├─ Namespace label patching (cozy-* system namespaces from spec.AttachedNamespaces) └─ Watches HTTPRoute / TLSRoute attachments to materialise listeners ``` ## Inheritance: tenants attach by default, opt into their own Gateway A tenant gets a dedicated Gateway (own LB Service, own LB IP, own Certificate) only when it explicitly asks via `tenant.spec.gateway=true`. Every other tenant in the tree publishes through the Gateway of the nearest ancestor that owns one — same shape as the existing `_namespace.ingress` inheritance. The `apps/tenant` chart writes a `namespace.cozystack.io/gateway` label on each tenant namespace, carrying either the tenant's own name (when owning a Gateway) or the inherited ancestor name (when inheriting). The Gateway's listener `allowedRoutes.namespaces.selector` is keyed on that label, so the same selector admits routes from every namespace pointing at the owner — descendants and cozy-* system namespaces alike. cozystack-controller patches the same label onto every namespace in `tgw.Spec.AttachedNamespaces` and garbage-collects labels it wrote when an entry is removed from the attach list (annotation-gated GC so labels written by the apps/tenant chart are never stripped). In DNS-01 mode, the controller extends the parent's wildcard Certificate with `<child-apex>` + `*.<child-apex>` SANs per inheriting child, and adds a `*.<child-apex>` HTTPS listener per child apex referencing the same cert. Without this expansion the parent's single-level wildcard cannot match a child route's hostname (`harbor.alice.example.org` is two labels past the parent's `*.example.org`). Opt-out into own Gateway is the right choice when: a tenant needs its own LB IP (DNS pinned, firewall rule), the apex is not derived from the parent (custom `host` value the ancestor's cert can't cover), or the tenant wants its own ACME account / cert authority. Otherwise leave `gateway` unset and inherit. ## End-to-end flow 1. Platform: `gateway.enabled=true` toggles cert-manager solver, switches per-system-app templates from Ingress to HTTPRoute / TLSRoute. 2. Per tenant: explicit `tenant.spec.gateway=true` renders a `TenantGateway` CR and triggers the controller to materialise the owner Gateway. Tenants with the field unset inherit through `_namespace.gateway` propagation in the tenant chart. 3. Adding a published app under any tenant (owner or inheriting) is just deploying its HTTPRoute. The controller picks up the hostname, adds a per-listener Certificate (HTTP-01) or extends the wildcard cert's SANs (DNS-01), and updates the owner Gateway's listeners. Defaults stay legacy: `gateway.enabled=false`, `tenant.spec.gateway=false`. Existing clusters upgrade unchanged. ## External IP allocation — mechanism-agnostic The per-tenant Gateway's auto-created `LoadBalancer` Service draws its IP from whatever LB allocator the cluster admin has configured at the platform layer — same shape as ingress-nginx today. The tenant API stays mechanism-agnostic — no `gatewayIP` field, no allocator-specific manifest in the tenant chart. Cozystack itself ships MetalLB installed but does not render any `IPAddressPool` / `L2Advertisement` / `BGPAdvertisement` from this chart; admins set up the allocator that suits their environment (MetalLB pool with L2 / BGP, Cilium LB-IPAM with announcer, robotlb against a cloud provider, or `Service.spec.externalIPs` pinning). If a tenant needs a specific address (DNS already pinned, firewall rule, etc.), the operator pre-allocates it on the admin side: either pre-create the Service with `loadBalancerIP` set, or hand the tenant a reference to a named admin-managed pool. Per-Service IP uniqueness is the allocator's responsibility — same as for any other LoadBalancer Service. ## Cert mode: HTTP-01 (default) vs DNS-01 (opt-in) `publishing.certificates.solver` selects how the controller sources TLS certs. **HTTP-01 (default)** — per-app HTTPS listener + per-app Certificate. New apps require zero platform-side config: deploy the HTTPRoute, controller does the rest. Inheriting children's hostnames pick up per-listener certs the same way the owner's own apps do. **DNS-01 (opt-in)** — single wildcard Certificate (`<apex>` + `*.<apex>`) covering every published app under the apex, plus per-child SANs (`<child-apex>` + `*.<child-apex>`) for every inheriting tenant. Pick this for clusters where many apps share the apex and Let's Encrypt rate limits matter. The DNS provider account must be able to write TXT records under every apex level the parent serves — for deeply-nested inherited children that requires zone delegation or a wide-scope provider credential. | `publishing.certificates.dns01.provider` | Required `publishing.certificates.dns01.<provider>` keys | | --- | --- | | `cloudflare` (default) | `cloudflare.secretName`, `cloudflare.secretKey` | | `route53` | `route53.region`, `route53.secretName` (and `accessKeyID` if not on IRSA) | | `digitalocean` | `digitalocean.secretName` | | `rfc2136` | `rfc2136.nameserver`, `rfc2136.tsigKeyName`, `rfc2136.secretName` | The platform chart wires every provider's keys into `_cluster.dns01-*`; the per-tenant gateway chart and the cluster-wide `cluster-issuers.yaml` both read them. ## Strengthens admission coverage on apps.cozystack.io/* The `cozystack-tenant-host-policy` probe in `gateway.bats` initially looked like a flake — the apply that was supposed to be VAP-rejected just succeeded. Investigation traced it to the custom REST handler at `pkg/registry/apps/application/rest.go`. genericapiserver hands every storage method a callback into the validating admission chain: `createValidation` for Create, `updateValidation` for Update, `deleteValidation` for Delete. `Update` was correctly invoking its callback since 23e399b, but `Create` and `Delete` accepted the parameter and never called it — admission was silently bypassed for both verbs on every `apps.cozystack.io` resource (Tenant, MariaDB, Postgres, Kubernetes, …). This PR wires the missing `createValidation` and `deleteValidation` calls into Create and Delete. Every ValidatingAdmissionPolicy and ValidatingWebhook against `apps.cozystack.io/*` now fires on all three verbs as the API contract requires. Per KEP-3488 aggregated APIServers are responsible for enforcing admission on their own kinds; cozystack-api is configured to do so via `genericoptions.NewRecommendedOptions`, the only thing missing was the explicit `*Validation` callbacks in the custom REST. ## Security — five guards, grouped by what they defend Tenants in Cozystack interact with the platform exclusively through `apps.cozystack.io/*` resources (Tenant, Bucket, Kubernetes, …) served by `cozystack-api`. Tenant RBAC (`cozy:tenant:*` aggregated to a RoleBinding in the tenant's own namespace) does not grant write access to `gateway.networking.k8s.io/*`, core `Namespaces`, or `cozystack.io/Package`. The protections below split into three groups by who they defend against: **Tenant-user-input gates.** Layer 4 (`cozystack-tenant-host-policy`). `Tenant.spec.host` is the user-supplied field that surfaces as a security boundary at the hostname layer; gated on every Create / Update via `cozystack-api`'s admission chain. **Defense-in-depth.** Layers 1, 2, 5, 7. These do not protect against tenant-user input (tenants don't hold the relevant RBAC). They guard against bugs in cozystack-controller / Flux, supply-chain compromise of an app chart that emits Gateway API resources, and confused-deputy mistakes by a cluster admin. Fail-closed via `failurePolicy: Fail` + `validationActions: [Deny]`. The five layers themselves: 1. **Listener `allowedRoutes` label selector** — `namespace.cozystack.io/gateway = <owner-tenant-name>` (controller-written for cozy-* via `AttachedNamespaces`, Helm-written for tenant namespaces via the tenant chart, both feeding into the same Gateway selector). 2. **`cozystack-gateway-hostname-policy`** — VAP on `Gateway` CREATE/UPDATE. Restricts listener hostnames to the namespace's `namespace.cozystack.io/host` apex. 4. **`cozystack-tenant-host-policy`** — VAP on `Tenant` CREATE/UPDATE. 5. **`cozystack-namespace-host-label-policy`** — VAP on core `Namespace` CREATE/UPDATE. Immutability gate on the host label. 7. **`cozystack-route-hostname-policy`** — VAP on `HTTPRoute` and `TLSRoute` CREATE/UPDATE. Closes the hostname-hijack vector across namespaces under inheritance — a child cannot claim a hostname outside its own apex. The previous Layer 3 (`cozystack-gateway-attached-namespaces-policy`) and Layer 6 (render-time `fail` in cozystack-basics) banned `tenant-*` entries in `publishing.gateway.attachedNamespaces` on the assumption that the attach list was the hijack vector. Under inheritance the attach surface is the label selector, not the static list; the hijack vector is closed by Layers 4/5/7 independently. Both Layer 3 and Layer 6 are dropped — kept in numbering above for continuity with the earlier review threads. ## Foreign-takeover guards Five reconcile paths refuse to silently rewrite a pre-existing object that shares the controller-derived name but carries no `OwnerReference` back to the TenantGateway: `Gateway`, redirect `HTTPRoute`, per-tenant `Issuer`, wildcard `Certificate`, per-listener `Certificate`. An operator who hand-pinned a Certificate or Issuer at the controller's derived name (private CA, manual cert pinning) gets an explicit `Ready=False/ReconcileError` condition instead of having their config silently destroyed and the resource re-issued from a different ACME account. The namespace-label patching path applies the same discipline: cozystack-controller only writes / strips `namespace.cozystack.io/gateway` on namespaces it annotates with `cozystack.io/gateway-attached-by`. Labels written by the apps/tenant chart (no annotation) are never touched, so inheritance for tenant namespaces survives every reconcile. ## Tests - **Go controller tests** — 50+ cases in `internal/controller/tenantgateway/`: idempotency, OwnerReference cascade, multi-parentRef status, HTTP-01↔DNS-01 mode-transition cert lifecycle, foreign-takeover refusals (Gateway / HTTPRoute / Issuer / Certificate / per-listener Certificate), DNS-01 provider matrix (cloudflare / route53 / digitalocean / rfc2136), HTTPS Kinds=[HTTPRoute] restriction, fail-closed CEL on missing host label, status condition wiring. Plus, for inheritance: label-based allowedRoutes selector shape, namespace label patch + garbage-collect via owner annotation, Helm-owned label safety, DNS-01 wildcard SAN expansion per inheriting child, per-child wildcard listener materialisation. - **Admission-chain tests** — `pkg/registry/apps/application/rest_admission_test.go` pins `createValidation`, `updateValidation`, `deleteValidation` paths via sentinel callbacks. - **helm-unittest** across `extra/gateway` (TenantGateway CR rendering, both modes), `core/platform` (DNS-01 wiring), `apps/tenant` (gateway-effective decision matrix including inheritance), `system/cozystack-basics` (three VAPs + Layer 7 VAP + tenant-root namespace label including the conditional `namespace.cozystack.io/gateway: tenant-root` + cozystack-values-secret `_namespace.gateway` propagation), `system/cert-manager-issuers` (full DNS-01 provider matrix), `system/cozystack-api` and `system/dashboard` (Ingress↔TLSRoute/HTTPRoute toggle), `system/monitoring` (Grafana / Alerta Ingress↔HTTPRoute toggle). - **e2e bats** — `hack/e2e-apps/gateway.bats` (17 cases) covering GatewayClass Accepted, Gateway Programmed + LB Service, HTTPRoute Accepted with status, VAP rejects cross-tenant hostname hijack, VAP rejects `tenant.spec.host` writes from non-trusted SA (regression-detector for the admission-chain fix), VAP rejects `namespace.cozystack.io/host` label changes from non-trusted SA, route-hostname VAP rejects HTTPRoute hostnames outside the apex, and three new cases for inheritance — Package admission accepts `tenant-*` in `attachedNamespaces` (regression guard for the dropped ban), child tenant without explicit gateway inherits `_namespace.gateway` from a Gateway-owning parent in lockstep across the namespace label and the cozystack-values Secret, and the cross-namespace attach end-to-end where a child tenant's HTTPRoute reaches Accepted on the parent's Gateway via the inheritance label. - **CI wiring** — `make test-controllers` target invoked from `pull-requests.yaml` runs `go test ./internal/...`. ## Backward compatibility The only always-on change is the flip of `cilium.envoy.enabled` + `cilium.gatewayAPI.enabled` to `true` (extra `cilium-envoy` DaemonSet, ~100 MB RAM/node idle). Everything else is gated behind `gateway.enabled=false` / `tenant.spec.gateway=false`. Existing clusters see no behavioural change until an operator explicitly opts in. The cozystack-api admission-chain fix is unconditional. Existing operators who do not define any VAP / admission webhook on `apps.cozystack.io` resources see no observable change. Operators who DO have their own webhooks/VAPs against these kinds will start seeing them fire on Create and Delete (as intended) — review your policies before merging if you maintain custom admission for `apps.cozystack.io/*`. Operators who set `publishing.gateway.attachedNamespaces` to include `tenant-*` entries used to hit a render-time fail or an admission rejection — now the apply succeeds and the listed tenant namespaces simply pick up the gateway-attach label alongside the cozy-* system namespaces. This matches the inheritance model and was kvaps' explicit ask in the May 25 review. ## Release note ```release-note - Adds Gateway API support via Cilium, materialised per-tenant via the new `gateway.cozystack.io/v1alpha1 TenantGateway` CRD reconciled by `cozystack-controller`. Opt-in via `publishing.gateway.enabled=true` at the platform level; per-tenant opt-in via `tenant.spec.gateway=true` for dedicated Gateway / LB IP, otherwise the tenant inherits the nearest ancestor's Gateway via a label-based `allowedRoutes` selector (same shape as the existing `_namespace.ingress` inheritance). Includes HTTP-01 (default) and DNS-01 (opt-in, with cloudflare / route53 / digitalocean / rfc2136 providers) cert solvers; DNS-01 extends the parent's wildcard Certificate with per-child apex SANs and adds a `*.<child-apex>` listener per inheriting tenant. Tenant API is mechanism-agnostic for IP allocation — the LoadBalancer Service draws its IP from the cluster's admin-configured allocator (MetalLB / Cilium LB-IPAM / robotlb / externalIPs). - Strengthens admission coverage on `apps.cozystack.io/*` resources: `cozystack-api` now invokes `createValidation` and `deleteValidation` callbacks on Create and Delete (Update was already wired). Operators with custom ValidatingAdmissionPolicies or webhooks targeting `apps.cozystack.io/*` will see them fire on all three verbs as the API contract requires; review your policies before merging if you maintain such admission. ```
…ion (#509) ## What this PR does Adds a new `networking/gateway-api.md` page to the `next/` docs trunk describing the Cilium-backed Gateway API feature that landed in cozystack/cozystack#2470. Also extends the platform-package reference with the `gateway.*` and `publishing.certificates.dns01.*` value rows the same upstream PR introduces. The page covers, as the feature shipped: - a new platform-level toggle (`gateway.enabled`) and a per-tenant toggle (`tenant.spec.gateway`) that opts a tenant into its own dedicated Gateway (own LB IP, own per-tenant Issuer, own Certificate); - inheritance through the `namespace.cozystack.io/gateway` label: tenants without `spec.gateway: true` attach their routes to the nearest ancestor's Gateway via a label-keyed `allowedRoutes.namespaces.selector` (same shape as `_namespace.ingress` inheritance); - mechanism-agnostic LoadBalancer IP allocation: the auto-created Service draws its IP from whatever LB allocator the cluster admin has wired up (MetalLB / Cilium LB-IPAM / robotlb / `Service.spec.externalIPs`). Cozystack ships MetalLB installed but does not auto-render any `IPAddressPool` / `L2Advertisement` / `BGPAdvertisement` / `CiliumLoadBalancerIPPool`. There is no `gatewayIP` field on the Tenant CR — to pin an address, the operator pre-creates the Service with `loadBalancerIP` set or hands the tenant a reference to a named admin-managed pool; - a migration away from `Ingress` for every cozystack-native exposed service (dashboard, keycloak, grafana, alerta via HTTPRoute; kubernetes-api, vm-exportproxy, cdi-uploadproxy via TLSRoute passthrough; harbor and bucket attached to the owning tenant's Gateway); - a per-tenant cert-manager `Issuer` (own ACME account) plus per-listener Certificates (HTTP-01) or a single wildcard Certificate extended with per-child-apex SANs and a `*.<child-apex>` listener per inheriting tenant (DNS-01). Four DNS providers supported (cloudflare, route53, digitalocean, rfc2136); - a five-layer security model grouped by who it defends against: Layer 3 gates tenant-user input on `Tenant.spec.host` via `cozystack-api`'s admission chain; Layers 1, 2, 4, 5 are defense-in-depth via VAPs and the listener label selector (tenants do not hold `gateway.networking.k8s.io/*` RBAC by design); - foreign-takeover guards on six reconcile paths (Gateway, redirect HTTPRoute, per-tenant Issuer, wildcard Certificate, per-listener Certificate, plus the namespace-label patching path which only writes / strips `namespace.cozystack.io/gateway` on namespaces it annotates with `cozystack.io/gateway-attached-by`). ### Sections - **Overview** — opt-in defaults, inheritance, coexistence with ingress-nginx. - **Architecture** — reconciliation flow, traffic path, listener layout (HTTP-80 / HTTPS / TLS-passthrough), `allowedRoutes` selector mechanics per listener role. - **Enabling Gateway API** — platform-level Package example with full `attachedNamespaces` list, per-tenant Tenant example for owning and inheriting tenants, custom-apex case. - **Inheritance** — how the namespace label is set, how cross-namespace ParentRef is gated, how DNS-01 mode extends the parent's wildcard cert. - **Cert mode** — HTTP-01 (default, per-listener) vs DNS-01 (opt-in, wildcard + per-child SANs), provider matrix, listener-cap considerations. - **Per-service routing** — HTTPRoute / TLSRoute tables: service → namespace → route name → backend → listener. - **Security** — mermaid diagram and one section per layer, plus HostnameConflict resolution and the foreign-takeover guards. - **Certificates** — per-tenant Issuer, supported ACME servers, Let's Encrypt rate limits and mitigations. - **Migration from ingress-nginx** — step-by-step for new and existing clusters. - **Known limitations** — multi-tenant shared LB IP (deferred until Cilium ListenerSet, cilium#42756), TLSRoute `v1alpha2`, DNS-01 multi-apex DNS-provider requirement, supported ACME issuers, upstream application gaps. - **Troubleshooting** — concrete `kubectl` recipes for the most likely stuck states (TenantGateway ReconcileError, Gateway not Programmed, Certificate not Ready, HostnameConflict, admission denials, inheriting child route not accepted, `<pending>` LB IP). ### Target branch `next/` — the version-agnostic trunk. When the page ships with the upstream release that contains cozystack/cozystack#2470, it materialises automatically under the corresponding `vX.Y/` via `make release-next`. ### Not included The legacy `v1/networking/gateway-api.md` page on the abandoned `docs/gateway-api` branch (from the Envoy Gateway proposal in cozystack/cozystack#2213) is unrelated to this PR. That proposal targeted a different architecture that has since been superseded. ### Release note ```release-note NONE ```
What this PR does
Adds Gateway API (HTTPRoute/TLSRoute) support alongside existing Ingress resources for all system and tenant-level components, enabling a migration path from ingress-nginx to Gateway API using Envoy Gateway as the controller.
Why Envoy Gateway instead of Cilium Gateway API
Cilium's built-in Gateway API controller creates a separate LoadBalancer Service per Gateway resource. With
publishing.externalIPs, IPs get spread across services instead of being concentrated on one — breaking the externalIPs use case entirely.Cilium limitations that drove this decision:
mergeGateways— each Gateway = separate LB Service, no way to share IP on same portsXListenerSet(GEP-1713) — cannot delegate listeners to shared GatewaystreamIdleTimeoutDurationSecondsEnvoy Gateway solves all of these. Cilium remains the CNI (eBPF, network policy, LB IPAM).
Architecture
Per-tenant gateway model — mirrors ingress-nginx:
gateway: truein tenant → deploys per-tenant GatewayClass + EnvoyProxymergeGateways: true→ all per-component Gateways in the tenant merge into one LB ServiceCiliumLoadBalancerIPPoolassigns externalIPs to the merged Servicegateway: false→ inherit parent's GatewayClassTLS certificates via cert-manager + Gateway API
ACME HTTP-01 challenges are served through a shared
acme-challengeGateway with a wildcard HTTP listener (allowedRoutes: All). This avoids conflict with HTTP→HTTPS redirects:acme-challengeGateway is the sole HTTP listener on port 80redirect-to-httpsHTTPRoute handles HTTP→HTTPS redirect/.well-known/acme-challenge/TOKEN) takes priority over the redirect (implicit PathPrefix/)http01.ingresswhen ingress enabled,http01.gatewayHTTPRoutewhen gatewayAPI enabled, both when both enabled. dns01 unchanged.Envoy Gateway cluster domain
Envoy Gateway subchart hardcodes
kubernetesClusterDomain: cluster.local, but cozystack uses a configurable domain (e.g.cozy.local). The subchart templates are patched duringmake updateto readglobal.clusterDomain(populated fromnetworking.clusterDomainvia cozystack-values secret) with fallback to the default.Package structure (mirrors ingress-nginx)
packages/system/ingress-nginx/packages/system/envoy-gateway/packages/extra/ingress/packages/extra/gateway/packages/system/ingress-rd/packages/system/gateway-rd/sources/ingress-nginx.yamlsources/envoy-gateway.yamlsources/ingress-application.yamlsources/gateway-application.yamltenant/templates/ingress.yamltenant/templates/gateway.yamlComponents covered
All Gateway API resources are conditional on
gateway.gatewayAPI: true. Existing Ingress templates are untouched — both can coexist simultaneously.Release note
Summary by CodeRabbit