fix(orchestrator): fail DB creation job on actual errors instead of silently succeeding#407
Conversation
…ilently succeeding
Code Review by Qodo
Context used✅ Tickets:
RHDHBUGS-2577 1.
|
Signed-off-by: Fortune-Ndlovu <fndlovu@redhat.com>
Review Summary by QodoFix DB creation job error handling and make backoffLimit configurable
WalkthroughsDescription• Replace blanket error suppression with proper database creation error handling • Distinguish between "database already exists" (acceptable) and real failures (exit non-zero) • Make job backoffLimit configurable via dbCreationJobBackoffLimit in values.yaml • Enable Kubernetes to properly retry on actual database creation failures Diagramflowchart LR
A["DB Creation Job"] -->|"Old: || echo WARNING"| B["Always Succeeds"]
A -->|"New: Check if exists"| C{"Database exists?"}
C -->|"Yes"| D["Skip & Succeed"]
C -->|"No"| E["Exit 1"]
E -->|"Retry via backoffLimit"| A
File Changes1. charts/backstage/templates/sonataflows.yaml
|
…job-should-be-able-to-retry-properly-and-fail-if-needed-it-shoud-not-silently-succeed-if-there-are-errors
….3, adding dbCreationJobBackoffLimit parameter.
|
/cherry-pick release-1.10 |
|
@rm3l: once the present PR merges, I will cherry-pick it on top of DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/review |
PR Reviewer Guide 🔍Warning
Here are some key observations to aid the review process:
|
…json and values.schema.tmpl.json with minimum and maximum constraints. Update sonataflows.yaml to simplify database creation command. Signed-off-by: Fortune-Ndlovu <fndlovu@redhat.com>
Signed-off-by: Fortune-Ndlovu <fndlovu@redhat.com>
|
/agentic_review |
|
Persistent review updated to latest commit 635fd18 |
…exists in PostgreSQL and matches the behavior of the internal branch Signed-off-by: Fortune-Ndlovu <fndlovu@redhat.com>
|
/agentic_review |
|
Persistent review updated to latest commit 8f6b80f |
Signed-off-by: Fortune-Ndlovu <fndlovu@redhat.com>
|
cc/ @rm3l for review |
…job-should-be-able-to-retry-properly-and-fail-if-needed-it-shoud-not-silently-succeed-if-there-are-errors
Updated the Job name to use a shorter format and added a TTL of 300 seconds after the job finishes. Removed Helm hook annotations for cleaner configuration.
…job-should-be-able-to-retry-properly-and-fail-if-needed-it-shoud-not-silently-succeed-if-there-are-errors
Signed-off-by: Fortune-Ndlovu <fndlovu@redhat.com>
…pgrade compatibility The CI "Test Latest Release" check fails because helm upgrade tries to patch the existing Job's spec.template, which Kubernetes rejects as immutable. The old chart created the Job without ttlSecondsAfterFinished, so it persists indefinitely and blocks the upgrade. Adding helm.sh/hook and helm.sh/hook-delete-policy annotations makes Helm delete the old Job before creating the new one on upgrade.
…job-should-be-able-to-retry-properly-and-fail-if-needed-it-shoud-not-silently-succeed-if-there-are-errors
|
/agentic_review |
|
Code review by qodo was updated up to the latest commit 13b5e0f |
Add hook-succeeded to the Helm hook delete policy so that successful. Jobs are cleaned up immediately while failed Jobs are kept for log inspection. TTL still handles cleanup for ArgoCD users after 5 minutes.
|
/agentic_review |
|
Code review by qodo was updated up to the latest commit 41fbb3e |
|
|
/agentic_review |
|
Code review by qodo was updated up to the latest commit 333fd0c |
| args: | ||
| - "psql -h ${POSTGRES_HOST} -p ${POSTGRES_PORT} -U ${POSTGRES_USER} -d {{ .Values.orchestrator.sonataflowPlatform.externalDBName }} -c 'CREATE DATABASE sonataflow;' || echo WARNING: Could not create database" | ||
| - | | ||
| psql -h ${POSTGRES_HOST} -p ${POSTGRES_PORT} -U ${POSTGRES_USER} -d postgres -c 'CREATE DATABASE sonataflow;' 2>&1 || { |
There was a problem hiding this comment.
| psql -h ${POSTGRES_HOST} -p ${POSTGRES_PORT} -U ${POSTGRES_USER} -d postgres -c 'CREATE DATABASE sonataflow;' 2>&1 || { | |
| psql -h ${POSTGRES_HOST} -p ${POSTGRES_PORT} -U ${POSTGRES_USER} -d {{ .Values.orchestrator.sonataflowPlatform.externalDBName }} -c 'CREATE DATABASE sonataflow;' 2>&1 || { |
| - "psql -h ${POSTGRES_HOST} -p ${POSTGRES_PORT} -U ${POSTGRES_USER} -d {{ .Values.orchestrator.sonataflowPlatform.externalDBName }} -c 'CREATE DATABASE sonataflow;' || echo WARNING: Could not create database" | ||
| - | | ||
| psql -h ${POSTGRES_HOST} -p ${POSTGRES_PORT} -U ${POSTGRES_USER} -d postgres -c 'CREATE DATABASE sonataflow;' 2>&1 || { | ||
| if psql -h ${POSTGRES_HOST} -p ${POSTGRES_PORT} -U ${POSTGRES_USER} -d postgres -tc "SELECT 1 FROM pg_database WHERE datname='sonataflow'" | grep -q 1; then |
There was a problem hiding this comment.
| if psql -h ${POSTGRES_HOST} -p ${POSTGRES_PORT} -U ${POSTGRES_USER} -d postgres -tc "SELECT 1 FROM pg_database WHERE datname='sonataflow'" | grep -q 1; then | |
| if psql -h ${POSTGRES_HOST} -p ${POSTGRES_PORT} -U ${POSTGRES_USER} -d {{ .Values.orchestrator.sonataflowPlatform.externalDBName }} -tc "SELECT 1 FROM pg_database WHERE datname='sonataflow'" | grep -q 1; then |
| # Note that when this chart is published to https://github.com/openshift-helm-charts/charts | ||
| # it will follow the RHDH versioning 1.y.z | ||
| version: 5.14.0 | ||
| version: 5.14.1 |
There was a problem hiding this comment.
| version: 5.14.1 | |
| version: 6.0.0 |
Let's denote the fact that it is a potentially breaking change.
| "helm.sh/hook": post-install,post-upgrade | ||
| "helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded |
There was a problem hiding this comment.
The problem with the hooks here is that the helm install/upgrade command will hang by default until the Job is done, which might take some time. I think it would be safer to switch to the previous approach, where you changed the job name (might even be better to include a chart version in the name).
With hindsight, I think it should be fine in case of an upgrade even if the previous Job is still there, as it won't be running anyway.



Description of the change
The create-sonataflow-database Job was using
|| echo WARNINGwhich swallowed all psql errors, causing the Job to always report success. This prevented Kubernetes from retrying via backoffLimit when there were real failures (e.g. wrong credentials).This PR replaces the blanket error suppression with proper handling that tolerates "database already exists" but exits non-zero on real failures. Also make backoffLimit configurable via values.yaml.
Which issue(s) does this PR fix or relate to
https://redhat.atlassian.net/browse/RHDHBUGS-2577
How to test changes / Special notes to the reviewer
# Verify schema rejects invalid values helm template rhdh charts/backstage \ --set orchestrator.enabled=true \ --set orchestrator.sonataflowPlatform.dbCreationJobBackoffLimit=-1 Expected: Helm rejects with a schema validation error.Checklist
Chart.yamlaccording to Semantic Versioning.values.yamland added to the corresponding README.md. The pre-commit utility can be used to generate the necessary content. Runpre-commit run --all-filesto run the hooks and then push any resulting changes. The pre-commit Workflow will enforce this and warn you if needed.pre-commithook.ct lintcommand.