Skip to content

revert: grafana provisioning refactor — unsupported templated enum step field (AROSLSRE-1379)#5887

Merged
raelga merged 1 commit into
mainfrom
revert-5843-grafana-refactor-grafana
Jul 2, 2026
Merged

revert: grafana provisioning refactor — unsupported templated enum step field (AROSLSRE-1379)#5887
raelga merged 1 commit into
mainfrom
revert-5843-grafana-refactor-grafana

Conversation

@raelga

@raelga raelga commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator

Reverts #5843

ARO-27904 · AROSLSRE-1379

What

Reverts "fix: Refactor grafana provisioning" (#5843), restoring Grafana provisioning via the Bicep deployment (global-infra.bicep + global-infra.tmpl.bicepparam) instead of the GrafanaManage pipeline action.

Why

#5843 moved Grafana provisioning into a GrafanaManage pipeline action whose values are declared as pipeline step fields sourced from config templates:

action: GrafanaManage
zoneRedundancy: "{{ .monitoring.grafanaZoneRedundantMode }}"

This breaks EV2 manifest generation. During aro ev2 manifests generate, the pipeline is precompiled with dunder sentinels substituted for every config value, then validated strictly against pipeline.schema.v1.json:

generate.go:118  dunderCfg := EV2Mapping(system, NewDunderPlaceholders())   // every value -> __key__
generate.go:327  NewPipelineFromFile(path, dunderCfg)
pipeline.go:67   PreprocessContent(...)   // {{ .monitoring.grafanaZoneRedundantMode }} -> __monitoring.grafanaZoneRedundantMode__
pipeline.go:72   ValidatePipelineSchema() // strict, no placeholder allowance

zoneRedundancy is a strict enum (Enabled/Disabled), so the substituted sentinel __monitoring.grafanaZoneRedundantMode__ fails validation:

at '/resourceGroups/0/steps/1/zoneRedundancy': value must be one of 'Enabled', 'Disabled'

This fails regardless of the config value — it is not a missing-default problem. (majorVersion/grafanaName are plain type: string, so their sentinels pass, which is why only zoneRedundancy errored.) Surfaced in sdp-pipelines build 170669006 / PR #16296188.

You cannot source a strict-enum pipeline step field from a {{ }} config template. Reverting to Bicep provisioning restores the supported pattern: the same values flow through .tmpl.bicepparam as simple field-access substitutions, rendered with real values at deploy time and type-checked by Bicep — never validated as pipeline step fields. This is consistent with ARO-27904 / #5863 ("logic lives in config; bicepparam templates are simple field access").

Testing

  • EV2 manifest generation for public/int no longer hits the zoneRedundancy enum failure once the GrafanaManage step is removed.
  • Reverts cleanly to the prior Bicep-based provisioning; grafana params (param grafanaZoneRedundantMode = '{{ .monitoring.grafanaZoneRedundantMode }}', etc.) are simple field access, compliant with the ARO-27904 / feat(templatize): enforce simple field access in bicepparam templates #5863 validator.

Special notes for your reviewer

  • This supersedes fix(config): set grafanaZoneRedundantMode for public clouds (AROSLSRE-1379) #5886, which incorrectly attempted to fix the failure by adding a config default — that does not help, because the dunder precompile rejects the sentinel regardless of the config value.
  • Follow-up: if Grafana is re-refactored onto a pipeline action later, enum-constrained step fields must be literals (or the pipeline schema must gain placeholder tolerance), not config templates.

PR Checklist

  • PR is scoped to a single task (no mixed concerns)
  • Title follows Conventional Commits format
  • Summary explains the "Why" behind the change
  • Linked to relevant ticket/issue
  • Screenshots included (if graph/UI/metrics changes)
  • Self-reviewed the diff
  • CI/CD checks are passing (ignore Tide)
  • Draft PR used for WIP (if applicable)
  • Commit history is clean (rebased/squashed)
  • Tricky code blocks are commented
  • Specific reviewers tagged
  • All comment threads resolved before merge

Copilot AI review requested due to automatic review settings July 2, 2026 08:10
@openshift-ci openshift-ci Bot added the approved label Jul 2, 2026
@janboll

janboll commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator

/lgtm
/approve

@openshift-ci

openshift-ci Bot commented Jul 2, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: janboll, raelga

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@raelga raelga changed the title Revert "fix: Refactor grafana provisioning" revert: grafana provisioning refactor — unsupported templated enum step field (AROSLSRE-1379) Jul 2, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reverts #5843 by removing the new Grafana reconciliation pipeline/step wiring and rolling back multiple tooling dependencies/config values to earlier versions, while leaving Grafana-related infra temporarily paused (no-op outputs and commented pipeline steps).

Changes:

  • Removed Grafana reconciliation pipeline entries and the GrafanaManage step execution path from templatize.
  • Rolled back github.com/Azure/ARO-Tools/* (and related Azure SDK) versions across several Go modules.
  • Paused Grafana outputs/steps in global pipelines and reverted Grafana major version configuration to 11.

Reviewed changes

Copilot reviewed 36 out of 46 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
topology.yaml Removes the dev-only Grafana pipeline entry from the service topology list.
tooling/yamlwrap/go.sum Rolls back ARO-Tools dependency checksums.
tooling/yamlwrap/go.mod Rolls back ARO-Tools yamlwrap/testutil versions.
tooling/templatize/pkg/pipeline/run.go Removes execution support for GrafanaManageStep.
tooling/templatize/pkg/pipeline/grafana_reconcile.go Deletes the Grafana manage-step runner implementation.
tooling/templatize/go.sum Rolls back ARO-Tools and related dependency checksums.
tooling/templatize/go.mod Rolls back ARO-Tools module versions and drops indirect resourcegraph dep.
tooling/secret-sync/go.sum Rolls back ARO-Tools dependency checksums.
tooling/secret-sync/go.mod Rolls back ARO-Tools secret-sync and indirect deps.
tooling/pipeline-documentation/go.sum Rolls back ARO-Tools pipelines checksum.
tooling/pipeline-documentation/go.mod Rolls back ARO-Tools pipelines version.
tooling/helmtest/go.sum Rolls back ARO-Tools dependency checksums.
tooling/helmtest/go.mod Rolls back ARO-Tools config/pipelines/testutil and indirect deps.
tooling/hcpctl/go.sum Rolls back cmdutils checksum and downgrades armresourcegraph checksum.
tooling/hcpctl/go.mod Rolls back cmdutils and downgrades armresourcegraph to v0.9.0.
tooling/grafanactl/main.go Removes the manage command from the grafanactl CLI wiring.
tooling/grafanactl/go.sum Rolls back grafanactl/cmdutils checksums.
tooling/grafanactl/go.mod Rolls back grafanactl and cmdutils dependencies.
tooling/aro-hcp-exporter/go.sum Downgrades armresourcegraph checksums to v0.9.0.
tooling/aro-hcp-exporter/go.mod Downgrades armresourcegraph to v0.9.0.
test/go.sum Rolls back ARO-Tools and related dependency checksums for tests.
test/go.mod Rolls back ARO-Tools config/prow-job-executor versions.
mgmt-agent/go.sum Rolls back ARO-Tools testutil checksum.
mgmt-agent/go.mod Rolls back ARO-Tools testutil version.
docs/pipelines.md Removes documentation reference to the Grafana test pipeline.
dev-infrastructure/templates/output-grafana.bicep Deletes the Grafana-specific output template.
dev-infrastructure/templates/output-global.bicep Emits empty Grafana outputs to keep dependent steps as no-ops while paused.
dev-infrastructure/templates/grafana-roles.bicep Deletes the Grafana role assignment template wrapper.
dev-infrastructure/templates/global-infra.bicep Introduces (currently commented-out) Grafana reconciliation modules and related params.
dev-infrastructure/modules/grafana/integration-lookup.bicep Adds a deployment-script module to look up existing Grafana integrations.
dev-infrastructure/modules/grafana/instance.bicep Refactors Grafana module to create/manage the Grafana resource and role assignments.
dev-infrastructure/modules/grafana/grafanaIntegrationsLookup.ps1 Adds a deployment-script PowerShell implementation for integration lookup.
dev-infrastructure/grafana-pipeline.yaml Deletes the standalone Grafana test pipeline.
dev-infrastructure/global-pipeline.yaml Removes Grafana reconcile/roles steps; comments out downstream Grafana steps.
dev-infrastructure/global-pipeline-stg.yaml Removes Grafana reconcile/roles steps; comments out downstream Grafana steps.
dev-infrastructure/configurations/output-grafana.tmpl.bicepparam Deletes Grafana output parameter template.
dev-infrastructure/configurations/grafana-roles.tmpl.bicepparam Deletes Grafana roles parameter template.
dev-infrastructure/configurations/grafana-roles-stg.tmpl.bicepparam Deletes STG Grafana roles parameter template.
dev-infrastructure/configurations/global-infra.tmpl.bicepparam Adds Grafana-related params to global infra parameters.
config/rendered/dev/pers/westus3.yaml Reverts rendered Grafana major version to 11.
config/rendered/dev/perf/westus3.yaml Reverts rendered Grafana major version to 11.
config/rendered/dev/dev/westus3.yaml Reverts rendered Grafana major version to 11.
config/rendered/dev/cspr/westus3.yaml Reverts rendered Grafana major version to 11.
config/rendered/dev/ci01/centralus.yaml Reverts rendered Grafana major version to 11.
config/rendered/dev/ci00/centralus.yaml Reverts rendered Grafana major version to 11.
config/config.yaml Reverts configured Grafana major version to 11.

//

// why do we read the registered workspace IDs and feed it into grafana creation again?
// becaue the grafana ARM resource expects the list to be provided or otherwise wipes the
Comment on lines +14 to +19
# Ensure the Az.ResourceGraph module is available
if (-not (Get-Module -ListAvailable -Name Az.ResourceGraph)) {
Write-Output "Az.ResourceGraph module not found. Installing..."
Install-Module -Name Az.ResourceGraph -Force -Scope CurrentUser
Import-Module Az.ResourceGraph
}
Comment on lines +31 to +46
$result = Search-AzGraph -Query $query

if (-not $result) {
# Grafana does not exist, return empty array
Write-Output "No Grafana integrations found or Grafana does not exist."
$output = @()
} else {
# Extract workspace IDs from the azureMonitorWorkspaceIntegrations list
Write-Output "Grafana integrations found: $($result | ConvertTo-Json -Depth 10)"
$output = @()
foreach ($item in $result.grafanaIntegrations) {
if ($item.azureMonitorWorkspaceResourceId) {
$output += $item.azureMonitorWorkspaceResourceId
}
}
}
@raelga

raelga commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator Author

/retest

@raelga

raelga commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator Author

Force-merge justification

This is a clean revert. The tide queue itself is healthy — this FM is not to rescue a stuck queue; it's to expedite a rollout-blocking revert so it doesn't sit behind long E2E while int/stg/prod stays blocked. It meets all four FM criteria:

1. int/stg/prod is blocked by the problem.
EV2 manifest generation fails deterministically for public clouds, so no ARO-HCP change can roll out to int/stg/prod until this lands:

failed to precompile pipeline: failed to validate pipeline schema:
  at '/resourceGroups/0/steps/1/zoneRedundancy': value must be one of 'Enabled', 'Disabled'

Blocked in sdp-pipelines build 170669006 (PR https://dev.azure.com/msazure/AzureRedHatOpenShift/_git/sdp-pipelines/pullrequest/16296188). This is a schema/manifest-gen failure, not an environmental flake — it fails on every rollout attempt.

The GitHub merge queue is not materially disrupted, so FM'ing this revert won't jump ahead of contended batch testing: the Prow batch view for this repo (https://prow.ci.openshift.org/?repo=Azure%2FARO-HCP&type=batch&job=*e2e-p*) shows no e2e-parallel batches currently running — all recent batch runs are terminal (success/failure), with 0 in a running/pending/triggered state at the time of this FM. The block is downstream in the rollout pipeline, not in the queue.

2. High confidence the revert fixes the problem.
Root cause = the reverted PR #5843, which moved Grafana provisioning to a GrafanaManage pipeline step field sourced from a config template (zoneRedundancy: "{{ .monitoring.grafanaZoneRedundantMode }}"). EV2 precompiles the pipeline with dunder sentinels substituted for every config value, then validates strictly (pipeline.go PreprocessContentValidatePipelineSchema); zoneRedundancy is a strict enum, so the sentinel __monitoring.grafanaZoneRedundantMode__ fails regardless of config value. Reverting #5843 removes the GrafanaManage step entirely. Mechanism confirmed by @janboll in-thread ("pipeline is validated before … the schema does not contain templating for the redundancy value").

3. The revert builds and passes all fast testing.
Clean revert; all fast tests are greenverify, test-unit, lint, integration, bicep-lint, mega-linter, secrets-validation, images, e2e-images, image-updater-images, config-change-detection, Analyze (go/python), CodeQL, license/cla. The only outstanding check is the long-running ci/prow/e2e-parallel, which is exactly the job an FM is intended to bypass. (lint/verify had one earlier "Pod scheduling timeout" infra flake and passed cleanly on the automatic retest.)

4. The revert is clean.
Straight git revert of #5843 (no manual edits). Authored by @raelga rather than the original author (@janboll), but it is a clean revert — satisfying "clean OR original author".

Durable fix in flight: Grafana provisioning returns to the Bicep path where the same values are passed as simple field-access bicepparams (aligned with ARO-27904 / #5863). Any future re-refactor onto a pipeline action must use literal enum step fields, not config templates, and will go through the queue normally.

@raelga raelga merged commit 00e8de2 into main Jul 2, 2026
17 of 19 checks passed
@raelga raelga deleted the revert-5843-grafana-refactor-grafana branch July 2, 2026 08:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants