Skip to content

chore(grafana): move workspace lookup to pipeline shell step (ARO-28040)#5837

Closed
raelga wants to merge 5 commits into
Azure:mainfrom
raelga:rael/aro-28040-remove-grafana-deploymentscript
Closed

chore(grafana): move workspace lookup to pipeline shell step (ARO-28040)#5837
raelga wants to merge 5 commits into
Azure:mainfrom
raelga:rael/aro-28040-remove-grafana-deploymentscript

Conversation

@raelga

@raelga raelga commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator

ARO-28040

What

Redesigns Grafana reconciliation so grafanactl owns the Azure Managed Grafana create/update flow end-to-end.

The new Grafana command discovers all succeeded Azure Monitor Workspaces (Prometheus), creates or updates the Managed Grafana resource with azureMonitorWorkspaceIntegrations, and reconciles Grafana datasources in the same pipeline step. Bicep no longer creates the Grafana instance or emits AMW lookup outputs, while Grafana RBAC remains declarative in bicep through resource lookups and naming conventions.

This supersedes and reverts the temporary Grafana no-op mitigation in #5838 / ARO-28044: Grafana deployment behavior is re-enabled through grafanactl instead of bicep plus region-pipeline datasource steps.

Why

AME Azure Policy enforces allowSharedKeyAccess=false, which blocks ARM deploymentScripts because they require shared-key-backed storage. The old deploymentScript worked around a bicep limitation: AMW discovery had to produce a complete integration list for the Grafana ARM PUT.

Moving Grafana create/update into grafanactl removes that bicep output contract entirely. grafanactl can look up AMWs directly, apply azureMonitorWorkspaceIntegrations, and reconcile datasources without passing discovered workspace IDs back into bicep.

Testing

Planned for this redesign:

  • cd tooling/grafanactl && PATH=$HOME/sdk/go1.25.7/bin:$PATH GO=$HOME/sdk/go1.25.7/bin/go go build ./...
  • PATH=$HOME/sdk/go1.25.7/bin:$PATH GO=$HOME/sdk/go1.25.7/bin/go go test ./...
  • cd config && PATH=$HOME/sdk/go1.25.7/bin:$PATH GO=$HOME/sdk/go1.25.7/bin/go make materialize
  • PATH=$HOME/sdk/go1.25.7/bin:$PATH GO=$HOME/sdk/go1.25.7/bin/go make validate-config-pipelines
  • PATH=$HOME/sdk/go1.25.7/bin:$PATH GO=$HOME/sdk/go1.25.7/bin/go make yamlfmt
  • /home/rael/bin/bicep build ...
  • /home/rael/bin/bicep lint ...
  • git diff --exit-code

Special notes for your reviewer

cc @jboll as owner.

Key review point: role management intentionally stays in bicep. Only Grafana resource create/update, AMW integration discovery, and datasource reconciliation move into grafanactl so the AMW lookup no longer needs to become a bicep output.

PR Checklist

  • PR is scoped to a single task (no mixed concerns)
  • Title follows Conventional Commits format
  • Summary explains the "Why" behind the change
  • Linked to relevant ticket/issue
  • Screenshots included (if graph/UI/metrics changes)
  • Self-reviewed the diff
  • CI/CD checks are passing (ignore Tide)
  • Draft PR used for WIP (if applicable)
  • Commit history is clean (rebased/squashed)
  • Tricky code blocks are commented
  • Specific reviewers tagged
  • All comment threads resolved before merge

If E2E tests are included:

  • E2E tests follow Principles of Good E2E Test Case Design
  • If new E2E use case is covered (via a new test or new check/verifier),
    demonstrate that the test is able to detect a defect/error and fail with
    proper error message and logs which communicates nature of the problem.

Copilot AI review requested due to automatic review settings June 29, 2026 09:11

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes the ARM deploymentScripts-based Grafana Azure Monitor workspace integration lookup/preservation flow and replaces it with a global pipeline Shell step that patches the Managed Grafana resource after the global infra deployment. This is intended to avoid deployment scripts that require shared-key-backed storage access, which is blocked by policy.

Changes:

  • Remove the integration-lookup Bicep module (and its PowerShell payload) and stop supplying grafanaIntegrations via the Grafana ARM resource.
  • Add a pipeline Shell step plus a Bash script to read and PATCH existing Azure Monitor workspace integrations.
  • Add an additional role assignment for the Grafana manager identity to support the new flow.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
dev-infrastructure/templates/global-infra.bicep Removes the integration lookup module and stops passing integrations into Grafana deployment.
dev-infrastructure/scripts/grafana-integrations.sh New Bash script to query and PATCH Grafana workspace integrations.
dev-infrastructure/modules/grafana/integration-lookup.bicep Deletes the ARM deploymentScripts lookup module.
dev-infrastructure/modules/grafana/instance.bicep Removes grafanaIntegrations from the Grafana resource and adds a new role assignment for the manager identity.
dev-infrastructure/modules/grafana/grafanaIntegrationsLookup.ps1 Deletes the PowerShell payload used by the deployment script.
dev-infrastructure/global-pipeline.yaml Adds a global Shell step to run grafana-integrations.sh after global infra deploys.
dev-infrastructure/global-pipeline-stg.yaml Mirrors the same Shell step addition for the STG-global pipeline.

Comment thread dev-infrastructure/scripts/grafana-integrations.sh Outdated
Comment thread dev-infrastructure/global-pipeline.yaml Outdated
Comment thread dev-infrastructure/global-pipeline-stg.yaml Outdated

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Comment thread dev-infrastructure/scripts/grafana-integrations.sh Outdated
Comment thread dev-infrastructure/scripts/grafana-integrations.sh Outdated
Comment thread dev-infrastructure/global-pipeline.yaml Outdated
Comment thread dev-infrastructure/global-pipeline-stg.yaml Outdated
@raelga raelga force-pushed the rael/aro-28040-remove-grafana-deploymentscript branch from 1173390 to 9487b34 Compare June 29, 2026 11:05
Copilot AI review requested due to automatic review settings June 29, 2026 13:32
@openshift-ci

openshift-ci Bot commented Jun 29, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: raelga

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@raelga

raelga commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator Author

Redesign: grafanactl-based reconcile (supersedes preserve/PATCH)

Thanks @copilot — all the inline comments target the previous grafana-integrations.sh preserve-and-PATCH approach (reading integrations after infra runs, capture-mode stderr/JSON fragility, output-before-infra bootstrap ordering). That whole approach has been removed in the latest push, so those concerns no longer apply:

  • dev-infrastructure/scripts/grafana-integrations.shdeleted.
  • output-before-infra / post-infra PATCH steps — removed from global-pipeline.yaml and global-pipeline-stg.yaml.
  • Datasource sync — removed from region-pipeline.yaml.

Instead, a new tooling/grafanactl reconcile command owns Grafana create/update and datasources idempotently. It dynamically discovers all Azure Monitor Workspaces at runtime (the lookup that bicep couldn't express after this change), so there's no read-after-clear window — reconcile is the source of truth and converges regardless of prior state. Role management stays in bicep via resource lookups + naming conventions (templates/grafana-rbac.bicep).

Validation so far: go build/go vet ./... clean for tooling/grafanactl; bicep build clean for global-infra.bicep, grafana-rbac.bicep, instance.bicep. The prior e2e-parallel failure was an unrelated transient (HCP cluster-delete node timeout / context canceled during cleanup), not a grafana code issue; re-running with this push.

No further action or commit needed from the reviewer on the removed grafana-integrations.sh threads — this is resolved by the redesign.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 2 comments.

Comment thread dev-infrastructure/templates/grafana-rbac.bicep Outdated
Comment thread tooling/grafanactl/cmd/reconcile/options.go Outdated
Copilot AI review requested due to automatic review settings June 29, 2026 16:07
@raelga raelga force-pushed the rael/aro-28040-remove-grafana-deploymentscript branch from 9a79563 to 29b4a83 Compare June 29, 2026 16:07

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 18 changed files in this pull request and generated 2 comments.

Comment thread dev-infrastructure/global-pipeline.yaml
Comment thread dev-infrastructure/global-pipeline-stg.yaml
raelga added 3 commits June 29, 2026 17:57
The ARO-28044 mitigation (Azure#5838) commented out the grafana rollout and
reader steps to unblock pipelines. This change supersedes that mitigation
with a redesigned, deploymentScript-free grafana reconcile, so re-enable
grafana first by reverting the mitigation in full.
Introduce a grafanactl reconcile command that owns managed grafana
create/update and datasource wiring with dynamic Azure Monitor Workspace
discovery, replacing the grafana deploymentScript-based integration lookup.
Zone redundancy mirrors determineZoneRedundancy() in common.bicep: Auto and
Enabled only enable zone redundancy when the target region exposes
availability zones.
…ntScript (ARO-28040)

Replace the grafana integration deploymentScript lookup with a grafanactl
reconcile pipeline step and dedicated grafana-rbac bicep template. Removes
integration-lookup.bicep and grafanaIntegrationsLookup.ps1, slims
instance.bicep / global-infra.bicep, and gates the Front Door role
assignment on azureFrontDoorManage && the profile name being set.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 21 changed files in this pull request and generated 3 comments.

Comment thread dev-infrastructure/global-pipeline.yaml Outdated
Comment thread dev-infrastructure/global-pipeline-stg.yaml Outdated
Comment thread tooling/grafanactl/cmd/reconcile/options.go Outdated
Address PR review feedback:
- The global pipeline grafana step invoked ./grafanactl, but the binary is
  gitignored and not committed, so the step would fail with file-not-found.
  Prepend 'make build' (prod + stg) so the binary is built before invocation.
- managedGrafana() always set Tags to a non-nil empty map; an empty tags map on
  the ARM PUT clears existing tags on the Managed Grafana resource. Leave Tags
  nil unless a cross-tenant security group value is provided.
@raelga

raelga commented Jun 30, 2026

Copy link
Copy Markdown
Collaborator Author

Addressed the latest Copilot review in 99b78fc:

  • grafanactl binary not built (global-pipeline.yaml + global-pipeline-stg.yaml) — the Shell step invoked ./grafanactl, but the binary is .gitignored and not committed. Prepended make build && so the binary is built in-step before invocation (matches the make ... convention used by other pipeline shell steps; the datasource reconcile needs no bundled observability files, so make build is sufficient over make bundle).
  • managedGrafana() wiped existing tagsTags was always a non-nil empty map, and an empty map on the ARM PUT clears existing tags. Now left nil unless a cross-tenant security group value is provided.

Two earlier Copilot comments are already satisfied in the current revision: the AFD condition mismatch in grafana-rbac.bicep (both frontDoorProfile and grafanaAfdPermissions share azureFrontDoorManage && !empty(azureFrontDoorProfileName)), and the Auto/Enabled zone-redundancy semantics (resolveZoneRedundancy gates both on availability-zone presence, matching determineZoneRedundancy in common.bicep).

Validation: go build ./... + go vet clean, make validate-config-pipelines passes, make yamlfmt no-op on the changed files.

No further action needed from Copilot on these — resolved.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 21 changed files in this pull request and generated 2 comments.

Comment on lines +30 to +37
grafanaCmd := &cobra.Command{
Use: "grafana",
Short: "Reconcile an Azure Managed Grafana instance",
Long: "Create or update an Azure Managed Grafana instance, discover all succeeded Azure Monitor Workspaces, and reconcile Azure Monitor Workspace integrations.",
RunE: func(cmd *cobra.Command, args []string) error {
return opts.Run(cmd.Context())
},
}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed in 0e96817. Added Args: cobra.NoArgs to the reconcile grafana subcommand, so unexpected positional arguments are now rejected instead of silently ignored. No further action or commit needed from you — this is resolved.

Comment on lines +164 to +167
logger.Info("Creating or updating Managed Grafana", "workspace-count", len(workspaceIDs))
poller, err := o.grafanaClient.BeginCreate(ctx, o.ResourceGroup, o.GrafanaName, o.managedGrafana(workspaceIDs), nil)
if err != nil {
return fmt.Errorf("failed to start Managed Grafana create/update: %w", err)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid edge case — fixed in 0e96817. Context: on main the prior PowerShell deployment script captured the existing Grafana integrations and re-applied them, so it self-healed; this command instead derives the set from a live NewListBySubscriptionPager query, which could transiently return zero. Run now guards that case: when discovery returns zero workspaces it does a GET on the Grafana, and if the instance already exists with integrations it logs a warning and skips the update to avoid wiping them. A genuine first-time create (Grafana does not exist yet, returns 404) still proceeds with an empty set, so bootstrap is unaffected. No new flag was needed. No further action or commit needed from you — this is resolved.

…tion wipe

Add cobra.NoArgs to the reconcile grafana subcommand so unexpected
positional arguments are rejected instead of silently ignored.

Guard the create/update: if Azure Monitor Workspace discovery returns
zero workspaces (e.g. a transient permission/listing glitch) and the
Grafana already has integrations, skip the update to avoid wiping them.
A genuine first-time create (Grafana does not exist yet) still proceeds.
@janboll

janboll commented Jun 30, 2026

Copy link
Copy Markdown
Collaborator

We need to add this to ARO-Tools and integrate into the pipeline.
I started this work, see #5843 / Azure/ARO-Tools#260

Closing this for now, will re-iterate and integrate your work with the other PRs

@janboll janboll closed this Jun 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants