feat(aro-hcp): add periodic Grafana datasource cleanup job (AROSLSRE-1138)#80947
feat(aro-hcp): add periodic Grafana datasource cleanup job (AROSLSRE-1138)#80947cssjr wants to merge 5 commits into
Conversation
…1138) Add a monthly Prow periodic that runs grafanactl clean datasources and clean fixup-datasources against the DEV Grafana instance to remove orphaned Prometheus datasources left by personal dev environments. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Skipping CI for Draft Pull Request. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository YAML (base), Central YAML (inherited) Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
✅ Files skipped from review due to trivial changes (2)
WalkthroughA new Grafana datasource deprovisioning step is added with Azure authentication and grafanactl cleanup commands, and a monthly periodic CI job is configured to run it. ChangesARO HCP Grafana Datasource Cleanup
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Suggested labels
🚥 Pre-merge checks | ✅ 15✅ Passed checks (15 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
/pj-rehearse |
|
@cssjr: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
The Azure SDK's DefaultAzureCredential with RequireAzureTokenCredentials requires AZURE_TOKEN_CREDENTIALS to select credential sources. Setting it to "prod" enables EnvironmentCredential (AZURE_CLIENT_ID/SECRET/TENANT), which is how all ARO-HCP Prow steps authenticate. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/pj-rehearse |
|
@cssjr: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/pj-rehearse |
|
@cssjr: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/approve |
|
/lgtm |
|
[REHEARSALNOTIFIER]
Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
|
/lgtm |
|
@cssjr: |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cssjr, roivaz The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/pj-rehearse |
|
@cssjr: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/pj-rehearse ack |
|
@cssjr: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/retest |
|
@cssjr: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/hold |
|
On hold while we consider an alternative approach which may not require a Prow job. |
|
Abandoning in favor of Azure/ARO-Tools#258 |
Summary
grafanactl clean datasourcesandgrafanactl clean fixup-datasourcesagainst the DEV Grafana instance (arohcp-devin resource groupglobal, subscription1d3378d3-...)cleanup-sweeperstep pattern in the step registry#aro-hcp-failures-devSlack channelNew files
ci-operator/step-registry/aro-hcp/deprovision/grafana-datasources/— step registry entry (commands script, ref YAML, metadata, OWNERS)Modified files
ci-operator/config/Azure/ARO-HCP/Azure-ARO-HCP-main__periodic-cleanup.yaml— addedclean-grafana-datasourcesentry with monthly cron (0 6 1 * *)ci-operator/jobs/Azure/ARO-HCP/Azure-ARO-HCP-main-periodics.yaml— auto-regenerated viamake jobsContext
grafanactllives in Azure/ARO-Tools, entry point in Azure/ARO-HCPTest plan
pj-rehearsevalidates the job can be scheduled and runclean datasourcesandclean fixup-datasourcesexecute successfully🤖 Generated with Claude Code
Summary by CodeRabbit
This PR extends the ARO-HCP CI configuration with a monthly automated Grafana cleanup for the DEV Azure Managed Grafana instance, preventing orphaned Prometheus datasource entries from accumulating.
What changes (practically):
ci-operator/config/Azure/ARO-HCP/Azure-ARO-HCP-main__periodic-cleanup.yamlto scheduleclean-grafana-datasourceson the 1st of every month at 06:00 UTC (0 6 1 * *). It reports failure and error states to#aro-hcp-failures-devand runs the new deprovision step.aro-hcp-deprovision-grafana-datasourcesunderci-operator/step-registry/aro-hcp/deprovision/grafana-datasources/with:CLUSTER_PROFILE_DIRfromVAULT_SECRET_PROFILE, reads Azure service principal credentials plusGLOBAL_INFRA_SUBSCRIPTION_IDfrom mounted vault-secret files, logs into Azure viaaz login --service-principal, buildsgrafanactlto/tmp/grafanactl, and then runs (in order):grafanactl clean datasourcesgrafanactl clean fixup-datasourcesarohcp-devand resource groupglobal(withGRAFANA_NAME/GRAFANA_RESOURCE_GROUPconfigured in the step ref).Managed_Prometheus_*datasources not backed by a live Azure Monitor Workspace.geoberleanddeads2k(alongside the existing team reviewer/approver placeholders).ci-operator/jobs/Azure/ARO-HCP/Azure-ARO-HCP-main-periodics.yamlviamake jobsso the new periodic entry is reflected in generated Prow job definitions.Rehearsal request:
/pj-rehearseto validate the periodic job scheduling/execution before merge.