feat: Add PrometheusRule CRD for Prometheus-to-CloudWatch alarm migration#63
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: bonclay7 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @bonclay7. Thanks for your PR. I'm waiting for a aws-controllers-k8s member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
…tion Add a custom PrometheusRule resource that enables Kubernetes users to migrate their existing Prometheus alerting rules to CloudWatch alarms without rewriting PromQL expressions. Uses the native CloudWatch PromQL EvaluationCriteria API for alarm creation. Components: - CRD types and deepcopy generation (apis/v1alpha1) - PromQL-to-CloudWatch alarm converter using EvaluationCriteria API - Full resource manager with create/update/delete reconciliation - Per-alarm status feedback via status.alarmStatuses[] - Recording rule detection and skip (status.skippedRuleCount) - Helm CRD, RBAC rules, and kustomize configuration - E2E test helpers and resource fixtures - Sample PrometheusRule manifests for common use cases - Design document with architecture and reconciliation flow Action model: - alarmActions[]: SNS, Lambda, SSM Incident Manager, Auto Scaling - okActions[]: actions on recovery - insufficientDataActions[]: actions on missing data - cloudWatch.tags: AWS resource tags on alarms SDK: aws-sdk-go-v2/service/cloudwatch v1.43.12 → v1.56.0 The controller converts each alerting rule into a CloudWatch PromQL alarm using the naming convention <prefix>-<group>-<alertName>, maps Prometheus 'for' duration to PendingPeriod/RecoveryPeriod, routes notifications via configurable action ARNs, and continuously reconciles desired state.
394686d to
e2fe78e
Compare
|
Hey @bonclay7 Feel free to reach out if you have any questions. Thanks! |
|
So @michaelhtm, the |
|
Product here. I would appreciate it we could prioritize this since it was called out by a number of customers to be blocking. |
|
@bonclay7 @mhausenblas Looking over the PR this new PrometheusRule CR deviates quite a bit from other custom resources (CRs) provided by ACK. For the ACK project, we aim to provide CRs that map to AWS control plane resources. For higher level constructs like this we generally look to tools such as KRO. In this case I think a good potential solution would be to update our existing MetricAlarm resource to support the new EvaluationCriteria field and provide users a KRO RGD template that they can use to translate existing Prometheus CRs to ACK MetricAlarms. KRO's external references can be used to pull in target PrometheusRules and translate them into a collection of ACK MetricAlarms. |
Add PrometheusRule CRD for Prometheus-to-CloudWatch alarm migration
Enables Kubernetes users to migrate Prometheus alerting rules to CloudWatch PromQL alarms by changing only the
apiVersion/kindand adding a cloudWatch section. No PromQL rewrite required.What it does
Applies a PrometheusRule CR → controller converts each alerting rule into a CloudWatch PromQL alarm via PutMetricAlarm with EvaluationCriteria. Recording rules are detected and skipped. The controller continuously reconciles desired state and reports per-alarm status back to the CR.
Example
Changes
Tested