Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions packages/aws_ec2_otel/changelog.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
# newer versions go on top
- version: "0.3.0"
changes:
- description: Add alert rules and SLOs to README
type: enhancement
link: https://github.com/elastic/integrations/pull/19637
- version: "0.2.0"
changes:
- description: Create new SLO and Alert assets
Expand Down
48 changes: 47 additions & 1 deletion packages/aws_ec2_otel/docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,4 +20,50 @@

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this RADEM. We jhave a whats included section that shows only dashboards. Looks misleading.
I would suggest put the dashboard part also below along with the other assets. and remove the whats included part.

## Compatibility

Requires Kibana `^9.4.0`.
Requires Kibana `^9.5.0`.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should mention the upstream collector version as well over here. Since this is an alpha receiver and I can expect changes there.
@tommyers-elastic any thoughts ?


## Alerting Rule Template
Alert rule templates provide pre-defined configurations for creating alert rules in Kibana.

For more information, refer to the [Elastic documentation](https://www.elastic.co/docs/reference/fleet/alerting-rule-templates).

Alert rule templates require Elastic Stack version 9.2.0 or later.

**The following alert rule templates are available:**

<details>
<summary>View the alert rule templates</summary>

| Name | Description |
|---|---|
| [AWS EC2 OTel] CPU surplus credits charged | Alerts when burstable unlimited-mode instances are charged for surplus CPU credits, indicating credit balance exhaustion and overage billing. |
| [AWS EC2 OTel] High CPU utilization | Alerts when sustained average CPU utilization exceeds a threshold, indicating compute saturation and potential scheduling latency. |
| [AWS EC2 OTel] High disk read throughput | Alerts when local disk read throughput exceeds a threshold over the lookback window. Important: the DiskReadBytes metric only measures temporary local disks (instance store) that are physically attached to the host. It does NOT measure EBS volumes, which is how most EC2 instances store their data. If your instances use EBS (the common case), this alert will stay silent — track EBS disk activity using the EBS metrics (AWS/EBS namespace) instead. |
| [AWS EC2 OTel] High disk write throughput | Alerts when local disk write throughput exceeds a threshold over the lookback window. Important: the DiskWriteBytes metric only measures temporary local disks (instance store) that are physically attached to the host. It does NOT measure EBS volumes, which is how most EC2 instances store their data. If your instances use EBS (the common case), this alert will stay silent — track EBS disk activity using the EBS metrics (AWS/EBS namespace) instead. |
| [AWS EC2 OTel] Instance status check failed | Alerts when the EC2 instance status check fails, indicating a guest OS or instance-level problem (exhausted memory, corrupt network config, failed boot). Remediation is typically reboot. |

Check notice on line 43 in packages/aws_ec2_otel/docs/README.md

View workflow job for this annotation

GitHub Actions / Lint user-facing content

Elastic.WordChoice: Consider using 'start, run' instead of 'boot', unless the term is in the UI.
| [AWS EC2 OTel] Low CPU credit balance | Alerts when CPU credit balance on burstable (T-family) instances drops below a threshold, indicating imminent throttling and application slowdown. |
| [AWS EC2 OTel] System status check failed | Alerts when the EC2 system status check fails, indicating underlying AWS host, hardware, or network impairment. Remediation is typically recover (stop/start on new hardware). |

</details>



## SLO Templates
SLO templates provide pre-defined configurations for creating SLOs in Kibana.

For more information, refer to the [Elastic documentation](https://www.elastic.co/docs/solutions/observability/incident-management/service-level-objectives-slos).

SLO templates require Elastic Stack version 9.4.0 or later.

**The following SLO templates are available:**

<details>
<summary>View the SLO templates</summary>

| Name | Description |
|---|---|
| [AWS EC2 OTel] Status check availability 99.5% rolling 30 days | Per-instance SLO that treats each 5-minute window as healthy when the CloudWatch StatusCheckFailed metric (Maximum statistic) is below 1, meaning neither the system nor instance status check reported a failure. Targets 99.5% of rolling 30-day windows as healthy. This is the primary EC2 platform-availability signal: a failed check indicates AWS-detected impairment requiring recover or reboot. Application-only failures while checks remain at 0 are outside this data source and should be monitored separately. |

</details>


4 changes: 2 additions & 2 deletions packages/aws_ec2_otel/manifest.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
format_version: 3.5.0
format_version: 3.6.0
name: aws_ec2_otel
title: "AWS EC2 OpenTelemetry Assets"
version: 0.2.0
version: 0.3.0
source:
license: "Elastic-2.0"
description: "AWS EC2 OpenTelemetry assets for CloudWatch metrics collected by the OpenTelemetry Collector"
Expand Down
5 changes: 5 additions & 0 deletions packages/aws_ecs_otel/changelog.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
# newer versions go on top
- version: "0.3.0"
changes:
- description: Add alert rules to README
type: enhancement
link: https://github.com/elastic/integrations/pull/19637
- version: "0.2.0"
changes:
- description: Create new Alert assets
Expand Down
23 changes: 22 additions & 1 deletion packages/aws_ecs_otel/docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,4 +20,25 @@ The dashboard reads metric-stream style OpenTelemetry documents from `metrics-aw

## Compatibility

Requires Kibana `^9.4.0`.
Requires Kibana `^9.5.0`.

## Alerting Rule Template
Alert rule templates provide pre-defined configurations for creating alert rules in Kibana.

For more information, refer to the [Elastic documentation](https://www.elastic.co/docs/reference/fleet/alerting-rule-templates).

Alert rule templates require Elastic Stack version 9.2.0 or later.

**The following alert rule templates are available:**

<details>
<summary>View the alert rule templates</summary>

| Name | Description |
|---|---|
| [AWS ECS OTel] CPU utilization high | Alerts when average CPU utilization of reserved task CPU exceeds a threshold. Sustained high CPU throttles workloads and raises application latency. |
| [AWS ECS OTel] Memory utilization high | Alerts when average memory utilization of reserved task memory exceeds a threshold. Sustained high memory risks OOM kills and task churn. |

</details>


4 changes: 2 additions & 2 deletions packages/aws_ecs_otel/manifest.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
format_version: 3.5.0
format_version: 3.6.0
name: aws_ecs_otel
title: "AWS ECS OpenTelemetry Assets"
version: 0.2.0
version: 0.3.0
source:
license: "Elastic-2.0"
description: "AWS ECS OpenTelemetry assets for CloudWatch metrics collected by the OpenTelemetry Collector"
Expand Down
5 changes: 5 additions & 0 deletions packages/aws_elb_metrics_otel/changelog.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
# newer versions go on top
- version: "0.3.0"
changes:
- description: Add alert rules and SLOs to README
type: enhancement
link: https://github.com/elastic/integrations/pull/19637
- version: "0.2.0"
changes:
- description: move assets to aws_elb_metrics_otel
Expand Down
48 changes: 47 additions & 1 deletion packages/aws_elb_metrics_otel/docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,4 +20,50 @@ The dashboard reads metric-stream style OpenTelemetry documents from `metrics-aw

## Compatibility

Requires Kibana `^9.4.0`.
Requires Kibana `^9.5.0`.

## Alerting Rule Template
Alert rule templates provide pre-defined configurations for creating alert rules in Kibana.

For more information, refer to the [Elastic documentation](https://www.elastic.co/docs/reference/fleet/alerting-rule-templates).

Alert rule templates require Elastic Stack version 9.2.0 or later.

**The following alert rule templates are available:**

<details>
<summary>View the alert rule templates</summary>

| Name | Description |
|---|---|
| [AWS ELB OTel] High ELB 5XX error rate | Alerts when load-balancer-generated 5XX error rate exceeds a tunable threshold. Indicates edge/infrastructure failures such as no healthy targets, connection timeouts, or LB capacity issues. |
| [AWS ELB OTel] Rejected connections detected | Alerts when the ALB rejects connections because it reached its connection ceiling. A hard capacity-class failure requiring immediate attention. |
| [AWS ELB OTel] High target 5XX error rate | Alerts when target-generated 5XX error rate exceeds a tunable threshold for any load balancer target group. Indicates application or backend failures behind the ALB. |
| [AWS ELB OTel] High target response time (average) | Alerts when average target response time exceeds a tunable threshold. Indicates typical backend latency degradation even when error rates remain low. |
| [AWS ELB OTel] High target response time (tail) | Alerts when peak (maximum) target response time exceeds a tunable threshold. Serves as a tail-latency proxy where CloudWatch percentiles are unavailable. |
| [AWS ELB OTel] Unhealthy targets detected | Alerts when any target in a target group is failing health checks (UnHealthyHostCount \> 0). Early warning before healthy capacity collapses to zero. |

</details>



## SLO Templates
SLO templates provide pre-defined configurations for creating SLOs in Kibana.

For more information, refer to the [Elastic documentation](https://www.elastic.co/docs/solutions/observability/incident-management/service-level-objectives-slos).

SLO templates require Elastic Stack version 9.4.0 or later.

**The following SLO templates are available:**

<details>
<summary>View the SLO templates</summary>

| Name | Description |
|---|---|
| [AWS ELB OTel] Request availability 99.5% rolling 30 days | Tracks Application Load Balancer request availability by keeping the combined ELB-generated and target-generated 5XX error rate below 0.5% in each 1-minute interval. Scoped per load balancer and region; aggregates all target groups behind the load balancer. A rolling 30-day target of 99.5% ensures users receive successful responses at the edge. |
| [AWS ELB OTel] Target response time average 99.5% rolling 30 days | Tracks typical backend latency for Application Load Balancer target groups by keeping average TargetResponseTime below 1 second in each 1-minute interval. Scoped per target group, load balancer, and region. A rolling 30-day target of 99.5% ensures users experience responsive service even when errors are absent. |

</details>


4 changes: 2 additions & 2 deletions packages/aws_elb_metrics_otel/manifest.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
format_version: 3.5.0
format_version: 3.6.0
name: aws_elb_metrics_otel
title: "AWS ELB Metrics OpenTelemetry Assets"
version: 0.2.0
version: 0.3.0
source:
license: "Elastic-2.0"
description: "AWS ELB metrics OpenTelemetry assets for CloudWatch metrics collected by the OpenTelemetry Collector"
Expand Down
5 changes: 5 additions & 0 deletions packages/aws_lambda_otel/changelog.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
# newer versions go on top
- version: "0.3.0"
changes:
- description: Add alert rules and SLOs to README
type: enhancement
link: https://github.com/elastic/integrations/pull/19637
- version: "0.2.0"
changes:
- description: Create new SLO and Alert assets
Expand Down
51 changes: 50 additions & 1 deletion packages/aws_lambda_otel/docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,4 +20,53 @@

## Compatibility

Requires Kibana `^9.4.0`.
Requires Kibana `^9.5.0`.

## Alerting Rule Template
Alert rule templates provide pre-defined configurations for creating alert rules in Kibana.

For more information, refer to the [Elastic documentation](https://www.elastic.co/docs/reference/fleet/alerting-rule-templates).

Alert rule templates require Elastic Stack version 9.2.0 or later.

**The following alert rule templates are available:**

<details>
<summary>View the alert rule templates</summary>

| Name | Description |
|---|---|
| [AWS Lambda OTel] Dead letter errors | Alerts when Lambda fails to write events to the configured dead-letter queue, meaning failed events may be lost. |

Check notice on line 39 in packages/aws_lambda_otel/docs/README.md

View workflow job for this annotation

GitHub Actions / Lint user-facing content

Elastic.WordChoice: Consider using 'can, might' instead of 'may', unless the term is in the UI.
| [AWS Lambda OTel] Destination delivery failures | Alerts when Lambda fails to deliver events to configured on-success or on-failure destinations. |
| [AWS Lambda OTel] High async event age | Alerts when async-invoked Lambda functions show high AsyncEventAge, indicating events are aging in the internal queue. |
| [AWS Lambda OTel] High concurrent executions | Alerts when peak concurrent executions approach capacity limits, predicting imminent throttling. |
| [AWS Lambda OTel] High average duration | Alerts when average Lambda invocation duration exceeds a configurable threshold over a 15-minute window. |
| [AWS Lambda OTel] High tail duration | Alerts when peak (Maximum) Lambda invocation duration exceeds a configurable threshold, indicating slow handler execution or downstream latency. |
| [AWS Lambda OTel] High error rate | Alerts when a Lambda function exceeds a configurable error rate (Errors / Invocations) over a 15-minute window. Evaluates the top 10 functions by error rate. |
| [AWS Lambda OTel] High iterator age | Alerts when stream-based Lambda consumers show high IteratorAge, indicating the function is falling behind the record arrival rate. |
| [AWS Lambda OTel] High throttle rate | Alerts when a Lambda function exceeds a configurable throttle rate (Throttles / (Invocations + Throttles)) over a 15-minute window. |

</details>



## SLO Templates
SLO templates provide pre-defined configurations for creating SLOs in Kibana.

For more information, refer to the [Elastic documentation](https://www.elastic.co/docs/solutions/observability/incident-management/service-level-objectives-slos).

SLO templates require Elastic Stack version 9.4.0 or later.

**The following SLO templates are available:**

<details>
<summary>View the SLO templates</summary>

| Name | Description |
|---|---|
| [AWS Lambda OTel] Average duration 99.5% rolling 30 days | Tracks per-function execution latency from CloudWatch Duration (Average statistic). Each 1-minute window is good when average duration stays below 3000 ms; 99.5% of windows must be good over a rolling 30-day period. Sustained duration regressions degrade user-facing response times and increase Lambda billing. |
| [AWS Lambda OTel] Invocation success rate 99.5% rolling 30 days | Tracks per-function invocation reliability from CloudWatch Errors and Invocations (Sum statistics). Each 1-minute window is good when the error rate is below 0.5%; 99.5% of windows must be good over a rolling 30-day period. Rising error rates indicate function code failures that directly break synchronous and asynchronous workloads. |

</details>


4 changes: 2 additions & 2 deletions packages/aws_lambda_otel/manifest.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
format_version: 3.5.0
format_version: 3.6.0
name: aws_lambda_otel
title: "AWS Lambda OpenTelemetry Assets"
version: 0.2.0
version: 0.3.0
source:
license: "Elastic-2.0"
description: "AWS Lambda OpenTelemetry assets for CloudWatch metrics collected by the OpenTelemetry Collector"
Expand Down
5 changes: 5 additions & 0 deletions packages/aws_rds_otel/changelog.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
# newer versions go on top
- version: "0.3.0"
changes:
- description: Add alert rules and SLOs to README
type: enhancement
link: https://github.com/elastic/integrations/pull/19637
- version: "0.2.0"
changes:
- description: Create new SLO and Alert assets
Expand Down
52 changes: 51 additions & 1 deletion packages/aws_rds_otel/docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,4 +20,54 @@ The dashboard reads metric-stream style OpenTelemetry documents from `metrics-aw

## Compatibility

Requires Kibana `^9.4.0`.
Requires Kibana `^9.5.0`.

## Alerting Rule Template
Alert rule templates provide pre-defined configurations for creating alert rules in Kibana.

For more information, refer to the [Elastic documentation](https://www.elastic.co/docs/reference/fleet/alerting-rule-templates).

Alert rule templates require Elastic Stack version 9.2.0 or later.

**The following alert rule templates are available:**

<details>
<summary>View the alert rule templates</summary>

| Name | Description |
|---|---|
| [AWS RDS OTel] Burst balance low | Alerts when gp2 burst balance falls below a percentage floor. Depleted burst credits throttle IOPS and typically precede disk queue depth and latency spikes. |
| [AWS RDS OTel] Checkpoint lag high | Alerts when checkpoint lag exceeds a threshold. Uses the Maximum statistic for worst-case lag. Rising checkpoint lag indicates the instance cannot keep up with write/redo volume. |
| [AWS RDS OTel] CPU utilization high | Alerts when average CPU utilization is sustained above a threshold. Latency rises sharply above ~80% CPU; correlate with SwapUsage for memory-related CPU pressure. |
| [AWS RDS OTel] Database connections high | Alerts when peak database connections exceed a threshold. CloudWatch does not publish max_connections — set the threshold against your engine limit and normal baseline. |
| [AWS RDS OTel] Disk queue depth high | Alerts when average disk queue depth is sustained above a threshold. High queue depth with plateauing IOPS is the canonical storage I/O saturation signature. |
| [AWS RDS OTel] Free storage space low | Alerts when free storage space on an RDS instance falls below an absolute byte floor. Storage exhaustion is an outage-class risk; total volume size is not published by CloudWatch so percentage thresholds cannot be derived from this source. |
| [AWS RDS OTel] Freeable memory low | Alerts when freeable memory on an RDS instance falls below an absolute byte floor. Persistent low memory leads to swapping and latency; correlate with DatabaseConnections and SwapUsage. |
| [AWS RDS OTel] Read latency high | Alerts when peak read I/O latency exceeds a threshold. Uses the Maximum statistic for worst-case tail latency. Correlate with DiskQueueDepth and ReadIOPS for storage bottlenecks. |
| [AWS RDS OTel] Replica lag high | Alerts when read replica lag exceeds a threshold. Uses the Maximum statistic for worst-case lag. Rising lag means stale read traffic and failover targets behind the primary. |
| [AWS RDS OTel] Swap usage high | Alerts when swap usage exceeds an absolute byte threshold. Non-zero or rising swap indicates memory pressure spilling to disk and degrading performance. |
| [AWS RDS OTel] Write latency high | Alerts when peak write I/O latency exceeds a threshold. Uses the Maximum statistic for worst-case tail latency. Correlate with DiskQueueDepth and WriteIOPS for storage bottlenecks. |

</details>



## SLO Templates
SLO templates provide pre-defined configurations for creating SLOs in Kibana.

For more information, refer to the [Elastic documentation](https://www.elastic.co/docs/solutions/observability/incident-management/service-level-objectives-slos).

SLO templates require Elastic Stack version 9.4.0 or later.

**The following SLO templates are available:**

<details>
<summary>View the SLO templates</summary>

| Name | Description |
|---|---|
| [AWS RDS OTel] Average read latency 99.5% rolling 30 days | Tracks per-I/O read storage latency from CloudWatch RDS metrics. At least 99.5% of 1-minute intervals per DB instance should show average read latency below 10 milliseconds, protecting read-heavy application workloads from storage bottlenecks. |

</details>


4 changes: 2 additions & 2 deletions packages/aws_rds_otel/manifest.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
format_version: 3.5.0
format_version: 3.6.0
name: aws_rds_otel
title: "AWS RDS OpenTelemetry Assets"
version: 0.2.0
version: 0.3.0
source:
license: "Elastic-2.0"
description: "AWS RDS OpenTelemetry assets for CloudWatch metrics collected by the OpenTelemetry Collector"
Expand Down
5 changes: 5 additions & 0 deletions packages/aws_sqs_otel/changelog.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
# newer versions go on top
- version: "0.3.0"
changes:
- description: Add alert rules and SLOs to README
type: enhancement
link: https://github.com/elastic/integrations/pull/19637
- version: "0.2.0"
changes:
- description: Create new SLO and Alert assets
Expand Down
Loading
Loading