PE-7693 | Add ElastiCache/Redis metrics via Alloy + Redis dashboard#60
Merged
Conversation
Add a prometheus.exporter.cloudwatch "elasticache" block discovering AWS/ElastiCache clusters via the environment tag, with Sum statistics for the CacheHits/CacheMisses/Evictions counters and Average/Maximum for the gauges. Wire a create_elasticache_labels relabel (service=server, node identity preserved via dimension_CacheClusterId) and an elasticache scrape job into the existing pipeline. Add the Redis dashboard (8 panels, per-node breakout) and document the new exporter in the README. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The dashboard shipped with a hand-typed placeholder uid (a1b2c3d4-redis-server-cache-0001), inconsistent with the UUIDs used by the sibling database/service dashboards. Swap in a generated UUIDv4 to lock in a stable, collision-free identity before the dashboard is imported into Grafana. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was no automated gate ensuring the .alloy config files are
formatted and valid; formatting relied on contributors running the Alloy
VS Code extension locally, and a malformed or invalid config could only
fail at container boot in a cloud environment.
Add an "Alloy Check" workflow that runs on pull requests (and pushes to
main) and:
- runs `alloy fmt -t` on every config/*.alloy file, failing if any file
is not formatted correctly
- runs `alloy validate` over the whole config directory so cross-file
pipeline references are checked together
Both checks use the exact Alloy version read from the Dockerfile, so CI
validates against the same binary that runs in production. The validate
step passes dummy values for the sys.env() references; it inspects config
structure and does not connect to any endpoint.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Alloy team documents `alloy fmt --test` and `alloy validate` as the CI contract (exit codes signal pass/fail) but ships no dedicated setup/fmt/validate GitHub Action, so we invoke the CLI ourselves. Running it through `docker run` required overriding the image entrypoint and a mounted-volume find loop; installing the released binary is simpler and faster. Download the alloy-linux-amd64 release matching the version pinned in the Dockerfile (keeping CI in parity with production), then run `alloy fmt -t` per file and `alloy validate` over the config directory. The format loop uses `find` rather than a bash globstar so it can never silently match zero files. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
mihoward21
approved these changes
Jun 5, 2026
mihoward21
left a comment
Contributor
There was a problem hiding this comment.
think this all looks good. have you run it yet in testing or anything? fine to just "test in prod" if that's easier
Contributor
Author
want to test it on prod directly |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
We had no visibility into our Redis (ElastiCache) clusters — there were zero
aws_elasticache_*metrics in Grafana, so memory pressure, evictions, or a lagging replica were effectively invisible until something broke. This PR closes that gap.What you get
dashboards/server/redis.json) — 8 panels, broken out per node (primary vs. replica) so on-call can spot an unhealthy node at a glance.alloy fmtandalloy validateon every PR. Previously formatting was manual and nothing validated the config before it reached a running container — now a malformed or invalid config can't merge.Rollout (action required)
Config is baked into the Docker image, so metrics only start flowing after merge: cut a release
vX.Y.Z-<sha>, then redeploy Alloy across all environments. Redis alerting intentionally stays in CloudWatch for now — there are no alerting changes here.How it was verified
us-west-2(production / staging / testing) are taggedenvironmentand expose every scraped metric at per-node granularity.alloy fmtandalloy validatepass in CI against the prod-pinned Alloyv1.13.2.🤖 Generated with Claude Code