Support for Redis AWS IAM auth#8078
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces opt-in AWS IAM authentication for ElastiCache Redis connections and supports Redis Cluster mode across multiple services. The reviewer identified critical issues where the PubSub subscriber Redis clients in both the server and workflows services are not registered for IAM token refresh, which will cause subscription failures after the initial 15-minute token expires. Additionally, the reviewer noted that if the AUTH command fails during a refresh, the password is not updated, potentially leading to permanent authentication failures upon reconnection. Lastly, unreferencing the interval timer in startTokenRefreshTimer was suggested to prevent blocking clean process termination during graceful shutdowns.
n1ru4l
left a comment
There was a problem hiding this comment.
Thank you for the contribution, could you please address the comments I pointed out? Things are getting a bit too much duplicated and I would prefer a central place where we manage these.
|
Hey @n1ru4l sorry about the delay,
|
|
@mish-elle Please feel free to bump the |
n1ru4l
left a comment
There was a problem hiding this comment.
Hey @mish-elle, I have some more feedback for you that I would like to see implemented. Aside from these points the implementation looks solid to me. 🙏
42658e7 to
6a59c87
Compare
Thanks for the detailed feedback, @n1ru4l! I believed I addressed everything mentioned, please let me know if we need additional changes. |
|
@mish-elle Thank you, we are close to getting this in! 🥳 Could you please have a stab at the typescript and linting issues? |
|
@mish-elle I added two more comments:
Once these two are addressed, we are good to merge 👍 |
|
Actually there seems to be a minor issue with the subscriber reconnecting with a stale token, causing WRONG user/pass pair error and the container crashes, looking into it 🤔 |
This means |
|
Thanks @mish-elle ! I approved the PR, it seems like it needs a slight rebase and then we can run ci and merge :) |
Add opt-in AWS IAM authentication for ElastiCache Redis connections and Redis Cluster mode support. When IAM is enabled, services authenticate to Redis using short-lived SigV4 pre-signed tokens instead of static passwords, with automatic token refresh before expiry. New environment variables: - REDIS_AWS_IAM_AUTH_ENABLED: enable IAM authentication for Redis - REDIS_AWS_IAM_CACHE_NAME: ElastiCache cache instance name for the signer - REDIS_AWS_REGION: optional override for the Redis region - REDIS_CLUSTER_MODE_ENABLED: enable Redis Cluster mode - REDIS_USERNAME: optional Redis username for ACL-based authentication
- Fix refreshIamAuth to set password BEFORE AUTH call (prevents auth failures) - Add timer initialization for pubsub Redis client - Enhance test coverage with unhappy paths and organized test structure - Improve JSDoc comments for AWS IAM interfaces and functions
- Add IAM authentication support for AWS-managed Redis - Refactor redis-config-validation to redis-config with enhanced schema - Update all services to use centralized Redis config - Add ClickHouse and feature flags support to workflows - Implement tracing configuration across services
…word-before-AUTH ordering
0321c99 to
740e987
Compare
|
Thanks @mish-elle , I ran the full ci checks in #8177 , once cleared, i'll merge this one :) |
Closes #8177
Background
Self-hosters running Hive on AWS with AWS ElastiCache Redis currently have no way to use IAM-based authentication for Redis connections, which forces them to use static passwords.
This PR adds opt-in AWS IAM authentication support for ElastiCache Redis across all services that communicate with Redis:
api,schema,server,tokens,usage, andworkflows. It also adds Redis Cluster mode support.This PR is part of the following issue. We will have separate PRs for each IAM support to help decrease the scope per PR.
Description
The ElastiCache IAM token generation logic lives in two new modules inside
service-common:service-common/src/iam-aws.ts: A generic AWS SigV4 pre-signed token generation and a reusable periodic token refresh timer with retry/backoff logic. This is necessary since Elasticache does not have a dedicated signer like MSK or RDS.service-common/src/iam-redis.ts: ElastiCache-specific helpers built on top ofiam-aws. Handles token generation (generateIamAuthToken), in-place re-authentication for both standalone and cluster connections (refreshIamAuth), periodic token rotation (startIamTokenRefresh), and credential resolution (resolveRedisCredentials).When Redis IAM auth is enabled:
resolveRedisCredentials()startIamTokenRefresh()that rotates the token every ~12 minutes (15-min SigV4 TTL minus 3-min buffer), with jitter to prevent thundering-herd refreshes across instancesAUTHcommands to re-authenticate active connections (including per-node AUTH for cluster mode)Redis Cluster mode is added as an opt-in feature
REDIS_CLUSTER_MODE_ENABLED=1. When enabled, services connect usingRedis.Clusterfrom ioredis withdnsLookupconfigured to pass addresses through directly (required for ElastiCache's DNS-based endpoint resolution).Note that the underlying ioredis library does not support dynamic passwords, so using a token refresh timer is the workaround for now until ioredis supports dynamic passwords.
New environment variables introduced
AWS_REGIONREDIS_USERNAMEREDIS_CLUSTER_MODE_ENABLED1to connect using Redis Cluster mode.REDIS_AWS_IAM_AUTH_ENABLED1to enable IAM authentication for Redis.REDIS_AWS_REGIONAWS_REGION).REDIS_AWS_IAM_CACHE_NAMEEnvironment Variable Validation
When
REDIS_AWS_IAM_AUTH_ENABLED=1, the environment validation enforces:REDIS_TLS_ENABLED=1(ElastiCache IAM requires TLS)REDIS_AWS_IAM_CACHE_NAMEmust be setREDIS_AWS_REGIONorAWS_REGIONmust be setPnpm-lock file generation
Like MSK, the CI pipelines will fail because of the pnpm-lock file. We're downgrading dependencies and not building a few packages, due to constraints in our environment. Is it possible for a code maintainer to help us generate the pnpm-lock file?
Checklist