Automate Kubernetes the easy way. Deploy once, explore GPU, Spot, Graviton, cost optimization, ODCR, disruption budgets, observability, and more. Minimal add-on management. Auto Mode handles core compute, storage, and networking add-ons for you.
- Overview
- Prerequisites
- Claude Code Skills (Plugin)
- Quick Start
- Examples
- Cleanup
- Configuration
- Components
- Learn More
- Contributing
- License and Disclaimer
Amazon EKS Auto Mode simplifies Kubernetes cluster management by automating compute, storage, and networking decisions. Under the hood it runs Karpenter, the AWS Load Balancer Controller, and the EBS CSI driver as managed components. You get the benefits without installing or upgrading any of them.
This repository is an educational companion. Each example demonstrates a specific EKS Auto Mode pattern (Graviton, GPU, Spot, ODCR targeting, disruption budgets, etc.) with a self-contained README explaining the "why" alongside the "how." Deploy the base cluster once, then apply individual examples to explore.
Key capabilities covered:
- Graviton (ARM64) and x86 workloads side by side
- GPU and Inferentia2 (Neuron) ML inference
- Spot and On-Demand mixed pools with overprovision headroom
- On-Demand Capacity Reservation targeting
- Static capacity pools and disruption budgets
- HPA and KEDA-driven autoscaling
- KMS encryption for ephemeral node storage
- CloudWatch Container Insights observability
- 5-layer resource tagging for cost allocation
Required Tools:
Note: This project currently provides Linux-specific commands in the examples. Windows compatibility will be added in future updates.
This repo ships as a Claude Code plugin with two AI-assisted skills for EKS Auto Mode:
| Skill | Audience | What it covers |
|---|---|---|
eks-automode-onboard |
Newcomers | Concepts, deployment, example selection, troubleshooting |
eks-automode-maintain |
Repo maintainers | Rendering chain, 5-layer tagging, docs sync, PR checklist |
/plugin marketplace add https://github.com/aws-samples/sample-aws-eks-auto-mode.git/plugin install eks-automode@sample-aws-eks-auto-modegit clone https://github.com/aws-samples/sample-aws-eks-auto-mode.git
cp -r sample-aws-eks-auto-mode/skills/eks-automode-onboard ~/.claude/skills/
cp -r sample-aws-eks-auto-mode/skills/eks-automode-maintain ~/.claude/skills/- Clone Repository:
git clone https://github.com/aws-samples/sample-aws-eks-auto-mode.git
cd sample-aws-eks-auto-mode- Deploy Cluster:
cd terraform
terraform init
terraform apply -auto-approve
# Configure kubectl
$(terraform output -raw configure_kubectl)- Apply an example (e.g., Graviton):
kubectl apply -f examples/graviton/Each example has its own README with detailed explanations of the underlying mechanics.
| Example | Description |
|---|---|
| Graviton | ARM64 workloads on cost-effective Graviton instances. Deploys a 2048 game. |
| Spot | Fault-tolerant workloads on EC2 Spot with diverse instance families. Deploys a 2048 game. |
| GPU | GPU-accelerated ML inference (Qwen 3 on NVIDIA GPUs). |
| Neuron | ML inference on AWS Inferentia2 (DeepSeek-R1-Qwen3-8B served by vLLM). |
| Example | Description |
|---|---|
| Cost Optimization | OD/Spot mixed pools with weighted priorities and pause-pod overprovision headroom. |
| Example | Description |
|---|---|
| Capacity Reservation | Pin workloads to On-Demand Capacity Reservations (ODCRs) so reserved capacity is consumed. |
| Static Capacity | Maintain a fixed fleet of always-on nodes using spec.replicas, immune to consolidation. |
| Batch Jobs | Protect long-running jobs from eviction using do-not-disrupt annotations and dedicated NodePools. |
| Disruption Budgets | Limit simultaneous node drains during consolidation to prevent cascading failures. |
| Example | Description |
|---|---|
| Pod Autoscaling | HPA for CPU-based scaling plus KEDA for event-driven scaling (SQS queue depth). |
| Example | Description |
|---|---|
| Observability | CloudWatch Container Insights integration for metrics, pod logs, and Application Signals tracing. |
A standalone cleanup script handles the full teardown lifecycle. It drains Kubernetes-controller-managed AWS resources (ALBs, EBS volumes, EC2 instances) before terraform destroy, then sweeps for any orphans that survived.
# Recommended: interactive cleanup (prompts per resource)
./scripts/cleanup.sh
# Non-interactive: delete everything
./scripts/cleanup.sh --yes
# Preview what would be deleted
./scripts/cleanup.sh --dry-run
# Delete everything except storage (PVCs/EBS)
./scripts/cleanup.sh --yes --keep-storage
# Orphan sweep only (terraform already destroyed)
./scripts/cleanup.sh --skip-terraform --cluster-name <name> --region <region>The script runs in three phases:
- Pre-drain deletes Ingresses, LoadBalancer Services, PVCs, Helm releases, NodePools/NodeClaims while the cluster API is alive so controllers can fire finalizers and release AWS resources.
- Terraform destroy runs
terraform init+destroyfor both the main and KEDA terraform roots. - Orphan sweep scans for resources tagged with the cluster name (or matching known patterns for untaggable resources like Auto Mode internal volumes) and prompts for deletion.
Why not just
terraform destroy? A bareterraform destroydoesn't drain Kubernetes-managed resources first. ALBs, EBS volumes, EC2 instances, and ENIs created by in-cluster controllers (ALB controller, EBS CSI, Karpenter) are not in Terraform state. They persist as orphans after the cluster is gone. The cleanup script handles these.
Manual alternative (not recommended)
cd terraform
terraform init
terraform destroy --auto-approveThis only destroys Terraform-managed resources. You will need to manually find and delete any orphaned load balancers, volumes, instances, security groups, IAM roles, OIDC providers, and CloudWatch log groups.
All inputs are defined in terraform/variables.tf. Override them with -var flags or a terraform.tfvars file.
| Variable | Description | Default |
|---|---|---|
name |
Name of the VPC and EKS cluster | automode-cluster |
region |
AWS region to deploy into | us-west-2 |
eks_cluster_version |
EKS Kubernetes version | 1.34 |
vpc_cidr |
VPC CIDR block (RFC 1918) | 10.0.0.0/16 |
tags |
Tags applied to every taggable resource (provider default_tags, EKS primary SG, NodeClass EC2/EBS/ENI, StorageClass EBS, ALB) | {"auto-delete" = "never"} |
base_domain |
Public Route53 hosted zone for HTTPS exposure. Leave empty for internal-only (safe-by-default). | "" |
subdomain |
Optional prefix under base_domain (e.g., automode gives automode.example.com). Ignored when base_domain is empty. |
"" |
ephemeral_storage_kms_key_id |
KMS key ID for encrypting ephemeral node storage. Leave empty for default encryption. | "" |
enable_observability |
Enable CloudWatch Container Insights addon (metrics, logs, Application Signals). Incurs CloudWatch costs. | false |
Example: public exposure with observability:
terraform apply \
-var='base_domain=example.com' \
-var='subdomain=automode' \
-var='enable_observability=true'EKS Auto Mode fully automates the operational overhead of running Kubernetes on AWS. Rather than requiring you to install, configure, and upgrade individual cluster add-ons, Auto Mode runs them as managed components inside the EKS control plane. Specifically, it:
- Provisions, scales, and consolidates compute. Powered by Karpenter, it matches pending pods to optimal EC2 instances, bins-packs efficiently, and removes underutilized nodes automatically.
- Manages pod networking. Handles VPC CNI configuration, IP address allocation, and security group enforcement without any DaemonSet you need to maintain.
- Handles persistent storage. Provisions and attaches EBS volumes from PersistentVolumeClaims via the managed EBS CSI driver.
- Automates load balancing. Creates ALBs and NLBs from Ingress and Service resources, including TLS termination with ACM certificates.
- Runs CoreDNS. Cluster DNS is a managed component with no installation or tuning required.
- Manages Pod Identity Agent. Enables fine-grained IAM roles for pods without manual IRSA configuration.
- Monitors node health. Detects unhealthy nodes and automatically repairs or replaces them.
- Handles AMI selection and patching. Picks the correct AMI for each instance type, applies security patches, and remediates drift.
All of these run in the control plane. You never install, configure, or upgrade them. You interact through standard Kubernetes APIs (NodePool, NodeClass, Ingress, StorageClass, etc.) and EKS handles the rest.
The provisioning path:
Pod pending → Karpenter matches NodePool constraints (instance families, AZs, capacity type)
→ NodePool references a NodeClass (subnet selection, security groups, tags, storage)
→ Karpenter launches an EC2 instance matching the constraints
→ kubelet registers the node and the pod is scheduled
NodePools define what to launch (instance types, architectures, capacity type, taints/labels). NodeClasses define how to launch (subnets, SGs, ephemeral storage, tags pushed to EC2/EBS/ENI).
EKS Auto Mode automates ALB and NLB setup:
- Application Load Balancer (ALB): IngressClass-based, supports shared ALB groups across namespaces. Docs
- Network Load Balancer (NLB): Native Kubernetes Service type LoadBalancer. Docs
Subnet tagging requirement: If subnet IDs are not explicit in IngressClassParams, subnets need
kubernetes.io/role/elb: "1"(public) orkubernetes.io/role/internal-elb: "1"(private). The Terraform code in this repo adds these tags automatically.
By default this stack is safe-by-default: every example workload exposes an internal-scheme load balancer reachable only via kubectl port-forward. Nothing is published to the public internet without an explicit opt-in.
To expose the example workloads on a real domain over HTTPS, set var.base_domain (and optionally var.subdomain) to a public Route53 hosted zone you already own:
terraform apply -var='base_domain=example.com' -var='subdomain=automode'When base_domain is set, Terraform will:
- Look up the existing public hosted zone (it does not create one; the zone must already exist and be the authoritative DNS for that name).
- Issue an ACM wildcard certificate
*.<subdomain>.<base_domain>validated via DNS records added to the zone. - Install external-dns bound to a Pod Identity IAM role scoped to only that hosted zone (not
Route53FullAccess). - Switch the cluster-wide
IngressClass albtointernet-facingwith a shared ALB group so all example Ingresses share one load balancer. - Render each example with a public hostname and the appropriate annotations.
Workload hostnames once enabled:
| Example | URL |
|---|---|
examples/graviton |
https://2048-graviton.<full_domain> |
examples/spot |
https://2048-spot.<full_domain> |
examples/gpu |
https://gpu.<full_domain> |
examples/neuron |
https://neuron.<full_domain> |
The ALB controller picks the right certificate via SNI from each Ingress's host: against the wildcard cert. No certificateArn is configured anywhere.
To revert to safe-by-default, unset var.base_domain and re-apply.
EKS Auto Mode includes the EBS CSI driver as a managed component. No installation required.
- Only volumes provisioned from a StorageClass using
ebs.csi.eks.amazonaws.comcan mount on Auto Mode nodes. - Existing volumes need migration via volume snapshots.
- Custom KMS encryption may require additional IAM permissions.
- EKS Auto Mode documentation: official AWS guide covering setup, NodePools, NodeClasses, and managed components
- Karpenter documentation: the provisioner that powers Auto Mode's compute layer; useful for understanding NodePool/NodeClass semantics
- karpenter-blueprints: additional Karpenter patterns beyond what this repo covers
- platform-engineering-on-eks: broader platform engineering patterns on EKS
See SECURITY_CONSIDERATIONS.md for Checkov scan results and documented exceptions.
Contributions welcome! Please read our Contributing Guidelines and Code of Conduct.
This project is licensed under the MIT License - see LICENSE file.
This repository is intended for demonstration and learning purposes only. It is not intended for production use. The code provided here is for educational purposes and should not be used in a live environment without proper testing, validation, and modifications.
Use at your own risk. The authors are not responsible for any issues, damages, or losses that may result from using this code in production.
In this samples, there may be use of third-party models ("Third-Party Models") that AWS does not own, and that AWS does not exercise control over. By using any prototype or proof of concept from AWS you acknowledge that the Third-Party Models are "Third-Party Content" under your agreement for services with AWS. You should perform your own independent assessment of the Third-Party Models. You should also take measures to ensure that your use of the Third-Party Models complies with your own specific quality control practices and standards, and the local rules, laws, regulations, licenses and terms of use that apply to you, your content, and the Third-Party Models. AWS does not make any representations or warranties regarding the Third-Party Models, including that use of the Third-Party Models and the associated outputs will result in a particular outcome or result. You also acknowledge that outputs generated by the Third-Party Models are Your Content/Customer Content, as defined in the AWS Customer Agreement or the agreement between you and AWS for AWS Services. You are responsible for your use of outputs from the Third-Party Models.