Setting up EKS Auto Mode using Terraform

Automate Kubernetes the easy way. Deploy once, explore GPU, Spot, Graviton, cost optimization, ODCR, disruption budgets, observability, and more. Minimal add-on management. Auto Mode handles core compute, storage, and networking add-ons for you.

Overview

Amazon EKS Auto Mode simplifies Kubernetes cluster management by automating compute, storage, and networking decisions. Under the hood it runs Karpenter, the AWS Load Balancer Controller, and the EBS CSI driver as managed components. You get the benefits without installing or upgrading any of them.

This repository is an educational companion. Each example demonstrates a specific EKS Auto Mode pattern (Graviton, GPU, Spot, ODCR targeting, disruption budgets, etc.) with a self-contained README explaining the "why" alongside the "how." Deploy the base cluster once, then apply individual examples to explore.

Key capabilities covered:

Graviton (ARM64) and x86 workloads side by side
GPU and Inferentia2 (Neuron) ML inference
Spot and On-Demand mixed pools with overprovision headroom
On-Demand Capacity Reservation targeting
Static capacity pools and disruption budgets
HPA and KEDA-driven autoscaling
KMS encryption for ephemeral node storage
CloudWatch Container Insights observability
5-layer resource tagging for cost allocation

Prerequisites

Required Tools:

Note: This project currently provides Linux-specific commands in the examples. Windows compatibility will be added in future updates.

Claude Code Skills (Plugin)

This repo ships as a Claude Code plugin with two AI-assisted skills for EKS Auto Mode:

Skill	Audience	What it covers
`eks-automode-onboard`	Newcomers	Concepts, deployment, example selection, troubleshooting
`eks-automode-maintain`	Repo maintainers	Rendering chain, 5-layer tagging, docs sync, PR checklist

Install

/plugin marketplace add https://github.com/aws-samples/sample-aws-eks-auto-mode.git

/plugin install eks-automode@sample-aws-eks-auto-mode

Alternative: manual install

git clone https://github.com/aws-samples/sample-aws-eks-auto-mode.git
cp -r sample-aws-eks-auto-mode/skills/eks-automode-onboard ~/.claude/skills/
cp -r sample-aws-eks-auto-mode/skills/eks-automode-maintain ~/.claude/skills/

Quick Start

Clone Repository:

git clone https://github.com/aws-samples/sample-aws-eks-auto-mode.git
cd sample-aws-eks-auto-mode

Deploy Cluster:

cd terraform
terraform init
terraform apply -auto-approve

# Configure kubectl
$(terraform output -raw configure_kubectl)

Apply an example (e.g., Graviton):

kubectl apply -f examples/graviton/

Examples

Each example has its own README with detailed explanations of the underlying mechanics.

Compute Patterns

Example	Description
Graviton	ARM64 workloads on cost-effective Graviton instances. Deploys a 2048 game.
Spot	Fault-tolerant workloads on EC2 Spot with diverse instance families. Deploys a 2048 game.
GPU	GPU-accelerated ML inference (Qwen 3 on NVIDIA GPUs).
Neuron	ML inference on AWS Inferentia2 (DeepSeek-R1-Qwen3-8B served by vLLM).

Cost Optimization

Example	Description
Cost Optimization	OD/Spot mixed pools with weighted priorities and pause-pod overprovision headroom.

Advanced Scheduling

Example	Description
Capacity Reservation	Pin workloads to On-Demand Capacity Reservations (ODCRs) so reserved capacity is consumed.
Static Capacity	Maintain a fixed fleet of always-on nodes using `spec.replicas`, immune to consolidation.
Batch Jobs	Protect long-running jobs from eviction using `do-not-disrupt` annotations and dedicated NodePools.
Disruption Budgets	Limit simultaneous node drains during consolidation to prevent cascading failures.

Autoscaling

Example	Description
Pod Autoscaling	HPA for CPU-based scaling plus KEDA for event-driven scaling (SQS queue depth).

Observability

Example	Description
Observability	CloudWatch Container Insights integration for metrics, pod logs, and Application Signals tracing.

Cleanup

A standalone cleanup script handles the full teardown lifecycle. It drains Kubernetes-controller-managed AWS resources (ALBs, EBS volumes, EC2 instances) before terraform destroy, then sweeps for any orphans that survived.

# Recommended: interactive cleanup (prompts per resource)
./scripts/cleanup.sh

# Non-interactive: delete everything
./scripts/cleanup.sh --yes

# Preview what would be deleted
./scripts/cleanup.sh --dry-run

# Delete everything except storage (PVCs/EBS)
./scripts/cleanup.sh --yes --keep-storage

# Orphan sweep only (terraform already destroyed)
./scripts/cleanup.sh --skip-terraform --cluster-name <name> --region <region>

The script runs in three phases:

Pre-drain deletes Ingresses, LoadBalancer Services, PVCs, Helm releases, NodePools/NodeClaims while the cluster API is alive so controllers can fire finalizers and release AWS resources.
Terraform destroy runs terraform init + destroy for both the main and KEDA terraform roots.
Orphan sweep scans for resources tagged with the cluster name (or matching known patterns for untaggable resources like Auto Mode internal volumes) and prompts for deletion.

Why not just terraform destroy? A bare terraform destroy doesn't drain Kubernetes-managed resources first. ALBs, EBS volumes, EC2 instances, and ENIs created by in-cluster controllers (ALB controller, EBS CSI, Karpenter) are not in Terraform state. They persist as orphans after the cluster is gone. The cleanup script handles these.

Manual alternative (not recommended)

cd terraform
terraform init
terraform destroy --auto-approve

This only destroys Terraform-managed resources. You will need to manually find and delete any orphaned load balancers, volumes, instances, security groups, IAM roles, OIDC providers, and CloudWatch log groups.

Configuration

All inputs are defined in terraform/variables.tf. Override them with -var flags or a terraform.tfvars file.

Variable	Description	Default
`name`	Name of the VPC and EKS cluster	`automode-cluster`
`region`	AWS region to deploy into	`us-west-2`
`eks_cluster_version`	EKS Kubernetes version	`1.34`
`vpc_cidr`	VPC CIDR block (RFC 1918)	`10.0.0.0/16`
`tags`	Tags applied to every taggable resource (provider default_tags, EKS primary SG, NodeClass EC2/EBS/ENI, StorageClass EBS, ALB)	`{"auto-delete" = "never"}`
`base_domain`	Public Route53 hosted zone for HTTPS exposure. Leave empty for internal-only (safe-by-default).	`""`
`subdomain`	Optional prefix under `base_domain` (e.g., `automode` gives `automode.example.com`). Ignored when `base_domain` is empty.	`""`
`ephemeral_storage_kms_key_id`	KMS key ID for encrypting ephemeral node storage. Leave empty for default encryption.	`""`
`enable_observability`	Enable CloudWatch Container Insights addon (metrics, logs, Application Signals). Incurs CloudWatch costs.	`false`

Example: public exposure with observability:

terraform apply \
  -var='base_domain=example.com' \
  -var='subdomain=automode' \
  -var='enable_observability=true'

Components

How EKS Auto Mode Works

EKS Auto Mode fully automates the operational overhead of running Kubernetes on AWS. Rather than requiring you to install, configure, and upgrade individual cluster add-ons, Auto Mode runs them as managed components inside the EKS control plane. Specifically, it:

Provisions, scales, and consolidates compute. Powered by Karpenter, it matches pending pods to optimal EC2 instances, bins-packs efficiently, and removes underutilized nodes automatically.
Manages pod networking. Handles VPC CNI configuration, IP address allocation, and security group enforcement without any DaemonSet you need to maintain.
Handles persistent storage. Provisions and attaches EBS volumes from PersistentVolumeClaims via the managed EBS CSI driver.
Automates load balancing. Creates ALBs and NLBs from Ingress and Service resources, including TLS termination with ACM certificates.
Runs CoreDNS. Cluster DNS is a managed component with no installation or tuning required.
Manages Pod Identity Agent. Enables fine-grained IAM roles for pods without manual IRSA configuration.
Monitors node health. Detects unhealthy nodes and automatically repairs or replaces them.
Handles AMI selection and patching. Picks the correct AMI for each instance type, applies security patches, and remediates drift.

All of these run in the control plane. You never install, configure, or upgrade them. You interact through standard Kubernetes APIs (NodePool, NodeClass, Ingress, StorageClass, etc.) and EKS handles the rest.

NodePool → NodeClass → EC2 Flow

The provisioning path:

Pod pending → Karpenter matches NodePool constraints (instance families, AZs, capacity type)
           → NodePool references a NodeClass (subnet selection, security groups, tags, storage)
           → Karpenter launches an EC2 instance matching the constraints
           → kubelet registers the node and the pod is scheduled

NodePools define what to launch (instance types, architectures, capacity type, taints/labels). NodeClasses define how to launch (subnets, SGs, ephemeral storage, tags pushed to EC2/EBS/ENI).

Load Balancer Configuration

EKS Auto Mode automates ALB and NLB setup:

Application Load Balancer (ALB): IngressClass-based, supports shared ALB groups across namespaces. Docs
Network Load Balancer (NLB): Native Kubernetes Service type LoadBalancer. Docs

Subnet tagging requirement: If subnet IDs are not explicit in IngressClassParams, subnets need kubernetes.io/role/elb: "1" (public) or kubernetes.io/role/internal-elb: "1" (private). The Terraform code in this repo adds these tags automatically.

Public exposure (opt-in)

By default this stack is safe-by-default: every example workload exposes an internal-scheme load balancer reachable only via kubectl port-forward. Nothing is published to the public internet without an explicit opt-in.

To expose the example workloads on a real domain over HTTPS, set var.base_domain (and optionally var.subdomain) to a public Route53 hosted zone you already own:

terraform apply -var='base_domain=example.com' -var='subdomain=automode'

When base_domain is set, Terraform will:

Look up the existing public hosted zone (it does not create one; the zone must already exist and be the authoritative DNS for that name).
Issue an ACM wildcard certificate *.<subdomain>.<base_domain> validated via DNS records added to the zone.
Install external-dns bound to a Pod Identity IAM role scoped to only that hosted zone (not Route53FullAccess).
Switch the cluster-wide IngressClass alb to internet-facing with a shared ALB group so all example Ingresses share one load balancer.
Render each example with a public hostname and the appropriate annotations.

Workload hostnames once enabled:

Example	URL
`examples/graviton`	`https://2048-graviton.<full_domain>`
`examples/spot`	`https://2048-spot.<full_domain>`
`examples/gpu`	`https://gpu.<full_domain>`
`examples/neuron`	`https://neuron.<full_domain>`

The ALB controller picks the right certificate via SNI from each Ingress's host: against the wildcard cert. No certificateArn is configured anywhere.

To revert to safe-by-default, unset var.base_domain and re-apply.

EBS CSI Driver

EKS Auto Mode includes the EBS CSI driver as a managed component. No installation required.

Only volumes provisioned from a StorageClass using ebs.csi.eks.amazonaws.com can mount on Auto Mode nodes.
Existing volumes need migration via volume snapshots.
Custom KMS encryption may require additional IAM permissions.

AWS Documentation

Learn More

EKS Auto Mode documentation: official AWS guide covering setup, NodePools, NodeClasses, and managed components
Karpenter documentation: the provisioner that powers Auto Mode's compute layer; useful for understanding NodePool/NodeClass semantics
karpenter-blueprints: additional Karpenter patterns beyond what this repo covers
platform-engineering-on-eks: broader platform engineering patterns on EKS

Security Considerations

See SECURITY_CONSIDERATIONS.md for Checkov scan results and documented exceptions.

Contributing

Contributions welcome! Please read our Contributing Guidelines and Code of Conduct.

License and Disclaimer

License

This project is licensed under the MIT License - see LICENSE file.

Disclaimer

This repository is intended for demonstration and learning purposes only. It is not intended for production use. The code provided here is for educational purposes and should not be used in a live environment without proper testing, validation, and modifications.

Use at your own risk. The authors are not responsible for any issues, damages, or losses that may result from using this code in production.

In this samples, there may be use of third-party models ("Third-Party Models") that AWS does not own, and that AWS does not exercise control over. By using any prototype or proof of concept from AWS you acknowledge that the Third-Party Models are "Third-Party Content" under your agreement for services with AWS. You should perform your own independent assessment of the Third-Party Models. You should also take measures to ensure that your use of the Third-Party Models complies with your own specific quality control practices and standards, and the local rules, laws, regulations, licenses and terms of use that apply to you, your content, and the Third-Party Models. AWS does not make any representations or warranties regarding the Third-Party Models, including that use of the Third-Party Models and the associated outputs will result in a particular outcome or result. You also acknowledge that outputs generated by the Third-Party Models are Your Content/Customer Content, as defined in the AWS Customer Agreement or the agreement between you and AWS for AWS Services. You are responsible for your use of outputs from the Third-Party Models.

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
.claude-plugin		.claude-plugin
.github		.github
examples		examples
misc/website		misc/website
nodepool-templates		nodepool-templates
scripts		scripts
skills		skills
terraform		terraform
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY_CONSIDERATIONS.md		SECURITY_CONSIDERATIONS.md

Folders and files

Latest commit

History

Repository files navigation

Setting up EKS Auto Mode using Terraform

Table of Contents

Overview

Prerequisites

Claude Code Skills (Plugin)

Install

Alternative: manual install

Quick Start

Examples

Compute Patterns

Cost Optimization

Advanced Scheduling

Autoscaling

Observability

Cleanup

Configuration

Components

How EKS Auto Mode Works

NodePool → NodeClass → EC2 Flow

Load Balancer Configuration

Public exposure (opt-in)

EBS CSI Driver

Learn More

Security Considerations

Contributing

License and Disclaimer

License

Disclaimer

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages