Scale-to-zero NAT instances for AWS. Stop paying for NAT when nothing is running.
nat-zero is a Terraform module that replaces always-on NAT with on-demand NAT instances. When a workload launches in a private subnet, a NAT instance starts automatically. When the last workload stops, the NAT shuts down and its Elastic IP is released. Idle cost: ~$0.80/month per AZ.
Built around a NAT Zero AMI baked in-repo and promoted through a dedicated workflow. Orchestrated by a single Go Lambda (~55 ms cold start, 29 MB memory). Integration-tested against real AWS infrastructure on every PR.
AZ-A (active) AZ-B (idle)
┌──────────────────┐ ┌──────────────────┐
│ Workloads │ │ No workloads │
│ ↓ route table │ │ No NAT instance │
│ Private ENI │ │ No EIP │
│ ↓ │ │ │
│ NAT Instance │ │ Cost: ~$0.80/mo │
│ ↓ │ │ (EBS only) │
│ Public ENI + EIP │ │ │
│ ↓ │ └──────────────────┘
│ Internet Gateway │
└──────────────────┘
▲
EventBridge → Lambda (reconciler, concurrency=1)
| State | nat-zero | fck-nat | NAT Gateway |
|---|---|---|---|
| Idle (no workloads) | ~$0.80/mo | ~$7-8 | ~$36+ |
| Active (workloads running) | ~$7-8 | ~$7-8 | ~$36+ |
AWS NAT Gateway costs ~$36/month per AZ even when idle. fck-nat brings that to roughly ~$7-8/month, but the instance and EIP stay allocated 24/7. nat-zero releases the Elastic IP when idle, avoiding the $3.60/month public IPv4 charge.
Best for dev/staging environments, CI/CD runners, batch jobs, and side projects where workloads run intermittently. If you need a simpler always-on NAT instance, fck-nat is still a sensible option.
An EventBridge rule captures EC2 instance state changes. A Lambda function (concurrency=1, single writer) runs a reconciliation loop on each event:
- Observe — query workloads, NAT instances, and EIPs in the AZ
- Decide — compare actual state to desired state
- Act — take at most one mutating action, then return
The event is just a trigger — the reconciler always computes the correct action from current state. With reserved_concurrent_executions=1, events are processed sequentially, eliminating race conditions.
| Workloads? | NAT State | Action |
|---|---|---|
| Yes | None / terminated | Create NAT |
| Yes | Stopped | Start NAT |
| Yes | Stopping | Wait |
| Yes | Running, no EIP | Attach EIP |
| No | Running / pending | Stop NAT |
| No | Stopped, has EIP | Release EIP |
| — | Multiple NATs | Terminate duplicates |
Each NAT uses two persistent ENIs (public + private) created by Terraform. They survive stop/start cycles, keeping route tables intact.
See Architecture for the full reconciliation model and event flow diagrams.
module "nat_zero" {
source = "github.com/MachineDotDev/nat-zero"
name = "my-nat"
vpc_id = module.vpc.vpc_id
availability_zones = ["us-east-1a", "us-east-1b"]
public_subnets = module.vpc.public_subnets
private_subnets = module.vpc.private_subnets
private_route_table_ids = module.vpc.private_route_table_ids
private_subnets_cidr_blocks = module.vpc.private_subnets_cidr_blocks
tags = { Environment = "dev" }
}See Examples for spot instances, custom AMIs, and building from source.
The module intentionally supports exactly three ways to supply the Lambda binary:
- Default release artifact
- Normal path for end users
- The module downloads the versioned
lambda.zipand reads the matchinglambda.zip.base64sha256from the tagged GitHub release - The checksum file exists so Terraform can know the Lambda code hash during
plan, before it downloads the zip duringapply - When a new release publishes a different checksum, Terraform sees the
source_code_hashchange duringplanand knows the Lambda must be updated
- Pre-built local zip via
lambda_binary_path- Best for CI, unreleased branch testing, or custom binaries
- Terraform hashes the local file during plan
- Apply-time build via
build_lambda_locally = true- Local development only
- Requires Go and
zip - May require a second apply after Lambda code changes
| Audience | Recommended module ref | Recommended Lambda path | Why |
|---|---|---|---|
| Normal end users | Release tag such as ?ref=v0.4.0 |
Default release artifact | Stable module code, stable versioned Lambda artifact, and clean plan/apply behavior |
| CI, branch testing, unreleased validation | Branch or commit ref | lambda_binary_path |
Lets Terraform see Lambda code changes during plan before the branch has been released |
| Local module development | Working tree | build_lambda_locally = true |
Fastest iteration loop while changing Go code inside this repo |
ref=main is suitable for development, but it is not the stable consumption path for end users. If main has unreleased Go changes, the default Lambda artifact still comes from the latest tagged release until a new release is cut.
| Scenario | Time to connectivity |
|---|---|
| First workload (cold create) | ~10.7 s |
| Restart from stopped | ~8.5 s |
| NAT already running | Instant |
The Lambda is a compiled Go ARM64 binary. Cold start: 55 ms. Typical invocation: 400-600 ms. Peak memory: 29 MB. The startup delay is dominated by EC2 instance boot, not the Lambda.
See Performance for detailed timings and cost breakdowns.
- EventBridge scope: Captures all EC2 state changes in the account; Lambda filters by VPC ID.
- Startup delay: First workload in an idle AZ waits ~10 seconds for internet. Design scripts to retry outbound connections.
- Dual ENI: Persistent public + private ENIs survive stop/start cycles.
- AMI compatibility: The module defaults to the NAT Zero AMI track. Custom AMIs are supported only if they follow the same deterministic dual-ENI model.
fck-natAMIs are intentionally unsupported because their bootstrap interrogates IMDS/AWS to discover attached ENIs before nat-zero's EIP lifecycle has completed. - Retries: Failed Lambda invocations are retried up to 2 times by EventBridge.
- Clean destroy: A cleanup action terminates NAT instances before
terraform destroyremoves ENIs. - Config versioning: Changing AMI or instance type auto-replaces NAT instances on next workload event.
- EC2 events only: Currently nat-zero responds only to EC2 instance state changes. If you have a use case for other event sources (ECS tasks, Lambda, etc.), PRs are welcome.
| Name | Version |
|---|---|
| terraform | >= 1.4 |
| aws | >= 5.0 |
| http | >= 3.0 |
| null | >= 3.0 |
| time | >= 0.9 |
No modules.
| Name | Description | Type | Default | Required |
|---|---|---|---|---|
| ami_id | Explicit AMI ID to use (overrides AMI lookup entirely) | string |
null |
no |
| ami_name_pattern | AMI name pattern used when resolving the default nat-zero AMI. Override this to use your own shared AMI. | string |
"nat-zero-al2023-minimal-arm64-20260306-064438" |
no |
| ami_owner_account | Owner account ID used when resolving the default nat-zero AMI by name pattern. Override this to use your own shared AMI. | string |
"590144423513" |
no |
| availability_zones | List of availability zones to deploy NAT instances in | list(string) |
n/a | yes |
| block_device_iops | Provisioned IOPS for the gp3 root EBS volume. 3000 is the gp3 baseline included at no extra cost. | number |
3000 |
no |
| block_device_size | Size in GB of the root EBS volume | number |
10 |
no |
| block_device_throughput | Provisioned throughput in MB/s for the gp3 root EBS volume. 125 is the gp3 baseline included at no extra cost. | number |
125 |
no |
| build_lambda_locally | Build the Lambda binary from Go source during apply instead of downloading a pre-compiled release. This is primarily for local development and may require a second apply after code changes. | bool |
false |
no |
| enable_logging | Create a CloudWatch log group for the Lambda function | bool |
true |
no |
| encrypt_root_volume | Encrypt the root EBS volume. | bool |
true |
no |
| ignore_tag_key | Tag key used to mark instances the Lambda should ignore | string |
"nat-zero:ignore" |
no |
| ignore_tag_value | Tag value used to mark instances the Lambda should ignore | string |
"true" |
no |
| instance_type | Instance type for the NAT instance | string |
"t4g.nano" |
no |
| lambda_binary_path | Optional path to a pre-built Lambda zip on disk. Use this to build the artifact outside Terraform and avoid apply-time compilation. | string |
null |
no |
| lambda_memory_size | Memory allocated to the Lambda function in MB (also scales CPU proportionally) | number |
128 |
no |
| log_retention_days | CloudWatch log retention in days (only used when enable_logging is true) | number |
14 |
no |
| market_type | Whether to use spot or on-demand instances | string |
"on-demand" |
no |
| name | Name prefix for all resources created by this module | string |
n/a | yes |
| nat_tag_key | Tag key used to identify NAT instances | string |
"nat-zero:managed" |
no |
| nat_tag_value | Tag value used to identify NAT instances | string |
"true" |
no |
| private_route_table_ids | Route table IDs for the private subnets (one per AZ) | list(string) |
n/a | yes |
| private_subnets | Private subnet IDs (one per AZ) for NAT instance private ENIs | list(string) |
n/a | yes |
| private_subnets_cidr_blocks | CIDR blocks for the private subnets (one per AZ, used in security group rules) | list(string) |
n/a | yes |
| public_subnets | Public subnet IDs (one per AZ) for NAT instance public ENIs | list(string) |
n/a | yes |
| tags | Additional tags to apply to all resources | map(string) |
{} |
no |
| vpc_id | The VPC ID where NAT instances will be deployed | string |
n/a | yes |
| Name | Description |
|---|---|
| eventbridge_rule_arn | ARN of the EventBridge rule capturing EC2 state changes |
| lambda_function_arn | ARN of the nat-zero Lambda function |
| lambda_function_name | Name of the nat-zero Lambda function |
| launch_template_ids | Launch template IDs for NAT instances (one per AZ) |
| nat_private_eni_ids | Private ENI IDs for NAT instances (one per AZ) |
| nat_public_eni_ids | Public ENI IDs for NAT instances (one per AZ) |
| nat_security_group_ids | Security group IDs for NAT instances (one per AZ) |
Contributions welcome. Please open an issue or submit a pull request.
MIT