Remove legacy Gossip support

## Background

Gossip was added to kOps in **v1.7** (July 2017) as a way to run a cluster without depending on Route53 or any external DNS provider. The cluster name's `.k8s.local` suffix was the trigger; protokube on every node would discover peers via cloud-provider tag lookups and use a gossip overlay to converge on the live set of control-plane and api-server IPs. That set was written to `/etc/hosts` so workers could resolve `api.internal.<cluster>` without ever touching DNS.

It was a pragmatic and welcome feature. For years it was the easiest way to spin up a kOps cluster, no zone, no NS records, no special access for the DNS provider. A lot of CI fleets, dev clusters, and air-gapped-ish setups have lived on it.

A condensed history of how we got here:

- **v1.7 (July 2017):** gossip introduced as the path for "no real DNS available." Initial implementation backed by `weaveworks/mesh`.
- **v1.16 (February 2020):** a second gossip implementation (`memberlistmesh`) shipped as an alternative to `weaveworks/mesh`, partly to give users a choice and partly for resilience against either implementation going unmaintained.
- **v1.26 (March 2023):** `--dns=none` introduced as a topology choice (Hetzner first).
- **v1.28 (September 2023):** release notes started telling users that `--dns=none` is the path forward and pointed at the gossip-deactivation.
- **v1.29 (May 2024):** `kops create cluster` started defaulting to `dns=none` for non-AWS/non-GCE clouds, and started emitting an explicit deprecation warning in CLI output: `"Gossip is deprecated, using None DNS instead"`. This was the first time the message reached operators at the command line rather than only in release notes.
- **v1.29.1 (July 2024):** the `dns=none` default extended to AWS and GCE too. From this point on, every new cluster on every cloud is None-DNS by default.
- **v1.30 (August 2024):** the `IsGossip()` predicate was renamed to `UsesLegacyGossip()` to signal long-term direction without breaking anyone.
- **v1.35 (March 2026):** `kops create cluster` no longer creates a gossip cluster at all, even the `.k8s.local` naming convention now produces a None-DNS cluster. Existing gossip clusters keep updating; new ones cannot be created via the standard flow.

This issue tracks finishing the removal.

## Why now

Two reasons, both mostly outside the project's control.

### Unmaintained upstream dependencies

Both gossip implementations are pulling in code that nobody's shipping fixes for:

- `github.com/weaveworks/mesh` - last commit 2019. Weaveworks itself wound down in 2024; the repository is archived in spirit if not in label. Anything CVE-relevant in its transitive graph (e.g. older `golang.org/x/*` versions, older crypto helpers) is on us to vendor and patch.
- `github.com/jacksontj/memberlistmesh` - personal fork from 2019 of HashiCorp's `memberlist`. Same unmaintained-vendor problem; the upstream `memberlist` has moved on, the fork has not.

Each of these brings a chunk of code that runs on every kOps node, in a privileged daemon (protokube), that we don't really get to update.

### Security pressure beyond just maintenance

protokube in gossip mode needs broad cloud-provider permissions, on **every node, including workers**, just to discover its peers. Workers in a None-DNS cluster don't need any of that. The kOps default permissions reflect this gap: gossip workers carry permissions that expose considerable cluster topology to anyone who can read the node's instance role.

We have been chipping at this for releases and have always been comfortable with a long deprecation runway. That's changing for a specific reason: **AI-powered security review tools are now surfacing both of the points above prominently**, the unmaintained mesh dependencies on every node, and the over-broad worker permissions. Whatever a Claude / Codex / Copilot-style scanner finds quickly and consistently is, by construction, a low-friction discovery for someone with malicious intent. The same automation that makes the security posture obvious to defenders makes it obvious to attackers. The deprecation can no longer be paced by what's comfortable for us; it has to be paced by the realistic window before this surface is actively exploited.

Would rather not rush this. Gossip enabled real work for a long time and we don't take user-facing breakage lightly. But the cost of holding the line is now higher than the cost of finishing the migration.

## Proposal

Open for discussion, **not a decision yet**:

- **kOps 1.36:** ship a simple hybrid mode that lets a gossip cluster keep gossip on the control plane while bootstrapping workers off the API load balancer. Gives operators a single `kops reconcile` path to take protokube (and the unmaintained mesh dependencies, and the over-broad worker IAM) off worker nodes without flipping topology in the same window. New gossip clusters remain refused by `kops create cluster`; existing ones keep updating.
- **kOps 1.37:** remove gossip code paths entirely. `protokube` ships without `weaveworks/mesh` or `memberlistmesh`.

A 1.36→1.37 gap of one minor is short, but the warning has been in front of operators since v1.29 (May 2024). The proposal here is the final step of a deprecation that has been visible in `kops create cluster` output for over 2 years.

If a longer runway is needed for specific operator groups, long-lived gossip clusters that can't easily acquire an API load balancer, or environments where the hybrid bridge in 1.36 is insufficient, please say so on this issue with concrete details. Counter-proposals welcome.

CC @justinsb @rifelpet @ameukam 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove legacy Gossip support #18240

Background

Why now

Unmaintained upstream dependencies

Security pressure beyond just maintenance

Proposal

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Remove legacy Gossip support #18240

Description

Background

Why now

Unmaintained upstream dependencies

Security pressure beyond just maintenance

Proposal

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions