Refactor tunnel handling into pluggable backends and add SSH jump-host tunneling#529
Conversation
✅ Deploy Preview for interlink-dev ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
- Add TunnelType, RatholeExecutableURL, and RatholeCommand fields to Network config - Add templates/rathole-template.yaml with Deployment/ConfigMap/Service/Ingress for a rathole server using WebSocket transport (compatible with nginx ingress) - Embed rathole template; add DefaultRatholeExecutableURL and DefaultRatholeCommand constants; add annRatholeClientCmds annotation constant - Modify executeWstunnelTemplate to select rathole template when TunnelType==rathole - Modify addWstunnelClientAnnotation to generate rathole client TOML + bootstrap command when rathole mode is active; clear stale wstunnel annotation - Update clearConflictingNetworkAnnotations to also remove annRatholeClientCmds in full-mesh mode - Update cleanupWstunnelResources to also delete the rathole ConfigMap on teardown - Add tests covering rathole template rendering, annotation generation, and the updated clearConflictingNetworkAnnotations behaviour Agent-Logs-Url: https://github.com/interlink-hq/interLink/sessions/76bae3af-62af-4408-96ac-81a236bbae50 Co-authored-by: dciangot <4144326+dciangot@users.noreply.github.com>
…ment, log skipped UDP ports Agent-Logs-Url: https://github.com/interlink-hq/interLink/sessions/76bae3af-62af-4408-96ac-81a236bbae50 Co-authored-by: dciangot <4144326+dciangot@users.noreply.github.com>
|
@copilot lint action is failing |
- Rewrite if-else chain in addWstunnelClientAnnotation as a switch statement (gocritic) - Replace WriteString(fmt.Sprintf(...)) with fmt.Fprintf(...) (staticcheck QF1012) - Fix struct field alignment in rathole_test.go (goimports) Agent-Logs-Url: https://github.com/interlink-hq/interLink/sessions/d9236412-33e3-4b2c-af8a-71a1d7613670 Co-authored-by: dciangot <4144326+dciangot@users.noreply.github.com>
Fixed in 4fca6a0. Four lint issues addressed:
|
|
To implement Option A using Traefik Ingress (Kubernetes), you need to use a IngressRouteTCP resource. This allows Traefik to handle the TLS termination for your DNS name on port 443 while passing raw TCP traffic to the Rathole service. 1. The Traefik IngressRouteTCPStandard Kubernetes Ingress only supports HTTP. To do this correctly, you must use the Traefik Custom Resource Definition (CRD) for TCP. apiVersion: traefik.io/v1alpha1
kind: IngressRouteTCP
metadata:
name: triton-rathole-tcp
namespace: default
spec:
entryPoints:
- websecure
routes:
- match: HostSNI(`triton.yourdomain.com`)
services:
- name: rathole-server-service
port: 1234 # The port Rathole is listening on inside the cluster
tls:
secretName: your-certs-secret # Your Let's Encrypt or manual SSL cert
2. Why this solves the latency issue
3. Your Rathole ConfigurationWith Traefik handling the TLS termination, your Rathole Server should listen for plain TCP, and your Rathole Client (where Triton is) will connect via TLS. [server]
bind_addr = "0.0.0.0:1234" # Traefik sends plain TCP here
[server.services.triton]
token = "your_secure_token"
Rathole Client (client.toml): [client]
remote_addr = "triton.yourdomain.com:443"
transport.type = "tls" # Client wraps in TLS to talk to Traefik
transport.tls.sni = "triton.yourdomain.com"
[client.services.triton]
local_addr = "triton-service:8000" # Your actual Triton service address
4. Critical Networking Tip: initcwndEven with this high-performance setup, you are still bound by the laws of TCP "Slow Start." When your batch hits 15KB, it is slightly larger than the default 14.6KB (10 segments) that a Linux server is allowed to send in its "first flight." # Increase the initial window to ~45KB to ensure a 15KB batch clears in 1 RTT
sudo ip route change default via <gateway> dev eth0 initcwnd 30 initrwnd 30
Summary Checklist
Whether you need to pass a certificate or CA to the Rathole client depends on how "official" your SSL certificate is. 1. If you are using a Public CA (Let's Encrypt, ZeroSSL, etc.)If Traefik is using a valid certificate from a trusted authority (like the ones managed by cert-manager in Kubernetes), you usually don't need to pass a CA file.
2. If you are using a Self-Signed CertIf you generated your own certificate for triton.yourdomain.com, the client will reject the connection because it can't verify the "Chain of Trust."
[client]
remote_addr = "triton.yourdomain.com:443"
transport.type = "tls"
transport.tls.sni = "triton.yourdomain.com"
transport.tls.ca_crt = "/path/to/your/ca.crt" # Path to the CA that signed Traefik's cert
3. The "Insecure" Shortcut (Not recommended for Prod)If you just want to test if this fixes your 15KB latency cliff and don't want to mess with certificates yet, you can tell Rathole to skip verification.
[client.transport.tls]
insecure = true
Why this matters for your Latency CliffEven if you use a certificate, the TLS Handshake only happens once when the Rathole client connects to the server.
Summary Checklist for your Client:
|
|
@copilot use rathole but with tls, via cert-manager self signed ca provided by the admin. Sign the client cert with that ca too. |
Agent-Logs-Url: https://github.com/interlink-hq/interLink/sessions/a0ffa622-8d08-4ef5-8c1b-b5bc31967057 Co-authored-by: dciangot <4144326+dciangot@users.noreply.github.com>
Done in 10d34ee. Here's what changed: Kubernetes side (
Server Compute side (client annotation) — waits for cert-manager to issue [client.transport]
type = "tls"
[client.transport.tls]
hostname = "rathole-<name>.<dns>"
trusted_root = "/tmp/rathole-ca.crt"
cert = "/tmp/rathole-client.crt"
key = "/tmp/rathole-client.key"The CA cert, client cert, and client key are all base64-embedded in the bootstrap command annotation. Config: Network:
TunnelType: "rathole"
RatholeCAIssuerName: "my-admin-ca" # cert-manager ClusterIssuer name
RatholeCAIssuerKind: "ClusterIssuer" # default; or "Issuer"WebSocket transport is retained as a backward-compatible fallback when |
|
@copilot lint action is failing |
Lint is clean on the current branch (0 issues with golangci-lint v2.11.2). The |
|
@copilot address the latest review comments |
…olling, UDP filter, verb validation, test fix Agent-Logs-Url: https://github.com/interlink-hq/interLink/sessions/35ce6acd-f8e9-43b3-8aaa-e3a97405d8d4 Co-authored-by: dciangot <4144326+dciangot@users.noreply.github.com>
….Count before Sprintf Agent-Logs-Url: https://github.com/interlink-hq/interLink/sessions/35ce6acd-f8e9-43b3-8aaa-e3a97405d8d4 Co-authored-by: dciangot <4144326+dciangot@users.noreply.github.com>
Fixed in 43a7d19. Latest review comments addressed:
|
Add a comprehensive test suite for the rathole tunnel port-forwarding feature introduced in PR #529. ## test/portforward/ (new) - Docker Compose two-network topology: nginx backend + 4 rathole containers (TCP and WebSocket modes) on isolated networks. - pytest suite (test_tunnel.py, 27 tests): verify TCP and WS tunnels forward HTTP traffic, multi-port isolation, and annotation TOML format. - pytest suite (test_tls_k8s.py, 10 tests): verify cert-manager Certificates, Traefik IngressRouteTCP, TLS secrets, pod annotation, and cleanup lifecycle. - conftest.py with Docker tunnel URL fixtures + Kubernetes API fixtures. - pyproject.toml with all dependencies (kubernetes>=28.0). ## test/vk-test-set (in submodule) - templates/130-port-forwarding.yaml: pod with containerPort 8080 running a Python HTTP server; validates Running phase + server startup log. - vktestset/port_forward_test.py (10 tests): end-to-end infrastructure test using a module-scoped fixture; verifies: * Pod reaches Running phase on VK node * interlink.eu/rathole-client-commands annotation is set * Shadow namespace ({namespace}-wstunnel) is created * Rathole server Deployment and Service are provisioned * cert-manager Certificates (server + client) reach Ready=True * TLS Secrets contain ca.crt, tls.crt, tls.key * Traefik IngressRouteTCP is created * Annotation encodes valid TOML with [client], remote_addr :443, TLS transport, and p8080 service ## plugins/slurm (submodule) - prepare.go: inject interlink.eu/rathole-client-commands into the SLURM job script prefix (mirrors existing wstunnel-client-commands handling) so the rathole client bootstrap command runs alongside the job. ## CI scripts - k3s-test-setup.sh: Traefik v3 CRDs, cert-manager, CA ClusterIssuer chain, interlink namespace, VK RBAC with cert-manager/traefik perms, VK config with Network.TunnelType=rathole and RatholeCAIssuerName, rathole compose. - k3s-test-run.sh: portforward pytest section (isolated venv). - k3s-test-cleanup.sh: TLS resource state capture + compose teardown. - integration-test-k3s.yaml: updated artifact paths. Signed-off-by: dciangot <dciangot@example.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: rocky Cloud User <rocky@ood-test.cloudcnaf>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: rocky Cloud User <rocky@ood-test.cloudcnaf>
Add a new 'Running individual test suites manually' section to Developers.md covering the three independent test suites: 1. VK pod integration tests (test/vk-test-set/) — venv setup, pytest commands, filter flags, port_forward_test.py assertion table, troubleshooting tips, Rocky/CentOS 9 oauthlib workaround note. 2. Docker tunnel unit tests (test/portforward/) — Docker Compose bring-up, pytest commands by class, network topology diagram, env-var table, teardown. 3. Quick health-check commands — kubectl one-liners to verify cluster state, VK connectivity, shadow namespaces, and cert-manager Certificates. Also update: - 'What the scripts do' table: document rathole Docker Compose environment and cert-manager/Traefik setup added to k3s-test-setup.sh. - 'Artefacts and logs' listing: add portforward-test-results.log, rathole-server-tcp.log, rathole-server-ws.log, and test-results.log. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: rocky Cloud User <rocky@ood-test.cloudcnaf>
The submodule was pointing to commit 557428cce047584fe7f2c30f87e22301fcb023e8 which does not exist in the remote vk-test-set repository, causing the e2e CI job to fail during submodule checkout. Update to 3404e2d21bef92a04beb683ce69b43c3edbbe171 (HEAD of main branch).
This PR decouples tunnel logic from
addWstunnelClientAnnotation/executeWstunnelTemplateand introduces a backend abstraction so new tunnel types can be added without expanding central switch logic. It also adds an SSH tunnel backend that supports forwarding through a jump host using key material from Kubernetes Secrets.Backend abstraction + registry
TunnelBackendinterface (tunnel_backend.go) covering:newTunnelBackend(...)with explicit support for:""/wstunnelratholesshTunnelTypenow fails fast with a clear error.Provider integration
Provider.tunnelBackendandensureTunnelBackend()for one-time backend resolution/binding.executeWstunnelTemplatenow asks backend for template content (KubernetesTemplate()), with fallback to existing wstunnel template.createDummyPodnow delegates backend-specific server resources viaServerResources(...).cleanupWstunnelResourcesnow delegates backend-specific cleanup viaCleanupResources(...).addWstunnelClientAnnotationnow delegates client command generation to backend in non-full-mesh mode and sets the backend-provided annotation key.Backend implementations
WstunnelBackend(backend_wstunnel.go) to preserve existing default behavior.RatholeBackend(backend_rathole.go) encapsulating:SSHBackend(backend_ssh.go) with:SSHJumpHost)id_rsaorid_ed25519)SSHCommand) with strict%sverb-count validationinterlink.eu/ssh-client-commands.Config surface expansion
Networkconfig with SSH fields:SSHJumpHostSSHJumpKeySecretNameSSHJumpKeySecretNamespaceSSHRemoteHostSSHCommandssh.Annotation conflict handling
annSSHClientCmds.Tests
tunnel_backend_test.go)Example (new delegation path in annotation generation):