cli: fix --resolve-host to preserve hostname for SNI, S3 signing, and Host header#454
cli: fix --resolve-host to preserve hostname for SNI, S3 signing, and Host header#454maniche1024 wants to merge 13 commits into
Conversation
|
Hi, @klauspost @harshavardhana , I've implemented transport-level DNS rotation and SNI preservation to fix gateway hotspots. I've included benchmark results in the description showing a ~16% throughput improvement. Ready for review when you have a moment, thanks in advance! |
|
I am not a particular fan. Mainly because round-robin is quite ineffective if there is even the slightest imbalance in the servers. Then you see throughput significantly below your capacity. I would much rather see that Add to type Client struct {
*minio.Client
Host *url.URL
}
// EndpointURL returns the endpoint URL.
func (c *Client) EndpointURL() *url.URL {
if c.Host != nil {
return c.Host
}
return c.Client.EndpointURL()
}Does that make sense? |
|
Hi Klaus, thanks for the guidance. Yeah it does make sense but I’ve located the client initialization logic in cli/client.go (I believe you might have been referring to this rather than pkg/bench, as that's where minio.New is called). I am currently refactoring getClient and newClient to use the Client wrapper you suggested so that EndpointURL() returns the original hostname for SNI and Host headers, even when connecting to the resolved IPs. I'll perform some validation tests and update the PR once verified. Does this sound like the right direction? |
|
Think I fully understood the context of the changes you were suggesting when started working on it. It definitely required more work than I had initially anticipated but thankfully was able to figure out eventually. Providing a summary of changes below:
Ran several tests to ensure:
Let me know your thoughts on this or if you need more information |
|
@harshavardhana You are probably the best to evaluate this. Sounds reasonable to me. |
|
@klauspost @harshavardhana just a quick bump on the Warp changes PR when you have a moment. I'm keen to get this merged so I can start on the next phase. Let me know if there's anything I can clarify! |
|
@klauspost thanks for the approval! But looks like I don't have merge privileges, so either would need that or please merge the PR. |
|
@maniche1024 I would want @harshavardhana to also take a look. He can merge if he approves. |
|
Hi @ramondeklein Thanks for the feedback. I’ve addressed all the review comments by removing the Client wrapper and updating the IP-pinning logic in the relevant transport layer files. I've also updated the Dialer to be proxy-aware, it now only performs IP pinning when no proxy is configured. I’ve verified the fix with my previous test suite, including cases for hosts supplied with custom ports. Everything is behaving as expected, and the results remain consistent. Hope things look good this time! |
ramondeklein
left a comment
There was a problem hiding this comment.
I think the proper way to deal with IP pinning is to only use a custom dialer and SNI when IP pinning is actually being used. It has too many side effects. It's probably fine not to support HTTP(S) proxies when using IP pinning, but it should work without it.
|
@ramondeklein I’ve pushed updates addressing all your recent comments, including the proxy-safety logic and the kTLS dialer optimization. The logic is now unified across client_tls, client_ktls, and client_transport. Ready for another look! |
Critical:
|
…erministic IP-level load distribution at the transport layer. Previously, Warp relied on the OS or a one-time resolution, which could lead to uneven traffic distribution when using a single hostname sitting in front of multiple gateways/IPs Key improvements: - Added a thread-safe DNS cache (sync.Map) and atomic counter to rotate between IPs for every new connection. - Fixed TLS SNI verification: when dialing a resolved IP, the original hostname is now explicitly set in TLSClientConfig.ServerName to prevent certificate hostname mismatches. - Applied rotation logic to both standard TLS and kTLS transport paths. - Optimized performance by reducing redundant DNS lookups during high-concurrency benchmarks.
…ing when using --resolve-host flag
fa5e768 to
a1137f2
Compare
|
Hi @ramondeklein, thanks for the thorough review again, I was out for about a month and half due to personal reasons hence could not address your review comments. Did it now after coming back to work, here's a summary:
I have also tested my changes thoroughly and everything looks good and I see expected results. |
|
Hi @ramondeklein — just wanted to follow up on this PR. I've addressed all the review comments you raised (SNI fix, endpoint hostname for signing, proxy bypass, and the stale description), and have pushed the updated changes. Would really appreciate another look when you get a chance. Happy to answer any questions or make further adjustments. Thanks! |
|
Hi @harshavardhana @0xMALVEE — wanted to flag this PR for your attention. @klauspost has approved it and I've fully addressed @ramondeklein's review comments (SNI fix, hostname-based S3 signing, proxy bypass, stale description). Happy to answer any questions. Would appreciate a look when you get a chance! |
ramondeklein
left a comment
There was a problem hiding this comment.
Previous Review Remarks — Status
| # | Reviewer remark | Status |
|---|---|---|
Variable naming in client.go:163 |
✅ Addressed | |
Rename host param to endpoint |
✅ Addressed | |
| Port number producing malformed address | ✅ Fixed (SplitHostPort before JoinHostPort) |
|
Same port issue in client_ktls.go |
✅ Fixed | |
| Remove SSL cert warning from flag | ✅ Removed | |
Remove wrapper type in ops.go |
✅ Removed | |
| Move dialer outside closure in ktls | ✅ Fixed (netD hoisted) |
|
SNI ServerName always being set |
✅ Fixed (both TLS and kTLS only set when originalHost != "") |
|
originalHost containing multiple URLs |
✅ Fixed | |
| DNS resolved twice (flags.go + newClient) | ✅ Fixed (clientTransport now passes empty params) |
|
| Proxy bypass logic | ❌ Still broken (see below) |
Remaining Issues
1. Proxy bypass logic is inverted — withResolveHost (client_transport.go:94) and getDialAddr (client_ktls.go:81)
The comment says "Proxy connections are not rewritten (if addr doesn't match our target, it's a proxy)" but the code does the opposite:
if host != targetHost {
dialAddr = net.JoinHostPort(targetHost, port) // rewrites addr → targetIP
}When a proxy is configured, Go's http.Transport calls DialContext with the proxy's address as addr. Since proxy.corp.com != targetIP, the condition is true and the proxy connection is incorrectly rewritten to the S3 IP instead. The same bug exists in getDialAddr in client_ktls.go.
The correct check should compare against the original hostname, not the resolved IP:
// extract once outside the closure
origHostname, _, _ := net.SplitHostPort(originalHost)
if origHostname == "" {
origHostname = originalHost
}
// inside DialContext:
if host == origHostname {
dialAddr = net.JoinHostPort(targetHost, port)
}This was specifically flagged in the follow-up round and the author claimed to fix it, but the condition is still inverted.
2. IPv6 address formatting bug — parseHostPairs (client.go:368-369)
resolved = ip.String() + ":" + port // BUG: produces "::1:8080" for IPv6ip.String() for IPv6 returns a bare address like ::1. String-concatenating :8080 produces ::1:8080, which is an invalid host:port (brackets are missing). Should be:
resolved = net.JoinHostPort(ip.String(), port)3. clientTransportDefault passes resolvedHost as originalHost — client_default.go:29
return newClientTransport(ctx, withResolveHost(resolvedHost, resolvedHost, dialer, false))
// ^--- should be originalHostThe function only receives resolvedHost, so the original hostname is unavailable here. The second argument should be the original hostname. The function signature needs an update:
func clientTransportDefault(ctx *cli.Context, localIP, resolvedHost, originalHost string) http.RoundTripper {with the caller in clientTransportWithLocalIP updated accordingly.
Minor
- The SNI extraction snippet is duplicated verbatim in
client_tls.goandclient_ktls.go— a small helper would remove the repetition. net.LookupIPat startup pins IPs for the entire benchmark run. Worth a comment acknowledging this is intentional (no TTL-based refresh).
… threading - Fix inverted proxy bypass condition in withResolveHost and getDialAddr: rewrite only when addr matches the original hostname, not when it doesn't match the resolved IP (which incorrectly rewrote proxy addrs) - Fix IPv6 address formatting in parseHostPairs: use net.JoinHostPort instead of string concatenation to correctly bracket IPv6 addresses - Thread originalHost through clientTransportDefault so withResolveHost receives the hostname instead of the resolved IP as its second arg - Extract sniFromHost() helper to deduplicate SNI extraction from client_tls.go and client_ktls.go - Add comment acknowledging DNS resolution is intentionally pinned at startup with no TTL-based refresh
|
Hi @ramondeklein , thanks again for the thorough reviews and for sticking with this PR through multiple rounds — really appreciated. On the proxy bypass: apologies this slipped through twice. My reasoning was "if the address doesn't match our target IP, it must be a proxy — skip it", which felt logically sound but was actually backwards. The condition host != targetHost ended up rewriting everything that wasn't already the resolved IP — which is nearly every connection, including proxy ones. The correct framing is the inverse as you mentioned: "if the address is our original hostname, rewrite it; otherwise leave it alone." Fixed to host == origHostname, where origHostname is extracted from originalHost outside the closure. Same fix applied to getDialAddr in client_ktls.go. Other fixes in this commit: IPv6 formatting — replaced ip.String() + ":" + port with net.JoinHostPort() in parseHostPairs to correctly bracket IPv6 addresses Hope everything looks good this time. |
📝 WalkthroughWalkthroughThis PR refines the ChangesHost-based dial resolution with SNI correctness
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cli/client_default.go`:
- Around line 28-30: The branch that decides to return newClientTransport(...)
is gating on resolvedHost, but resolvedHost can be set even when --resolve-host
was not intended (parseHostPairs), causing the withLocalAddr(localIP) path to be
skipped; change the condition to check originalHost (or check both originalHost
and resolvedHost) so dial rewriting is only activated when the original host
mapping was provided by the user, and ensure newClientTransport is invoked with
withResolveHost(originalHost, resolvedHost, dialer, false) only when the
originalHost mapping is present; update the conditional around resolvedHost in
the code that constructs the client transport (the if that calls
newClientTransport and withResolveHost) to use originalHost (or a combined
check) so withLocalAddr(localIP) is not inadvertently bypassed.
In `@cli/client.go`:
- Around line 109-112: The current offset calculation uses i + off%len(pairs)
which applies the same constant shift to every entry; change it to compute a
rotated index using (i + off) % len(pairs) so each entry is shifted relative to
others (e.g., compute idx := (i+off)%len(pairs) and use idx when setting
lastFinished[i] = now.Add(time.Duration(idx))). Update the code that sets
lastFinished to use this rotated index (variables: off, lastFinished, pairs,
now).
In `@cli/flags.go`:
- Line 220: Update the sample config text in yml-samples/put.yml to match the
corrected flag help for --resolve-host: remove the stale "can break certs"
warning and the suggestion to use --insecure, and instead describe that
resolve-host forces hostname resolution to IP(s) (including multiple A/AAAA
records) without implying it breaks TLS; ensure the sample wording mirrors the
Usage string change so users see consistent behavior between the flag help
(Usage: "Resolve the host(s) ip(s) (including multiple A/AAAA records)") and the
sample config entry for resolve-host.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: c260c4cc-9308-4853-ba2a-ba242717c9c3
📒 Files selected for processing (6)
cli/client.gocli/client_default.gocli/client_ktls.gocli/client_tls.gocli/client_transport.gocli/flags.go
| if resolvedHost != "" { | ||
| return newClientTransport(ctx, withResolveHost(resolvedHost, originalHost, dialer, false)) | ||
| } |
There was a problem hiding this comment.
Line 28: Gate resolve-host transport setup on originalHost, not resolvedHost.
resolvedHost is populated even when --resolve-host is disabled (from parseHostPairs), so this branch can skip the withLocalAddr(localIP) path unintentionally. Use originalHost (or both fields) as the activation condition for dial rewriting.
Suggested fix
func clientTransportDefault(ctx *cli.Context, localIP, resolvedHost, originalHost string) http.RoundTripper {
dialer := makeDialer(localIP)
- if resolvedHost != "" {
+ if originalHost != "" && resolvedHost != "" {
return newClientTransport(ctx, withResolveHost(resolvedHost, originalHost, dialer, false))
}
return newClientTransport(ctx, withLocalAddr(localIP))
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cli/client_default.go` around lines 28 - 30, The branch that decides to
return newClientTransport(...) is gating on resolvedHost, but resolvedHost can
be set even when --resolve-host was not intended (parseHostPairs), causing the
withLocalAddr(localIP) path to be skipped; change the condition to check
originalHost (or check both originalHost and resolvedHost) so dial rewriting is
only activated when the original host mapping was provided by the user, and
ensure newClientTransport is invoked with withResolveHost(originalHost,
resolvedHost, dialer, false) only when the originalHost mapping is present;
update the conditional around resolvedHost in the code that constructs the
client transport (the if that calls newClientTransport and withResolveHost) to
use originalHost (or a combined check) so withLocalAddr(localIP) is not
inadvertently bypassed.
| off := rand.New(rand.NewSource(time.Now().UnixNano())).Intn(len(pairs)) | ||
| for i := range lastFinished { | ||
| lastFinished[i] = now.Add(time.Duration(i + off%len(hosts))) | ||
| lastFinished[i] = now.Add(time.Duration(i + off%len(pairs))) | ||
| } |
There was a problem hiding this comment.
Random start offset is not actually randomizing the initial host.
On Line 111, i + off%len(pairs) adds the same constant shift to all entries, so relative order is unchanged and index 0 still starts earliest.
Suggested fix
{
// Start with a random host
now := time.Now()
off := rand.New(rand.NewSource(time.Now().UnixNano())).Intn(len(pairs))
for i := range lastFinished {
- lastFinished[i] = now.Add(time.Duration(i + off%len(pairs)))
+ lastFinished[(i+off)%len(pairs)] = now.Add(time.Duration(i))
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| off := rand.New(rand.NewSource(time.Now().UnixNano())).Intn(len(pairs)) | |
| for i := range lastFinished { | |
| lastFinished[i] = now.Add(time.Duration(i + off%len(hosts))) | |
| lastFinished[i] = now.Add(time.Duration(i + off%len(pairs))) | |
| } | |
| off := rand.New(rand.NewSource(time.Now().UnixNano())).Intn(len(pairs)) | |
| for i := range lastFinished { | |
| lastFinished[(i+off)%len(pairs)] = now.Add(time.Duration(i)) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cli/client.go` around lines 109 - 112, The current offset calculation uses i
+ off%len(pairs) which applies the same constant shift to every entry; change it
to compute a rotated index using (i + off) % len(pairs) so each entry is shifted
relative to others (e.g., compute idx := (i+off)%len(pairs) and use idx when
setting lastFinished[i] = now.Add(time.Duration(idx))). Update the code that
sets lastFinished to use this rotated index (variables: off, lastFinished,
pairs, now).
| cli.BoolFlag{ | ||
| Name: "resolve-host", | ||
| Usage: "Resolve the host(s) ip(s) (including multiple A/AAAA records). This can break SSL certificates, use --insecure if so", | ||
| Usage: "Resolve the host(s) ip(s) (including multiple A/AAAA records)", |
There was a problem hiding this comment.
Sync --resolve-host wording with sample config docs.
Line 220 removes the stale SSL warning, but yml-samples/put.yml still tells users resolve-host can break certs and suggests --insecure. Please update that sample text to match this corrected behavior to avoid conflicting guidance.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cli/flags.go` at line 220, Update the sample config text in
yml-samples/put.yml to match the corrected flag help for --resolve-host: remove
the stale "can break certs" warning and the suggestion to use --insecure, and
instead describe that resolve-host forces hostname resolution to IP(s)
(including multiple A/AAAA records) without implying it breaks TLS; ensure the
sample wording mirrors the Usage string change so users see consistent behavior
between the flag help (Usage: "Resolve the host(s) ip(s) (including multiple
A/AAAA records)") and the sample config entry for resolve-host.
Summary
This PR fixes --resolve-host so it correctly preserves the original hostname for TLS SNI, S3 request signing, and the Host header when dialing resolved IPs. Previously, using --resolve-host caused TLS handshake failures and S3 signature mismatches because the resolved IP was used in place of the hostname throughout. The fix threads the original hostname alongside the resolved IP, using each for the right purpose.
The Problem: Connection Hotspots & SNI Mismatches
When benchmarking S3-compatible storage sitting behind multiple gateways, Warp can experience unequal traffic distribution. While Warp has a
--resolve-hostflag, using it often leads to two issues in modern architectures:Observed behavior (Before): In a test against 5 gateways, traffic was pinned to only 3, with a severe skew:
Resolution: Changes' Summary
cli/client.go — Introduces a hostPair struct pairing each resolved IP with its original hostname. parseHostPairs() produces one pair per IP; getClient() uses originalHost as the S3 endpoint (for correct request signing and virtual-host bucket routing) and the resolved IP only for dialing.
cli/client_transport.go — Adds a withResolveHost() transport option that rewrites dial addresses from the logical hostname to the resolved IP, with correct proxy bypass handling.
cli/client_tls.go and cli/client_ktls.go — SNI (ServerName) is now derived from originalHost (the hostname), not the resolved IP. Both standard TLS and Kernel TLS paths use withResolveHost() in resolve-host mode.
cli/client_default.go — Plain HTTP transport also uses withResolveHost() when a resolved host is provided.
cli/flags.go — Removed stale warning ("This can break SSL certificates, use --insecure if so") since TLS now works correctly without --insecure
Evidence: Test Results
Post-implementation tests demonstrate perfectly balanced connection distribution across all 5 gateways without any HTTP 400 or TLS handshake errors.
Observed behavior (After):
Performance Impact & Verification
Comparative analysis between the stock Warp binary and this PR demonstrates that deterministic IP rotation significantly improves performance by preventing gateway saturation.
Test Environment: 600 concurrency, 1KB PUT operations, 5-gateway backend.
Summary by CodeRabbit
Bug Fixes
--resolve-hostfunctionality to correctly preserve hostname information for S3 signing and TLS certificate validation while using resolved IP addresses for connections.Documentation
--resolve-hostflag help text to remove the note about potential SSL certificate issues, reflecting improved flag behavior.