Skip to content

feat(kerberos): honour DNS SRV port and fail over across multiple KDCs#698

Draft
Richard Markiewicz (thenextman) wants to merge 1 commit into
masterfrom
feat/kdc-multi-failover
Draft

feat(kerberos): honour DNS SRV port and fail over across multiple KDCs#698
Richard Markiewicz (thenextman) wants to merge 1 commit into
masterfrom
feat/kdc-multi-failover

Conversation

@thenextman

Copy link
Copy Markdown
Member

ℹ️ 1. This doesn't come from any specific client issue or concern; it's more of a feature gap that I'd like to close before it becomes an issue and a personal itch to scratch.

ℹ️ 2. Primarily engineered with Claude and dev testing still to be completed.

So, at this point the PR is more to validate the correctness of the approach, underlying code, etc


Issues

  • DNS SRV port ignored on Windows. detect_kdc_hosts_from_dns_windows hardcoded :88 and read only the first SRV record
  • Multiple KDCs never used. Resolution collapsed to the first candidate (detect_kdc_host().first()), and the send path tried one KDC with no failover if it was down.
  • krb5.conf returned only the first kdc = line. Additional KDCs for a realm were silently dropped.
  • No KDC stickiness across an auth. Each Kerberos message (AS ×2, TGS, referral hops) re-resolved independently, so it could bounce between KDCs under round-robin DNS.

Changes

  1. Honour the SRV port everywhere. Windows now walks the DnsQuery_W results, filters to SRV records and uses the record's wPort.
  2. Resolve and fail over across multiple KDCs. New detect_kdc_urls() returns the full ordered candidate list (DNS SRV / Windows KdcNames / krb5.conf); send_for_realm tries each in order. detect_kdc_url() kept (first candidate) for backward compatibility.
  3. krb5.conf multi-KDC. Krb5Conf::get_all_values returns every kdc = entry; one-host-per-line.
  4. Per-realm KDC pinning with fallback. The KDC that answers for a realm is cached and reused for the rest of the exchange, re-resolved only if it later fails. Failover advances only on a transport failure; any KDC reply, including a KRB-ERROR, is returned as-is.

Decisions

  1. krb5.conf = one host per kdc = line with no whitespace splitting. I confirmed this matches the MIT documentation; although in the real world we saw at least one case of multiple KDCs on a single line I think it's better to follow the officially documented format here.
  2. Failover timeout keeps the existing 5s per-attempt (KDC_CONNECT_TIMEOUT).
  3. Stickiness pins on first success and then falls back on failure. Not a hard pin and does not re-resolve every message (see below).
  4. &mut self and plain HashMap, not RefCell: the generator's futures are + Send, so a RefCell (!Sync) would break the bound. &mut self threaded through all callers without borrow conflicts.
  5. Referral routing unchanged. self.realm stays the home realm; the per-realm cache is keyed independently, so the home-vs-child KDC decision is preserved.

For pinning a KDC across requests, I was unsure what to do here. As I understand it, a message sequence is not tied to a specific KDC i.e. requests in a single exchange can round-robin between different KDCs. It seems an inherent part of the design. However I think we should avoid the lookup on every request if possible.

MIT does a staggered, parallel request (i.e. it tries all KDCs with a stagger and backoff). There's no pinning after successfully talking to a KDC, although the used KDC is passed back to the client in an out parameter so the client can pin if they desire to. This seems an optimal solution but also has significant complexity.

In Windows, as far as I can tell, this is handled in netlogon rather than the Kerberos client. The documentation says:

  • "The Netlogon service caches the discovered domain controller… Caching this information encourages the consistent use of the same domain controller."
  • It returns "the information… from the domain controller that responds first" (a responsiveness-based pick), then caches it.
  • Negative caching exists too (the shutdown-DC KB: a client can land in "negative cache mode").

So it seems Windows supports the spirit of "pinning" to a DC, but it's the OS doing that across the whole machine, not a per-auth context cache on the Kerberos side.

I consider what we have here a reasonable middle-ground; not a faithful copy of MIT or Microsoft but its own thing. There's some complexity overhead, if we wanted to simply drop the pinning and re-resolve on every request it simplifies the code somewhat.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances the Kerberos KDC discovery + transport path to (a) honor DNS SRV-provided ports, (b) return and iterate across multiple KDC candidates for failover, and (c) add per-realm KDC “stickiness” within an authentication exchange to reduce re-resolution churn and improve resilience.

Changes:

  • Add detect_kdc_urls() (multi-candidate) alongside existing single-candidate helpers, and thread multi-KDC selection into the send path.
  • Extend krb5.conf parsing to preserve multiple kdc = entries for a realm (in order).
  • Update DNS SRV resolution to retain SRV ports (including Windows DnsQuery_W), and introduce per-realm KDC caching during Kerberos exchanges.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/lib.rs Re-exports the new multi-candidate KDC resolver API.
src/krb.rs Adds Krb5Conf::get_all_values and tests to support multi-kdc realms.
src/kerberos/tests.rs Updates Kerberos struct construction for the new kdc_cache field.
src/kerberos/mod.rs Implements per-realm KDC pinning + failover logic and adds select_kdc_urls tests.
src/kdc.rs Adds detect_kdc_urls() and uses multi-kdc krb5.conf values on non-Windows.
src/dns.rs Preserves SRV ports and adds srv_records_to_kdc_urls + tests; Windows now walks all SRV records.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/krb.rs Outdated
Comment thread src/dns.rs
Comment on lines +408 to +412
fn srv_records_to_kdc_urls(scheme: &str, records: &[(String, u16)]) -> Vec<String> {
records
.iter()
.map(|(target, port)| format!("{scheme}://{}:{port}", target.trim_end_matches('.')))
.collect()

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documented as a known limitation, not for this PR.

Comment thread src/kerberos/mod.rs Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants