Skip to content

fix: deduplicate CA certificates in ClientTrafficPolicy mTLS#8909

Open
yuehaii wants to merge 10 commits into
envoyproxy:mainfrom
yuehaii:dup-ca-refs
Open

fix: deduplicate CA certificates in ClientTrafficPolicy mTLS#8909
yuehaii wants to merge 10 commits into
envoyproxy:mainfrom
yuehaii:dup-ca-refs

Conversation

@yuehaii
Copy link
Copy Markdown

@yuehaii yuehaii commented May 4, 2026

What type of PR is this?
fix: deduplicate CA certificates in ClientTrafficPolicy mTLS

What this PR does / why we need it:
when a single secret's ca.crt field contains a PEM bundle with duplicate certificate blocks, the translator concatenates all bytes without checking for duplicates.
the resulting xDS TLSCertificate resource carries a CA bundle with repeated CERTIFICATE PEM blocks. boring SSL's X509_STORE rejects duplicate X509_STORE_add calls for the same certificate and responds with a NACK. envoy discards the entire update and continues using the previous configuration.

Which issue(s) this PR fixes:
Fixes #8847

Test:

go test ./internal/gatewayapi -count=1 -run "TestDeduplicatePEMCerts|TestBuildListenerTLSParametersDedupCACerts|TestTranslate/clienttrafficpolicy-mtls-dup-ca-refs" -v
=== RUN TestDeduplicatePEMCerts
=== RUN TestDeduplicatePEMCerts/single_cert_unchanged
=== RUN TestDeduplicatePEMCerts/two_distinct_certs_unchanged
=== RUN TestDeduplicatePEMCerts/duplicate_cert_within_single_bundle_is_deduplicated
=== RUN TestDeduplicatePEMCerts/first_occurrence_kept_when_cert_appears_twice_in_multi-cert_bundle
=== RUN TestDeduplicatePEMCerts/three_copies_collapsed_to_one
=== RUN TestDeduplicatePEMCerts/empty_input_returns_empty_output
--- PASS: TestDeduplicatePEMCerts (0.00s)
--- PASS: TestDeduplicatePEMCerts/single_cert_unchanged (0.00s)
--- PASS: TestDeduplicatePEMCerts/two_distinct_certs_unchanged (0.00s)
--- PASS: TestDeduplicatePEMCerts/duplicate_cert_within_single_bundle_is_deduplicated (0.00s)
--- PASS: TestDeduplicatePEMCerts/first_occurrence_kept_when_cert_appears_twice_in_multi-cert_bundle (0.00s)
--- PASS: TestDeduplicatePEMCerts/three_copies_collapsed_to_one (0.00s)
--- PASS: TestDeduplicatePEMCerts/empty_input_returns_empty_output (0.00s)
=== RUN TestBuildListenerTLSParametersDedupCACerts
--- PASS: TestBuildListenerTLSParametersDedupCACerts (0.00s)
=== RUN TestTranslate
=== RUN TestTranslate/clienttrafficpolicy-mtls-dup-ca-refs
1.77794380634598e+09 info gatewayapi/route.go:355 setting 500 direct response in routes due to errors in processing destinations {"routes": ["httproute/envoy-gateway/httproute-1/rule/0/match/-1"], "error": "service envoy-gateway/service-1 not found"}
--- PASS: TestTranslate (0.02s)
--- PASS: TestTranslate/clienttrafficpolicy-mtls-dup-ca-refs (0.01s)
=== RUN TestTranslateWithExtensionKinds
--- PASS: TestTranslateWithExtensionKinds (0.00s)
PASS
ok github.com/envoyproxy/gateway/internal/gatewayapi 1.708s

Release Notes: Yes

Signed-off-by: hai.yue <20416005+yuehaii@users.noreply.github.com>
@yuehaii yuehaii requested a review from a team as a code owner May 4, 2026 10:52
@netlify
Copy link
Copy Markdown

netlify Bot commented May 4, 2026

Deploy Preview for cerulean-figolla-1f9435 ready!

Name Link
🔨 Latest commit ec96dcf
🔍 Latest deploy log https://app.netlify.com/projects/cerulean-figolla-1f9435/deploys/6a156c35a422420008dfff9e
😎 Deploy Preview https://deploy-preview-8909--cerulean-figolla-1f9435.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Signed-off-by: hai.yue <20416005+yuehaii@users.noreply.github.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 4, 2026

Codecov Report

❌ Patch coverage is 83.33333% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.76%. Comparing base (e3ab54c) to head (5203e9a).

Files with missing lines Patch % Lines
internal/gatewayapi/tls.go 82.60% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8909      +/-   ##
==========================================
+ Coverage   74.75%   74.76%   +0.01%     
==========================================
  Files         252      252              
  Lines       40567    40590      +23     
==========================================
+ Hits        30326    30349      +23     
+ Misses       8169     8167       -2     
- Partials     2072     2074       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

yuehaii added 2 commits May 5, 2026 09:18
Signed-off-by: hai.yue <20416005+yuehaii@users.noreply.github.com>
Signed-off-by: Hai <20416005+yuehaii@users.noreply.github.com>
@yuehaii
Copy link
Copy Markdown
Author

yuehaii commented May 8, 2026

For the CI checking of two conformance test and three e2e test failures. Those failures should have no relation with the fixing code. The fixing code just de-duplicate the cert.

error fetching Gateway: context deadline exceeded
Test: TestMultipleGC/Internet_GC_Test/PolicyStatusAggregatesAcrossGatewayClasses/backendtrafficpolicy_status_aggregates_across_gateway_classes
Messages: error waiting for Gateway to have at least one IP address in status

Error: Received unexpected error:
error fetching GatewayClass: Get " https://127.0.0.1:41743/apis/gateway.networking.k8s.io/v1/gatewayclasses/envoy-gateway": context deadline exceeded
Test: TestGatewayAPIConformance
Messages: error waiting for envoy-gateway GatewayClass to have Accepted condition to be set: error fetching GatewayClass: Get "
https://127.0.0.1:41743/apis/gateway.networking.k8s.io/v1/gatewayclasses/envoy-gateway": context deadline exceeded
--- FAIL: TestGatewayAPIConformance (180.13s)

--- FAIL: TestE2E/MultiHeaderBasedConsistentHashLoadBalancing/combo1 (6.17s)
--- FAIL: TestE2E/MultiHeaderBasedConsistentHashLoadBalancing/combo2 (6.16s)
--- FAIL: TestE2E/MultiHeaderBasedConsistentHashLoadBalancing/combo3 (6.17s)

The cause of those failure should be environmental related, not code caused issue

  • context deadline exceeded on Kubernetes API calls, it indicates the test cluster is unreachable or overloaded
  • Gateway not getting an IP, indicates infrastructure provisioning issue in the test environment
  • Consistent hash LB failures, most likely a flaky test or cluster state issue

@guydc
Copy link
Copy Markdown
Contributor

guydc commented May 11, 2026

/retest

@yuehaii
Copy link
Copy Markdown
Author

yuehaii commented May 12, 2026

  • Build and Test / ci-checks (pull_request)

@guydc, thanks for retesting. I checked the ci failure. There is only one TestE2E/RateLimitCIDRInvertMatchAlwaysEnforce/rate_limit_all_IPs_except_CIDR case failure this time.

=== RUN TestE2E/RateLimitCIDRInvertMatchAlwaysEnforce/rate_limit_all_IPs_except_CIDR
helpers.go:812: 2026-05-11T14:35:14.21268824Z: Conditions matched expectations
helpers.go:812: 2026-05-11T14:35:14.212739255Z: Route gateway-conformance-infra/cidr-invert-ratelimit Parents matched expectations
ratelimit.go:187: 2026-05-11T14:35:14.216066424Z: Making GET request to host via http://172.18.0.203/
ratelimit.go:198: 2026-05-11T14:35:14.216110886Z: Making GET request to host via http://172.18.0.203/
ratelimit.go:1688: 2026-05-11T14:35:14.216157042Z: Making GET request to host via http://172.18.0.203/
ratelimit.go:206: failed to get expected response for the first two requests: expected X-Ratelimit-Limit header to be set, actual headers: map[Content-Length:[580] Content-Type:[application/json] Date:[Mon, 11 May 2026 14:35:14 GMT] X-Content-Type-Options:[nosniff] content-length:[580] content-type:[application/json] date:[Mon, 11 May 2026 14:35:14 GMT] x-content-type-options:[nosniff]]
ratelimit.go:209: failed to get expected response for the last (third) request: expected status code to be one of [429], got 200. CRes: &{200 580 HTTP/1.1 map[Content-Length:[580] Content-Type:[application/json] Date:[Mon, 11 May 2026 14:35:14 GMT] X-Content-Type-Options:[nosniff] X-Ratelimit-Limit:[2, 2;w=3600] X-Ratelimit-Remaining:[0] X-Ratelimit-Reset:[1486]] []}
ratelimit.go:225: failed to get expected response for the last (third) request: context deadline exceeded

--- FAIL: TestE2E/RateLimitCIDRInvertMatchAlwaysEnforce (60.27s)
--- FAIL: TestE2E/RateLimitCIDRInvertMatchAlwaysEnforce/rate_limit_all_IPs_except_CIDR (60.22s)

Below are the test failure related code. The e2e server didn't response caused failure, and has no relation with current fix. Maybe we can bypass this CI failure.

https://github.com/envoyproxy/gateway/blob/cbb4337459bf2adfd3c9e6c8bcffdd3ba4cdcffa/test/e2e/tests/ratelimit.go#L206C5-L206C84
t.Errorf("failed to get expected response for the first two requests: %v", err)

https://github.com/envoyproxy/gateway/blob/cbb4337459bf2adfd3c9e6c8bcffdd3ba4cdcffa/test/e2e/tests/ratelimit.go#L209C5-L209C86
t.Errorf("failed to get expected response for the last (third) request: %v", err)

}
irCACert.Certificate = append(irCACert.Certificate, validCaCertBytes...)
}
irCACert.Certificate = deduplicatePEMCerts(irCACert.Certificate)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • can some info be added to the PR why we should dedup instead of reject
  • also if we choose to dedup, can we avoid a proactive append instead ?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

openssl works well with same (CA) cert occurring more than once. I suppose envoy gw should also support such scenario for tolerance.
https://security.stackexchange.com/questions/62055/how-to-delete-duplicates-in-ca-bundle-certificate-file

Its a good suggestion. I will do de-dup during append process

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @arkodg , good day. may I know whether you have more questions?

yuehaii and others added 2 commits May 18, 2026 21:31
Signed-off-by: hai.yue <20416005+yuehaii@users.noreply.github.com>
@zirain
Copy link
Copy Markdown
Member

zirain commented May 20, 2026

@codex

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Chef's kiss.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

yuehaii added 2 commits May 21, 2026 08:43
Signed-off-by: Hai <20416005+yuehaii@users.noreply.github.com>
zirain
zirain previously approved these changes May 21, 2026
tls:
alpnProtocols: null
caCertificate:
certificate: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUR3VENDQXFtZ0F3SUJBZ0lVWTNmMFI2SExoVGNJU2FIaURhUkF0d1dHMDJrd0RRWUpLb1pJaHZjTkFRRUwKQlFBd2J6RUxNQWtHQTFVRUJoTUNWVk14Q3pBSkJnTlZCQWdNQWxaQk1SRXdEd1lEVlFRSERBaFRiMjFsUTJsMAplVEVUTUJFR0ExVUVDZ3dLUlc1MmIzbFFjbTk0ZVRFUU1BNEdBMVVFQ3d3SFIyRjBaWGRoZVRFWk1CY0dBMVVFCkF3d1FiWFJzY3k1bGVHRnRjR3hsTG1OdmJUQWdGdzB5TlRFeU1UTXhOREl3TXpGYUdBOHlNVEkxTVRFeE9URTAKTWpBek1Wb3diekVMTUFrR0ExVUVCaE1DVlZNeEN6QUpCZ05WQkFnTUFsWkJNUkV3RHdZRFZRUUhEQWhUYjIxbApRMmwwZVRFVE1CRUdBMVVFQ2d3S1JXNTJiM2xRY205NGVURVFNQTRHQTFVRUN3d0hSMkYwWlhkaGVURVpNQmNHCkExVUVBd3dRYlhSc2N5NWxlR0Z0Y0d4bExtTnZiVENDQVNJd0RRWUpLb1pJaHZjTkFRRUJCUUFEZ2dFUEFEQ0MKQVFvQ2dnRUJBT3Jxc21LS20xd1NZNlF2dm9vQkY1dGZVanByYWNwb3lERTVJVGVxMVcyeHVMRHBzWG9CTTZJVApzSG5LSWM3a3gzcEZmTk5YZG5TVE1PYU5CNEoycEIvUG5EdjhMK3FWN2dPYS91bGdjMnpJUVpCaVFtWEt0WTNFClNHL0ZQZmM1YmZ0bC9KcktHaExkaTdSNm40SkxIbGw0OVgyZ0xZc2hvM3ZNTXprQXJRUTVpaVVCRDlnSE1GVU4KV1loOTRONnQyQUxxY3A4cTZQZ2Y5R0Y5Z2Y2OGMzSWtnNlBQL3FTR2dQTTRMWHZQYlgyM0ZqUjBPdzcyTE8ybwphbDdTMVNwWFM5MGJyV2VXWThwMFBocVd1QnBSeWl4T1JFcWltY1BndXdGTisyYi9KZVJpeFlldTJjZisrYXU5ClhVWkFyYmc2UDJsNDNjSC9EZUNCbWdtTU9lVjExY0VDQXdFQUFhTlRNRkV3SFFZRFZSME9CQllFRksvTTVyK0gKMVNQdFhlOHBzd2srcmh3T1JFUW5NQjhHQTFVZEl3UVlNQmFBRksvTTVyK0gxU1B0WGU4cHN3aytyaHdPUkVRbgpNQThHQTFVZEV3RUIvd1FGTUFNQkFmOHdEUVlKS29aSWh2Y05BUUVMQlFBRGdnRUJBRnh5NmNpNVl3eVhndDcrCisxdjF0LzVsSS9vTytGNVZROWxCY29HK3UzRDduTGFtdkpEdFl0TkdxbFFZUVEwYXpXeDFqaEpjTzNQMnhGcUIKdTNJUU5GQ3VrdlZNVDFnd05UUDQwMFRQcmptOWFJblVYcE0wRUZtbXp0R0o4akJWdkVmc0kxK0ZUWnR4ckdzSQpBeGZpM1JzYjJISWZCVStzOEFpRTdOb2lVSFlJWi9pL21wNGswczlZVzNBQk9sT2duY0w1eEsyeWdhMGJGeW1HCjNmYVBrSEVURHVpaUVoMnR6cThuUThETWl1TVZwUEtzeEZJOWtVQ1FhcVZZZ2lKVjVrVjVZZGRyVHgyMC9wVWwKQ1RNVS9nd1pSUzVBMlJ1ZHQvWDZKMXNhRUFROXhHcTVNQ25wczFnazVXV09sRTJhZXZQYkRsTmV6RkVRcjJabAp4OXJKOTR3PQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCi0tLS0tQkVHSU4gQ0VSVElGSUNBVEUtLS0tLQpNSUlEd1RDQ0FxbWdBd0lCQWdJVVkzZjBSNkhMaFRjSVNhSGlEYVJBdHdXRzAya3dEUVlKS29aSWh2Y05BUUVMCkJRQXdiekVMTUFrR0ExVUVCaE1DVlZNeEN6QUpCZ05WQkFnTUFsWkJNUkV3RHdZRFZRUUhEQWhUYjIxbFEybDAKZVRFVE1CRUdBMVVFQ2d3S1JXNTJiM2xRY205NGVURVFNQTRHQTFVRUN3d0hSMkYwWlhkaGVURVpNQmNHQTFVRQpBd3dRYlhSc2N5NWxlR0Z0Y0d4bExtTnZiVEFnRncweU5URXlNVE14TkRJd016RmFHQTh5TVRJMU1URXhPVEUwCk1qQXpNVm93YnpFTE1Ba0dBMVVFQmhNQ1ZWTXhDekFKQmdOVkJBZ01BbFpCTVJFd0R3WURWUVFIREFoVGIyMWwKUTJsMGVURVRNQkVHQTFVRUNnd0tSVzUyYjNsUWNtOTRlVEVRTUE0R0ExVUVDd3dIUjJGMFpYZGhlVEVaTUJjRwpBMVVFQXd3UWJYUnNjeTVsZUdGdGNHeGxMbU52YlRDQ0FTSXdEUVlKS29aSWh2Y05BUUVCQlFBRGdnRVBBRENDCkFRb0NnZ0VCQU9ycXNtS0ttMXdTWTZRdnZvb0JGNXRmVWpwcmFjcG95REU1SVRlcTFXMnh1TERwc1hvQk02SVQKc0huS0ljN2t4M3BGZk5OWGRuU1RNT2FOQjRKMnBCL1BuRHY4TCtxVjdnT2EvdWxnYzJ6SVFaQmlRbVhLdFkzRQpTRy9GUGZjNWJmdGwvSnJLR2hMZGk3UjZuNEpMSGxsNDlYMmdMWXNobzN2TU16a0FyUVE1aWlVQkQ5Z0hNRlVOCldZaDk0TjZ0MkFMcWNwOHE2UGdmOUdGOWdmNjhjM0lrZzZQUC9xU0dnUE00TFh2UGJYMjNGalIwT3c3MkxPMm8KYWw3UzFTcFhTOTBicldlV1k4cDBQaHFXdUJwUnlpeE9SRXFpbWNQZ3V3Rk4rMmIvSmVSaXhZZXUyY2YrK2F1OQpYVVpBcmJnNlAybDQzY0gvRGVDQm1nbU1PZVYxMWNFQ0F3RUFBYU5UTUZFd0hRWURWUjBPQkJZRUZLL001citICjFTUHRYZThwc3drK3Jod09SRVFuTUI4R0ExVWRJd1FZTUJhQUZLL001citIMVNQdFhlOHBzd2srcmh3T1JFUW4KTUE4R0ExVWRFd0VCL3dRRk1BTUJBZjh3RFFZSktvWklodmNOQVFFTEJRQURnZ0VCQUZ4eTZjaTVZd3lYZ3Q3KworMXYxdC81bEkvb08rRjVWUTlsQmNvRyt1M0Q3bkxhbXZKRHRZdE5HcWxRWVFRMGF6V3gxamhKY08zUDJ4RnFCCnUzSVFORkN1a3ZWTVQxZ3dOVFA0MDBUUHJqbTlhSW5VWHBNMEVGbW16dEdKOGpCVnZFZnNJMStGVFp0eHJHc0kKQXhmaTNSc2IySElmQlUrczhBaUU3Tm9pVUhZSVovaS9tcDRrMHM5WVczQUJPbE9nbmNMNXhLMnlnYTBiRnltRwozZmFQa0hFVER1aWlFaDJ0enE4blE4RE1pdU1WcFBLc3hGSTlrVUNRYXFWWWdpSlY1a1Y1WWRkclR4MjAvcFVsCkNUTVUvZ3daUlM1QTJSdWR0L1g2SjFzYUVBUTl4R3E1TUNucHMxZ2s1V1dPbEUyYWV2UGJEbE5lekZFUXIyWmwKeDlySjk0dz0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo=
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that this is a side-effect from your change where an existing test used the same cert twice. Can you fix the existing test input please to use different certs, so that the test continues to check the same thing essentially?

Copy link
Copy Markdown
Author

@yuehaii yuehaii May 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the input, @guydc . I have updated the test with different certs.

Signed-off-by: hai.yue <20416005+yuehaii@users.noreply.github.com>
Signed-off-by: Hai <20416005+yuehaii@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Duplicate CAs in caCertificateRefs silently break the trust store

4 participants