Skip to content

Azure: Require catalog-provided credentials for Key Vault client#16541

Open
wombatu-kun wants to merge 1 commit into
apache:mainfrom
wombatu-kun:issue/16465-keyvault-catalog-credentials
Open

Azure: Require catalog-provided credentials for Key Vault client#16541
wombatu-kun wants to merge 1 commit into
apache:mainfrom
wombatu-kun:issue/16465-keyvault-catalog-credentials

Conversation

@wombatu-kun
Copy link
Copy Markdown
Contributor

Closes #16465.

Summary

AzureKeyManagementClient built its Key Vault KeyClient from an arbitrary azure.keyvault.url and authenticated using ambient Azure credentials — when no adls.token-credential-provider was configured, the credential chain fell back to DefaultAzureCredentialBuilder().build() (managed identity / environment / Azure CLI). A malicious or mis-scoped catalog configuration could therefore redirect Key Vault authentication traffic to an attacker-controlled endpoint and exfiltrate the client's ambient bearer token, as reported in #16465.

Following the direction suggested on the issue, the Key Vault client now authenticates only with a credential supplied through configuration (a catalog-vended adls.token, or an explicitly configured adls.token-credential-provider) and no longer silently falls back to ambient DefaultAzureCredential. When no such credential is provided it fails fast with a ValidationException. With this change a malicious catalog can at most recover a credential it already issued, rather than the client's ambient identity.

What changed

AzureProperties gains keyVaultTokenCredential(), which returns the catalog-/operator-supplied credential and is empty when only ambient credentials would be available; the inline TokenCredential builder previously used for ADLS is extracted into a shared helper and reused. AzureKeyManagementClient now resolves its credential through this method and throws a ValidationException when none is configured. ADLS storage credential behavior is unchanged — this is a Key Vault–only change.

Note: the supplied token must be scoped for Key Vault (for the public cloud, https://vault.azure.net). adls.token is reused as the carrier rather than introducing a new property; a dedicated azure.keyvault.token could be added as a follow-up if preferred.

Tests

TestAzureProperties adds unit coverage for keyVaultTokenCredential(): resolution from adls.token, resolution from a custom token-credential provider, and — critically — that it is empty (no ambient fallback) when nothing is configured. The integration TestAzureKeyManagementClient now asserts both sides of the new contract against a live vault: the old URL-only configuration fails with ValidationException, and wrap/unwrap works when a Key Vault-scoped token is provided via adls.token.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added the AZURE label May 23, 2026
@nandorKollar
Copy link
Copy Markdown
Contributor

Can using DefaultAzureCredential cause similar problem with ADSL client? It seems to me, that when adls.token-credential-provider is not set, then we use default credentials chain there too, the pattern seems similar as with vault client.

@wombatu-kun
Copy link
Copy Markdown
Contributor Author

@nandorKollar good catch — yes, the same DefaultAzureCredential fallback exists on the ADLS path.

That ADLS exposure is actually called out in #16465 itself: the report notes the Key Vault case "is less exposed than the ADLS location issue because it is config-driven rather than metadata-driven, but it is still an endpoint-trust break." So the ADLS case is known as a separate concern and considered the more exposed of the two — there the target storage host comes from table-metadata locations rather than a single catalog config property.

I kept this PR scoped to Key Vault on purpose, because the ADLS mitigation can't be the same shape. For Key Vault, requiring a catalog-vended credential is reasonable since ambient Key Vault auth is a niche setup. For ADLS, authenticating to your own storage via a managed identity (DefaultAzureCredential) is a mainstream, documented production configuration, so simply removing the ambient fallback would be a significant breaking change. The ADLS case more likely needs a different mitigation (e.g. validating/allow-listing the storage host a bearer token is sent to), which deserves its own design discussion rather than being folded into this PR.

Since #16465 came in through the private Apache security list, I'd suggest the ADLS location case be triaged the same way rather than expanded here. But let's land this Key Vault PR first, and I'm happy to help drive the ADLS follow-up afterwards.

@nandorKollar
Copy link
Copy Markdown
Contributor

@nandorKollar good catch — yes, the same DefaultAzureCredential fallback exists on the ADLS path.

That ADLS exposure is actually called out in #16465 itself: the report notes the Key Vault case "is less exposed than the ADLS location issue because it is config-driven rather than metadata-driven, but it is still an endpoint-trust break." So the ADLS case is known as a separate concern and considered the more exposed of the two — there the target storage host comes from table-metadata locations rather than a single catalog config property.

I kept this PR scoped to Key Vault on purpose, because the ADLS mitigation can't be the same shape. For Key Vault, requiring a catalog-vended credential is reasonable since ambient Key Vault auth is a niche setup. For ADLS, authenticating to your own storage via a managed identity (DefaultAzureCredential) is a mainstream, documented production configuration, so simply removing the ambient fallback would be a significant breaking change. The ADLS case more likely needs a different mitigation (e.g. validating/allow-listing the storage host a bearer token is sent to), which deserves its own design discussion rather than being folded into this PR.

Since #16465 came in through the private Apache security list, I'd suggest the ADLS location case be triaged the same way rather than expanded here. But let's land this Key Vault PR first, and I'm happy to help drive the ADLS follow-up afterwards.

Great, thanks! I didn't notice that the #16465 mentioned ADLS too! It sounds more important, I don't think the Key Vault path is often used in Iceberg, unlike storage access path.

}

if (allProperties.containsKey(ADLS_TOKEN_CREDENTIAL_PROVIDER)) {
return Optional.of(AdlsTokenCredentialProviders.from(allProperties).credential());
Copy link
Copy Markdown
Contributor

@nandorKollar nandorKollar May 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: probably AdlsTokenCredentialProviders is not the best name here, as this is connecting to Key Vault now.

Would it make sense to handle the two separately? Is it a possible use case that someone would like to use for example managed identities to connect to the key vault, and at the same time use SAS tokens for the storage accounts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

The Key Vault client trusts any configured vault URL and reuses ambient Azure credentials

2 participants