feat: Add S3 Tables credential vending support for federated catalogs#4052
feat: Add S3 Tables credential vending support for federated catalogs#4052aritragster wants to merge 4 commits into
Conversation
dimas-b
left a comment
There was a problem hiding this comment.
Thanks for your contribution, @aritragster !
The idea in this PR is very interesting, but I guess it might be best to have a broach discussion about it on the dev ML.
Also, I'm not sure vending storage credentials for federated catalogs is a well-understood use case, which is another point for having a dev discussion.
| // Capture tableId from remote catalog response for S3 Tables ARN construction | ||
| Optional<String> capturedTableId = capturedConfigHolder().getTableId(); | ||
| List<String> resourceArns = List.of(); | ||
| if (capturedTableId.isPresent()) { | ||
| resourceArns = List.of(constructS3TablesArn(capturedTableId.get())); |
There was a problem hiding this comment.
This is a smart idea 😉 However, I'm not sure it is viable from the internal APIs POV. This approach appears to circumvent the usual way of passing locations to the storage integration code and instead introduces a new optional way of passing similar information via a different channel (CredentialVendingContext).
I believe it would be beneficial to holistically refactor all storage integrations call paths so that location data is handled uniformly.
Note: such a refactoring will probably overlap with #3699.
There was a problem hiding this comment.
hey @dimas-b thanks for the initial review on this.
The idea was to get some early feedback if this makes sense, or we should think of a different approach. I also started the discussion on the ML: https://lists.apache.org/thread/sk84l9wdgpkyvk50v0d3wclh7f3575xn
There was a problem hiding this comment.
I took a look at #3699 and the direction there looks like a natural fit for what S3 Tables needs. The key difference with S3 Tables is that the resource identifier is an ARN (not an s3:// path) and the IAM actions use the s3tables: namespace, but the overall credential vending flow is the same. If #3699's refactored abstractions can accommodate ARN-based resources alongside path-based ones, S3 Tables support should fit in cleanly.
Happy to rebase this PR on top of #3699 once it lands, or collaborate with @tokoko to make sure the refactoring accounts for the S3 Tables case. In the meantime I'll keep this draft up as a reference for the S3 Tables-specific logic (policy generation, signingName detection, tableId capture).
Let me know what you'd prefer — happy to help either way.
There was a problem hiding this comment.
@aritragster : just to sync up: I tend to prefer completing #3699 first... I hope it does not cause too much delay on this PR. Please let us know if this is timeline is problematic.
#3699 may not provide all the features needed by this PR, but at least I hope we can perform the main refactoring in #3699 and later add remaining features for S3 tables in this PR.
|
Hi @aritragster , thanks for working on it. The PR is in draft now. Is it ready for review? |
|
Thanks for the draft here. I’m aligned with the direction, and I think this is already useful in showing that S3 Tables can fit the federated catalog path at the protocol level. My bias would be to treat this as a good proof point, and then use the next iteration to tighten both the shape and the vending semantics a bit more. In particular, I wonder if the cleaner direction here would be:
On the credential vending side, there is also a concrete next-step item that feel worth making explicit:
Also worth aligning with Tornike's PR #3699 on the unified call path for storage locations, seems like natural synergy. Looking forward to seeing this evolve. |
40e568e to
8934c19
Compare
singhpk234
left a comment
There was a problem hiding this comment.
Thank for the update @aritragster !
i left first round of comments
| private IamPolicy s3TablesPolicyString(CredentialVendingContext context, boolean canWrite) { | ||
| IamStatement.Builder statement = | ||
| IamStatement.builder() | ||
| .effect(IamEffect.ALLOW) | ||
| .addAction("s3tables:GetTableData") | ||
| .addAction("s3tables:GetTableMetadataLocation"); | ||
|
|
||
| if (canWrite) { | ||
| statement | ||
| .addAction("s3tables:UpdateTableMetadataLocation") | ||
| .addAction("s3tables:PutTableData"); | ||
| } | ||
|
|
||
| List<String> arns = context.resourceArns().orElse(List.of()); | ||
| for (String arn : arns) { | ||
| statement.addResource(IamResource.create(arn)); | ||
| } | ||
|
|
||
| return IamPolicy.builder().addStatement(statement.build()).build(); |
There was a problem hiding this comment.
I think having a dedidcated storage integration would be really helpful than writing if else for s3 tables
| @JsonSerialize(as = ImmutableAwsS3TablesStorageConfigurationInfo.class) | ||
| @JsonDeserialize(as = ImmutableAwsS3TablesStorageConfigurationInfo.class) | ||
| @JsonTypeName("AwsS3TablesStorageConfigurationInfo") | ||
| public abstract class AwsS3TablesStorageConfigurationInfo extends PolarisStorageConfigurationInfo { |
There was a problem hiding this comment.
why not extend AWSS3StorageIntegrations ?
There was a problem hiding this comment.
This was Intentional choice. S3 Tables doesn't use most of the S3-specific fields (endpoint, pathStyleAccess, stsUnavailable, userARN, endpointInternal). Extending AwsStorageConfigurationInfo would inherit all of those as nullable fields that are never set, which made me think about this. The only shared fields are roleARN, region, externalId, and KMS — and those are explicitly declared on the S3 Tables config class. Happy to reconsider if you feel strongly about it.
| /** | ||
| * A delegating wrapper around the Iceberg {@link RESTClient} that intercepts responses to extract | ||
| * the {@code config} section from loadTable responses. When a {@link LoadTableResponse} is | ||
| * received, the config map (containing {@code tableId} for S3 Tables) is captured and stored in the | ||
| * request-scoped {@link CapturedConfigHolder}. | ||
| */ | ||
| public class ConfigCapturingHTTPClient implements RESTClient { |
There was a problem hiding this comment.
rather than overriding the rest client i wonder if we can inject the @RequestScopedBean we introduced i.e CapturedConfigHolder where we are doing actual loadTable ? i am assuming as part of federaton we would be calling irc's loadTable anyways ?
| boolean isS3Tables = | ||
| catalogEntity.getStorageConfigurationInfo() != null | ||
| && catalogEntity.getStorageConfigurationInfo().getStorageType() | ||
| == PolarisStorageConfigurationInfo.StorageType.S3_TABLES; |
There was a problem hiding this comment.
lets move to a private function below, also lets writes an assumption, i would say that call to federated catalog has succeeded and now we want to .....
I don't fully understand the contextConfigholder requirement though let me think more about that as well meanwhile.
There was a problem hiding this comment.
Done on the private function - extracted to isS3TablesCatalog() with a comment documenting the assumption that the federated catalog loadTable call has already succeeded.
On the CapturedConfigHolder: the problem it solves is that the S3 Tables tableId only exists in the config section of the remote loadTable REST response. When Polaris calls baseCatalog.loadTable(), the Iceberg RESTCatalog internally deserializes the HTTP response, consumes the config map (for token refresh, endpoint config, etc.), and returns a BaseTable object. The BaseTable/TableMetadata doesn't expose the config section - by the time our handler sees the table, the tableId is gone.
The CapturedConfigHolder is a request-scoped bean that bridges this gap. The ConfigCapturingHTTPClient wraps the RESTClient at the HTTP layer (the only point where the config is visible), extracts the tableId, and stashes it in the holder. After baseCatalog.loadTable() returns, the handler reads the tableId from the holder to construct the table ARN.
Let me know what you think of this.
|
|
||
| FILE("org.apache.iceberg.hadoop.HadoopFileIO", false), | ||
|
|
||
| S3_TABLES("org.apache.iceberg.aws.s3.S3FileIO", true), |
There was a problem hiding this comment.
I think the issue here is not the dedicated S3 Tables path itself. That part feels directionally right. The issue is that StorageTypeFileIO no longer gives you a stable discriminator once both S3 and S3_TABLES map to the same S3FileIO, but the reverse lookup still assumes FileIO impl uniquely determines storage type. Would it make sense to switch this validation path to a discriminator that remains explicit, instead of depending on shared FileIO impl here?
There was a problem hiding this comment.
Fixed. Set validateAllowedStorageType=false on S3_TABLES so it's excluded from the FileIO reverse lookup map. S3FileIO still resolves to S3. S3_TABLES validation happens via storage type, not FileIO impl.
| tableLocations = Set.of(tableArn); | ||
| LOGGER | ||
| .atDebug() | ||
| .addKeyValue("tableIdentifier", tableIdentifier) |
There was a problem hiding this comment.
This is the main remaining gap for me. If we already know the catalog is S3_TABLES, I wonder if this should just fail closed once tableId/table ARN cannot be derived. Otherwise the dedicated path is better, but the unresolved-scope/create-time story still feels incomplete to me, because the downstream S3 Tables policy path is still being reached without validated table-ARN-shaped input.
There was a problem hiding this comment.
Agreed. I changed it from warn-and-continue to BadRequestException. If the catalog is S3_TABLES and no tableId was captured, we now refuse to vend credentials rather than proceeding with unscoped permissions.
|
I do think this is directionally better now. Pulling S3 Tables into a more dedicated path makes sense to me, and I think this is in a better place than the earlier version where the logic was drifting further into the generic S3 vending path. That said, I still think there are two issues that block this for me:
So overall, I'm aligned with the structural direction here, but I still don't think the contract is explicit enough yet for merge. |
| break; | ||
| case S3_TABLES: | ||
| if (defaultBaseLocation == null | ||
| || !defaultBaseLocation.startsWith("arn:aws:s3tables")) { |
There was a problem hiding this comment.
arn:aws:s3tables may be move this to the constants ?
| // No-op stub for S3 Tables — real validation deferred to a follow-up PR. | ||
| return Map.of(); |
There was a problem hiding this comment.
can you please add an issue and reference the issue link here ?
follow-up PR is not a meaningful name
happy to assign it to you since you have already something working
| AZURE(List.of("abfs://", "wasb://", "abfss://", "wasbs://")), | ||
| GCS("gs://"), | ||
| FILE("file://"), | ||
| S3_TABLES("arn:aws:s3tables"), |
There was a problem hiding this comment.
you can move it before ?
| Set<PolarisStorageActions> actions, | ||
| Optional<String> refreshCredentialsEndpoint) { | ||
| Optional<String> refreshCredentialsEndpoint, | ||
| Optional<String> capturedTableId) { |
There was a problem hiding this comment.
do we need to pass this as Args ? since this a request scoped bean its applicable to this Request only right ?
| new ConfigCapturingHTTPClient( | ||
| HTTPClient.builder(config) | ||
| .uri(config.get(org.apache.iceberg.CatalogProperties.URI)) | ||
| .build(), | ||
| capturedConfigHolder)); |
There was a problem hiding this comment.
Might have to think more about this one : if there is a better way to capture the config part of loadTable response
There was a problem hiding this comment.
I thought about this one a bit. The HTTP interception is really the only way to get the tableId without either changing the Iceberg library or making an extra API call.
The problem is that the S3 Tables Iceberg REST endpoint returns tableId in the config section of the loadTable response, but Iceberg's RESTCatalog consumes that config internally (for token refresh, endpoint routing, etc.) and never exposes it on the BaseTable or TableMetadata. So by the time baseCatalog.loadTable() returns, the tableId is just gone.
The alternatives I considered:
A supplementary s3tables:GetTable API call, but that adds latency and requires extra IAM permissions on the Polaris server.
An upstream Iceberg change to expose response config, but that's a long lead time and not guaranteed to be accepted.
The wrapper itself is pretty thin (~40 lines of delegation) and only captures data, never modifies the response. Happy to revisit if you see a cleaner path though.
| /** Checks whether the resolved catalog entity is configured with S3_TABLES storage type. */ | ||
| private boolean isS3TablesCatalog(CatalogEntity catalogEntity) { | ||
| PolarisStorageConfigurationInfo storageConfig = catalogEntity.getStorageConfigurationInfo(); | ||
| return storageConfig != null | ||
| && storageConfig.getStorageType() == PolarisStorageConfigurationInfo.StorageType.S3_TABLES; | ||
| } | ||
|
|
||
| /** | ||
| * Constructs an S3 Tables ARN from the catalog's default-base-location and a tableId. The | ||
| * default-base-location for S3 Tables catalogs is the table bucket ARN (e.g., | ||
| * arn:aws:s3tables:us-east-1:123456789012:bucket/my-bucket). The resulting table ARN is | ||
| * bucket-arn/table/tableId. | ||
| */ | ||
| private String constructS3TablesArn(CatalogEntity catalogEntity, String tableId) { | ||
| String baseLocation = catalogEntity.getBaseLocation(); | ||
| return baseLocation + "/table/" + tableId; | ||
| } | ||
|
|
||
| /** | ||
| * Validates that a constructed S3 Tables ARN falls under one of the catalog's allowed locations. | ||
| * This prevents a malicious remote catalog from returning a tableId that would construct an ARN | ||
| * outside the catalog's authorized scope. | ||
| */ | ||
| private void validateS3TablesArn( | ||
| TableIdentifier tableIdentifier, String tableArn, CatalogEntity catalogEntity) { | ||
| PolarisStorageConfigurationInfo storageConfig = catalogEntity.getStorageConfigurationInfo(); | ||
| if (storageConfig == null) { | ||
| return; | ||
| } | ||
| List<String> allowedLocations = storageConfig.getAllowedLocations(); | ||
| boolean isAllowed = | ||
| allowedLocations.stream().anyMatch(allowed -> tableArn.startsWith(allowed + "/")); | ||
| if (!isAllowed) { | ||
| throw new ForbiddenException( | ||
| "Table '%s' has ARN '%s' which is outside the catalog's allowed locations: %s", | ||
| tableIdentifier, tableArn, allowedLocations); | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
can we capture all of these in a dedicate util class
| // For S3 Tables catalogs, replace s3:// table locations with the constructed table ARN. | ||
| // s3tables:* IAM actions require ARN resources, not s3:// paths. | ||
| // Assumption: the federated catalog loadTable call has already succeeded at this point, | ||
| // and the CapturedConfigHolder contains the tableId from the remote response. | ||
| CatalogEntity catalogEntity = CatalogEntity.of(getResolvedCatalogEntity()); | ||
| boolean isS3Tables = isS3TablesCatalog(catalogEntity); | ||
|
|
||
| if (isS3Tables && capturedTableId.isPresent()) { | ||
| String tableArn = constructS3TablesArn(catalogEntity, capturedTableId.get()); | ||
| validateS3TablesArn(tableIdentifier, tableArn, catalogEntity); | ||
| tableLocations = Set.of(tableArn); | ||
| LOGGER | ||
| .atDebug() | ||
| .addKeyValue("tableIdentifier", tableIdentifier) | ||
| .addKeyValue("tableArn", tableArn) | ||
| .log("Replaced table locations with S3 Tables ARN for credential vending"); | ||
| } else if (isS3Tables) { | ||
| // Fail closed: S3 Tables catalogs require a tableId to construct the table ARN | ||
| // for scoped credential vending. Without it, we cannot generate a properly scoped | ||
| // IAM session policy. | ||
| throw new BadRequestException( | ||
| "Cannot vend credentials for S3 Tables table '%s': " | ||
| + "no tableId was captured from the remote catalog response. " | ||
| + "Ensure the remote S3 Tables endpoint returns tableId in the loadTable config.", | ||
| tableIdentifier); | ||
| } |
There was a problem hiding this comment.
I am a bit confused, wouldn't this be completely inside, since this is what marks this catalog as a federated catalog ?
!(baseCatalog instanceof IcebergCatalog)
|
can you resolve the conflicts @aritragster |
singhpk234
left a comment
There was a problem hiding this comment.
This is looking close @aritragster !
can you please rebase this pr !
| AwsS3TablesStorageConfigurationInfo storageConfig = config(); | ||
| String region = storageConfig.getRegion(); | ||
| int durationSeconds = realmConfig.getConfig(STORAGE_CREDENTIAL_DURATION_SECONDS); | ||
|
|
||
| StorageAccessConfig.Builder accessConfig = StorageAccessConfig.builder(); | ||
|
|
||
| // Generate s3tables:* session policy | ||
| IamPolicy policy = | ||
| buildS3TablesPolicy(storageConfig, allowedReadLocations, allowedWriteLocations); | ||
|
|
||
| // Role session name | ||
| boolean includePrincipalName = | ||
| realmConfig.getConfig(FeatureConfiguration.INCLUDE_PRINCIPAL_NAME_IN_SUBSCOPED_CREDENTIAL); | ||
| String roleSessionName = | ||
| includePrincipalName | ||
| ? AwsRoleSessionNameSanitizer.sanitize("polaris-" + polarisPrincipal.getName()) | ||
| : "PolarisAwsS3TablesCredentialsStorageIntegration"; | ||
|
|
||
| AssumeRoleRequest.Builder request = | ||
| AssumeRoleRequest.builder() | ||
| .externalId(storageConfig.getExternalId()) | ||
| .roleArn(storageConfig.getRoleARN()) | ||
| .roleSessionName(roleSessionName) | ||
| .policy(policy.toJson()) | ||
| .durationSeconds(durationSeconds); | ||
|
|
||
| // Session tags support | ||
| List<String> sessionTagFieldNames = | ||
| realmConfig.getConfig(FeatureConfiguration.SESSION_TAGS_IN_SUBSCOPED_CREDENTIAL); | ||
| Set<SessionTagField> enabledSessionTagFields = | ||
| sessionTagFieldNames.stream() | ||
| .map(SessionTagField::fromConfigName) | ||
| .flatMap(Optional::stream) | ||
| .collect(Collectors.toCollection(() -> EnumSet.noneOf(SessionTagField.class))); | ||
|
|
||
| if (!enabledSessionTagFields.isEmpty()) { | ||
| List<Tag> sessionTags = | ||
| buildSessionTags( | ||
| polarisPrincipal.getName(), credentialVendingContext, enabledSessionTagFields); | ||
| if (!sessionTags.isEmpty()) { | ||
| request.tags(sessionTags); | ||
| request.transitiveTagKeys(sessionTags.stream().map(Tag::key).collect(Collectors.toList())); | ||
| } | ||
| } | ||
|
|
||
| credentialsProvider.ifPresent( | ||
| cp -> request.overrideConfiguration(b -> b.credentialsProvider(cp))); | ||
|
|
||
| @SuppressWarnings("resource") | ||
| StsClient stsClient = | ||
| stsClientProvider.stsClient(StsDestination.of(storageConfig.getStsEndpointUri(), region)); | ||
|
|
||
| AssumeRoleResponse response = stsClient.assumeRole(request.build()); | ||
| accessConfig.put(StorageAccessProperty.AWS_KEY_ID, response.credentials().accessKeyId()); | ||
| accessConfig.put( | ||
| StorageAccessProperty.AWS_SECRET_KEY, response.credentials().secretAccessKey()); | ||
| accessConfig.put(StorageAccessProperty.AWS_TOKEN, response.credentials().sessionToken()); | ||
| Optional.ofNullable(response.credentials().expiration()) | ||
| .ifPresent( | ||
| i -> { | ||
| accessConfig.put( | ||
| StorageAccessProperty.EXPIRATION_TIME, String.valueOf(i.toEpochMilli())); | ||
| accessConfig.put( | ||
| StorageAccessProperty.AWS_SESSION_TOKEN_EXPIRES_AT_MS, | ||
| String.valueOf(i.toEpochMilli())); | ||
| }); | ||
|
|
||
| if (region != null) { | ||
| accessConfig.put(StorageAccessProperty.CLIENT_REGION, region); | ||
| } | ||
|
|
||
| refreshCredentialsEndpoint.ifPresent( | ||
| endpoint -> | ||
| accessConfig.put(StorageAccessProperty.AWS_REFRESH_CREDENTIALS_ENDPOINT, endpoint)); | ||
|
|
||
| return accessConfig.build(); |
There was a problem hiding this comment.
This is mostly same as S3, i would say can we move this to AWSUtil or something ? we can pass the policy string & prinicpal name as args for this since these are the one which differ from s3
| public class CapturedConfigHolder { | ||
|
|
||
| /** Config key for the S3 Tables table identifier in the loadTable response config section. */ | ||
| public static final String TABLE_ID_CONFIG_KEY = "tableId"; |
There was a problem hiding this comment.
This is s3Table key, can you prefix it with that or you can have
Map<String, Map<String, String>> as well
| /** | ||
| * A delegating wrapper around the Iceberg {@link RESTClient} that intercepts responses to extract | ||
| * the {@code config} section from loadTable responses. When a {@link LoadTableResponse} is | ||
| * received, the config map (containing {@code tableId} for S3 Tables) is captured and stored in the |
There was a problem hiding this comment.
Lets make this generic ! just say intercepting the raw rest responses !
|
Coming back after the latest round. You addressed my 4-01 blockers and singhpk234 is on the finish line for local issues. Before this lands, three structural things. 1. My 4-01 point was that Before this PR, the The next S3-family storage type hits the same shape. The structural fix is a dedicated discriminator (the Two ways to land this, my preference is A:
The one outcome I would push back on is skipping A without filing the follow-up. 2. You laid this out on 4-27: Three issues compound here:
You already considered the upstream Iceberg fix on 4-27 and ruled it out as long-lead and not guaranteed to land. I would flip that call. Concrete ask:
Shipping it without the upstream link signs Polaris up for fork-shaped debt inside OSS. singhpk234's 4-25 comment on 3. Sequencing vs #3699. You offered on 3-25 to rebase on #3699 once it lands, or to collaborate with tokoko so the refactoring accounts for S3 Tables. +1 to that and to dimas-b 4-13. Let's hold to that sequencing rather than flip to merge-first. If this merges first, #3699 either carries S3 Tables semantics through the refactor or leaves the S3 Tables path as a legacy branch. Either way the "one coherent vending abstraction" outcome gets worse. If #3699's abstraction can express ARN-based resources alongside path-based ones, which you flagged as a fit on 3-25, S3 Tables drops in cleanly. Better landing than branching-in-S3 now and retrofitting later. Not trying to block. Iteration since the 3-24 draft has moved from |
- Add StorageType.S3_TABLES with 'arn:' prefix for ARN-based locations - Add AwsS3TablesStorageConfigurationInfo config class - Add AwsS3TablesCredentialsStorageIntegration generating s3tables:* IAM policies - Add AwsS3TablesStorageConfigInfo OpenAPI schema - Wire S3_TABLES into CatalogEntity, StorageIntegrationProvider, StorageTypeFileIO - Add CapturedConfigHolder and ConfigCapturingHTTPClient for tableId capture from federated loadTable responses (needed for table ARN construction) - Add ARN validation to prevent credential scoping outside allowed locations - 17 unit tests covering config, integration, holder, and HTTP client
- StorageTypeFileIO: set S3_TABLES validateAllowedStorageType=false to avoid collision with S3 in FileIO reverse lookup (flyingImer) - IcebergCatalogHandler: fail closed with BadRequestException when S3_TABLES catalog has no captured tableId (flyingImer) - PolarisStorageConfigurationInfo: narrowed S3_TABLES prefix from 'arn:' to 'arn:aws:s3tables' (singhpk234) - CatalogEntity: moved S3_TABLES block next to S3, updated base location validation prefix (singhpk234) - IcebergCatalogHandler: extracted isS3TablesCatalog() private method with assumption comment (singhpk234)
- Extract arn:aws:s3tables to S3_TABLES_ARN_PREFIX constant - Move S3_TABLES next to S3 in enum/switch ordering - Extract S3 Tables helpers to dedicated S3TablesUtil class - Remove capturedTableId method parameter (read from request-scoped bean) - Guard S3 Tables logic explicitly within federated catalog path - Add IAM action constants and TABLE_ID_CONFIG_KEY constant - Reference apache#4302 for validateAccessToLocations follow-up
- Extract shared STS assume-role logic to AwsStsUtil (used by both S3 and S3 Tables integrations, eliminating code duplication) - Rename TABLE_ID_CONFIG_KEY to S3TABLES_TABLE_ID_CONFIG_KEY - Make ConfigCapturingHTTPClient Javadoc generic, add TODO linking upstream Iceberg issue apache/iceberg#16399 - Add TODO on StorageTypeFileIO linking apache#4486 - Move S3_TABLES case before GCS in provider switch ordering - Fix duplicate CLIENT_REGION/refreshEndpoint in S3 integration (util handles it in STS path, else-block handles non-STS path) - Add @nullable annotations on AwsStsUtil parameters - Fix IcebergRESTFederatedCatalogFactory constructor name mismatch - Clean up dead code in IcebergCatalogHandler (inline ARN logic replaced by S3TablesUtil.resolveTableLocations) - Add AwsStsUtilTest (12 tests) and S3TablesUtilTest (9 tests)
659e915 to
c8817c5
Compare
|
@flyingImer @singhpk234 I've addressed all outstanding feedback, rebased on latest main, and filed the upstream issues. Here's the summary: Issues Filed
Review Feedback Addressed@singhpk234's feedback:
@flyingImer's feedback:
Additional cleanup:
New Tests
On Sequencing with #3699I've been tracking that PR. It currently has no formal approvals, an unresolved cache design discussion ( This PR is self-contained and additive (~1,200 lines including tests, primarily new files). When #3699 lands with its Merging this first doesn't add debt to the refactor path. It just means S3 Tables credential vending ships sooner. |
|
I'm fine merging this before #3699 ... please resolve conflicts for CI to run and enable the next review round 👍 |
Summary
When Polaris federates to an S3 Tables Iceberg REST endpoint, the credential vending flow must generate S3 Tables IAM policies (using
s3tables:actions and table-level ARNs) instead of standard S3 policies.This is a draft PR to socialize the approach with the community. Feedback on the design is welcome.
Related to #577
Problem
S3 Tables uses a different IAM action namespace (
s3tables:) and ARN format (arn:aws:s3tables:REGION:ACCOUNT:bucket/BUCKET/table/TABLE_ID) compared to standard S3. When Polaris federates to an S3 Tables Iceberg REST endpoint, the existing credential vending flow generates S3 policies that don't work for S3 Tables data access.Approach
signingNameproperty ("s3tables") from the remote endpoint's connection configuration, persisted on the catalog entitytableIdreturned inloadTableresponses from the remote endpoint using aConfigCapturingHTTPClientwrapper and request-scopedCapturedConfigHoldertableIds3tables:GetTableData,s3tables:GetTableMetadataLocations3tables:UpdateTableMetadataLocation,s3tables:PutTableDatasigningName: "s3tables"catalogs use ARN-baseddefault-base-location(nots3://paths)Files Changed (15 files, ~570 lines)
AwsCredentialsStorageIntegration— S3 Tables policy generation, branching onsigningNameCredentialVendingContext— addedresourceArnsandsigningNamefieldsAwsStorageConfigurationInfo— addedsigningNamegetterCatalogEntity—setSigningNameFromConnectionConfig()+ validationCapturedConfigHolder/ConfigCapturingHTTPClient— new classes to capturetableIdfrom remote responsesIcebergCatalogHandler— ARN construction from captured configStorageAccessConfigProvider— passresourceArnsthrough to credential vendingIcebergRESTExternalCatalogFactory— wire in config capturePolarisAdminService— persistsigningNameat catalog creationAI Disclosure
This implementation was developed with AI assistance (Kiro).
Checklist
CHANGELOG.md(if needed)site/content/in-dev/unreleased(if needed)