feat: Add S3 Tables credential vending support for federated catalogs by aritragster · Pull Request #4052 · apache/polaris

aritragster · 2026-03-24T21:18:18Z

Summary

When Polaris federates to an S3 Tables Iceberg REST endpoint, the credential vending flow must generate S3 Tables IAM policies (using s3tables: actions and table-level ARNs) instead of standard S3 policies.

This is a draft PR to socialize the approach with the community. Feedback on the design is welcome.

Related to #577

Problem

S3 Tables uses a different IAM action namespace (s3tables:) and ARN format (arn:aws:s3tables:REGION:ACCOUNT:bucket/BUCKET/table/TABLE_ID) compared to standard S3. When Polaris federates to an S3 Tables Iceberg REST endpoint, the existing credential vending flow generates S3 policies that don't work for S3 Tables data access.

Approach

Detect S3 Tables catalogs via the signingName property ("s3tables") from the remote endpoint's connection configuration, persisted on the catalog entity
Capture the tableId returned in loadTable responses from the remote endpoint using a ConfigCapturingHTTPClient wrapper and request-scoped CapturedConfigHolder
Construct proper S3 Tables ARNs from the catalog's base location and captured tableId
Generate per-table scoped session policies with granular read/write actions:
- Read: s3tables:GetTableData, s3tables:GetTableMetadataLocation
- Write (additional): s3tables:UpdateTableMetadataLocation, s3tables:PutTableData
Validate at catalog creation time that signingName: "s3tables" catalogs use ARN-based default-base-location (not s3:// paths)

Files Changed (15 files, ~570 lines)

AwsCredentialsStorageIntegration — S3 Tables policy generation, branching on signingName
CredentialVendingContext — added resourceArns and signingName fields
AwsStorageConfigurationInfo — added signingName getter
CatalogEntity — setSigningNameFromConnectionConfig() + validation
CapturedConfigHolder / ConfigCapturingHTTPClient — new classes to capture tableId from remote responses
IcebergCatalogHandler — ARN construction from captured config
StorageAccessConfigProvider — pass resourceArns through to credential vending
IcebergRESTExternalCatalogFactory — wire in config capture
PolarisAdminService — persist signingName at catalog creation
Test: 3 new unit tests for S3 Tables policy generation (read-only, read-write, fallback)

AI Disclosure

This implementation was developed with AI assistance (Kiro).

Checklist

🛡️ Don't disclose security issues! (contact security@apache.org)
🔗 Clearly explained why the changes are needed, or linked related issues: Related to Amazon S3 Tables Integration #577
🧪 Added/updated tests with good coverage, or manually tested (and explained how)
💡 Added comments for complex logic
🧾 Updated CHANGELOG.md (if needed)
📚 Updated documentation in site/content/in-dev/unreleased (if needed)

dimas-b

Thanks for your contribution, @aritragster !

The idea in this PR is very interesting, but I guess it might be best to have a broach discussion about it on the dev ML.

Also, I'm not sure vending storage credentials for federated catalogs is a well-understood use case, which is another point for having a dev discussion.

dimas-b · 2026-03-24T22:41:38Z

+    // Capture tableId from remote catalog response for S3 Tables ARN construction
+    Optional<String> capturedTableId = capturedConfigHolder().getTableId();
+    List<String> resourceArns = List.of();
+    if (capturedTableId.isPresent()) {
+      resourceArns = List.of(constructS3TablesArn(capturedTableId.get()));


This is a smart idea 😉 However, I'm not sure it is viable from the internal APIs POV. This approach appears to circumvent the usual way of passing locations to the storage integration code and instead introduces a new optional way of passing similar information via a different channel (CredentialVendingContext).

I believe it would be beneficial to holistically refactor all storage integrations call paths so that location data is handled uniformly.

Note: such a refactoring will probably overlap with #3699.

hey @dimas-b thanks for the initial review on this.

The idea was to get some early feedback if this makes sense, or we should think of a different approach. I also started the discussion on the ML: https://lists.apache.org/thread/sk84l9wdgpkyvk50v0d3wclh7f3575xn

I took a look at #3699 and the direction there looks like a natural fit for what S3 Tables needs. The key difference with S3 Tables is that the resource identifier is an ARN (not an s3:// path) and the IAM actions use the s3tables: namespace, but the overall credential vending flow is the same. If #3699's refactored abstractions can accommodate ARN-based resources alongside path-based ones, S3 Tables support should fit in cleanly.

Happy to rebase this PR on top of #3699 once it lands, or collaborate with @tokoko to make sure the refactoring accounts for the S3 Tables case. In the meantime I'll keep this draft up as a reference for the S3 Tables-specific logic (policy generation, signingName detection, tableId capture).

Let me know what you'd prefer — happy to help either way.

@aritragster : just to sync up: I tend to prefer completing #3699 first... I hope it does not cause too much delay on this PR. Please let us know if this is timeline is problematic.

#3699 may not provide all the features needed by this PR, but at least I hope we can perform the main refactoring in #3699 and later add remaining features for S3 tables in this PR.

flyrain · 2026-03-27T21:43:08Z

Hi @aritragster , thanks for working on it. The PR is in draft now. Is it ready for review?

flyingImer · 2026-03-28T00:23:26Z

Thanks for the draft here. I’m aligned with the direction, and I think this is already useful in showing that S3 Tables can fit the federated catalog path at the protocol level.

My bias would be to treat this as a good proof point, and then use the next iteration to tighten both the shape and the vending semantics a bit more.

In particular, I wonder if the cleaner direction here would be:

keep the S3 Tables-specific policy generation in a dedicated storage-integration path, rather than growing more branching in the existing S3 logic
align that with the broader push toward a more unified call path for storage locations, rather than introducing another parallel path here

On the credential vending side, there is also a concrete next-step item that feel worth making explicit:

IIUC, only loadTable currently derives S3 Tables resourceArns, while the create paths still flow through with empty resource scope. I think the next iteration should make those create-time semantics explicit: either derive the usable scope there as well, or fail/defer vending rather than returning partially-scoped credentials. This seems especially relevant given the S3 Tables REST docs note that stage-create is not supported.

Also worth aligning with Tornike's PR #3699 on the unified call path for storage locations, seems like natural synergy.

Looking forward to seeing this evolve.

singhpk234

Thank for the update @aritragster !
i left first round of comments

singhpk234 · 2026-03-25T03:24:17Z

+  private IamPolicy s3TablesPolicyString(CredentialVendingContext context, boolean canWrite) {
+    IamStatement.Builder statement =
+        IamStatement.builder()
+            .effect(IamEffect.ALLOW)
+            .addAction("s3tables:GetTableData")
+            .addAction("s3tables:GetTableMetadataLocation");
+
+    if (canWrite) {
+      statement
+          .addAction("s3tables:UpdateTableMetadataLocation")
+          .addAction("s3tables:PutTableData");
+    }
+
+    List<String> arns = context.resourceArns().orElse(List.of());
+    for (String arn : arns) {
+      statement.addResource(IamResource.create(arn));
+    }
+
+    return IamPolicy.builder().addStatement(statement.build()).build();


I think having a dedidcated storage integration would be really helpful than writing if else for s3 tables

singhpk234 · 2026-04-01T13:29:36Z

+@JsonSerialize(as = ImmutableAwsS3TablesStorageConfigurationInfo.class)
+@JsonDeserialize(as = ImmutableAwsS3TablesStorageConfigurationInfo.class)
+@JsonTypeName("AwsS3TablesStorageConfigurationInfo")
+public abstract class AwsS3TablesStorageConfigurationInfo extends PolarisStorageConfigurationInfo {


why not extend AWSS3StorageIntegrations ?

This was Intentional choice. S3 Tables doesn't use most of the S3-specific fields (endpoint, pathStyleAccess, stsUnavailable, userARN, endpointInternal). Extending AwsStorageConfigurationInfo would inherit all of those as nullable fields that are never set, which made me think about this. The only shared fields are roleARN, region, externalId, and KMS — and those are explicitly declared on the S3 Tables config class. Happy to reconsider if you feel strongly about it.

singhpk234 · 2026-04-01T13:35:30Z

+/**
+ * A delegating wrapper around the Iceberg {@link RESTClient} that intercepts responses to extract
+ * the {@code config} section from loadTable responses. When a {@link LoadTableResponse} is
+ * received, the config map (containing {@code tableId} for S3 Tables) is captured and stored in the
+ * request-scoped {@link CapturedConfigHolder}.
+ */
+public class ConfigCapturingHTTPClient implements RESTClient {


rather than overriding the rest client i wonder if we can inject the @RequestScopedBean we introduced i.e CapturedConfigHolder where we are doing actual loadTable ? i am assuming as part of federaton we would be calling irc's loadTable anyways ?

singhpk234 · 2026-04-01T13:37:13Z

+      boolean isS3Tables =
+          catalogEntity.getStorageConfigurationInfo() != null
+              && catalogEntity.getStorageConfigurationInfo().getStorageType()
+                  == PolarisStorageConfigurationInfo.StorageType.S3_TABLES;


lets move to a private function below, also lets writes an assumption, i would say that call to federated catalog has succeeded and now we want to .....
I don't fully understand the contextConfigholder requirement though let me think more about that as well meanwhile.

Done on the private function - extracted to isS3TablesCatalog() with a comment documenting the assumption that the federated catalog loadTable call has already succeeded.

On the CapturedConfigHolder: the problem it solves is that the S3 Tables tableId only exists in the config section of the remote loadTable REST response. When Polaris calls baseCatalog.loadTable(), the Iceberg RESTCatalog internally deserializes the HTTP response, consumes the config map (for token refresh, endpoint config, etc.), and returns a BaseTable object. The BaseTable/TableMetadata doesn't expose the config section - by the time our handler sees the table, the tableId is gone.

The CapturedConfigHolder is a request-scoped bean that bridges this gap. The ConfigCapturingHTTPClient wraps the RESTClient at the HTTP layer (the only point where the config is visible), extracts the tableId, and stashes it in the holder. After baseCatalog.loadTable() returns, the handler reads the tableId from the holder to construct the table ARN.

Let me know what you think of this.

flyingImer · 2026-04-01T20:19:44Z


  FILE("org.apache.iceberg.hadoop.HadoopFileIO", false),

+  S3_TABLES("org.apache.iceberg.aws.s3.S3FileIO", true),


I think the issue here is not the dedicated S3 Tables path itself. That part feels directionally right. The issue is that StorageTypeFileIO no longer gives you a stable discriminator once both S3 and S3_TABLES map to the same S3FileIO, but the reverse lookup still assumes FileIO impl uniquely determines storage type. Would it make sense to switch this validation path to a discriminator that remains explicit, instead of depending on shared FileIO impl here?

Fixed. Set validateAllowedStorageType=false on S3_TABLES so it's excluded from the FileIO reverse lookup map. S3FileIO still resolves to S3. S3_TABLES validation happens via storage type, not FileIO impl.

flyingImer · 2026-04-01T20:19:44Z

+        tableLocations = Set.of(tableArn);
+        LOGGER
+            .atDebug()
+            .addKeyValue("tableIdentifier", tableIdentifier)


This is the main remaining gap for me. If we already know the catalog is S3_TABLES, I wonder if this should just fail closed once tableId/table ARN cannot be derived. Otherwise the dedicated path is better, but the unresolved-scope/create-time story still feels incomplete to me, because the downstream S3 Tables policy path is still being reached without validated table-ARN-shaped input.

Agreed. I changed it from warn-and-continue to BadRequestException. If the catalog is S3_TABLES and no tableId was captured, we now refuse to vend credentials rather than proceeding with unscoped permissions.

flyingImer · 2026-04-01T20:27:38Z

I do think this is directionally better now.

Pulling S3 Tables into a more dedicated path makes sense to me, and I think this is in a better place than the earlier version where the logic was drifting further into the generic S3 vending path.

That said, I still think there are two issues that block this for me:

StorageTypeFileIO is no longer a stable discriminator once both S3 and S3_TABLES point at the same S3FileIO, but the validation path still reverse-maps from FileIO impl back to storage type.
Once we know the catalog is S3_TABLES, the no-tableId path still warns and continues instead of failing closed. So IIUC, the create-time/unresolved-scope gap is still not really closed yet.

So overall, I'm aligned with the structural direction here, but I still don't think the contract is explicit enough yet for merge.

singhpk234 · 2026-04-25T01:07:56Z

            break;
+          case S3_TABLES:
+            if (defaultBaseLocation == null
+                || !defaultBaseLocation.startsWith("arn:aws:s3tables")) {


arn:aws:s3tables may be move this to the constants ?

singhpk234 · 2026-04-25T01:10:11Z

+    // No-op stub for S3 Tables — real validation deferred to a follow-up PR.
+    return Map.of();


can you please add an issue and reference the issue link here ?
follow-up PR is not a meaningful name
happy to assign it to you since you have already something working

singhpk234 · 2026-04-25T01:10:52Z

    AZURE(List.of("abfs://", "wasb://", "abfss://", "wasbs://")),
    GCS("gs://"),
    FILE("file://"),
+    S3_TABLES("arn:aws:s3tables"),


you can move it before ?

singhpk234 · 2026-04-25T01:13:40Z

      Set<PolarisStorageActions> actions,
-      Optional<String> refreshCredentialsEndpoint) {
+      Optional<String> refreshCredentialsEndpoint,
+      Optional<String> capturedTableId) {


do we need to pass this as Args ? since this a request scoped bean its applicable to this Request only right ?

singhpk234 · 2026-04-25T01:15:56Z

+                new ConfigCapturingHTTPClient(
+                    HTTPClient.builder(config)
+                        .uri(config.get(org.apache.iceberg.CatalogProperties.URI))
+                        .build(),
+                    capturedConfigHolder));


Might have to think more about this one : if there is a better way to capture the config part of loadTable response

I thought about this one a bit. The HTTP interception is really the only way to get the tableId without either changing the Iceberg library or making an extra API call.
The problem is that the S3 Tables Iceberg REST endpoint returns tableId in the config section of the loadTable response, but Iceberg's RESTCatalog consumes that config internally (for token refresh, endpoint routing, etc.) and never exposes it on the BaseTable or TableMetadata. So by the time baseCatalog.loadTable() returns, the tableId is just gone.

The alternatives I considered:
A supplementary s3tables:GetTable API call, but that adds latency and requires extra IAM permissions on the Polaris server.
An upstream Iceberg change to expose response config, but that's a long lead time and not guaranteed to be accepted.
The wrapper itself is pretty thin (~40 lines of delegation) and only captures data, never modifies the response. Happy to revisit if you see a cleaner path though.

singhpk234 · 2026-04-25T01:23:42Z

+  /** Checks whether the resolved catalog entity is configured with S3_TABLES storage type. */
+  private boolean isS3TablesCatalog(CatalogEntity catalogEntity) {
+    PolarisStorageConfigurationInfo storageConfig = catalogEntity.getStorageConfigurationInfo();
+    return storageConfig != null
+        && storageConfig.getStorageType() == PolarisStorageConfigurationInfo.StorageType.S3_TABLES;
+  }
+
+  /**
+   * Constructs an S3 Tables ARN from the catalog's default-base-location and a tableId. The
+   * default-base-location for S3 Tables catalogs is the table bucket ARN (e.g.,
+   * arn:aws:s3tables:us-east-1:123456789012:bucket/my-bucket). The resulting table ARN is
+   * bucket-arn/table/tableId.
+   */
+  private String constructS3TablesArn(CatalogEntity catalogEntity, String tableId) {
+    String baseLocation = catalogEntity.getBaseLocation();
+    return baseLocation + "/table/" + tableId;
+  }
+
+  /**
+   * Validates that a constructed S3 Tables ARN falls under one of the catalog's allowed locations.
+   * This prevents a malicious remote catalog from returning a tableId that would construct an ARN
+   * outside the catalog's authorized scope.
+   */
+  private void validateS3TablesArn(
+      TableIdentifier tableIdentifier, String tableArn, CatalogEntity catalogEntity) {
+    PolarisStorageConfigurationInfo storageConfig = catalogEntity.getStorageConfigurationInfo();
+    if (storageConfig == null) {
+      return;
+    }
+    List<String> allowedLocations = storageConfig.getAllowedLocations();
+    boolean isAllowed =
+        allowedLocations.stream().anyMatch(allowed -> tableArn.startsWith(allowed + "/"));
+    if (!isAllowed) {
+      throw new ForbiddenException(
+          "Table '%s' has ARN '%s' which is outside the catalog's allowed locations: %s",
+          tableIdentifier, tableArn, allowedLocations);
+    }
+  }
+


can we capture all of these in a dedicate util class

singhpk234 · 2026-04-25T01:26:27Z

+      // For S3 Tables catalogs, replace s3:// table locations with the constructed table ARN.
+      // s3tables:* IAM actions require ARN resources, not s3:// paths.
+      // Assumption: the federated catalog loadTable call has already succeeded at this point,
+      // and the CapturedConfigHolder contains the tableId from the remote response.
+      CatalogEntity catalogEntity = CatalogEntity.of(getResolvedCatalogEntity());
+      boolean isS3Tables = isS3TablesCatalog(catalogEntity);
+
+      if (isS3Tables && capturedTableId.isPresent()) {
+        String tableArn = constructS3TablesArn(catalogEntity, capturedTableId.get());
+        validateS3TablesArn(tableIdentifier, tableArn, catalogEntity);
+        tableLocations = Set.of(tableArn);
+        LOGGER
+            .atDebug()
+            .addKeyValue("tableIdentifier", tableIdentifier)
+            .addKeyValue("tableArn", tableArn)
+            .log("Replaced table locations with S3 Tables ARN for credential vending");
+      } else if (isS3Tables) {
+        // Fail closed: S3 Tables catalogs require a tableId to construct the table ARN
+        // for scoped credential vending. Without it, we cannot generate a properly scoped
+        // IAM session policy.
+        throw new BadRequestException(
+            "Cannot vend credentials for S3 Tables table '%s': "
+                + "no tableId was captured from the remote catalog response. "
+                + "Ensure the remote S3 Tables endpoint returns tableId in the loadTable config.",
+            tableIdentifier);
+      }


I am a bit confused, wouldn't this be completely inside, since this is what marks this catalog as a federated catalog ?

!(baseCatalog instanceof IcebergCatalog)

singhpk234 · 2026-04-30T16:15:41Z

can you resolve the conflicts @aritragster

singhpk234

This is looking close @aritragster !
can you please rebase this pr !

singhpk234 · 2026-04-30T16:18:00Z

+    AwsS3TablesStorageConfigurationInfo storageConfig = config();
+    String region = storageConfig.getRegion();
+    int durationSeconds = realmConfig.getConfig(STORAGE_CREDENTIAL_DURATION_SECONDS);
+
+    StorageAccessConfig.Builder accessConfig = StorageAccessConfig.builder();
+
+    // Generate s3tables:* session policy
+    IamPolicy policy =
+        buildS3TablesPolicy(storageConfig, allowedReadLocations, allowedWriteLocations);
+
+    // Role session name
+    boolean includePrincipalName =
+        realmConfig.getConfig(FeatureConfiguration.INCLUDE_PRINCIPAL_NAME_IN_SUBSCOPED_CREDENTIAL);
+    String roleSessionName =
+        includePrincipalName
+            ? AwsRoleSessionNameSanitizer.sanitize("polaris-" + polarisPrincipal.getName())
+            : "PolarisAwsS3TablesCredentialsStorageIntegration";
+
+    AssumeRoleRequest.Builder request =
+        AssumeRoleRequest.builder()
+            .externalId(storageConfig.getExternalId())
+            .roleArn(storageConfig.getRoleARN())
+            .roleSessionName(roleSessionName)
+            .policy(policy.toJson())
+            .durationSeconds(durationSeconds);
+
+    // Session tags support
+    List<String> sessionTagFieldNames =
+        realmConfig.getConfig(FeatureConfiguration.SESSION_TAGS_IN_SUBSCOPED_CREDENTIAL);
+    Set<SessionTagField> enabledSessionTagFields =
+        sessionTagFieldNames.stream()
+            .map(SessionTagField::fromConfigName)
+            .flatMap(Optional::stream)
+            .collect(Collectors.toCollection(() -> EnumSet.noneOf(SessionTagField.class)));
+
+    if (!enabledSessionTagFields.isEmpty()) {
+      List<Tag> sessionTags =
+          buildSessionTags(
+              polarisPrincipal.getName(), credentialVendingContext, enabledSessionTagFields);
+      if (!sessionTags.isEmpty()) {
+        request.tags(sessionTags);
+        request.transitiveTagKeys(sessionTags.stream().map(Tag::key).collect(Collectors.toList()));
+      }
+    }
+
+    credentialsProvider.ifPresent(
+        cp -> request.overrideConfiguration(b -> b.credentialsProvider(cp)));
+
+    @SuppressWarnings("resource")
+    StsClient stsClient =
+        stsClientProvider.stsClient(StsDestination.of(storageConfig.getStsEndpointUri(), region));
+
+    AssumeRoleResponse response = stsClient.assumeRole(request.build());
+    accessConfig.put(StorageAccessProperty.AWS_KEY_ID, response.credentials().accessKeyId());
+    accessConfig.put(
+        StorageAccessProperty.AWS_SECRET_KEY, response.credentials().secretAccessKey());
+    accessConfig.put(StorageAccessProperty.AWS_TOKEN, response.credentials().sessionToken());
+    Optional.ofNullable(response.credentials().expiration())
+        .ifPresent(
+            i -> {
+              accessConfig.put(
+                  StorageAccessProperty.EXPIRATION_TIME, String.valueOf(i.toEpochMilli()));
+              accessConfig.put(
+                  StorageAccessProperty.AWS_SESSION_TOKEN_EXPIRES_AT_MS,
+                  String.valueOf(i.toEpochMilli()));
+            });
+
+    if (region != null) {
+      accessConfig.put(StorageAccessProperty.CLIENT_REGION, region);
+    }
+
+    refreshCredentialsEndpoint.ifPresent(
+        endpoint ->
+            accessConfig.put(StorageAccessProperty.AWS_REFRESH_CREDENTIALS_ENDPOINT, endpoint));
+
+    return accessConfig.build();


This is mostly same as S3, i would say can we move this to AWSUtil or something ? we can pass the policy string & prinicpal name as args for this since these are the one which differ from s3

singhpk234 · 2026-04-30T16:19:44Z

+public class CapturedConfigHolder {
+
+  /** Config key for the S3 Tables table identifier in the loadTable response config section. */
+  public static final String TABLE_ID_CONFIG_KEY = "tableId";


This is s3Table key, can you prefix it with that or you can have

Map<String, Map<String, String>> as well

singhpk234 · 2026-04-30T16:20:27Z

+/**
+ * A delegating wrapper around the Iceberg {@link RESTClient} that intercepts responses to extract
+ * the {@code config} section from loadTable responses. When a {@link LoadTableResponse} is
+ * received, the config map (containing {@code tableId} for S3 Tables) is captured and stored in the


Lets make this generic ! just say intercepting the raw rest responses !

flyingImer · 2026-04-30T17:45:22Z

Coming back after the latest round. You addressed my 4-01 blockers and singhpk234 is on the finish line for local issues. Before this lands, three structural things.

1. StorageTypeFileIO: the current resolution opts out of the contract, it does not fix it.

My 4-01 point was that S3 and S3_TABLES both mapping to S3FileIO breaks the reverse-lookup assumption (FileIO impl -> storage type). The resolution here sets validateAllowedStorageType=false on S3_TABLES so it is excluded from the reverse map. The contract is still "FileIO impl uniquely determines storage type", with S3_TABLES as an exception.

Before this PR, the validate=false escape was only used by IN_MEMORY (test FileIO). Extending it to a production storage type is a different character of decision.

The next S3-family storage type hits the same shape. The structural fix is a dedicated discriminator (the StorageType enum is already there) and dropping FileIO class -> storage type reverse lookup in the validation path.

Two ways to land this, my preference is A:

A (preferred): do the discriminator refactor in this PR. StorageType enum already exists, the scope is the validation path's reverse lookup.
B (acceptable): file a follow-up issue and add a TODO next to S3_TABLES(..., false) citing the issue, with a commit to do the refactor before another S3-family type lands. This matches singhpk234's 4-25 pattern for AwsS3TablesCredentialsStorageIntegration:218.

The one outcome I would push back on is skipping A without filing the follow-up.

2. ConfigCapturingHTTPClient patches an Iceberg-library API gap from inside Polaris.

You laid this out on 4-27: tableId sits in the remote loadTable response's config, but RESTCatalog consumes config internally and never exposes it on BaseTable or TableMetadata. So we wrap the HTTP client to intercept the raw response.

Three issues compound here:

brittle against Iceberg library version bumps, since internal config consumption is not a stable boundary
narrow: it hooks on LoadTableResponse specifically. Any other response type from another endpoint needs its own wrapper.
the problem is upstream in Iceberg, the maintenance cost lands in Polaris

You already considered the upstream Iceberg fix on 4-27 and ruled it out as long-lead and not guaranteed to land. I would flip that call. Concrete ask:

file the Iceberg issue now (happy to help draft if useful)
add a TODO or @see on ConfigCapturingHTTPClient class Javadoc linking the upstream issue, so future readers see this is a temporary shim
keep the interception in this PR on that footing, not as a permanent mechanism

Shipping it without the upstream link signs Polaris up for fork-shaped debt inside OSS.

singhpk234's 4-25 comment on IcebergRESTExternalCatalogFactory:69 is the same instinct at the local level. Worth lifting it to the boundary.

3. Sequencing vs #3699.

You offered on 3-25 to rebase on #3699 once it lands, or to collaborate with tokoko so the refactoring accounts for S3 Tables. +1 to that and to dimas-b 4-13. Let's hold to that sequencing rather than flip to merge-first.

If this merges first, #3699 either carries S3 Tables semantics through the refactor or leaves the S3 Tables path as a legacy branch. Either way the "one coherent vending abstraction" outcome gets worse.

If #3699's abstraction can express ARN-based resources alongside path-based ones, which you flagged as a fit on 3-25, S3 Tables drops in cleanly. Better landing than branching-in-S3 now and retrofitting later.

Not trying to block. Iteration since the 3-24 draft has moved from if/else branching inside S3 logic to a dedicated S3 Tables integration path. But the discriminator contract and the interception layer are shapes we live with long after this merges, and I would rather get the shape right once.

- Add StorageType.S3_TABLES with 'arn:' prefix for ARN-based locations - Add AwsS3TablesStorageConfigurationInfo config class - Add AwsS3TablesCredentialsStorageIntegration generating s3tables:* IAM policies - Add AwsS3TablesStorageConfigInfo OpenAPI schema - Wire S3_TABLES into CatalogEntity, StorageIntegrationProvider, StorageTypeFileIO - Add CapturedConfigHolder and ConfigCapturingHTTPClient for tableId capture from federated loadTable responses (needed for table ARN construction) - Add ARN validation to prevent credential scoping outside allowed locations - 17 unit tests covering config, integration, holder, and HTTP client

- StorageTypeFileIO: set S3_TABLES validateAllowedStorageType=false to avoid collision with S3 in FileIO reverse lookup (flyingImer) - IcebergCatalogHandler: fail closed with BadRequestException when S3_TABLES catalog has no captured tableId (flyingImer) - PolarisStorageConfigurationInfo: narrowed S3_TABLES prefix from 'arn:' to 'arn:aws:s3tables' (singhpk234) - CatalogEntity: moved S3_TABLES block next to S3, updated base location validation prefix (singhpk234) - IcebergCatalogHandler: extracted isS3TablesCatalog() private method with assumption comment (singhpk234)

- Extract arn:aws:s3tables to S3_TABLES_ARN_PREFIX constant - Move S3_TABLES next to S3 in enum/switch ordering - Extract S3 Tables helpers to dedicated S3TablesUtil class - Remove capturedTableId method parameter (read from request-scoped bean) - Guard S3 Tables logic explicitly within federated catalog path - Add IAM action constants and TABLE_ID_CONFIG_KEY constant - Reference apache#4302 for validateAccessToLocations follow-up

- Extract shared STS assume-role logic to AwsStsUtil (used by both S3 and S3 Tables integrations, eliminating code duplication) - Rename TABLE_ID_CONFIG_KEY to S3TABLES_TABLE_ID_CONFIG_KEY - Make ConfigCapturingHTTPClient Javadoc generic, add TODO linking upstream Iceberg issue apache/iceberg#16399 - Add TODO on StorageTypeFileIO linking apache#4486 - Move S3_TABLES case before GCS in provider switch ordering - Fix duplicate CLIENT_REGION/refreshEndpoint in S3 integration (util handles it in STS path, else-block handles non-STS path) - Add @nullable annotations on AwsStsUtil parameters - Fix IcebergRESTFederatedCatalogFactory constructor name mismatch - Clean up dead code in IcebergCatalogHandler (inline ARN logic replaced by S3TablesUtil.resolveTableLocations) - Add AwsStsUtilTest (12 tests) and S3TablesUtilTest (9 tests)

aritragster · 2026-05-18T23:26:56Z

@flyingImer @singhpk234 I've addressed all outstanding feedback, rebased on latest main, and filed the upstream issues. Here's the summary:

Issues Filed

Iceberg upstream: Expose server-assigned resource identifiers (tableId) in LoadTableResponse REST API iceberg#16399 — Expose tableId natively in LoadTableResponse (removes need for ConfigCapturingHTTPClient long-term)
Polaris follow-up: Refactor StorageTypeFileIO discriminator to support S3 Tables without opt-out #4486 — Refactor StorageTypeFileIO discriminator to support S3 Tables without opt-out

Review Feedback Addressed

@singhpk234's feedback:

S3_TABLES case moved before GCS in PolarisStorageIntegrationProviderImpl switch
TABLE_ID_CONFIG_KEY renamed to S3TABLES_TABLE_ID_CONFIG_KEY
Shared STS assume-role logic extracted to AwsStsUtil — both AwsCredentialsStorageIntegration (S3) and AwsS3TablesCredentialsStorageIntegration now use it, eliminating duplication
ConfigCapturingHTTPClient Javadoc made generic (removed S3 Tables-specific reference)
Rebased on latest main, all conflicts resolved

@flyingImer's feedback:

TODO added on ConfigCapturingHTTPClient linking Expose server-assigned resource identifiers (tableId) in LoadTableResponse REST API iceberg#16399
TODO added on StorageTypeFileIO S3_TABLES entry linking Refactor StorageTypeFileIO discriminator to support S3 Tables without opt-out #4486

Additional cleanup:

Fixed IcebergRESTFederatedCatalogFactory constructor name mismatch (pre-existing bug exposed by rebase)
Removed dead inline ARN logic from IcebergCatalogHandler (already encapsulated in S3TablesUtil.resolveTableLocations)
Added @Nullable annotations on AwsStsUtil parameters
Fixed duplicate CLIENT_REGION/refreshCredentialsEndpoint setting in S3 integration (util handles STS path, else-block handles non-STS path)

New Tests

AwsStsUtilTest — 12 tests covering assume-role behavior (parameters, credentials, expiration, region, refresh endpoint, session names, session tags)
S3TablesUtilTest — 9 tests covering ARN construction, validation, fail-closed behavior, and type detection

On Sequencing with #3699

I've been tracking that PR. It currently has no formal approvals, an unresolved cache design discussion (LoadingCache vs Supplier-based Caffeine), and a pending dev ML vote on SPI removals. There's no clear timeline for it to land.

This PR is self-contained and additive (~1,200 lines including tests, primarily new files). When #3699 lands with its LocationGrant model and StorageIntegration SPI, my S3 Tables integration becomes a clean new implementation of that interface — the AwsS3TablesCredentialsStorageIntegration maps directly to the new abstraction, and AwsStsUtil remains shared. I commit to doing that adaptation when #3699 merges.

Merging this first doesn't add debt to the refactor path. It just means S3 Tables credential vending ships sooner.

dimas-b · 2026-05-22T00:08:44Z

I'm fine merging this before #3699 ... please resolve conflicts for CI to run and enable the next review round 👍

github-project-automation Bot added this to Basic Kanban Board Mar 24, 2026

github-project-automation Bot moved this to PRs In Progress in Basic Kanban Board Mar 24, 2026

dimas-b reviewed Mar 24, 2026

View reviewed changes

aritragster marked this pull request as draft March 25, 2026 03:59

singhpk234 requested review from flyrain and singhpk234 March 27, 2026 21:40

aritragster force-pushed the s3-tables-credential-vending branch from 40e568e to 8934c19 Compare March 31, 2026 17:06

singhpk234 reviewed Apr 1, 2026

View reviewed changes

flyingImer reviewed Apr 1, 2026

View reviewed changes

tokoko mentioned this pull request Apr 11, 2026

core: Credential Vending Refactor #3699

Open

singhpk234 reviewed Apr 25, 2026

View reviewed changes

aritragster mentioned this pull request Apr 27, 2026

Implement validateAccessToLocations for S3 Tables storage integration #4302

Open

3 tasks

singhpk234 marked this pull request as ready for review April 30, 2026 16:15

singhpk234 reviewed Apr 30, 2026

View reviewed changes

This was referenced May 18, 2026

Refactor StorageTypeFileIO discriminator to support S3 Tables without opt-out #4486

Open

Expose server-assigned resource identifiers (tableId) in LoadTableResponse REST API apache/iceberg#16399

Open

aritragster added 4 commits May 18, 2026 13:53

aritragster force-pushed the s3-tables-credential-vending branch from 659e915 to c8817c5 Compare May 18, 2026 23:23


		FILE("org.apache.iceberg.hadoop.HadoopFileIO", false),

		S3_TABLES("org.apache.iceberg.aws.s3.S3FileIO", true),

		// No-op stub for S3 Tables — real validation deferred to a follow-up PR.
		return Map.of();

Conversation

aritragster commented Mar 24, 2026

Summary

Problem

Approach

Files Changed (15 files, ~570 lines)

AI Disclosure

Checklist

Uh oh!

dimas-b left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

flyrain commented Mar 27, 2026

Uh oh!

flyingImer commented Mar 28, 2026

Uh oh!

singhpk234 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

flyingImer commented Apr 1, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

singhpk234 commented Apr 30, 2026

Uh oh!

singhpk234 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

flyingImer commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

flyingImer commented Apr 30, 2026 •

edited

Loading