Skip to content

feat(gcp): Add Hierarchical Namespace (HNS) support for GCS bucket #3996

Open
varunarya002 wants to merge 1 commit into
apache:mainfrom
varunarya002:gcp_hns_bucket
Open

feat(gcp): Add Hierarchical Namespace (HNS) support for GCS bucket #3996
varunarya002 wants to merge 1 commit into
apache:mainfrom
varunarya002:gcp_hns_bucket

Conversation

@varunarya002
Copy link
Copy Markdown

@varunarya002 varunarya002 commented Mar 13, 2026

Problem

GCS buckets with Hierarchical Namespace (HNS) enabled treat folders as first-class resources, not name prefixes. Iceberg table creation and Spark ingestion fail on HNS buckets because:

  1. Folder objects must exist before files can be written into them.
  2. Folder creation requires storage.folders.create — not granted by the standard write roles in our vended credentials.

Approach

Two complementary changes, with a clear split between server-side and client-side responsibility:

1. Server-side: pre-createTable hook to materialize the folder hierarchy

A new SPI, prepareLocations(List<String>), is exposed on PolarisStorageIntegration with a no-op default. GCS overrides it.

IcebergCatalogHandler.createTableDirect() and createTableStaged() now invoke integration.prepareLocations([tableLocation, metadataLocation, dataLocation]) before delegating to Iceberg's Catalog.createTable().

The GCS implementation (GcsStorageLocationPreparer):

  • Calls Storage.get(bucket, HIERARCHICAL_NAMESPACE) to detect HNS per bucket (auto-detection — HNS is bucket-immutable, so it's safe to call once per request; caching is a follow-up).
  • For HNS buckets only, walks the path hierarchy (warehousewarehouse/ns1warehouse/ns1/table1) and calls StorageControlClient.createFolder() for each segment, idempotent against AlreadyExistsException.
  • For non-HNS buckets, no-op.

Generic URI parsing and hierarchy-building lives in HierarchicalFolderLocationPreparer (base class) so Azure ADLS Gen2 can reuse it later.

2. Client-side: vended-credential scope adds folders.create + folders.get for HNS buckets

GcpCredentialsStorageIntegration.generateAccessBoundaryRules() now takes a Predicate<String> isHnsBucket. For each write bucket where the predicate is true, it adds a third access-boundary rule containing exactly two permissions: storage.folders.create and storage.folders.get, scoped via resource.name.startsWith('projects/_/buckets/<bucket>/folders/<path>').

This allows Spark to create deeper partition folders (e.g. for year=2024/month=01/) at write time without needing admin credentials, while non-HNS catalogs are unaffected (no additional permissions vended).

The narrow permission set (vs. the predefined roles/storage.folderAdmin role, which includes setIamPolicy/getIamPolicy/delete/rename/list) preserves least-privilege.

Tests

  • HNS folder rule emission gated by predicate
  • HNS with multiple buckets and partial writes
  • HNS without writes (no folder rules)
  • HNS with separate metadata and data buckets
  • HNS predicate-gating (mixed HNS/non-HNS in one catalog)
  • Folder rule uses narrow folders.create + folders.get permissions (verified explicitly)
  • Non-HNS bucket emits no folder rule
  • GcsStorageLocationPreparer HNS detection and folder creation (full suite preserved from previous round)

Checklist

  • 🛡️ Don't disclose security issues! (contact security@apache.org)
  • 🔗 Clearly explained why the changes are needed
  • 🧪 Added/updated tests with good coverage
  • 💡 Added comments for complex logic
  • 🧾 Updated `CHANGELOG.md`
  • 📚 Updated documentation in `site/content/in-dev/unreleased`

@github-project-automation github-project-automation Bot moved this to PRs In Progress in Basic Kanban Board Mar 13, 2026
@varunarya002 varunarya002 changed the title feat(gcp): Add Hierarchical Namespace (HNS) support for GCS bucket cr… feat(gcp): Add Hierarchical Namespace (HNS) support for GCS bucket Mar 13, 2026
Copy link
Copy Markdown
Contributor

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution, @varunarya002 !

Since this PR affects REST API parameters, please open a corresponding discussion on the dev ML, which is a usual practice in Polaris.

(this is not a complete review from my side 😅 ... just noting a couple of points for a start)

Comment thread spec/polaris-management-service.yml Outdated
gcsServiceAccount:
type: string
description: a Google cloud storage service account
hierarchicalNamespace:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: a similar field in Azure is called simply hierarchical

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hierarchicalNamespace field has been removed from the OpenAPI spec entirely. HNS is now auto-detected per bucket at credential-vending time, so no user-facing flag is needed (which also means a single catalog can serve a mix of HNS and non-HNS buckets correctly). If we add the flag back in a follow-up for any reason, we'll use hierarchical to match the Azure naming.

@dimas-b
Copy link
Copy Markdown
Contributor

dimas-b commented Apr 1, 2026

@varunarya002 : Do you have capacity to push this PR forward? If you do not have time, it's fine. We can build something based of your work and still give you attribution.

@varunarya002
Copy link
Copy Markdown
Author

Hi @dimas-b. I will be starting discussion about this change on dev ML.

@varunarya002 varunarya002 force-pushed the gcp_hns_bucket branch 5 times, most recently from 918e6d4 to f43e793 Compare April 5, 2026 08:32
JsonNode parsedRules = mapper.convertValue(credentialAccessBoundary, JsonNode.class);
JsonNode refRules = readResource(mapper, "gcp-testGenerateAccessBoundaryHnsEnabled.json");
assertThat(parsedRules)
.usingRecursiveComparison(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: does simple equals not work in this case?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The recursive-comparison pattern matches the existing convention in this file — see testGenerateAccessBoundary (line 202) and its peers, which all use usingRecursiveComparison + the recursiveEquals helper. I kept the new HNS tests on the same pattern for consistency.

That said, the new HNS tests also have stricter assertions where it matters — e.g. testGenerateAccessBoundaryFolderRuleUsesNarrowPermissions explicitly asserts containsExactlyInAnyOrder("storage.folders.create", "storage.folders.get") and that the expression doesn't reference managedFolders/. The JSON fixtures cover the structural rule emission shape.


public StorageLocationPreparer create(@Nonnull PolarisStorageConfigurationInfo storageConfig) {
if (storageConfig instanceof GcpStorageConfigurationInfo && storageConfiguration != null) {
return new GcsStorageLocationPreparer(storageConfiguration.gcpCredentialsSupplier(clock));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use StorageAccessConfigProvider for storage credentials?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not feasible because of different trust levels (admin creds for HNS folder creation vs subscoped client tokens).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... sorry, but I'm a bit confused now... Clients (e.g. Spark) should be able to create folders (for new data files with Iceberg partitioning) using the vended credentials, right?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly — Spark does create folders with the vended credentials, and the new design preserves that. Two distinct folder-creation responsibilities, with different credential trust levels:

1. Server-side, at table creation time — using the catalog's service-account credentials (admin-level), GcsStorageLocationPreparer pre-creates the table's known top-level folders (table/, table/metadata/, table/data/). This happens in IcebergCatalogHandler.createTableDirect/createTableStaged before Iceberg writes the initial table metadata.

2. Client-side, at write time — Spark needs to create deeper folders that the server can't enumerate ahead of time (partition subdirectories like data/year=2024/month=01/). For this, the vended downscoped credentials now include storage.folders.create + storage.folders.get (narrowly scoped to the catalog's write locations) when the target bucket has HNS enabled. That's the change in GcpCredentialsStorageIntegration.generateAccessBoundaryRules().

Earlier I conflated these two paths in my StorageAccessConfigProvider reply — apologies for the confusion. The vended credentials do let Spark create folders; they just don't include the unrelated admin operations that roles/storage.folderAdmin would have granted (e.g. setIamPolicy).

() ->
Optional.ofNullable(
tableProperties.get(
IcebergTableLikeEntity.USER_SPECIFIED_WRITE_METADATA_LOCATION_KEY)))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not look specific to GCS. Could we handle this part in a more general way?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracted AbstractStorageLocationPreparer base class with generic URI parsing, hierarchy building, bucket grouping.

import java.util.Map;

public interface StorageLocationPreparer {
void prepareTableLocation(String tableLocation, Map<String, String> tableProperties);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could simplify this SPI 🤔 I suppose we could simply request a certain set of folders to be created. Do you envision more sophisticated work to be done (in some future cases) for preparing table locations?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to void prepareLocations(List) — no Iceberg concepts leak into the SPI.

}

public StorageLocationPreparer create(@Nonnull PolarisStorageConfigurationInfo storageConfig) {
if (storageConfig instanceof GcpStorageConfigurationInfo && storageConfiguration != null) {
Copy link
Copy Markdown
Contributor

@dimas-b dimas-b Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's leverage CDI via storage type-based @Identifier annotations... similar to ServiceProducers.polarisAuthorizerFactory()

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Factory uses Instance lookup by StorageType.name(). GCS preparer annotated with @Identifier("GCS").

@dimas-b
Copy link
Copy Markdown
Contributor

dimas-b commented Apr 16, 2026

CredentialAccessBoundary.AccessBoundaryRule.AvailabilityCondition.newBuilder()
.setExpression(String.join(" || ", folderConditions))
.build());
builder.setAvailablePermissions(List.of("inRole:roles/storage.folderAdmin"));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We add this unconditionally in this code, but it is required only for HNS, right?

I suppose it would be more robust to check resolveHnsStatus() here or have an HNS flag in the Storage Config.... I'm kind of leaning toward the flag, even though we can auto-detect this... WDYT?

My thinking is that users are generally aware of the the HNS flag in their GCS storage (or al least they should be) and it is not likely to change in runtime.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @dimas-b 's comment - is it a concern that we are introducing this permission for any GCS backed tables? That IMO goes against the principle of granting least-privileged access, so I think it would be helpful to review if that exposes any unnecessary actions for non HNS enabled tables.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the latest force-push. Two changes here:

  1. The folder rule is now gated on per-bucket HNS auto-detectiongenerateAccessBoundaryRules() takes a Predicate<String> isHnsBucket and only emits the folder rule for buckets where it returns true. GcpCredentialsStorageIntegration.isHnsBucket() queries Storage.get(bucket, HIERARCHICAL_NAMESPACE) per bucket. We went with auto-detection rather than a config flag so a single catalog can serve a mix of HNS and non-HNS buckets correctly (a catalog's default-base-location is often one bucket, but write.data.path/write.metadata.path can point elsewhere — the auto-detect handles that per-bucket without admins having to enumerate which buckets are HNS).

  2. Permissions narrowed — replaced inRole:roles/storage.folderAdmin with the two individual permissions we actually need: storage.folders.create and storage.folders.get. The role would have additionally granted setIamPolicy/getIamPolicy/delete/rename/list on both folders/ and managedFolders/ resources, none of which clients need for write paths.

Also dropped the managedFolders/ resource condition — we use the Storage Control API's folders/ resource path, not managed folders.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed. The folder rule now uses individual permissions (storage.folders.create + storage.folders.get) instead of inRole:roles/storage.folderAdmin, and is only emitted for buckets where HNS is detected via an injected Predicate<String> isHnsBucket. Non-HNS GCS-backed tables get no extra permissions, restoring least-privilege.

The managedFolders/ resource condition was also dropped since we use HNS folders (folders/), not managed folders.

Copy link
Copy Markdown
Contributor

@sungwy sungwy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @varunarya002 thank you so much for putting together this PR. The new StorageLocationPreparer abstraction is a highly reusable idea, and I think it is directionally correct. I took a first pass at reviewing this PR and left some comments.

Also, could we add a bit more context in the PR description about this model? We are effectively introducing a pre-createTable hook into the table creation flow, so it might be worth calling that out explicitly.

* resolve folder hierarchies, group by bucket, and delegate to {@link
* #createFoldersForBucket(String, List)} for storage-specific operations.
*/
public abstract class AbstractStorageLocationPreparer implements StorageLocationPreparer {
Copy link
Copy Markdown
Contributor

@sungwy sungwy Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this name is a bit too abstract (pun intended 🙂). Would it make sense to give it a more descriptive name like HierarchicalFolderLocationPreparer or ObjectStoreFolderPreparer?

I also wonder whether introducing this abstraction is a bit premature. Do we expect these methods to be reusable across other cloud providers, or is this really tailored to the GCS HNS case for now?

If it is mainly the latter, would it make sense to fold this into GcsStorageLocationPreparer for now and only extract a shared abstraction once we have a second provider implementation and can validate the actual common ground?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to HierarchicalFolderLocationPreparer — using your suggestion verbatim. We kept the abstract base (rather than folding it into GcsStorageLocationPreparer) because Azure ADLS Gen2 is the obvious next reuse case (also has hierarchical-namespace folder semantics), and the generic URI parsing + path-hierarchy logic is genuinely cloud-agnostic.

return NO_OP;
}
String key = storageConfig.getStorageType().name();
Instance<StorageLocationPreparer> selected = preparers.select(Identifier.Literal.of(key));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the StorageLocationPreparer as an abstraction, but instead of introducing it as a separate standalone selection path, would it make more sense for it to be exposed by PolarisStorageIntegration instead?

PolarisStorageIntegration is already selected from the storage configuration type, so grouping storage-specific behavior there seems cleaner and makes the config-driven selection more obvious and consistent.

I do think it still makes sense to model StorageLocationPreparer as a separate capability. I’m just wondering whether PolarisStorageIntegration should be single the interface that exposes that capability, rather than introducing a parallel factory and dispatch path for storage-specific behavior.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. prepareLocations(List<String>) is now a method on PolarisStorageIntegration with a no-op default; non-GCS storage types pay nothing.

GcpCredentialsStorageIntegration takes a Consumer<List<String>> folderPreparer constructor parameter and delegates prepareLocations to it. PolarisStorageIntegrationProviderImpl (which already switches on storage type) injects GcsStorageLocationPreparer when constructing the GCP integration.

This deletes StorageLocationPreparerFactory and the @Identifier("GCS") CDI lookup entirely — single selection mechanism (the provider's type switch), and the module boundary is respected: polaris-core only declares the interface signature, the runtime-side preparer with StorageControlClient stays in runtime/service. IcebergCatalogHandler calls integration.prepareLocations(locations) directly.

CredentialAccessBoundary.AccessBoundaryRule.AvailabilityCondition.newBuilder()
.setExpression(String.join(" || ", folderConditions))
.build());
builder.setAvailablePermissions(List.of("inRole:roles/storage.folderAdmin"));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @dimas-b 's comment - is it a concern that we are introducing this permission for any GCS backed tables? That IMO goes against the principle of granting least-privileged access, so I think it would be helpful to review if that exposes any unnecessary actions for non HNS enabled tables.

@dimas-b
Copy link
Copy Markdown
Contributor

dimas-b commented May 5, 2026

Hi @varunarya002 : Thanks again for this PR. Do you have capacity to resolve conflicts and comments?

Iceberg table creation and Spark ingestion against HNS-enabled GCS buckets
previously failed: HNS buckets treat folders as first-class resources that
must exist before nested files can be written, and the vended credentials
lacked the IAM permissions to create those folders.

Two-part design:

1. Server-side pre-createTable hook
   - New `PolarisStorageIntegration.prepareLocations(List<String>)` method
     with a no-op default. Non-GCS storage types pay nothing.
   - `IcebergCatalogHandler.createTableDirect()` and `createTableStaged()`
     invoke it with `[tableLocation, metadata, data]` before Iceberg writes
     the initial table metadata.
   - GCS impl (`GcsStorageLocationPreparer`) auto-detects HNS per bucket
     and walks the path hierarchy via `StorageControlClient.createFolder()`.
     Idempotent against `AlreadyExistsException`.
   - Generic URI parsing and hierarchy-building live in
     `HierarchicalFolderLocationPreparer` so Azure ADLS Gen2 can reuse it.

2. Vended-credential scope on HNS buckets only
   - `GcpCredentialsStorageIntegration.generateAccessBoundaryRules()` now
     accepts a `Predicate<String> isHnsBucket`. Auto-detection at
     credential-vending time via `Storage.get(bucket).getHierarchicalNamespace()`.
   - For HNS write buckets only, emits a narrowly-scoped folder rule with
     permissions `storage.folders.create` + `storage.folders.get` (vs. the
     overly-broad `roles/storage.folderAdmin` role), conditioned on
     `resource.name.startsWith('.../folders/<path>')`.
   - Non-HNS buckets receive no additional permissions (least-privilege).
   - `managedFolders/` condition dropped — we don't use managed folders.

Co-Authored-By: claude-flow <ruv@ruv.net>
@varunarya002
Copy link
Copy Markdown
Author

Thanks @dimas-b and @sungwy for the thorough review. Just pushed a v2 that addresses every open thread. Summary of changes since the last round:

Design changes

  • HNS is now auto-detected per bucket at credential-vending time via Storage.get(bucket, HIERARCHICAL_NAMESPACE). generateAccessBoundaryRules() accepts a Predicate<String> isHnsBucket and the folder rule is emitted only for HNS buckets. Non-HNS buckets get no additional permissions — least-privilege preserved.
  • The folder rule uses narrow individual permissions storage.folders.create + storage.folders.get, replacing the previous overly-broad roles/storage.folderAdmin. Dropped the managedFolders/ resource condition entirely (we use the Storage Control API's folders/ resource, not managed folders).
  • StorageLocationPreparer is now exposed via PolarisStorageIntegration.prepareLocations(List<String>) with a no-op default. The standalone StorageLocationPreparerFactory and @Identifier("GCS") CDI lookup are deleted — single selection path through PolarisStorageIntegrationProviderImpl's existing storage-type switch. Module boundary respected: only the interface signature lives in polaris-core; the runtime-side preparer with StorageControlClient stays in runtime/service.
  • AbstractStorageLocationPreparer renamed to HierarchicalFolderLocationPreparer (using @sungwy's suggested name). Kept as a base class because Azure ADLS Gen2 is the obvious next reuse case — the URI parsing + path-hierarchy logic is genuinely cloud-agnostic.

PR description rewrite: now explicitly explains the pre-createTable hook model and the server-side / client-side responsibility split, per @sungwy's ask.

Rebase: branch is rebased onto current apache/main (was 200+ commits behind). Force-pushed as one squashed commit. Per-thread replies posted on every open comment.

Tests added for predicate gating, narrow-permission shape, multi-bucket scenarios, and the mixed HNS/non-HNS-in-one-catalog case. All build green locally on Java 21 (:polaris-core:test and :polaris-runtime-service:test --tests "*GcsStorageLocationPreparerTest*").

Ready for another look when you have time.

Copy link
Copy Markdown
Contributor

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pushing this PR forward, @varunarya002 ! Some more comments below.

* subsequent writes fail with a clear 403, rather than silently over-permissioning.
*/
@VisibleForTesting
boolean isHnsBucket(String bucket) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it might be worth covering this with a FeatureConfiguration flag (enabled by default) in case some prior deployments do not wish to execute the extra call for some internal reason... WDYT?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively we could add a GCS storage config property hierarchical with values yes, no, auto (default)... WDYT? This could be done in a separate PR, of course.

return;
}
PolarisStorageIntegration<?> integration =
storageIntegrationProvider().getStorageIntegrationForConfig(storageConfig);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a conflict WRT #3699 , which removes this method 🤔

I tend to think that #3699 is beneficial and would like that refactoring to land.

@varunarya002 : Will @tokoko 's approach work for you?

* resolve folder hierarchies, group by bucket, and delegate to {@link
* #createFoldersForBucket(String, List)}.
*/
public abstract class HierarchicalFolderLocationPreparer implements Consumer<List<String>> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's define a dedicated interface for Consumer<List<String>> for ease of navigation in IDEs.

controlClient.createFolder(request);
LOGGER.atDebug().addKeyValue("folder", folderPath).log("Created HNS folder");
} catch (AlreadyExistsException e) {
LOGGER.atDebug().addKeyValue("folder", folderPath).log("HNS folder already exists, skipping");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is folder actually logged even though the message does not reference it? (just to double check).

* Hierarchical Namespace enabled) override this. Passed locations include the table location and
* its metadata/data subpaths.
*/
public void prepareLocations(@Nonnull List<String> locations) {}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing a note from artificial helpers:

High: the new pre-create hook runs before Polaris has validated the requested table locations, so a rejected create request can still mutate GCS. In both
direct and staged flows, prepareStorageForTable(...) is invoked before any of the location checks that happen later in IcebergCatalog
(validateLocationsForTableLike, overlap checks, metadata-in-table-dir checks). That means a caller can point location, write.metadata.location, or
write.data.location at a path Polaris should reject, and Polaris will still try to create HNS folders there with the service account first. See runtime/
service/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalogHandler.java:252, runtime/service/src/main/java/org/apache/polaris/service/
catalog/iceberg/IcebergCatalogHandler.java:535, runtime/service/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalogHandler.java:658,
and the later validation in runtime/service/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalog.java:1541.

integration.prepareLocations(resolveStorageLocations(effectiveLocation, props));
}

private @Nullable String resolveTableLocation(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing feedback from artificial helpers:

High: resolveTableLocation() guesses the default table path from catalog.getBaseLocation(), but actual table creation does not use that rule. For implicit
locations, Iceberg resolves through namespace-aware logic in defaultWarehouseLocation(...), including namespace-specific locations; staged create also
derives the final location from the catalog builder. As written, HNS folder creation will target the wrong path whenever the namespace has its own
location or other catalog-side location logic applies, so the new feature breaks exactly for non-default namespace layouts. See runtime/service/src/main/
java/org/apache/polaris/service/catalog/iceberg/IcebergCatalogHandler.java:274, runtime/service/src/main/java/org/apache/polaris/service/catalog/iceberg/
IcebergCatalogHandler.java:576, and the real location resolution in runtime/service/src/main/java/org/apache/polaris/service/catalog/iceberg/
IcebergCatalog.java:366.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants