feat(gcp): Add Hierarchical Namespace (HNS) support for GCS bucket #3996
feat(gcp): Add Hierarchical Namespace (HNS) support for GCS bucket #3996varunarya002 wants to merge 1 commit into
Conversation
0bde95d to
119b101
Compare
dimas-b
left a comment
There was a problem hiding this comment.
Thanks for your contribution, @varunarya002 !
Since this PR affects REST API parameters, please open a corresponding discussion on the dev ML, which is a usual practice in Polaris.
(this is not a complete review from my side 😅 ... just noting a couple of points for a start)
| gcsServiceAccount: | ||
| type: string | ||
| description: a Google cloud storage service account | ||
| hierarchicalNamespace: |
There was a problem hiding this comment.
nit: a similar field in Azure is called simply hierarchical
There was a problem hiding this comment.
The hierarchicalNamespace field has been removed from the OpenAPI spec entirely. HNS is now auto-detected per bucket at credential-vending time, so no user-facing flag is needed (which also means a single catalog can serve a mix of HNS and non-HNS buckets correctly). If we add the flag back in a follow-up for any reason, we'll use hierarchical to match the Azure naming.
|
@varunarya002 : Do you have capacity to push this PR forward? If you do not have time, it's fine. We can build something based of your work and still give you attribution. |
|
Hi @dimas-b. I will be starting discussion about this change on dev ML. |
918e6d4 to
f43e793
Compare
| JsonNode parsedRules = mapper.convertValue(credentialAccessBoundary, JsonNode.class); | ||
| JsonNode refRules = readResource(mapper, "gcp-testGenerateAccessBoundaryHnsEnabled.json"); | ||
| assertThat(parsedRules) | ||
| .usingRecursiveComparison( |
There was a problem hiding this comment.
nit: does simple equals not work in this case?
There was a problem hiding this comment.
The recursive-comparison pattern matches the existing convention in this file — see testGenerateAccessBoundary (line 202) and its peers, which all use usingRecursiveComparison + the recursiveEquals helper. I kept the new HNS tests on the same pattern for consistency.
That said, the new HNS tests also have stricter assertions where it matters — e.g. testGenerateAccessBoundaryFolderRuleUsesNarrowPermissions explicitly asserts containsExactlyInAnyOrder("storage.folders.create", "storage.folders.get") and that the expression doesn't reference managedFolders/. The JSON fixtures cover the structural rule emission shape.
|
|
||
| public StorageLocationPreparer create(@Nonnull PolarisStorageConfigurationInfo storageConfig) { | ||
| if (storageConfig instanceof GcpStorageConfigurationInfo && storageConfiguration != null) { | ||
| return new GcsStorageLocationPreparer(storageConfiguration.gcpCredentialsSupplier(clock)); |
There was a problem hiding this comment.
Could we use StorageAccessConfigProvider for storage credentials?
There was a problem hiding this comment.
It's not feasible because of different trust levels (admin creds for HNS folder creation vs subscoped client tokens).
There was a problem hiding this comment.
Hmmm... sorry, but I'm a bit confused now... Clients (e.g. Spark) should be able to create folders (for new data files with Iceberg partitioning) using the vended credentials, right?
There was a problem hiding this comment.
Yes, exactly — Spark does create folders with the vended credentials, and the new design preserves that. Two distinct folder-creation responsibilities, with different credential trust levels:
1. Server-side, at table creation time — using the catalog's service-account credentials (admin-level), GcsStorageLocationPreparer pre-creates the table's known top-level folders (table/, table/metadata/, table/data/). This happens in IcebergCatalogHandler.createTableDirect/createTableStaged before Iceberg writes the initial table metadata.
2. Client-side, at write time — Spark needs to create deeper folders that the server can't enumerate ahead of time (partition subdirectories like data/year=2024/month=01/). For this, the vended downscoped credentials now include storage.folders.create + storage.folders.get (narrowly scoped to the catalog's write locations) when the target bucket has HNS enabled. That's the change in GcpCredentialsStorageIntegration.generateAccessBoundaryRules().
Earlier I conflated these two paths in my StorageAccessConfigProvider reply — apologies for the confusion. The vended credentials do let Spark create folders; they just don't include the unrelated admin operations that roles/storage.folderAdmin would have granted (e.g. setIamPolicy).
| () -> | ||
| Optional.ofNullable( | ||
| tableProperties.get( | ||
| IcebergTableLikeEntity.USER_SPECIFIED_WRITE_METADATA_LOCATION_KEY))) |
There was a problem hiding this comment.
This does not look specific to GCS. Could we handle this part in a more general way?
There was a problem hiding this comment.
Extracted AbstractStorageLocationPreparer base class with generic URI parsing, hierarchy building, bucket grouping.
| import java.util.Map; | ||
|
|
||
| public interface StorageLocationPreparer { | ||
| void prepareTableLocation(String tableLocation, Map<String, String> tableProperties); |
There was a problem hiding this comment.
I wonder if we could simplify this SPI 🤔 I suppose we could simply request a certain set of folders to be created. Do you envision more sophisticated work to be done (in some future cases) for preparing table locations?
There was a problem hiding this comment.
Changed to void prepareLocations(List) — no Iceberg concepts leak into the SPI.
| } | ||
|
|
||
| public StorageLocationPreparer create(@Nonnull PolarisStorageConfigurationInfo storageConfig) { | ||
| if (storageConfig instanceof GcpStorageConfigurationInfo && storageConfiguration != null) { |
There was a problem hiding this comment.
Let's leverage CDI via storage type-based @Identifier annotations... similar to ServiceProducers.polarisAuthorizerFactory()
There was a problem hiding this comment.
Factory uses Instance lookup by StorageType.name(). GCS preparer annotated with @Identifier("GCS").
1ccdfce to
561f90d
Compare
|
Related |
| CredentialAccessBoundary.AccessBoundaryRule.AvailabilityCondition.newBuilder() | ||
| .setExpression(String.join(" || ", folderConditions)) | ||
| .build()); | ||
| builder.setAvailablePermissions(List.of("inRole:roles/storage.folderAdmin")); |
There was a problem hiding this comment.
We add this unconditionally in this code, but it is required only for HNS, right?
I suppose it would be more robust to check resolveHnsStatus() here or have an HNS flag in the Storage Config.... I'm kind of leaning toward the flag, even though we can auto-detect this... WDYT?
My thinking is that users are generally aware of the the HNS flag in their GCS storage (or al least they should be) and it is not likely to change in runtime.
There was a problem hiding this comment.
I agree with @dimas-b 's comment - is it a concern that we are introducing this permission for any GCS backed tables? That IMO goes against the principle of granting least-privileged access, so I think it would be helpful to review if that exposes any unnecessary actions for non HNS enabled tables.
There was a problem hiding this comment.
Addressed in the latest force-push. Two changes here:
-
The folder rule is now gated on per-bucket HNS auto-detection —
generateAccessBoundaryRules()takes aPredicate<String> isHnsBucketand only emits the folder rule for buckets where it returnstrue.GcpCredentialsStorageIntegration.isHnsBucket()queriesStorage.get(bucket, HIERARCHICAL_NAMESPACE)per bucket. We went with auto-detection rather than a config flag so a single catalog can serve a mix of HNS and non-HNS buckets correctly (a catalog'sdefault-base-locationis often one bucket, butwrite.data.path/write.metadata.pathcan point elsewhere — the auto-detect handles that per-bucket without admins having to enumerate which buckets are HNS). -
Permissions narrowed — replaced
inRole:roles/storage.folderAdminwith the two individual permissions we actually need:storage.folders.createandstorage.folders.get. The role would have additionally grantedsetIamPolicy/getIamPolicy/delete/rename/liston bothfolders/andmanagedFolders/resources, none of which clients need for write paths.
Also dropped the managedFolders/ resource condition — we use the Storage Control API's folders/ resource path, not managed folders.
There was a problem hiding this comment.
Addressed. The folder rule now uses individual permissions (storage.folders.create + storage.folders.get) instead of inRole:roles/storage.folderAdmin, and is only emitted for buckets where HNS is detected via an injected Predicate<String> isHnsBucket. Non-HNS GCS-backed tables get no extra permissions, restoring least-privilege.
The managedFolders/ resource condition was also dropped since we use HNS folders (folders/), not managed folders.
sungwy
left a comment
There was a problem hiding this comment.
Hi @varunarya002 thank you so much for putting together this PR. The new StorageLocationPreparer abstraction is a highly reusable idea, and I think it is directionally correct. I took a first pass at reviewing this PR and left some comments.
Also, could we add a bit more context in the PR description about this model? We are effectively introducing a pre-createTable hook into the table creation flow, so it might be worth calling that out explicitly.
| * resolve folder hierarchies, group by bucket, and delegate to {@link | ||
| * #createFoldersForBucket(String, List)} for storage-specific operations. | ||
| */ | ||
| public abstract class AbstractStorageLocationPreparer implements StorageLocationPreparer { |
There was a problem hiding this comment.
I feel this name is a bit too abstract (pun intended 🙂). Would it make sense to give it a more descriptive name like HierarchicalFolderLocationPreparer or ObjectStoreFolderPreparer?
I also wonder whether introducing this abstraction is a bit premature. Do we expect these methods to be reusable across other cloud providers, or is this really tailored to the GCS HNS case for now?
If it is mainly the latter, would it make sense to fold this into GcsStorageLocationPreparer for now and only extract a shared abstraction once we have a second provider implementation and can validate the actual common ground?
There was a problem hiding this comment.
Renamed to HierarchicalFolderLocationPreparer — using your suggestion verbatim. We kept the abstract base (rather than folding it into GcsStorageLocationPreparer) because Azure ADLS Gen2 is the obvious next reuse case (also has hierarchical-namespace folder semantics), and the generic URI parsing + path-hierarchy logic is genuinely cloud-agnostic.
| return NO_OP; | ||
| } | ||
| String key = storageConfig.getStorageType().name(); | ||
| Instance<StorageLocationPreparer> selected = preparers.select(Identifier.Literal.of(key)); |
There was a problem hiding this comment.
I like the StorageLocationPreparer as an abstraction, but instead of introducing it as a separate standalone selection path, would it make more sense for it to be exposed by PolarisStorageIntegration instead?
PolarisStorageIntegration is already selected from the storage configuration type, so grouping storage-specific behavior there seems cleaner and makes the config-driven selection more obvious and consistent.
I do think it still makes sense to model StorageLocationPreparer as a separate capability. I’m just wondering whether PolarisStorageIntegration should be single the interface that exposes that capability, rather than introducing a parallel factory and dispatch path for storage-specific behavior.
There was a problem hiding this comment.
Done. prepareLocations(List<String>) is now a method on PolarisStorageIntegration with a no-op default; non-GCS storage types pay nothing.
GcpCredentialsStorageIntegration takes a Consumer<List<String>> folderPreparer constructor parameter and delegates prepareLocations to it. PolarisStorageIntegrationProviderImpl (which already switches on storage type) injects GcsStorageLocationPreparer when constructing the GCP integration.
This deletes StorageLocationPreparerFactory and the @Identifier("GCS") CDI lookup entirely — single selection mechanism (the provider's type switch), and the module boundary is respected: polaris-core only declares the interface signature, the runtime-side preparer with StorageControlClient stays in runtime/service. IcebergCatalogHandler calls integration.prepareLocations(locations) directly.
| CredentialAccessBoundary.AccessBoundaryRule.AvailabilityCondition.newBuilder() | ||
| .setExpression(String.join(" || ", folderConditions)) | ||
| .build()); | ||
| builder.setAvailablePermissions(List.of("inRole:roles/storage.folderAdmin")); |
There was a problem hiding this comment.
I agree with @dimas-b 's comment - is it a concern that we are introducing this permission for any GCS backed tables? That IMO goes against the principle of granting least-privileged access, so I think it would be helpful to review if that exposes any unnecessary actions for non HNS enabled tables.
|
Hi @varunarya002 : Thanks again for this PR. Do you have capacity to resolve conflicts and comments? |
Iceberg table creation and Spark ingestion against HNS-enabled GCS buckets
previously failed: HNS buckets treat folders as first-class resources that
must exist before nested files can be written, and the vended credentials
lacked the IAM permissions to create those folders.
Two-part design:
1. Server-side pre-createTable hook
- New `PolarisStorageIntegration.prepareLocations(List<String>)` method
with a no-op default. Non-GCS storage types pay nothing.
- `IcebergCatalogHandler.createTableDirect()` and `createTableStaged()`
invoke it with `[tableLocation, metadata, data]` before Iceberg writes
the initial table metadata.
- GCS impl (`GcsStorageLocationPreparer`) auto-detects HNS per bucket
and walks the path hierarchy via `StorageControlClient.createFolder()`.
Idempotent against `AlreadyExistsException`.
- Generic URI parsing and hierarchy-building live in
`HierarchicalFolderLocationPreparer` so Azure ADLS Gen2 can reuse it.
2. Vended-credential scope on HNS buckets only
- `GcpCredentialsStorageIntegration.generateAccessBoundaryRules()` now
accepts a `Predicate<String> isHnsBucket`. Auto-detection at
credential-vending time via `Storage.get(bucket).getHierarchicalNamespace()`.
- For HNS write buckets only, emits a narrowly-scoped folder rule with
permissions `storage.folders.create` + `storage.folders.get` (vs. the
overly-broad `roles/storage.folderAdmin` role), conditioned on
`resource.name.startsWith('.../folders/<path>')`.
- Non-HNS buckets receive no additional permissions (least-privilege).
- `managedFolders/` condition dropped — we don't use managed folders.
Co-Authored-By: claude-flow <ruv@ruv.net>
561f90d to
895d8ce
Compare
|
Thanks @dimas-b and @sungwy for the thorough review. Just pushed a v2 that addresses every open thread. Summary of changes since the last round: Design changes
PR description rewrite: now explicitly explains the pre-createTable hook model and the server-side / client-side responsibility split, per @sungwy's ask. Rebase: branch is rebased onto current Tests added for predicate gating, narrow-permission shape, multi-bucket scenarios, and the mixed HNS/non-HNS-in-one-catalog case. All build green locally on Java 21 ( Ready for another look when you have time. |
dimas-b
left a comment
There was a problem hiding this comment.
Thanks for pushing this PR forward, @varunarya002 ! Some more comments below.
| * subsequent writes fail with a clear 403, rather than silently over-permissioning. | ||
| */ | ||
| @VisibleForTesting | ||
| boolean isHnsBucket(String bucket) { |
There was a problem hiding this comment.
I guess it might be worth covering this with a FeatureConfiguration flag (enabled by default) in case some prior deployments do not wish to execute the extra call for some internal reason... WDYT?
There was a problem hiding this comment.
Alternatively we could add a GCS storage config property hierarchical with values yes, no, auto (default)... WDYT? This could be done in a separate PR, of course.
| return; | ||
| } | ||
| PolarisStorageIntegration<?> integration = | ||
| storageIntegrationProvider().getStorageIntegrationForConfig(storageConfig); |
There was a problem hiding this comment.
This looks like a conflict WRT #3699 , which removes this method 🤔
I tend to think that #3699 is beneficial and would like that refactoring to land.
@varunarya002 : Will @tokoko 's approach work for you?
| * resolve folder hierarchies, group by bucket, and delegate to {@link | ||
| * #createFoldersForBucket(String, List)}. | ||
| */ | ||
| public abstract class HierarchicalFolderLocationPreparer implements Consumer<List<String>> { |
There was a problem hiding this comment.
Let's define a dedicated interface for Consumer<List<String>> for ease of navigation in IDEs.
| controlClient.createFolder(request); | ||
| LOGGER.atDebug().addKeyValue("folder", folderPath).log("Created HNS folder"); | ||
| } catch (AlreadyExistsException e) { | ||
| LOGGER.atDebug().addKeyValue("folder", folderPath).log("HNS folder already exists, skipping"); |
There was a problem hiding this comment.
Is folder actually logged even though the message does not reference it? (just to double check).
| * Hierarchical Namespace enabled) override this. Passed locations include the table location and | ||
| * its metadata/data subpaths. | ||
| */ | ||
| public void prepareLocations(@Nonnull List<String> locations) {} |
There was a problem hiding this comment.
Passing a note from artificial helpers:
High: the new pre-create hook runs before Polaris has validated the requested table locations, so a rejected create request can still mutate GCS. In both
direct and staged flows, prepareStorageForTable(...) is invoked before any of the location checks that happen later in IcebergCatalog
(validateLocationsForTableLike, overlap checks, metadata-in-table-dir checks). That means a caller can point location, write.metadata.location, or
write.data.location at a path Polaris should reject, and Polaris will still try to create HNS folders there with the service account first. See runtime/
service/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalogHandler.java:252, runtime/service/src/main/java/org/apache/polaris/service/
catalog/iceberg/IcebergCatalogHandler.java:535, runtime/service/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalogHandler.java:658,
and the later validation in runtime/service/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalog.java:1541.
| integration.prepareLocations(resolveStorageLocations(effectiveLocation, props)); | ||
| } | ||
|
|
||
| private @Nullable String resolveTableLocation( |
There was a problem hiding this comment.
Passing feedback from artificial helpers:
High: resolveTableLocation() guesses the default table path from catalog.getBaseLocation(), but actual table creation does not use that rule. For implicit
locations, Iceberg resolves through namespace-aware logic in defaultWarehouseLocation(...), including namespace-specific locations; staged create also
derives the final location from the catalog builder. As written, HNS folder creation will target the wrong path whenever the namespace has its own
location or other catalog-side location logic applies, so the new feature breaks exactly for non-default namespace layouts. See runtime/service/src/main/
java/org/apache/polaris/service/catalog/iceberg/IcebergCatalogHandler.java:274, runtime/service/src/main/java/org/apache/polaris/service/catalog/iceberg/
IcebergCatalogHandler.java:576, and the real location resolution in runtime/service/src/main/java/org/apache/polaris/service/catalog/iceberg/
IcebergCatalog.java:366.
Problem
GCS buckets with Hierarchical Namespace (HNS) enabled treat folders as first-class resources, not name prefixes. Iceberg table creation and Spark ingestion fail on HNS buckets because:
storage.folders.create— not granted by the standard write roles in our vended credentials.Approach
Two complementary changes, with a clear split between server-side and client-side responsibility:
1. Server-side: pre-createTable hook to materialize the folder hierarchy
A new SPI,
prepareLocations(List<String>), is exposed onPolarisStorageIntegrationwith a no-op default. GCS overrides it.IcebergCatalogHandler.createTableDirect()andcreateTableStaged()now invokeintegration.prepareLocations([tableLocation, metadataLocation, dataLocation])before delegating to Iceberg'sCatalog.createTable().The GCS implementation (
GcsStorageLocationPreparer):Storage.get(bucket, HIERARCHICAL_NAMESPACE)to detect HNS per bucket (auto-detection — HNS is bucket-immutable, so it's safe to call once per request; caching is a follow-up).warehouse→warehouse/ns1→warehouse/ns1/table1) and callsStorageControlClient.createFolder()for each segment, idempotent againstAlreadyExistsException.Generic URI parsing and hierarchy-building lives in
HierarchicalFolderLocationPreparer(base class) so Azure ADLS Gen2 can reuse it later.2. Client-side: vended-credential scope adds folders.create + folders.get for HNS buckets
GcpCredentialsStorageIntegration.generateAccessBoundaryRules()now takes aPredicate<String> isHnsBucket. For each write bucket where the predicate is true, it adds a third access-boundary rule containing exactly two permissions:storage.folders.createandstorage.folders.get, scoped viaresource.name.startsWith('projects/_/buckets/<bucket>/folders/<path>').This allows Spark to create deeper partition folders (e.g. for
year=2024/month=01/) at write time without needing admin credentials, while non-HNS catalogs are unaffected (no additional permissions vended).The narrow permission set (vs. the predefined
roles/storage.folderAdminrole, which includessetIamPolicy/getIamPolicy/delete/rename/list) preserves least-privilege.Tests
folders.create+folders.getpermissions (verified explicitly)GcsStorageLocationPreparerHNS detection and folder creation (full suite preserved from previous round)Checklist