feat: add catalog namespace support and refactor adapter implementation#14
feat: add catalog namespace support and refactor adapter implementation#14kaori-seasons wants to merge 3 commits into
Conversation
0b4ded7 to
dca8979
Compare
This commit introduces a new namespace abstraction layer for Lance catalog integration with Flink. Key components added: - AbstractLanceNamespaceAdapter: Interface defining namespace operations - LanceNamespaceAdapter: Implementation with direct Lance Namespace SDK API calls - LanceNamespaceConfig: Type-safe configuration management with ImplType enum - BaseLanceNamespaceCatalog: Base catalog implementation for Flink integration - LanceNamespaceAdapterITCase: Comprehensive integration tests (23 test cases) - MockLanceNamespace: Mock implementation for standalone testing Features: - Direct API calls to Lance Namespace SDK (no reflection) - Support for both directory and REST namespace implementations - Complete CRUD operations for namespaces and tables - Full metadata management - Production-ready error handling and resource management
dca8979 to
a97baa2
Compare
| public static final String KEY_IMPL = "impl"; | ||
| public static final String KEY_ROOT = "root"; | ||
| public static final String KEY_URI = "uri"; | ||
| public static final String KEY_EXTRA_LEVEL = "extra_level"; |
There was a problem hiding this comment.
feels like it is time that we add something like 3LevelLanceNamespace in https://github.com/lance-format/lance-namespace/pulls so that we can just have a consistent experience across the engines instead of implementing exactly the same thing here. Would you be up to that work?
There was a problem hiding this comment.
Thank you for your reply. Due to the large volume of emails, I only just saw your message. Please allow me some time to review the relevant code.
| * This mock allows tests to run without requiring the actual Lance Namespace | ||
| * library to be available. It simulates the basic behavior of the real API. | ||
| */ | ||
| public class MockLanceNamespace { |
There was a problem hiding this comment.
we should always default to test with DirectoryNamespace and RestNamespace with a DirectoryNamespace backend, since those 2 come out of box with lance-core and DirectoryNamespace is storage only.
| * - Error handling: duplicate creation, non-existing resources and other exception scenarios | ||
| */ | ||
| @DisplayName("Lance Namespace Adapter Integration Test") | ||
| class LanceNamespaceAdapterITCase { |
There was a problem hiding this comment.
this is just calling the adapter. It's not really exercising interaction with Flink through operations like CREATE CATALOG, SHOW DATABASES, SHOW TABLES
- Add BaseLanceNamespaceCatalog with full Flink Catalog API support - Add LanceCatalogFactory for creating namespace adapters - Refactor LanceNamespaceAdapter with improved CRUD operations - Update LanceNamespaceConfig with extra_level and parent support
…e backend - Remove MockLanceNamespace as lance-core natively supports DirectoryNamespace and RestNamespace - Update LanceNamespaceAdapterITCase to use real DirectoryNamespace backend for testing - Add nested RestNamespaceTests enabled via LANCE_REST_URI environment variable - DirectoryNamespace is used for storage tests, RestNamespace for API tests when available
| /** | ||
| * Adapter for Lance Namespace API. | ||
| * | ||
| * Provides unified interface for interacting with Lance Namespace, |
There was a problem hiding this comment.
Lance Namespace is supposed to be the unified interface to implement Flink AbstractCatalog, I don't think you need to wrap it in yet another layer of adapter
fightBoxing
left a comment
There was a problem hiding this comment.
Hi @kaori-seasons, thanks for the substantial work on namespace catalog integration — the overall direction (wrapping the Lance Namespace SDK and exposing a Flink Catalog) is exactly right. I'm requesting changes on a few design and integration items before we can merge.
Blocking
-
Disconnected abstraction —
AbstractLanceNamespaceAdapteris defined but not actually implemented; twoTableMetadataclasses exist; the abstract type has zero usages. See inline onLanceNamespaceAdapter.java. Recommendation: delete the abstract layer (YAGNI) or wire it up properly. -
Class name collision —
catalog/namespace/LanceCatalogFactoryclashes with the existing Flink SPItable/LanceCatalogFactory. The new factory is also a thin pass-through that drops user properties. See inline oncatalog/namespace/LanceCatalogFactory.java. Recommendation: delete it, or rename toLanceNamespaceAdapterFactory. -
No SPI wiring — this catalog can't be used from
CREATE CATALOGSQL yet. I'll pick up the SPI wiring as a follow-up once items 1 & 2 are settled; please coordinate with me on the final constructor signature. See inline onBaseLanceNamespaceCatalog.java. -
Exception swallowing in adapter — many methods catch
Exceptionand return empty collections / dummy values. The most dangerous case isgetTableMetadatareturningnew TableMetadata("/path/to/table", ...)on error, which downstream code will treat as a real path. Please:- let
listX/getXMetadatapropagate failures - in
xExists, only catch the SDK's NotFound exception, not bareException(network/auth errors should not silently mean "not exists")
- let
-
getTablereturn type narrowing — signature returnsCatalogTablebut the interface contract isCatalogBaseTable. This may not compile under some Flink minor versions and is fragile across upgrades — please widen.
Important
- Allocator ownership —
close()always closes the allocator even when the caller injected an external one. Track ownership with a boolean flag. - Mutual exclusion of
parentandextra_levelshould be validated inLanceNamespaceConfigrather than silently ignored. - Rebase needed — your branch is based on a commit older than
f58b1f0. The diff currently appears to delete.github/dependabot.yml,.github/labeler.yml, and.github/workflows/pr-title.yml, which were added onmainafter your base. A rebase will fix this.
Nice to have
- Rename
LanceNamespaceAdapterITCase→LanceNamespaceAdapterTest. It uses@TempDirand runs against a real local backend, so it's a unit test, not an integration test (which by convention requiresmaven-failsafe-plugin, not currently configured inpom.xml). - Replace silent log-only stubs in
alterTable/renameTable/alterDatabasewithUnsupportedOperationExceptionso callers don't think they succeeded.
Division of work
To unblock you, here's the proposed split:
- You (kaori-seasons): items 1, 2, 4, 6, 7, 8 — these are structural and affect the design.
- Me (@fightBoxing): item 3 (SPI wiring) and item 9 (test rename) as a follow-up commit/PR, after your structural changes land.
- Items 5, 10 — whoever's faster, but they belong with the structural pass.
Happy to pair on any of this. Thanks again for the contribution!
| * Provides unified interface for interacting with Lance Namespace, | ||
| * supporting both directory-based and REST-based implementations. | ||
| */ | ||
| public class LanceNamespaceAdapter implements AutoCloseable { |
There was a problem hiding this comment.
🔴 Blocking: Interface/implementation are disconnected
AbstractLanceNamespaceAdapter (in this same package) defines 12 methods plus its own TableMetadata inner class, but this concrete class only implements AutoCloseable — it does not implement the abstract interface. As a result:
AbstractLanceNamespaceAdapterhas zero usages in this PR — it's dead code.- There are two
TableMetadataclasses (one nested in the interface, one nested here) with identical fields but incompatible types — they can't be passed across the boundary. BaseLanceNamespaceCatalogholds a concreteLanceNamespaceAdapterreference, completely bypassing the abstract layer.
Please pick one:
- Option A (recommended, YAGNI): delete
AbstractLanceNamespaceAdapter.javaentirely — there's no second backend yet, so the abstraction has no users. - Option B (keep abstraction): make this class
implements AbstractLanceNamespaceAdapter, remove the duplicateTableMetadata, and haveBaseLanceNamespaceCatalogdepend on the interface.
Also note that LanceNamespaceAdapter.create(properties) always new RootAllocator(), while the two-arg constructor accepts an external allocator — but close() unconditionally closes it. This will close caller-owned allocators by mistake. Please track ownership with a flag.
| * LanceNamespaceAdapter adapter = factory.createAdapter(config); | ||
| * </pre> | ||
| */ | ||
| public class LanceCatalogFactory { |
There was a problem hiding this comment.
🔴 Blocking: Name collision with the existing Flink SPI factory
The repository already has org.apache.flink.connector.lance.table.LanceCatalogFactory, which is the Flink CatalogFactory SPI implementation (registered in META-INF/services/org.apache.flink.table.factories.Factory). After this PR, two classes named LanceCatalogFactory coexist with completely different semantics — IDE auto-import will be ambiguous and readers will conflate them.
On top of that, this class is mostly a thin pass-through:
public LanceNamespaceAdapter createAdapter(LanceNamespaceConfig config) {
Map<String, String> properties = new HashMap<>();
properties.put(KEY_IMPL, config.getImpl());
config.getRoot().ifPresent(...);
// ... rebuilds properties that LanceNamespaceConfig already holds
return LanceNamespaceAdapter.create(properties);
}LanceNamespaceConfig already holds the full properties map internally, so this rebuild is redundant and drops any custom properties not in the hardcoded list (e.g. parent_delimiter, user extensions). The volatile sharedAllocator is also never reassigned — it should be final, not volatile.
Suggested fix: delete this class entirely and let callers use LanceNamespaceAdapter.create(properties) directly. If you want to keep a factory for ergonomic reasons, please rename it to LanceNamespaceAdapterFactory to remove the collision.
| /** | ||
| * Base Lance Catalog implementation integrated with Lance Namespace. | ||
| */ | ||
| public abstract class BaseLanceNamespaceCatalog extends AbstractCatalog { |
There was a problem hiding this comment.
🔴 Blocking: No SPI wiring — this catalog can't be used from Flink SQL
This PR introduces BaseLanceNamespaceCatalog but doesn't:
- provide a concrete subclass that implements
createCatalogTable(...) - extend the existing
connector.lance.table.LanceCatalogFactoryto construct it - add a new
CatalogFactorySPI +META-INF/servicesregistration
That means CREATE CATALOG xx WITH ('type'='lance-namespace', ...) in Flink SQL won't work at all — the catalog can only be instantiated by hand-written Java. Given the PR title is "add catalog namespace support", this is a significant gap.
I (@fightBoxing) will pick up the SPI wiring as a follow-up once the structural items in comments #1 and #2 are settled, so you don't need to do this part — but please coordinate with me on the final shape of the catalog constructor signature so I can wire it in cleanly.
A few smaller issues on the class itself:
getTablereturn type narrowing: signature returnsCatalogTablebut the interface contract isCatalogBaseTable. This may not compile under some Flink minor versions and is fragile across upgrades — please widen.parentPrefixvsextraLevel: silently mutually exclusive withparentPrefixwinning. Please validate inLanceNamespaceConfigthat both aren't set simultaneously, otherwise misconfigurations are invisible.alterTable/renameTable/alterDatabase: silently log-only with no exception — callers think it succeeded. Please throwUnsupportedOperationException(or a specific Flink exception).- All "Partition operations are not supported" methods: throwing
CatalogExceptionfrompartitionExistswill break Flink planners that call it speculatively. Consider returningfalseforpartitionExistsand empty list forlistPartitionswhen the table has no partition spec.
Related to issue-2