Skip to content

feat: add catalog namespace support and refactor adapter implementation#14

Open
kaori-seasons wants to merge 3 commits into
lance-format:mainfrom
kaori-seasons:support-catalog
Open

feat: add catalog namespace support and refactor adapter implementation#14
kaori-seasons wants to merge 3 commits into
lance-format:mainfrom
kaori-seasons:support-catalog

Conversation

@kaori-seasons
Copy link
Copy Markdown

Related to issue-2

  • Add namespace abstraction layer with AbstractLanceNamespaceAdapter
  • Implement LanceNamespaceAdapter for direct Lance Namespace SDK API calls
  • Add LanceNamespaceConfig for type-safe configuration management
  • Implement BaseLanceNamespaceCatalog for Flink catalog integration
  • Add comprehensive integration tests (LanceNamespaceAdapterITCase)
  • Add MockLanceNamespace for standalone testing
  • Remove reflection-based calls in favor of direct API invocations
  • Eliminate hardcoded strings with ImplType enum
  • Update all related catalog implementations

This commit introduces a new namespace abstraction layer for Lance catalog integration with Flink.

Key components added:
- AbstractLanceNamespaceAdapter: Interface defining namespace operations
- LanceNamespaceAdapter: Implementation with direct Lance Namespace SDK API calls
- LanceNamespaceConfig: Type-safe configuration management with ImplType enum
- BaseLanceNamespaceCatalog: Base catalog implementation for Flink integration
- LanceNamespaceAdapterITCase: Comprehensive integration tests (23 test cases)
- MockLanceNamespace: Mock implementation for standalone testing

Features:
- Direct API calls to Lance Namespace SDK (no reflection)
- Support for both directory and REST namespace implementations
- Complete CRUD operations for namespaces and tables
- Full metadata management
- Production-ready error handling and resource management
public static final String KEY_IMPL = "impl";
public static final String KEY_ROOT = "root";
public static final String KEY_URI = "uri";
public static final String KEY_EXTRA_LEVEL = "extra_level";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feels like it is time that we add something like 3LevelLanceNamespace in https://github.com/lance-format/lance-namespace/pulls so that we can just have a consistent experience across the engines instead of implementing exactly the same thing here. Would you be up to that work?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your reply. Due to the large volume of emails, I only just saw your message. Please allow me some time to review the relevant code.

* This mock allows tests to run without requiring the actual Lance Namespace
* library to be available. It simulates the basic behavior of the real API.
*/
public class MockLanceNamespace {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should always default to test with DirectoryNamespace and RestNamespace with a DirectoryNamespace backend, since those 2 come out of box with lance-core and DirectoryNamespace is storage only.

* - Error handling: duplicate creation, non-existing resources and other exception scenarios
*/
@DisplayName("Lance Namespace Adapter Integration Test")
class LanceNamespaceAdapterITCase {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just calling the adapter. It's not really exercising interaction with Flink through operations like CREATE CATALOG, SHOW DATABASES, SHOW TABLES

- Add BaseLanceNamespaceCatalog with full Flink Catalog API support
- Add LanceCatalogFactory for creating namespace adapters
- Refactor LanceNamespaceAdapter with improved CRUD operations
- Update LanceNamespaceConfig with extra_level and parent support
…e backend

- Remove MockLanceNamespace as lance-core natively supports DirectoryNamespace and RestNamespace
- Update LanceNamespaceAdapterITCase to use real DirectoryNamespace backend for testing
- Add nested RestNamespaceTests enabled via LANCE_REST_URI environment variable
- DirectoryNamespace is used for storage tests, RestNamespace for API tests when available
/**
* Adapter for Lance Namespace API.
*
* Provides unified interface for interacting with Lance Namespace,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lance Namespace is supposed to be the unified interface to implement Flink AbstractCatalog, I don't think you need to wrap it in yet another layer of adapter

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made some changes.
348cd43

Copy link
Copy Markdown
Collaborator

@fightBoxing fightBoxing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @kaori-seasons, thanks for the substantial work on namespace catalog integration — the overall direction (wrapping the Lance Namespace SDK and exposing a Flink Catalog) is exactly right. I'm requesting changes on a few design and integration items before we can merge.

Blocking

  1. Disconnected abstractionAbstractLanceNamespaceAdapter is defined but not actually implemented; two TableMetadata classes exist; the abstract type has zero usages. See inline on LanceNamespaceAdapter.java. Recommendation: delete the abstract layer (YAGNI) or wire it up properly.

  2. Class name collisioncatalog/namespace/LanceCatalogFactory clashes with the existing Flink SPI table/LanceCatalogFactory. The new factory is also a thin pass-through that drops user properties. See inline on catalog/namespace/LanceCatalogFactory.java. Recommendation: delete it, or rename to LanceNamespaceAdapterFactory.

  3. No SPI wiring — this catalog can't be used from CREATE CATALOG SQL yet. I'll pick up the SPI wiring as a follow-up once items 1 & 2 are settled; please coordinate with me on the final constructor signature. See inline on BaseLanceNamespaceCatalog.java.

  4. Exception swallowing in adapter — many methods catch Exception and return empty collections / dummy values. The most dangerous case is getTableMetadata returning new TableMetadata("/path/to/table", ...) on error, which downstream code will treat as a real path. Please:

    • let listX / getXMetadata propagate failures
    • in xExists, only catch the SDK's NotFound exception, not bare Exception (network/auth errors should not silently mean "not exists")
  5. getTable return type narrowing — signature returns CatalogTable but the interface contract is CatalogBaseTable. This may not compile under some Flink minor versions and is fragile across upgrades — please widen.

Important

  1. Allocator ownershipclose() always closes the allocator even when the caller injected an external one. Track ownership with a boolean flag.
  2. Mutual exclusion of parent and extra_level should be validated in LanceNamespaceConfig rather than silently ignored.
  3. Rebase needed — your branch is based on a commit older than f58b1f0. The diff currently appears to delete .github/dependabot.yml, .github/labeler.yml, and .github/workflows/pr-title.yml, which were added on main after your base. A rebase will fix this.

Nice to have

  1. Rename LanceNamespaceAdapterITCaseLanceNamespaceAdapterTest. It uses @TempDir and runs against a real local backend, so it's a unit test, not an integration test (which by convention requires maven-failsafe-plugin, not currently configured in pom.xml).
  2. Replace silent log-only stubs in alterTable/renameTable/alterDatabase with UnsupportedOperationException so callers don't think they succeeded.

Division of work

To unblock you, here's the proposed split:

  • You (kaori-seasons): items 1, 2, 4, 6, 7, 8 — these are structural and affect the design.
  • Me (@fightBoxing): item 3 (SPI wiring) and item 9 (test rename) as a follow-up commit/PR, after your structural changes land.
  • Items 5, 10 — whoever's faster, but they belong with the structural pass.

Happy to pair on any of this. Thanks again for the contribution!

* Provides unified interface for interacting with Lance Namespace,
* supporting both directory-based and REST-based implementations.
*/
public class LanceNamespaceAdapter implements AutoCloseable {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Blocking: Interface/implementation are disconnected

AbstractLanceNamespaceAdapter (in this same package) defines 12 methods plus its own TableMetadata inner class, but this concrete class only implements AutoCloseable — it does not implement the abstract interface. As a result:

  1. AbstractLanceNamespaceAdapter has zero usages in this PR — it's dead code.
  2. There are two TableMetadata classes (one nested in the interface, one nested here) with identical fields but incompatible types — they can't be passed across the boundary.
  3. BaseLanceNamespaceCatalog holds a concrete LanceNamespaceAdapter reference, completely bypassing the abstract layer.

Please pick one:

  • Option A (recommended, YAGNI): delete AbstractLanceNamespaceAdapter.java entirely — there's no second backend yet, so the abstraction has no users.
  • Option B (keep abstraction): make this class implements AbstractLanceNamespaceAdapter, remove the duplicate TableMetadata, and have BaseLanceNamespaceCatalog depend on the interface.

Also note that LanceNamespaceAdapter.create(properties) always new RootAllocator(), while the two-arg constructor accepts an external allocator — but close() unconditionally closes it. This will close caller-owned allocators by mistake. Please track ownership with a flag.

* LanceNamespaceAdapter adapter = factory.createAdapter(config);
* </pre>
*/
public class LanceCatalogFactory {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Blocking: Name collision with the existing Flink SPI factory

The repository already has org.apache.flink.connector.lance.table.LanceCatalogFactory, which is the Flink CatalogFactory SPI implementation (registered in META-INF/services/org.apache.flink.table.factories.Factory). After this PR, two classes named LanceCatalogFactory coexist with completely different semantics — IDE auto-import will be ambiguous and readers will conflate them.

On top of that, this class is mostly a thin pass-through:

public LanceNamespaceAdapter createAdapter(LanceNamespaceConfig config) {
    Map<String, String> properties = new HashMap<>();
    properties.put(KEY_IMPL, config.getImpl());
    config.getRoot().ifPresent(...);
    // ... rebuilds properties that LanceNamespaceConfig already holds
    return LanceNamespaceAdapter.create(properties);
}

LanceNamespaceConfig already holds the full properties map internally, so this rebuild is redundant and drops any custom properties not in the hardcoded list (e.g. parent_delimiter, user extensions). The volatile sharedAllocator is also never reassigned — it should be final, not volatile.

Suggested fix: delete this class entirely and let callers use LanceNamespaceAdapter.create(properties) directly. If you want to keep a factory for ergonomic reasons, please rename it to LanceNamespaceAdapterFactory to remove the collision.

/**
* Base Lance Catalog implementation integrated with Lance Namespace.
*/
public abstract class BaseLanceNamespaceCatalog extends AbstractCatalog {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Blocking: No SPI wiring — this catalog can't be used from Flink SQL

This PR introduces BaseLanceNamespaceCatalog but doesn't:

  • provide a concrete subclass that implements createCatalogTable(...)
  • extend the existing connector.lance.table.LanceCatalogFactory to construct it
  • add a new CatalogFactory SPI + META-INF/services registration

That means CREATE CATALOG xx WITH ('type'='lance-namespace', ...) in Flink SQL won't work at all — the catalog can only be instantiated by hand-written Java. Given the PR title is "add catalog namespace support", this is a significant gap.

I (@fightBoxing) will pick up the SPI wiring as a follow-up once the structural items in comments #1 and #2 are settled, so you don't need to do this part — but please coordinate with me on the final shape of the catalog constructor signature so I can wire it in cleanly.

A few smaller issues on the class itself:

  • getTable return type narrowing: signature returns CatalogTable but the interface contract is CatalogBaseTable. This may not compile under some Flink minor versions and is fragile across upgrades — please widen.
  • parentPrefix vs extraLevel: silently mutually exclusive with parentPrefix winning. Please validate in LanceNamespaceConfig that both aren't set simultaneously, otherwise misconfigurations are invisible.
  • alterTable / renameTable / alterDatabase: silently log-only with no exception — callers think it succeeded. Please throw UnsupportedOperationException (or a specific Flink exception).
  • All "Partition operations are not supported" methods: throwing CatalogException from partitionExists will break Flink planners that call it speculatively. Consider returning false for partitionExists and empty list for listPartitions when the table has no partition spec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants