Skip to content

feat: support deferred index creation with WITH (train=false)#558

Open
ivscheianu wants to merge 1 commit into
lance-format:mainfrom
ivscheianu:support-deferred-index-creation
Open

feat: support deferred index creation with WITH (train=false)#558
ivscheianu wants to merge 1 commit into
lance-format:mainfrom
ivscheianu:support-deferred-index-creation

Conversation

@ivscheianu
Copy link
Copy Markdown
Contributor

@ivscheianu ivscheianu commented May 25, 2026

Problem

ALTER TABLE ... CREATE INDEX always builds the index eagerly over all existing data before returning. On large datasets this is a long-running, blocking operation. There was no way to register an index definition without immediately training it, making it impossible to separate the "declare the index" step from the "populate it" step.

Solution

Add a train option to the WITH clause:

ALTER TABLE my_table CREATE INDEX idx_text_fts USING fts (text)
WITH (train=false)

When train=false, no Spark tasks are launched and no data is processed. An empty index is committed directly on the driver with an empty fragment bitmap — all existing rows appear as unindexed. A subsequent OPTIMIZE call covers them incrementally at a time of the caller's choosing.

The default (train=true) is unchanged, so all existing behaviour is preserved.

As a side fix, Spark-level execution options (train, build_mode, rows_per_range) are now filtered before being forwarded to the Lance index backend as index parameters.

Adds a train option to ALTER TABLE CREATE INDEX that skips data processing and commits an
empty index, deferring population to a subsequent OPTIMIZE call.
@github-actions github-actions Bot added the enhancement New feature or request label May 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant