feat: add distributed zonemap index build#473
Closed
beinan wants to merge 7 commits into
Closed
Conversation
Simplify useLogicalSegmentCommit to only return true for ZONEMAP. BTREE fragment-mode segment commit is not yet supported by the lance-jni layer (missing page_lookup.lance files), so remove the BTREE path and clean up related test assertions and docs. Co-Authored-By: Beinan Wang <beinanwang@microsoft.com>
Allow users to control the number of index segments created during distributed zonemap builds via the num_segments DDL option. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6392ab6 to
b18d409
Compare
5 tasks
Contributor
Author
|
Superseded by #516 which is rebased cleanly onto current main. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds a distributed zonemap index build path for in .
It keeps the SQL surface unchanged, but changes the execution strategy from the driver-only/single-process flow to Spark-side fragment processing and logical segment commit.
This is intended as the distributed alternative to #466, which stays open as the simpler single-process solution.
What Changed
Validation
Tested with Spark 4.0 / Scala 2.13:
[INFO] Scanning for projects...
[WARNING]
[WARNING] Some problems were encountered while building the effective model for org.lance:lance-spark-bundle-3.5_2.12:jar:0.4.0-beta.3
[WARNING] 'build.plugins.plugin.version' for org.apache.maven.plugins:maven-shade-plugin is missing. @ line 73, column 21
[WARNING]
[WARNING] Some problems were encountered while building the effective model for org.lance:lance-spark-bundle-3.4_2.12:jar:0.4.0-beta.3
[WARNING] 'build.plugins.plugin.version' for org.apache.maven.plugins:maven-shade-plugin is missing. @ line 76, column 21
[WARNING]
[WARNING] Some problems were encountered while building the effective model for org.lance:lance-spark-bundle-3.5_2.13:jar:0.4.0-beta.3
[WARNING] 'build.plugins.plugin.version' for org.apache.maven.plugins:maven-shade-plugin is missing. @ line 77, column 21
[WARNING]
[WARNING] Some problems were encountered while building the effective model for org.lance:lance-spark-bundle-3.4_2.13:jar:0.4.0-beta.3
[WARNING] 'build.plugins.plugin.version' for org.apache.maven.plugins:maven-shade-plugin is missing. @ line 78, column 21
[WARNING]
[WARNING] Some problems were encountered while building the effective model for org.lance:lance-spark-bundle-4.0_2.13:jar:0.4.0-beta.3
[WARNING] 'build.plugins.plugin.version' for org.apache.maven.plugins:maven-shade-plugin is missing. @ line 79, column 21
[WARNING]
[WARNING] Some problems were encountered while building the effective model for org.lance:lance-spark-bundle-4.1_2.13:jar:0.4.0-beta.3
[WARNING] 'build.plugins.plugin.version' for org.apache.maven.plugins:maven-shade-plugin is missing. @ line 79, column 21
[WARNING]
[WARNING] It is highly recommended to fix these problems because they threaten the stability of your build.
[WARNING]
[WARNING] For this reason, future Maven versions might no longer support building such malformed projects.
[WARNING]
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Build Order:
[INFO]
[INFO] lance-spark-root [pom]
[INFO] lance-spark-base_2.13 [jar]
[INFO] lance-spark-4.0_2.13 [jar]
[INFO]
[INFO] ---------------------< org.lance:lance-spark-root >---------------------
[INFO] Building lance-spark-root 0.4.0-beta.3 [1/3]
[INFO] from pom.xml
[INFO] --------------------------------[ pom ]---------------------------------
[INFO]
[INFO] --- checkstyle:3.3.1:check (validate) @ lance-spark-root ---
[WARNING] File encoding has not been set, using platform encoding UTF-8, i.e. build is platform dependent!
[INFO] Starting audit...
Audit done.
[INFO] You have 0 Checkstyle violations.
[INFO]
[INFO] --- spotless:2.43.0:apply (spotless-check) @ lance-spark-root ---
[INFO]
[INFO] ------------------< org.lance:lance-spark-base_2.13 >-------------------
[INFO] Building lance-spark-base_2.13 0.4.0-beta.3 [2/3]
[INFO] from lance-spark-base_2.13/pom.xml
[INFO] --------------------------------[ jar ]---------------------------------
[WARNING] 2 problems were encountered while building the effective model for org.apache.yetus:audience-annotations:jar:0.5.0 during dependency collection step for project (use -X to see details)
[INFO]
[INFO] --- checkstyle:3.3.1:check (validate) @ lance-spark-base_2.13 ---
[WARNING] File encoding has not been set, using platform encoding UTF-8, i.e. build is platform dependent!
[INFO] Starting audit...
Audit done.
[INFO] You have 0 Checkstyle violations.
[INFO]
[INFO] --- spotless:2.43.0:apply (spotless-check) @ lance-spark-base_2.13 ---
[INFO]
[INFO] --- build-helper:3.2.0:add-source (add-java-source) @ lance-spark-base_2.13 ---
[INFO] Source directory: /home/beinanwang/o/lance-spark/lance-spark-base_2.12/src/main/java added.
[INFO]
[INFO] --- resources:3.0.2:resources (default-resources) @ lance-spark-base_2.13 ---
[WARNING] Using platform encoding (UTF-8 actually) to copy filtered resources, i.e. build is platform dependent!
[INFO] skip non existing resourceDirectory /home/beinanwang/o/lance-spark/lance-spark-base_2.13/src/main/resources
[INFO]
[INFO] --- scala:4.9.10:add-source (scala-compile-first) @ lance-spark-base_2.13 ---
[INFO]
[INFO] --- scala:4.9.10:compile (scala-compile-first) @ lance-spark-base_2.13 ---
[INFO] Compiler bridge file: /home/beinanwang/.sbt/1.0/zinc/org.scala-sbt/org.scala-sbt-compiler-bridge_2.13-1.12.0-bin_2.13.16__65.0-1.12.0_20260104T231500.jar
[INFO] compile in 2.3 s
[INFO]
[INFO] --- compiler:3.8.1:compile (default-compile) @ lance-spark-base_2.13 ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- compiler:3.8.1:compile (default) @ lance-spark-base_2.13 ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- resources:3.0.2:testResources (default-testResources) @ lance-spark-base_2.13 ---
[WARNING] Using platform encoding (UTF-8 actually) to copy filtered resources, i.e. build is platform dependent!
[INFO] Copying 70 resources
[INFO]
[INFO] --- scala:4.9.10:testCompile (scala-test-compile) @ lance-spark-base_2.13 ---
[INFO] Compiler bridge file: /home/beinanwang/.sbt/1.0/zinc/org.scala-sbt/org.scala-sbt-compiler-bridge_2.13-1.12.0-bin_2.13.16__65.0-1.12.0_20260104T231500.jar
[INFO] compile in 0.1 s
[INFO]
[INFO] --- compiler:3.8.1:testCompile (default-testCompile) @ lance-spark-base_2.13 ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- surefire:3.2.5:test (default-test) @ lance-spark-base_2.13 ---
[INFO]
[INFO] -------------------< org.lance:lance-spark-4.0_2.13 >-------------------
[INFO] Building lance-spark-4.0_2.13 0.4.0-beta.3 [3/3]
[INFO] from lance-spark-4.0_2.13/pom.xml
[INFO] --------------------------------[ jar ]---------------------------------
[INFO]
[INFO] --- checkstyle:3.3.1:check (validate) @ lance-spark-4.0_2.13 ---
[WARNING] File encoding has not been set, using platform encoding UTF-8, i.e. build is platform dependent!
[INFO] Starting audit...
Audit done.
[INFO] You have 0 Checkstyle violations.
[INFO]
[INFO] --- spotless:2.43.0:apply (spotless-check) @ lance-spark-4.0_2.13 ---
[INFO] Spotless.Java is keeping 1 files clean - 0 were changed to be clean, 0 were already clean, 1 were skipped because caching determined they were already clean
[INFO] Spotless.Scala is keeping 3 files clean - 0 were changed to be clean, 0 were already clean, 3 were skipped because caching determined they were already clean
[INFO]
[INFO] --- antlr4:4.13.1:antlr4 (default) @ lance-spark-4.0_2.13 ---
[INFO] No grammars to process
[INFO] ANTLR 4: Processing source directory /home/beinanwang/o/lance-spark/lance-spark-base_2.12/src/main/antlr4
[INFO]
[INFO] --- build-helper:3.2.0:add-source (add-java-source) @ lance-spark-4.0_2.13 ---
[INFO] Source directory: /home/beinanwang/o/lance-spark/lance-spark-base_2.12/src/main/java added.
[INFO] Source directory: /home/beinanwang/o/lance-spark/lance-spark-3.5_2.12/src/main/java added.
[INFO] Source directory: /home/beinanwang/o/lance-spark/lance-spark-4.0_2.13/src/main/java added.
[INFO] Source directory: /home/beinanwang/o/lance-spark/lance-spark-4.0_2.13/src/main/scala added.
[INFO] Source directory: /home/beinanwang/o/lance-spark/lance-spark-4.0_2.13/target/generated-sources/antlr4 added.
[INFO]
[INFO] --- resources:3.0.2:resources (default-resources) @ lance-spark-4.0_2.13 ---
[WARNING] Using platform encoding (UTF-8 actually) to copy filtered resources, i.e. build is platform dependent!
[INFO] Copying 1 resource
[INFO]
[INFO] --- scala:4.9.10:compile (scala-compile-first) @ lance-spark-4.0_2.13 ---
[INFO] Compiler bridge file: /home/beinanwang/.sbt/1.0/zinc/org.scala-sbt/org.scala-sbt-compiler-bridge_2.13-1.12.0-bin_2.13.16__65.0-1.12.0_20260104T231500.jar
[INFO] compile in 0.4 s
[INFO]
[INFO] --- compiler:3.8.1:compile (default-compile) @ lance-spark-4.0_2.13 ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- build-helper:3.2.0:add-test-source (add-java-test-source) @ lance-spark-4.0_2.13 ---
[INFO] Test Source directory: /home/beinanwang/o/lance-spark/lance-spark-base_2.12/src/test/java added.
[INFO] Test Source directory: /home/beinanwang/o/lance-spark/lance-spark-3.5_2.12/src/test/java added.
[INFO] Test Source directory: /home/beinanwang/o/lance-spark/lance-spark-4.0_2.13/src/test/java added.
[INFO]
[INFO] --- resources:3.0.2:testResources (default-testResources) @ lance-spark-4.0_2.13 ---
[WARNING] Using platform encoding (UTF-8 actually) to copy filtered resources, i.e. build is platform dependent!
[INFO] Copying 70 resources
[INFO]
[INFO] --- scala:4.9.10:testCompile (scala-test-compile) @ lance-spark-4.0_2.13 ---
[INFO] Compiler bridge file: /home/beinanwang/.sbt/1.0/zinc/org.scala-sbt/org.scala-sbt-compiler-bridge_2.13-1.12.0-bin_2.13.16__65.0-1.12.0_20260104T231500.jar
[INFO] compile in 0.1 s
[INFO]
[INFO] --- compiler:3.8.1:testCompile (default-testCompile) @ lance-spark-4.0_2.13 ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- surefire:3.2.5:test (default-test) @ lance-spark-4.0_2.13 ---
[INFO] Using auto detected provider org.apache.maven.surefire.junitplatform.JUnitPlatformProvider
[INFO]
[INFO] -------------------------------------------------------
[INFO] T E S T S
[INFO] -------------------------------------------------------
[INFO] Running org.lance.spark.update.CreateIndexStandardSyntaxTest
[INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 9.235 s -- in org.lance.spark.update.CreateIndexStandardSyntaxTest
[INFO]
[INFO] Results:
[INFO]
[INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for lance-spark-root 0.4.0-beta.3:
[INFO]
[INFO] lance-spark-root ................................... SUCCESS [ 1.645 s]
[INFO] lance-spark-base_2.13 .............................. SUCCESS [ 4.527 s]
[INFO] lance-spark-4.0_2.13 ............................... SUCCESS [ 12.492 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 18.830 s
[INFO] Finished at: 2026-04-23T01:00:54Z
[INFO] ------------------------------------------------------------------------
Notes