Skip to content

Add Spark 4.0.3 shim support#15151

Open
firestarman wants to merge 4 commits into
NVIDIA:mainfrom
firestarman:spark-403-shim-refresh
Open

Add Spark 4.0.3 shim support#15151
firestarman wants to merge 4 commits into
NVIDIA:mainfrom
firestarman:spark-403-shim-refresh

Conversation

@firestarman

@firestarman firestarman commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

Fixes #15065.

Description

  • Add Spark 4.0.3 as a supported Scala 2.13 shim profile so the plugin can build and load against Spark 4.0.3.
  • Add the Spark 4.0.3 shim service provider and generated support metadata so runtime shim discovery and qualification documentation include Spark 4.0.3.
  • Split Spark 4.0.3 from the 4.0.1/4.0.2 SparkShims path so 4.0.3 can carry its Spark-specific behavior without affecting earlier 4.0.x shims.
  • Share ACOSH/ASINH compatibility overrides and boundary tests with Spark 4.1.2 because Spark 4.0.3 has the same CPU behavior for large hyperbolic inputs.
  • Update AST fallback handling for ACOSH/ASINH so Spark 4.0.3 uses the non-AST path where GPU AST semantics would not match Spark CPU results.
  • Update docs/download.md so the published support matrix lists Apache Spark 4.0.3 and the Scala 2.13 Spark 4.0.3 support line.
  • Validated with SPARK_HOME=/bigdata/work/tools/spark-4.0.3-bin-hadoop3 mvn -B -s /home/liangcail/.m2/settings_art.xml -f scala2.13/pom.xml -Dbuildver=403 -Dcuda.version=cuda13 verify: BUILD SUCCESS; integration tests reported 35271 passed, 2058 skipped, 513 xfailed, 895 xpassed; Scala tests reported 1759 succeeded, 0 failed.
  • Validated review updates with SPARK_HOME=/bigdata/work/tools/spark-4.0.3-bin-hadoop3 mvn -B -s /home/liangcail/.m2/settings_art.xml -f scala2.13/pom.xml -Dbuildver=403 -Dcuda.version=cuda13 -Dmaven.scalastyle.skip=true -Drat.skip=true -DskipTests -Dmaven.scaladoc.skip=true -pl tests -am package: BUILD SUCCESS.
  • Local NDS performance results show no overall performance regression observed for Spark 4.0.3.
Item Value
Dataset /bigdata/tpcds_data/parquet_100f
Format Parquet
RAPIDS jar rapids-4-spark_2.13-26.08.0-SNAPSHOT-cuda13-403-perf.jar
Spark 4.0.2 /bigdata/work/tools/spark-4.0.2-bin-hadoop3
Spark 4.0.3 /bigdata/work/tools/spark-4.0.3-bin-hadoop3
Runs 3 per shim
Compared runs Warm runs only: run 2 and run 3
Query status 618 query JSON records checked, 0 non-Completed
Result No overall performance regression observed for Spark 4.0.3
Metric Spark 4.0.2 Spark 4.0.3 4.0.3 vs 4.0.2
Power run 2 450.000s 454.000s +0.89%
Power run 3 464.000s 455.000s -1.94%
Power run avg 457.000s 454.500s -0.55%
Sum of per-query avg times 455.934s 453.577s -0.52%

Checklists

Documentation

  • Updated for new or modified user-facing features or behaviors
  • No user-facing change

Testing

  • Added or modified tests to cover new code paths
  • Covered by existing tests
    (Please provide the names of the existing tests in the PR description.)
  • Not required

Performance

  • Tests ran and results are added in the PR description
  • Issue filed with a link in the PR description
  • Not required

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman firestarman requested a review from a team as a code owner June 26, 2026 07:38
@greptile-apps

greptile-apps Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Too many files changed for review. (207 files found, 100 file limit)

@firestarman firestarman requested a review from a team June 26, 2026 07:42
@firestarman

Copy link
Copy Markdown
Collaborator Author

build

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman

Copy link
Copy Markdown
Collaborator Author

build

@firestarman firestarman requested a review from res-life June 26, 2026 10:17
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman

Copy link
Copy Markdown
Collaborator Author

build

@sameerz sameerz added the feature request New feature or request label Jun 29, 2026
@res-life

Copy link
Copy Markdown
Collaborator

The Spark 4.0.3 shim metadata is missing from seven shared Scala test suites: ConcurrentWriterMetricsSuite, GpuIntervalUtilsTest, IntervalCastSuite, IntervalDivisionSuite, IntervalMultiplySuite, OrcEncryptionSuite, and RapidsShuffleThreadedWriterSuite. Please add {"spark": "403"} to each suite. Otherwise, -Dbuildver=403 silently excludes these tests from compilation and execution.

@res-life

Copy link
Copy Markdown
Collaborator

docs/download.md still lists Apache Spark support only through 4.0.2, including the Scala 2.13 support line. Since this PR adds Spark 4.0.3 support, please add 4.0.3 to both lists and update the Documentation checklist accordingly.

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman

Copy link
Copy Markdown
Collaborator Author

Addressed the latest review feedback in 8bff69c8b:

  • Added Spark 4.0.3 shim metadata to the seven shared Scala test suites so they compile for -Dbuildver=403.
  • Updated docs/download.md to list Spark 4.0.3 in the supported Spark versions and Scala 2.13 support line.
  • Updated the PR Documentation checklist and added the focused validation command to the description.

Validation: SPARK_HOME=/bigdata/work/tools/spark-4.0.3-bin-hadoop3 mvn -B -s /home/liangcail/.m2/settings_art.xml -f scala2.13/pom.xml -Dbuildver=403 -Dcuda.version=cuda13 -Dmaven.scalastyle.skip=true -Drat.skip=true -DskipTests -Dmaven.scaladoc.skip=true -pl tests -am package passed.

@firestarman

Copy link
Copy Markdown
Collaborator Author

build

@res-life res-life left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@NvTimLiu NvTimLiu left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, +1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature request New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEA] Add support for Apache Spark 4.0.3

5 participants