Skip to content

feat: support AQE DPP broadcast reuse for Iceberg native scans#4215

Open
mbutrovich wants to merge 6 commits intoapache:mainfrom
mbutrovich:aqe_dpp_iceberg
Open

feat: support AQE DPP broadcast reuse for Iceberg native scans#4215
mbutrovich wants to merge 6 commits intoapache:mainfrom
mbutrovich:aqe_dpp_iceberg

Conversation

@mbutrovich
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes #4022.

Rationale for this change

Under AQE, Spark's PlanAdaptiveDynamicPruningFilters rewrites SubqueryAdaptiveBroadcastExec (SAB) to SubqueryBroadcastExec so DPP filters reuse the join's already-materialized broadcast. For Iceberg native scans this rewrite was a no-op: CometIcebergNativeScanExec kept runtimeFilters inside its @transient originalPlan, where neither Spark's expression walks nor our own transformExpressionsUp passes could see them. The SAB stayed unconverted and the dim table executed a second time as a standalone broadcast.

#4112 fixed the equivalent problem for V1 native Parquet by lifting runtimeFilters to a top-level constructor field and using Spark's standard prepare / waitForSubqueries flow. This PR applies the same design to V2 Iceberg, replacing the earlier prototype in #4033, and aligns CometNativeScanExec and CometIcebergNativeScanExec so both scans go through the same DPP and subquery resolution path.

What changes are included in this PR?

  • Lifted runtimeFilters to a top-level constructor field on CometIcebergNativeScanExec so Spark's productIterator-based expression walks (and our transformExpressionsUp passes) see and rewrite it directly. Mirrors BatchScanExec and matches the V1 design from feat: AQE DPP for native Parquet scans with broadcast reuse #4112.
  • Added CometLeafExec.ensureSubqueriesResolved(), bridging Comet's custom findAllPlanData data-collection path with Spark's standard prepare -> waitForSubqueries flow. Removes the deadlock-prone reflection hack from feat: AQE DPP broadcast reuse for Iceberg native scans #4033 and eliminates ad-hoc double-checked locking.
  • Refactored CometNativeScanExec to use the same flow (dropped its redundant doPrepare override and outer DPP filter loop) so V1 and V2 stay in sync.
  • New Iceberg branch in CometPlanAdaptiveDynamicPruningFilters (3.5+) that converts the SAB inside runtimeFilters to CometSubqueryBroadcastExec (or SubqueryBroadcastExec if the join fell back to vanilla Spark). Matches by buildKeys exprIds to disambiguate multiple broadcast joins, and rewrites to Literal.TrueLiteral when no matching broadcast join exists (e.g., SMJ) so DPP is disabled but results stay correct. On 3.4 Iceberg falls back without reuse: CometSpark34AqeDppFallbackRule walks scan partitionFilters, which BatchScanExec doesn't have.
  • LazyIcebergMetric defers metric value resolution. SparkPlanInfo.fromSparkPlan reads the metrics map for SQL UI events at planning time, before AQE's queryStageOptimizerRules run; without deferral that read would trigger serializedPartitionData against an unconverted SAB.
  • serializedPartitionData rebuilds originalPlan with the current top-level runtimeFilters before serializing. Otherwise Spark's PlanAdaptiveDynamicPruningFilters rewrite is invisible to the @transient originalPlan and serializePartitions re-translates the unresolved InSubqueryExec.
  • Handle the ParallelCollectionRDD shape that BatchScanExec.inputRDD returns when DPP prunes all partitions (matched by class name since it is private[spark]).

How are these changes tested?

  • 14 new AQE DPP tests in CometIcebergNativeSuite covering broadcast reuse, multiple DPP filters sharing a broadcast, buildKeys-based disambiguation across joins, BHJ fallback to vanilla Spark, SMJ (no broadcast) graceful disable, empty broadcast pruning all partitions, cross-stage scalar subqueries, and the V2 SPJ shape variations across Spark versions.
  • Verified on Spark 3.4, 3.5, 4.0, 4.1 for CometIcebergNativeSuite, CometExecSuite, CometDppFallbackRepro3949Suite, and CometShuffleFallbackStickinessSuite.

@mbutrovich mbutrovich marked this pull request as ready for review May 4, 2026 21:37
@mbutrovich mbutrovich requested a review from andygrove May 5, 2026 13:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: Iceberg DPP executes dim table broadcast twice instead of reusing join's broadcast exchange

1 participant