Skip to content

feat(hive-sync): batch and parallelize HiveQL partition operations#18984

Draft
nsivabalan wants to merge 4 commits into
apache:masterfrom
nsivabalan:hiveql-parallelize-calls
Draft

feat(hive-sync): batch and parallelize HiveQL partition operations#18984
nsivabalan wants to merge 4 commits into
apache:masterfrom
nsivabalan:hiveql-parallelize-calls

Conversation

@nsivabalan

@nsivabalan nsivabalan commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Adds opt-in HiveDriverPool for HiveQL partition sync (issue #18331)
  • Batches TOUCH using existing hoodie.datasource.hive_sync.batch_num (was one giant statement)
  • Fans batched ALTER statements across N pooled Hive Driver workers via dedicated single-thread executors
  • Default off — existing behavior unchanged unless hoodie.datasource.hive_sync.batching.enabled=true

Stack:

Until #18983 merges, the diff here includes the HMS commit. Once it lands, this PR rebases cleanly to a HiveQL-only delta. Reviewing the top three commits (49a700504bb34bcaacefd03e) in isolation gives the HiveQL change plus the post-review cleanup.

Design constraint

Hive's Driver and SessionState are thread-bound — SessionState.start() attaches to the calling thread's ThreadLocal, and a Driver constructed on one thread cannot be safely used from another. This is the opposite of IMetaStoreClient from #18983, which is a Thrift socket we can pool freely.

The HiveDriverPool gives each slot its own dedicated worker thread (a newSingleThreadExecutor). Bootstrap, dispatch, and close all run on that bound thread. The SessionState itself is shared across workers (lazily constructed once) — each worker calls SessionState.start(sharedState) on its own thread to attach to its ThreadLocal. Constructing one SessionState per worker triggered races in Hive's resource-dir machinery during early testing. Confirmed in review that the shared-SessionState model is appropriate for partition-only DDL — see usage contract in HiveDriverPool javadoc.

What this PR fixes

  1. TOUCH batchingQueryBasedDDLExecutor.constructPartitionAlterStatements was concatenating every partition into one ALTER TABLE ... TOUCH PARTITION (p1) PARTITION (p2) ... statement. Now split into batches of HIVE_BATCH_SYNC_PARTITION_NUM.
  2. HiveQL parallelismHiveQueryDDLExecutor.updateHiveSQLs ran the SQL list in a single for loop on one Driver. With the pool, each batch is dispatched to a worker (round-robin) and they execute in parallel.

SET_LOCATION (UPDATE) remains one statement per partition because Hive SQL doesn't support multi-partition SET LOCATION. But it now benefits from the parallel fan-out.

ALTER TABLE statements (createTable, schema evolution, column-comment updates) continue to run on the session Driver — they're rare and don't benefit from parallelism.

Hive 2.x quirk worth flagging

ALTER PARTITION SET LOCATION in Hive 2.x ignores db.tbl qualifiers and silently uses the connection's current database. The leading USE database statement in the SQL list is therefore load-bearing — when the pool is in use, it peels off any leading USE statements via runOnEachWorker() and runs them on every worker before fanning the rest out. JDBC mode (which shares one Connection) preserves today's contract where USE persists for subsequent statements on the same Connection.

Configs

No new configs — reuses everything from #18983:

Key Default
hoodie.datasource.hive_sync.batching.enabled false
hoodie.datasource.hive_sync.batching.threads 4
hoodie.datasource.hive_sync.batch_num 1000

Post-review hardening (commits 4bb34bca, acefd03e)

  • HiveDriverPool.awaitAll: cancels remaining pending futures on first error (mayInterruptIfRunning=false so in-flight Driver statements run to completion).
  • HiveDriverPool bootstrap: bounded by a 60s BOOTSTRAP_TIMEOUT_SECONDS; prior code blocked forever if Hive init hung.
  • HiveQueryDDLExecutor.driverPool is now Option<HiveDriverPool> (per review comment).
  • SessionState.setCurrentDatabase(...) is called once when the shared SessionState is first constructed, not on every newDriver invocation.
  • Per-statement SQL text removed from logs (batched statements can be many KB; N workers multiply log volume). Replaced with per-call summary.

Test plan

  • mvn compile on hudi-sync/hudi-hive-sync — clean, 0 Checkstyle violations, 0 RAT issues
  • mvn test on hudi-sync/hudi-hive-sync308 tests, 0 failures, 0 errors (was 305 before post-review tests)
  • TestHiveDriverPool9 unit tests (bootstrap, dispatch round-robin, error propagation, concurrent-borrow bounding, close idempotency, runOnEachWorker ordering, cancel-on-first-error)
  • TestHiveSyncTool#testHiveQLSyncWithBatchingEnabled — end-to-end HiveQL sync with batching on
  • TestHiveSyncTool#testHiveQLTouchPartitionsWithBatching — exercises the batched TOUCH path
  • TestHiveSyncTool#testHiveQLSetLocationWithBatching — drives the parallel SET_LOCATION fan-out path
  • Existing 296 tests across all three sync modes (hms / hiveql / jdbc) — pass unchanged
  • Manual benchmark on a ~1k-partition table (planned with Uber before flipping default; not blocking — default is off and OOB users are unaffected)

Files touched (top commits only)

  • QueryBasedDDLExecutor.java — batch TOUCH; new runSQLs(List<String>) hook for parallel-friendly subclasses
  • HiveQueryDDLExecutor.java — new constructor accepting Option<HiveDriverPool>; runSQLs peels off USE and dispatches via pool
  • HoodieHiveSyncClient.java — build pool for HIVEQL mode (explicit + legacy default)
  • util/HiveDriverPool.javanew, ~310 lines; eager pool of single-thread executors, each owning a Driver bound to a shared SessionState
  • util/TestHiveDriverPool.javanew; 9 unit tests with mocked DriverFactory
  • TestHiveSyncTool.java — 3 new end-to-end test methods

Follow-ups (separate PRs)

  1. DROP partition parallelization in HiveQL mode — tracked in #19033 (stacked on this PR).
  2. JDBC executor parallelism — needs a JDBC Connection pool, different concerns from Driver/SessionState.
  3. After benchmarks across HMS + HiveQL on a production HMS, post combined numbers on [IMPROVEMENT] Hive Sync partition operations lack batching and parallelism, causing 4x-9x slowdown for large tables #18331 and decide whether default-off is still right.

Related: #18331

🤖 Generated with Claude Code

nsivabalan and others added 2 commits June 11, 2026 16:54
Hive sync partition operations on HMS today serialize through a single
IMetaStoreClient and ship entire partition lists in a single Thrift call
for TOUCH/UPDATE. For large tables (~2k partitions) this is ~5-9x slower
than parallel implementations (see apache#18331). The biggest contributors are
(1) one giant alter_partitions call for UPDATE/TOUCH, and (2) per-
partition Thrift round-trips for DROP, all sequential.

This change introduces an opt-in IMetaStoreClientPool gated behind
hoodie.datasource.hive_sync.batching.enabled (default false). When on,
HMSDDLExecutor splits ADD / UPDATE / TOUCH / DROP into batches of
hoodie.datasource.hive_sync.batch_num (existing config, default 1000)
and fans them out across a pool of RetryingMetaStoreClient instances
sized by hoodie.datasource.hive_sync.batching.threads (default 4).

Design invariant: only partition-row operations go through the pool.
Table-row operations (createTable, alter_table, last-commit-time-synced,
writer-version, table-comments) stay on the existing session client, so
there is no lost-update risk on table parameters. The sync flow remains
serial-parallel-serial (phase 1: table setup, phase 2: parallel
partition fan-out, phase 3: table finalization).

Sequential fallback is preserved when the flag is off or when
HIVE_SYNC_USE_SPARK_CATALOG is on (incompatible with the pool's direct
RetryingMetaStoreClient.getProxy path).

Tests: TestIMetaStoreClientPool covers borrow/return, concurrent
borrows, close idempotency. TestHiveSyncTool.testHMSSyncWithBatchingEnabled
exercises end-to-end sync against the embedded HMS with batching on.

Related: apache#18331

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up to apache#18983 (HMS parallelism). Applies the equivalent treatment to
the HiveQL sync mode (hoodie.datasource.hive_sync.mode=hiveql).

HiveQL had two issues that this change addresses:

1. Batching gaps in QueryBasedDDLExecutor.constructPartitionAlterStatements:
   TOUCH concatenated every partition into one giant ALTER TABLE ... TOUCH
   PARTITION (...) PARTITION (...) ... statement; SET_LOCATION (UPDATE)
   emitted one statement per partition. ADD was already batched.

2. Sequential SQL execution in HiveQueryDDLExecutor.updateHiveSQLs: even
   when batches existed, they ran in a single for-loop on one Hive Driver.

This change introduces HiveDriverPool, an eager pool of single-thread
executors each owning a Hive Driver bound to a shared SessionState.
Gated behind the existing hoodie.datasource.hive_sync.batching.enabled
flag (default off) and sized by hoodie.datasource.hive_sync.batching.threads
(default 4) — no new configs.

Design notes:
- Hive's Driver and SessionState are thread-bound. SessionState.start()
  attaches to the calling thread's ThreadLocal. The pool gives each slot
  its own dedicated worker thread so the Driver stays valid for that
  thread's lifetime. Bootstrap, dispatch, and close all run on the bound
  thread.
- SessionState is shared across workers (lazily constructed once),
  because each worker calls SessionState.start(sharedState) on its own
  thread to attach. Constructing one SessionState per worker triggered
  race conditions in Hive's resource-directory machinery on macOS.
- TOUCH is now batched by HIVE_BATCH_SYNC_PARTITION_NUM. SET_LOCATION
  remains one statement per partition (Hive SQL doesn't support
  multi-partition SET LOCATION) but is now fanned out across workers.
- Hive 2.x's ALTER PARTITION SET LOCATION ignores db.table qualifiers
  and silently uses the connection's current database, so the leading
  USE database statement is load-bearing. The pool peels it off and
  runs it on every worker via runOnEachWorker() before fanning the
  rest out.

Tests:
- TestHiveDriverPool: bootstrap, dispatch round-robin, error
  propagation, concurrent-borrow bounding, close idempotency.
- TestHiveSyncTool.testHiveQLSyncWithBatchingEnabled: end-to-end with
  batching.enabled=true, threads=3, batch_num=3 against embedded HMS.
- TestHiveSyncTool.testHiveQLTouchPartitionsWithBatching: exercises
  the batched TOUCH path specifically.
- Full hudi-hive-sync suite: 305 passed, 0 failures, 0 errors.

Related: apache#18331

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…omments

- Change HiveDriverPool.awaitAll(...) to return void. The List<CommandProcessorResponse>
  it previously returned was always empty and no caller consumed it. Drops the unused
  CommandProcessorResponse import.
- Lift the empty-input short-circuit to the top of HiveQueryDDLExecutor.runSQLs so the
  no-op case skips both the pool and the session Driver branches cleanly.
- Document isUseStatement's strict 4-char prefix expectation so future callers don't
  feed it externally produced (potentially padded) SQL.

No behavior change. Full hudi-hive-sync suite: 305 tests, 0 failures, 0 errors.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
this(config, metaStoreClient, null);
}

public HiveQueryDDLExecutor(HiveSyncConfig config, IMetaStoreClient metaStoreClient,

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make the last arg Option instead of HiveDriverPool.
also the instance var in the class.

// Optional. When non-null, partition-row operations fan out across this pool;
// table-row operations always use the session `client` field above. See
// IMetaStoreClientPool javadoc for the usage contract.
private final IMetaStoreClientPool partitionClientPool;

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make this Option ? also the last arg in the constructor.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and can we name the variable iMetaStoreClientPool to align w/ the class being used

sharedSessionState = new SessionState(hiveConf,
UserGroupInformation.getCurrentUser().getShortUserName());
}
SessionState.start(sharedSessionState);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to reinstantiate here for every new driver.
the db cannot change across partitions right.
am I missing anything

PR review follow-ups for apache#18984:

- HiveQueryDDLExecutor.driverPool -> Option<HiveDriverPool> (PR comment).
  Constructor arg, instance field, and HoodieHiveSyncClient call sites
  updated. Eliminates a stale 'Optional. When non-null' inline doc.

- DefaultDriverFactory: stop redundantly calling setCurrentDatabase on
  every newDriver(). Database is a pool-wide property that never changes
  across workers, so set it once when the shared SessionState is first
  constructed (PR comment).

- HiveDriverPool.awaitAll: on first failure, cancel remaining (not yet
  started) pending futures so workers don't keep running pointless work
  after a fatal error. Cancel uses mayInterruptIfRunning=false so any
  in-flight statement is allowed to run to completion (keeps Driver
  state consistent). Suppressed errors continue to be logged at WARN.
  Adds handling for CancellationException so the cancel-walk doesn't
  itself raise a spurious HoodieHiveSyncException.

- HiveDriverPool bootstrap: bound each Future.get() at 60s
  (BOOTSTRAP_TIMEOUT_SECONDS). Prior code blocked forever if Hive init
  hung — now we surface a HoodieException with a timeout cause.

- Logging: stop logging full SQL text per-statement in runAll/awaitAll
  (batched TOUCH/ADD can be many KB; N workers multiply log volume).
  Replaced with a single per-call summary line. Same treatment applied
  to HiveQueryDDLExecutor.updateHiveSQLs (sequential path).

- New unit test: runOnEachWorkerRunsSetupOnEveryWorker — asserts every
  worker sees the leading USE before any fan-out partition statement.

- New unit test: awaitAllCancelsPendingFuturesOnFirstError — uses a
  size-1 pool to guarantee the 2nd/3rd statements are still pending
  behind the failing 1st, then asserts they are cancelled.

- New end-to-end test: testHiveQLSetLocationWithBatching — drives
  updatePartitionsToTable through the SET_LOCATION fan-out path with
  batching on; asserts partition count and per-partition relative paths
  survive parallel ALTER PARTITION SET LOCATION.

Out of scope (documented as follow-up): DROP partition parallelization
in HIVEQL mode. DROP goes through IMetaStoreClient.dropPartition (Thrift,
not Hive Driver), so it would need IMetaStoreClientPool wired into the
HiveQL path — a separable change from the HiveDriverPool work this PR
delivers.

Tests: full hudi-hive-sync suite passes — 308 tests, 0 failures,
0 errors (was 305 before this commit). New tests:
- TestHiveDriverPool: 9 tests (was 7)
- TestHiveSyncTool: testHiveQLSetLocationWithBatching added

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codecov-commenter

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 80.74713% with 67 lines in your changes missing coverage. Please review.
✅ Project coverage is 67.66%. Comparing base (8933224) to head (acefd03).
⚠️ Report is 42 commits behind head on master.

Files with missing lines Patch % Lines
...java/org/apache/hudi/hive/util/HiveDriverPool.java 83.21% 18 Missing and 5 partials ⚠️
...rg/apache/hudi/hive/util/IMetaStoreClientPool.java 72.05% 16 Missing and 3 partials ⚠️
.../java/org/apache/hudi/hive/ddl/HMSDDLExecutor.java 84.12% 7 Missing and 3 partials ⚠️
...org/apache/hudi/hive/ddl/HiveQueryDDLExecutor.java 73.33% 4 Missing and 4 partials ⚠️
...ava/org/apache/hudi/hive/HoodieHiveSyncClient.java 70.83% 6 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18984      +/-   ##
============================================
- Coverage     68.26%   67.66%   -0.61%     
- Complexity    29513    29845     +332     
============================================
  Files          2542     2564      +22     
  Lines        142627   145456    +2829     
  Branches      17788    18370     +582     
============================================
+ Hits          97369    98417    +1048     
- Misses        37253    38809    +1556     
- Partials       8005     8230     +225     
Flag Coverage Δ
common-and-other-modules 44.86% <80.74%> (+0.07%) ⬆️
hadoop-mr-java-client 44.69% <ø> (-0.06%) ⬇️
spark-client-hadoop-common 48.28% <ø> (+0.23%) ⬆️
spark-java-tests 48.18% <3.44%> (-0.59%) ⬇️
spark-scala-tests 44.39% <8.33%> (-0.46%) ⬇️
utilities 37.11% <5.17%> (-0.15%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...ava/org/apache/hudi/hive/HiveSyncConfigHolder.java 99.21% <100.00%> (+0.08%) ⬆️
...rg/apache/hudi/hive/ddl/QueryBasedDDLExecutor.java 87.12% <100.00%> (+1.07%) ⬆️
...ava/org/apache/hudi/hive/HoodieHiveSyncClient.java 50.12% <70.83%> (+1.20%) ⬆️
...org/apache/hudi/hive/ddl/HiveQueryDDLExecutor.java 67.36% <73.33%> (+1.69%) ⬆️
.../java/org/apache/hudi/hive/ddl/HMSDDLExecutor.java 80.86% <84.12%> (+0.51%) ⬆️
...rg/apache/hudi/hive/util/IMetaStoreClientPool.java 72.05% <72.05%> (ø)
...java/org/apache/hudi/hive/util/HiveDriverPool.java 83.21% <83.21%> (ø)

... and 157 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@hudi-bot

Copy link
Copy Markdown
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XL PR with lines of changes > 1000

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants