Skip to content

[MINOR] Make HoodieTable AutoCloseable and close per-cycle tables in BaseHoodieTableServiceClient#19042

Open
Davis-Zhang-Onehouse wants to merge 1 commit into
apache:masterfrom
Davis-Zhang-Onehouse:eng-43829-hoodietable-autocloseable
Open

[MINOR] Make HoodieTable AutoCloseable and close per-cycle tables in BaseHoodieTableServiceClient#19042
Davis-Zhang-Onehouse wants to merge 1 commit into
apache:masterfrom
Davis-Zhang-Onehouse:eng-43829-hoodietable-autocloseable

Conversation

@Davis-Zhang-Onehouse

@Davis-Zhang-Onehouse Davis-Zhang-Onehouse commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Change Logs

A HoodieTable created per table-service work unit on the driver lazily builds a FileSystemViewManager. For a SPILLABLE_DISK view the file-group store spills to an on-disk BitCaskDiskMap (which registers a JVM shutdown hook). HoodieTable had no close(), and the per-cycle tables created in BaseHoodieTableServiceClient were never closed, so their cached views' on-disk maps lingered until JVM exit.

  • HoodieTable now implements AutoCloseable; close() releases the FileSystemViewManager and the table metadata reader (idempotent, never throws).
  • Every create-use-drop HoodieTable in BaseHoodieTableServiceClient is wrapped in try-with-resources / try-finally: preCommit, logCompact, compact, cluster, scheduleTableServiceInternal, scheduleCleaning, purgePendingClustering, clean, rollbackFailedIndexingCommits, rollback, rollbackFailedBootstrap. (commitCompaction/commitLogCompaction are left alone — they receive an Option<HoodieTable> that may be caller-owned.)
  • SpillableMapBasedFileSystemView now drains its ExternalSpillableMaps from closeResources() — under the AbstractTableFileSystemView writeLock — instead of from close() before the lock. This releases the on-disk maps while still referenced and without racing a concurrent reader.
  • Adds TestHoodieTableResourceLifecycle verifying that HoodieTable.close() releases the spillable view's on-disk BitCaskDiskMap.

Impact

Promptly frees per-cycle driver-side file-system view resources instead of leaving them until JVM exit. No behavior change to table-service results.

Risk level: low

Lifecycle change covered by existing file-system-view close tests plus the new test.

Companion internal PR

This is kept code-consistent with the Onehouse-internal companion (onehouseinc/hudi-internal#1959). The internal repo additionally needs the closeResources() ordering fix as an actual bug fix (its SpillableMapBasedFileSystemView overrides closeResources() and was calling super first, nulling the map fields before draining them); here the same drain-before-super ordering is applied for the writeLock benefit.

Documentation Update

None.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable

@github-actions github-actions Bot added the size:M PR with lines of changes in (100, 300] label Jun 19, 2026
@codecov-commenter

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 50.00000% with 29 lines in your changes missing coverage. Please review.
✅ Project coverage is 51.04%. Comparing base (d5c6b4d) to head (255eaf2).
⚠️ Report is 90 commits behind head on master.

Files with missing lines Patch % Lines
...ache/hudi/client/BaseHoodieTableServiceClient.java 46.66% 21 Missing and 3 partials ⚠️
...c/main/java/org/apache/hudi/table/HoodieTable.java 61.53% 4 Missing and 1 partial ⚠️

❗ There is a different number of reports uploaded between BASE (d5c6b4d) and HEAD (255eaf2). Click for more details.

HEAD has 27 uploads less than BASE
Flag BASE (d5c6b4d) HEAD (255eaf2)
hadoop-mr-java-client 1 0
spark-scala-tests 10 0
spark-java-tests 15 0
utilities 1 0
Additional details and impacted files
@@              Coverage Diff              @@
##             master   #19042       +/-   ##
=============================================
- Coverage     68.78%   51.04%   -17.74%     
+ Complexity    29136    20833     -8303     
=============================================
  Files          2515     2457       -58     
  Lines        139938   133008     -6930     
  Branches      17187    15628     -1559     
=============================================
- Hits          96260    67898    -28362     
- Misses        35902    59726    +23824     
+ Partials       7776     5384     -2392     
Flag Coverage Δ
common-and-other-modules 44.78% <48.27%> (+0.44%) ⬆️
hadoop-mr-java-client ?
spark-client-hadoop-common 48.35% <18.96%> (+0.12%) ⬆️
spark-java-tests ?
spark-scala-tests ?
utilities ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...c/main/java/org/apache/hudi/table/HoodieTable.java 69.72% <61.53%> (-19.77%) ⬇️
...ache/hudi/client/BaseHoodieTableServiceClient.java 58.28% <46.66%> (-16.80%) ⬇️

... and 1164 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Davis-Zhang-Onehouse Davis-Zhang-Onehouse force-pushed the eng-43829-hoodietable-autocloseable branch from 255eaf2 to 005f406 Compare June 19, 2026 15:17
…tem views

A HoodieTable created per table-service work unit on the driver lazily builds a
FileSystemViewManager. For a SPILLABLE_DISK view the file-group store spills to an
on-disk BitCaskDiskMap. HoodieTable had no close() and the per-cycle tables created
in BaseHoodieTableServiceClient were never closed, so their cached views' on-disk
maps lingered until JVM exit.

- HoodieTable now implements AutoCloseable; close() releases the metadata reader and,
  for locally-managed views (SPILLABLE_DISK / MEMORY / EMBEDDED_KV), the
  FileSystemViewManager. For REMOTE_ONLY/REMOTE_FIRST views close() leaves the view
  manager alone: that view talks to an embedded timeline server shared with the write
  client and later table-service cycles, so tearing it down here would break them.
- Every create-use-drop HoodieTable in BaseHoodieTableServiceClient is wrapped in
  try-with-resources / try-finally (preCommit, logCompact, compact, cluster,
  scheduleTableServiceInternal, scheduleCleaning, purgePendingClustering, clean,
  rollbackFailedIndexingCommits, rollback, rollbackFailedBootstrap).
- SpillableMapBasedFileSystemView drains its ExternalSpillableMaps from
  closeResources() (under the AbstractTableFileSystemView writeLock) instead of
  close(), so the on-disk maps are released while still referenced and without racing
  a concurrent reader.
- Adds TestHoodieTableResourceLifecycle verifying the on-disk BitCaskDiskMap is
  released by HoodieTable.close().

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@Davis-Zhang-Onehouse Davis-Zhang-Onehouse force-pushed the eng-43829-hoodietable-autocloseable branch from 005f406 to 839de89 Compare June 19, 2026 17:04
@github-actions github-actions Bot added size:L PR with lines of changes in (300, 1000] and removed size:M PR with lines of changes in (100, 300] labels Jun 19, 2026
@hudi-bot

Copy link
Copy Markdown
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L PR with lines of changes in (300, 1000]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants