[FLINK-39753][state/rocksdb] Close ColumnFamilyOptions from getDescriptor() in Compactor#28251
Open
leekeiabstraction wants to merge 1 commit into
Open
[FLINK-39753][state/rocksdb] Close ColumnFamilyOptions from getDescriptor() in Compactor#28251leekeiabstraction wants to merge 1 commit into
leekeiabstraction wants to merge 1 commit into
Conversation
Collaborator
davidradl
approved these changes
May 26, 2026
Contributor
davidradl
left a comment
There was a problem hiding this comment.
Approving if you add the comment
och5351
approved these changes
May 26, 2026
Contributor
There was a problem hiding this comment.
Hi, @leekeiabstraction !
disposeInternal is implemented, but I couldn't find the corresponding close call.
LGTM
…ptor() in Compactor ColumnFamilyHandle.getDescriptor() allocates a new native ColumnFamilyOptions on every call and does not close it, preventing the shared block cache from being freed. Wrap the call in try-with-resources so the options are closed after reading numLevels().
9bfcda2 to
b065683
Compare
Author
|
@davidradl Thank you for the review, added comment just before the try-with-resource. PTAL |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What is the purpose of the change
Fixes a native memory leak in the RocksDB SST merge
Compactor.ColumnFamilyHandle.getDescriptor()copies the column family's options across JNI and returns a fresh nativeColumnFamilyOptionson every call.Compactor.compact()readnumLevels()from it but never closed it, so the native object leaked on every compaction. Because the leaked options retain a reference to the shared block cache (viaBlockBasedTableFactory->BlockBasedTableOptions->LRUCache), the cache'sshared_ptris never released, preventing the block cache from being freed even after all tasks stop. This causes task manager RSS to grow and eventually OOM.Brief change log
cfName.getDescriptor().getOptions()in a try-with-resources block inCompactor.compact()so the nativeColumnFamilyOptionsis closed afternumLevels()is read.Verifying this change
The leak and the fix were verified with jemalloc profiling (
jeprof), running Flink in session mode and repeatedly starting/stopping jobs to trigger the compactor while tracking therocksdb::BlockFetcher::ReadBlockContentscall stack that dominates block-cache allocations. The configured block cache capacity was 833MB.ReadBlockContentsgrew to ~1.54GB, far exceeding the 833MB cache capacity; the jemalloc heap profile reported a total of 2,280,777,636 bytes.ReadBlockContents, consistent with the 833MB capacity; the jemalloc heap profile reported a total of 1,416,132,765 bytes.This is a ~37% reduction in native memory usage and eliminates the cache-capacity overage, confirming the
LRUCacheleak caused by the unclosedColumnFamilyOptionsis resolved. The behavior (output level computation) is unchanged and is covered by existing tests; only the previously-leaked native handle is now closed.Does this pull request potentially affect one of the following parts:
@Public(Evolving): noDocumentation