Skip to content

[MINOR] Cap UT_FT_10 Azure install to -T 2 to avoid flaky compiler heap OOM#19008

Merged
danny0405 merged 1 commit into
apache:masterfrom
wombatu-kun:azure-cap-ut-ft-10-install-threads
Jun 16, 2026
Merged

[MINOR] Cap UT_FT_10 Azure install to -T 2 to avoid flaky compiler heap OOM#19008
danny0405 merged 1 commit into
apache:masterfrom
wombatu-kun:azure-cap-ut-ft-10-install-threads

Conversation

@wombatu-kun

Copy link
Copy Markdown
Contributor

Describe the issue this Pull Request addresses

The Azure CI job "UT FT common & other modules" (UT_FT_10) intermittently fails its initial full-reactor mvn clean install with maven-compiler-plugin:compile ... Bad service configuration file, or exception thrown while constructing Processor object: Java heap space, most recently while compiling hudi-kafka-connect-bundle (for example the Azure run for PR #19004, buildId 14704, where the change itself is unrelated test-only code). The job builds the entire reactor with -T 3 inside a single -Xmx8g JVM on a memory-constrained Azure agent, so when several heavy module builds align in time the shared heap occasionally exceeds the 8g ceiling and OOMs. The annotation-processor line in the message is only where the allocation tips over, not the root cause.

Summary and Changelog

This scopes a lower build parallelism to just the job that OOMs, without touching any source code or the shared install options used by the other jobs. UT_FT_10's clean install now prepends -T 2 before $(MVN_OPTS_INSTALL) (which contains -T 3). Maven uses the first -T it sees on the command line (verified: -T 2 -T 3 resolves to a thread count of 2, -T 3 -T 2 to 3), so the effective thread count for this job's install becomes 2 while every other job keeps the shared -T 3. Lowering the concurrency from 3 to 2 reduces how many heavy compiles and shade operations can run at the same time, and therefore the peak heap.

The approach was chosen after measuring the heap profile of the full clean install locally: peak heap is about 2.4 GB at -T 1, about 2.4 GB at -T 2, and about 2.9 GB at -T 3, all far below the 8 GB ceiling, which shows the failure is a rare concurrency-driven tail spike rather than a systematic over-use of memory. -T 2 keeps essentially the same wall-clock as -T 3 (about half a minute slower in the local run) while bringing the measured peak heap back down to the -T 1 level. An earlier idea to disable annotation processing on the bundle modules was measured and rejected, because it did not change the compile's heap requirement at all. No code was copied from third-party sources.

Impact

CI-only change scoped to the UT_FT_10 Azure job. No production code, public API, configuration default, or runtime behavior changes. The job's install phase becomes slightly slower (about half a minute in the local measurement) in exchange for lower peak heap; all other CI jobs are unaffected.

Risk Level

none

CI-only change that lowers build parallelism for a single job. It cannot affect build output or test results, and the verified -T precedence guarantees the intended thread count.

Documentation Update

none

Contributor's checklist

…ap OOM

The UT_FT_10 Azure job builds the full reactor with `mvn clean install` under the shared MVN_OPTS_INSTALL `-T 3` inside a single `-Xmx8g` JVM, and intermittently OOMs during compilation (most recently hudi-kafka-connect-bundle) when several heavy concurrent builds align in time. Prepend `-T 2` to this job's install so Maven (first -T wins) caps its concurrency to 2 while every other job keeps the shared -T 3. Local measurement: the full install peaks at ~2.4 GB at -T 1 and -T 2 vs ~2.9 GB at -T 3, all well under the 8g ceiling, and -T 2 keeps essentially the same wall-clock as -T 3.

@hudi-agent hudi-agent left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

No reviewable code files in this PR.

cc @yihua

wombatu-kun pushed a commit to wombatu-kun/hudi that referenced this pull request Jun 15, 2026
@github-actions github-actions Bot added the size:XS PR with lines of changes in <= 10 label Jun 15, 2026
@hudi-bot

Copy link
Copy Markdown
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@danny0405 danny0405 merged commit 7eaa1c3 into apache:master Jun 16, 2026
70 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XS PR with lines of changes in <= 10

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants