Restrict Python to <3.14 to prevent CPU RAM OOM in distributed training#128
Conversation
|
@claude review this PR |
|
Claude finished @hanaol's task in 33s —— View job PR Review: Restrict Python to <3.14 to prevent CPU RAM OOM
SummaryThis is a straightforward, well-scoped fix. The changes are minimal and targeted. Changes Reviewed
The constraint is correct and clearly communicates the intention. One minor observation: the
Pinning to
The large reduction (-572 lines) reflects the removal of Python 3.14-specific resolved dependencies/markers. This is expected and correct. Notes
VerdictApprove — the change is correct, minimal, and well-justified. No issues found. |
4b0ef2f to
6b92dd1
Compare
|
@claude re-review this PR |
|
Claude finished @hanaol's task in 1m 0s —— View job PR Re-Review: Restrict Python to <3.14 to prevent CPU RAM OOM
SummaryThe PR is correct and minimal. No issues found. All three changed files are consistent with each other and with the stated intent.
|
Python 3.14.0 caused progressive CPU RAM exhaustion during DDP training. Downgrading to Python 3.13 resolved the issue. Pin .python-version to 3.13 and update uv.lock accordingly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
6b92dd1 to
90880bb
Compare
Summary
requires-pythonto<3.14after discovering Python 3.14.0 causes progressive CPU RAM exhaustion during distributed training.python-versionto 3.13 and updateuv.lockaccordinglyBackground
Training with Python 3.14.0 resulted in processes being OOM-killed by the Linux kernel consistently across all dataset configurations. Downgrading to Python 3.13.9 (with an otherwise identical environment and torch version) resolved the issue. The root cause is likely an incompatibility between Python 3.14's memory management and PyTorch's C extensions under multi-process DDP workloads.