Skip to content

docs(prepare_model_for_kbit_training): note the ~0.5–1 GB CUDA reserved overhead (#3265)#3267

Open
Anai-Guo wants to merge 1 commit into
huggingface:mainfrom
Anai-Guo:docs/prepare-for-kbit-memory-note
Open

docs(prepare_model_for_kbit_training): note the ~0.5–1 GB CUDA reserved overhead (#3265)#3267
Anai-Guo wants to merge 1 commit into
huggingface:mainfrom
Anai-Guo:docs/prepare-for-kbit-memory-note

Conversation

@Anai-Guo

Copy link
Copy Markdown
Contributor

Closes #3265.

Why

#3265 describes that prepare_model_for_kbit_training adds ~1 GB of CUDA reserved memory in ~500 ms for 7B-class models on top of the loaded weights, and that this overhead is not documented anywhere. On 8 GB unified-memory or consumer accelerators (Jetson Orin Nano 8 GB, Apple Silicon, RTX 4060 8 GB) this is the difference between a recipe that fits and one that OOMs.

The issue author proposed three fixes (docs / new kwarg / lean variant). This PR takes Fix 1 (docs only, easy) so users can at least account for the overhead. The new-kwarg / lean-variant options can land separately if maintainers want them — happy to follow up.

What

Adds a <Tip warning={true}> block to the prepare_model_for_kbit_training docstring noting:

  • the source of the overhead (norm fp32 upcast + transient allocator buffers),
  • a rough magnitude (~0.5–1 GB CUDA reserved on 7B-class),
  • and the device classes most affected.

No behavior change.

Test plan

  • Docs build (make docs / nbsphinx) renders the <Tip warning> block correctly.
  • No code paths touched, so no new unit tests.

🤖 Generated with Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant