Documentation re-structure#3300
Conversation
|
Thanks a lot for revamping the PEFT docs, which I agree are not very user friendly at the moment. Could you please resolve the two merge conflicts so that preview docs could be rendered? I think it makes more sense to review the docs as a whole than going through the diff (which is probably showing a lot of text that has just moved places). One concern that I have is that links to the PEFT docs could break with the new structure. Thus I have two questions:
|
The space was not that useful anymore since most methods are compatible with most models. The front page buttons are, at least temporarily, with the exception of the quicktour and method overview buttons. I like the visuals but there should only be elements that are useful.
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
I didn't at the time but now I have. There were 14 occurrences of now broken links, all of which are fixed now.
I've added a The Hermes PEFT skill (https://github.com/NousResearch/hermes-agent/tree/main/optional-skills/mlops/peft) doesn't seem to link to changed pages in the docs. |
|
This also requires merging of https://huggingface.co/datasets/huggingface/documentation-images/discussions/625. |
Done |
PR #3300 drafts the idea of embedding the method comparison results into the respective method pages. This calls for a lighter version of the existing space to limit the needed space. This is what `app_embed.py` is. Most of the common processing has moved to the existing and aptly named `processing.py`. I think that this is better than having a layout switch in `app.py` as these apps are meant to be as flat as can be to be readable and maintainable.
|
I think this is now ready for review. Sorry about the huge PR but dissolving the guides into the individual method pages made a relatively big splash in terms of changes, even though the individual changes are quite small. @stevhliu it would be super cool if you could take a look as well :) When reviewing the rendered doc on moon-ci-docs I noticed that the new images are rendered with borders (esp. visible in the quicktour) and the ToC indentation for LoRA variants is broken but I have no clue how to fix this. @stevhliu do you have an idea? |
BenjaminBossan
left a comment
There was a problem hiding this comment.
Thanks a LOT for working on overhauling the PEFT docs. They always felt lacking and suboptimally structured to me, so I'm very happy to see improvements there.
For this review, I focused on the general sections but haven't reviewed the entries for the individual PEFT methods. This was in order to break down the review in smaller parts, as I'm not going to finish it today. It may also help avoid duplicate effort between me and Steven.
As a more general comment, I saw that some added parts contain manual line breaks, e.g. in overview.md. I would suggest to remove those completely.
I like the idea of including a benchmark overview for each PEFT method. Now that we have image generation too, it would be great to add an option to toggle the benchmark, but let's leave that to a future PR. I noticed, however, that not each PEFT method includes the benchmark, e.g. HRA is missing it. Also, some methods like HiRA have the graph but no corresponding data points, but maybe its result was added after the space was deployed?
I also wonder if we should not fully remove the legend, as the resulting graph can become quite cramped:
There is also a bit of an inconsistency about the legend, e.g. for Lily it only labels the line but not the points. I think it should be removed for simplicity.
| <div class="flex flex-col basis-1/4"> | ||
| There are numerous methods to "adapt" existing models, often extensively integrating into the model. PEFT can be thought of as a framework for arbitrary methods of model adaption (modifying weights, wrapping layers, manipulating KV-caches, ...) while also serving as a reference implementation for many fine-tuning methods. | ||
| </div> | ||
| <div class="flex flex-col basis-3/4 pl-10 pr-10"><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/adapter_installation.png" width="100%"></div> |
|
|
||
| ## Multiple adapters | ||
|
|
||
| PEFT supports installing multiple adapters (of the same kind, in this document this would be LoRA) on top of a base model. When you call `get_peft_model` there is only one adapter named `"default"` but you can add as many additional adapters by calling `peft_model.add_adapter(adapter_name=...)`. |
There was a problem hiding this comment.
| PEFT supports installing multiple adapters (of the same kind, in this document this would be LoRA) on top of a base model. When you call `get_peft_model` there is only one adapter named `"default"` but you can add as many additional adapters by calling `peft_model.add_adapter(adapter_name=...)`. | |
| PEFT supports installing multiple adapters (of the same kind, in this document this would be LoRA) on top of a base model. When you call `get_peft_model` there is only one adapter named `"default"` but you can add as many additional adapters as you want by calling `peft_model.add_adapter(adapter_name=...)`. |
| model = AutoPeftModel.from_pretrained("smangrul/openai-whisper-large-v2-LORA-colab") | ||
| ``` | ||
|
|
||
| ## Multiple adapters |
There was a problem hiding this comment.
In the section above, the docs describe the AutoPeftModel API for loading trained adapters. I'm just wondering if we should not at the very least mention the PeftModel.from_pretrained(base_model, adapter_id) API as well.
|
|
||
| ## Choosing the right method | ||
|
|
||
| Not every PEFT method is built equally and some formulations are easier to build in a memory efficient manner. If you are on a memory budget it makes sense to check out the [PEFT method comparison suite](https://huggingface.co/spaces/peft-internal-testing/PEFT-method-comparison) and filter for **maximum** accelerator memory usage. Average accelerator memory usage can be fairly equal across methods but not every method scales equally with activations and sequence length and is more prone to memory spikes than others. |
There was a problem hiding this comment.
I think as is, the last sentence doesn't quite make sense, even though it's clear what is meant. Here is a suggestion for a different wording.
| Not every PEFT method is built equally and some formulations are easier to build in a memory efficient manner. If you are on a memory budget it makes sense to check out the [PEFT method comparison suite](https://huggingface.co/spaces/peft-internal-testing/PEFT-method-comparison) and filter for **maximum** accelerator memory usage. Average accelerator memory usage can be fairly equal across methods but not every method scales equally with activations and sequence length and is more prone to memory spikes than others. | |
| Not every PEFT method is built equally and some formulations are easier to build in a memory efficient manner. If you are on a memory budget it makes sense to check out the [PEFT method comparison suite](https://huggingface.co/spaces/peft-internal-testing/PEFT-method-comparison) and filter for **maximum** accelerator memory usage. Average accelerator memory usage can be fairly equal across methods but not every method scales equally with activations and sequence length; some methods are more prone to memory spikes than others. |
|
|
||
| Especially when targeting large layers like language modeling heads or embedding layers to fine-tune specific tokens it might make sense to look into [using trainable tokens](troubleshooting#using-trainable-tokens). | ||
|
|
||
| ## Chunked NLL loss |
There was a problem hiding this comment.
I'd put this section last, I think the other ones below are more generally applicable.
|
|
||
| ## Quantization | ||
|
|
||
| Quantization is one of the best ways to reduce memory consumption *of the base model* and will, depending on the employed quantization, also reduce activation memory. Since the PEFT methods will only take up a small portion of the total number of parameters, PEFT defaults to use a higher precision than the base model. This can also have the effect that adapters can mitigate some of the quality loss incured by quantization methods. Read the [PEFT quantization guide](quantization). |
There was a problem hiding this comment.
| Quantization is one of the best ways to reduce memory consumption *of the base model* and will, depending on the employed quantization, also reduce activation memory. Since the PEFT methods will only take up a small portion of the total number of parameters, PEFT defaults to use a higher precision than the base model. This can also have the effect that adapters can mitigate some of the quality loss incured by quantization methods. Read the [PEFT quantization guide](quantization). | |
| Quantization is one of the best ways to reduce memory consumption *of the base model* and will, depending on the employed quantization, also reduce activation memory. Since the PEFT methods will only take up a small portion of the total number of parameters, PEFT defaults to use a higher precision than the base model. This can also have the effect that adapters can mitigate some of the quality loss incurred by quantization methods. Read the [PEFT quantization guide](quantization). |
|
|
||
| ## Gradient Checkpointing | ||
|
|
||
| You can trade memory with computation by only saving every nth gradient between layers and computing the rest on the fly. Check out the [gradient checkpointing](https://huggingface.co/docs/transformers/grad_checkpointing) documentation of Transformers to learn more. |
There was a problem hiding this comment.
Maybe worth mentioning that if not using Transformers or Diffusers, users may have to implement their own GC logic.
| Giving general advice for training large models is hard but for generative | ||
| models, especially language models, you can follow these steps: | ||
|
|
||
| 1. use prompting (few-shot examples in the prompt) to see if the model is |
There was a problem hiding this comment.
| 1. use prompting (few-shot examples in the prompt) to see if the model is | |
| 1. use prompting (e.g. few-shot examples in the prompt) to see if the model is |
| fine-tuning step is potentially unlearning past knowledege. | ||
|
|
||
| The [PEFT method comparison suite](https://huggingface.co/spaces/peft-internal-testing/PEFT-method-comparison) aims to give a rough overview of (most) implemented methods on selected benchmarks and models. | ||
|
|
There was a problem hiding this comment.
It could also be useful to mention some criteria here that may guide you in choosing the appropriate PEFT method:
- quantization: not all methods support quantized base models
- feature set: not all features are supported for all methods (e.g. multiple adapters, mixed adapter inference)
- layer types: linear layers are generally always supported, but not all methods support embedding (important for expanding vocab) or conv (important for some image models)
- inference runtime: PEFT methods generally add runtime overhead but some of that can be mitigated (e.g. some methods allow merging, removing the overhead)
|
|
||
| ## Layer Tuning | ||
|
|
||
| Layer Tuning categorizes methods that target specific layers of a model such as [LayerNorm Tuning](../package_reference/layernorm_tuning) |
There was a problem hiding this comment.
"target specific layers" doesn't make it quite clear that it means that existing parameters of the base model are made trainable, since you could say that LoRA also targets specific layers. I would state that explicitly.

The current state of the PEFT docs is not one of structure and I was constantly annoyed that whenever I wanted to change something there were several places that needed touching and they all felt disconnected. So this is my attempt at structuring the docs. Some of these ideas are quite old (discussed in 01/2025) but are still valid.
I've removed most of the code guides without replacement. That's not ideal, I think we should have code examples but I'm think they should be method-focused. Maybe one general example of a training workflow is sufficient because most methods follow the same scheme. I'd appreciate some feedback on this.
All details from the method guides (prompting, lora, oft/boft, etc.) are now integrated into the respective method pages instead. I would have hesitated to do this if these guides would have integrated information about the adapters but they didn't. I think it makes a lot more sense to have one place for each method to gather examples/tips/recommendations and that is now the
package_refernce/<method>page. This page now also hosts a small space that shows the MetaMathQA (and potentially other) benchmark results highlighted for that method.I've moved the LoRA initializations to
package_reference/lora#Initializationand converted the init methods to<hfoption>-tags. This collapses them to a list but may reduce searchability through the document - at least firefox is not able to search 'through' the option tabs. This also doesn't make them appear in the ToC and people specifically searching for, say, PiSSA won't find it directly. I think that's OK though, since the search is able to locate it.The quicktour is a bit more detailed about what happens under the hood (quick doesn't have to mean simplistic) and includes some new visualizations. I hope that we can integrate more visualizations in the future where it makes sense.