Adaptive Post-Training Quantization tooling (replicating AdpQ)
This repository implements tools and reference code to reproduce the ideas from AdpQ: A Zero-shot Calibration Free Adaptive Post Training Quantization Method for LLMs.
This README explains how to install, run tests, build documentation (including multi-version docs), and contribute.
Install from PyPI (recommended):
python -m pip install auto-adpqInstall the latest development version directly from GitHub:
python -m pip install "git+https://github.com/Tfloow/auto_adpq.git"To develop locally (editable install):
git clone https://github.com/Tfloow/auto_adpq.git
cd auto_adpq
python -m pip install -e .Makefile helper:
# Run formatting, linting, coverage and docs targets as defined in Makefile
makeImport the package and use the public API. Example (replace with real API):
from auto_adpq import Auto_AdpQAdd a short usage snippet here specific to the package functions you expect users to try first.
The most simple way to quantize a model is to follow a similar script as in examples/simple_quantization.py.
Coverage test: 91%
- Run tests with pytest:
pytest -q- Run full coverage report (Makefile target):
make coverage- Format & lint with
ruff(Makefile target):
make ruffTo obtain logs of the package, it is possible to enable the logging module. To activate it please create the new environment variable AUTO_ADPQ_DEBUG by running:
# Linux
export AUTO_ADPQ_DEBUG=1
# Windows
$Env:AUTO_ADPQ_DEBUG = 1The documentation can be found here.
This project uses Sphinx for documentation. There are two common workflows:
- Build a single-version site (useful for local writing and previews):
python -m pip install -r docs/requirements.txt
python -m sphinx -b html docs docs/_build/html- Build a multi-version site using
sphinx-multiversion(we configure this indocs/conf.py). This produces one static site containing each built branch and tag (useful for publishing versioned docs with a dropdown selector):
python -m pip install -r docs/requirements.txt
sphinx-multiversion docs docs/_build/html-mvNotes about versions
- The project includes a small template
docs/_templates/versions.htmlwhich renders a versions dropdown when the site is built withsphinx-multiversion. - Adjust
smv_tag_whitelistandsmv_branch_whitelistindocs/conf.pyto control which tags/branches are included in the build.
- Solve the datapacking issue #1
- Support efficient inference (maybe wrap in SpQR?)
- Optimize pydantic module
AdpQQuantizedWeights- Currently, there is a major overhead when creating a new object to validate the field. Since it is used internally only, we could ditch the Pydantic module but would need to ensure proper dump and load function
- Support model and integrate with
.safetensors
Pre-quantized models are available in this collection. They are simulated models meaning they are stored as bf16 values instead of the quantized versions. If I stored them in the custom format, I would either need an algorithm to reconstruct the weights in full at runtime or develop a custom CUDA kernel, which is quite tough.
Nonetheless, those models represent the quality and rounding errors that a typical quantized model can meet.
| Model Variant | Quantization Method | PPL (Perplexity) |
|---|---|---|
| meta-llama/Llama-3.1-8B | Baseline | 4.8693 |
| BNB | 5.0733 | |
| AdpQ | 5.3671 | |
| meta-llama/Llama-3.1-8B-Instruct | Baseline | 4.9080 |
| BNB | 4.9993 | |
| AdpQ | 5.0069 | |
| AWQ | 5.0440 | |
| GPTQ | nan | |
| meta-llama/Llama-3.2-1B | Baseline | 6.5546 |
| AdpQ 9% | 6.9491 | |
| BNB | 6.9971 | |
| AdpQ 2% | 7.0380 | |
| meta-llama/Llama-3.2-3B-Instruct | Baseline | 5.7864 |
| AWQ | 5.8339 | |
| AdpQ | 5.9040 |
Contributions are welcome. A suggested workflow:
- Fork the repository and create a feature branch.
- Add tests for new functionality.
- Run
ruffto format and lint. - Open a pull request describing the change.
Please include unit tests and keep the public API stable when possible.
- Docs templates:
docs/_templates/versions.html— version switcher used bysphinx-multiversion. - Makefile targets:
make ruff,make coverage,make docs(runs single and multiversion builds).
This work is under Apache 2.0 License.
