Skip to content

Tfloow/auto_adpq

Repository files navigation

Docs Paper PyPI HF CI Build and Release

auto-adpq

Adaptive Post-Training Quantization tooling (replicating AdpQ)

This repository implements tools and reference code to reproduce the ideas from AdpQ: A Zero-shot Calibration Free Adaptive Post Training Quantization Method for LLMs.

This README explains how to install, run tests, build documentation (including multi-version docs), and contribute.

Installation

Install from PyPI (recommended):

python -m pip install auto-adpq

Install the latest development version directly from GitHub:

python -m pip install "git+https://github.com/Tfloow/auto_adpq.git"

To develop locally (editable install):

git clone https://github.com/Tfloow/auto_adpq.git
cd auto_adpq
python -m pip install -e .

Makefile helper:

# Run formatting, linting, coverage and docs targets as defined in Makefile
make

Quick usage

Import the package and use the public API. Example (replace with real API):

from auto_adpq import Auto_AdpQ

Add a short usage snippet here specific to the package functions you expect users to try first.

The most simple way to quantize a model is to follow a similar script as in examples/simple_quantization.py.

Running tests & linters

Coverage test: 91%

  • Run tests with pytest:
pytest -q
  • Run full coverage report (Makefile target):
make coverage
  • Format & lint with ruff (Makefile target):
make ruff

Debug mode

To obtain logs of the package, it is possible to enable the logging module. To activate it please create the new environment variable AUTO_ADPQ_DEBUG by running:

# Linux
export AUTO_ADPQ_DEBUG=1

# Windows
$Env:AUTO_ADPQ_DEBUG = 1

Documentation

The documentation can be found here.

Building the documentation

This project uses Sphinx for documentation. There are two common workflows:

  • Build a single-version site (useful for local writing and previews):
python -m pip install -r docs/requirements.txt
python -m sphinx -b html docs docs/_build/html
  • Build a multi-version site using sphinx-multiversion (we configure this in docs/conf.py). This produces one static site containing each built branch and tag (useful for publishing versioned docs with a dropdown selector):
python -m pip install -r docs/requirements.txt
sphinx-multiversion docs docs/_build/html-mv

Notes about versions

  • The project includes a small template docs/_templates/versions.html which renders a versions dropdown when the site is built with sphinx-multiversion.
  • Adjust smv_tag_whitelist and smv_branch_whitelist in docs/conf.py to control which tags/branches are included in the build.

Tasklist

  • Solve the datapacking issue #1
  • Support efficient inference (maybe wrap in SpQR?)
  • Optimize pydantic module AdpQQuantizedWeights
    • Currently, there is a major overhead when creating a new object to validate the field. Since it is used internally only, we could ditch the Pydantic module but would need to ensure proper dump and load function
  • Support model and integrate with .safetensors

Quantized models

Pre-quantized models are available in this collection. They are simulated models meaning they are stored as bf16 values instead of the quantized versions. If I stored them in the custom format, I would either need an algorithm to reconstruct the weights in full at runtime or develop a custom CUDA kernel, which is quite tough.

Nonetheless, those models represent the quality and rounding errors that a typical quantized model can meet.

Performances

Current performance


Model Variant Quantization Method PPL (Perplexity)
meta-llama/Llama-3.1-8B Baseline 4.8693
BNB 5.0733
AdpQ 5.3671
meta-llama/Llama-3.1-8B-Instruct Baseline 4.9080
BNB 4.9993
AdpQ 5.0069
AWQ 5.0440
GPTQ nan
meta-llama/Llama-3.2-1B Baseline 6.5546
AdpQ 9% 6.9491
BNB 6.9971
AdpQ 2% 7.0380
meta-llama/Llama-3.2-3B-Instruct Baseline 5.7864
AWQ 5.8339
AdpQ 5.9040

Contributing

Contributions are welcome. A suggested workflow:

  1. Fork the repository and create a feature branch.
  2. Add tests for new functionality.
  3. Run ruff to format and lint.
  4. Open a pull request describing the change.

Please include unit tests and keep the public API stable when possible.

Development notes

  • Docs templates: docs/_templates/versions.html — version switcher used by sphinx-multiversion.
  • Makefile targets: make ruff, make coverage, make docs (runs single and multiversion builds).

License

This work is under Apache 2.0 License.

About

This repo aims at replicating: "AdpQ: A Zero-shot Calibration Free Adaptive Post Training Quantization Method for LLMs" https://arxiv.org/abs/2405.13358

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors