auto-adpq

Adaptive Post-Training Quantization tooling (replicating AdpQ)

This repository implements tools and reference code to reproduce the ideas from AdpQ: A Zero-shot Calibration Free Adaptive Post Training Quantization Method for LLMs.

This README explains how to install, run tests, build documentation (including multi-version docs), and contribute.

auto_adpq

Installation

Install from PyPI (recommended):

python -m pip install auto-adpq

Install the latest development version directly from GitHub:

python -m pip install "git+https://github.com/Tfloow/auto_adpq.git"

To develop locally (editable install):

git clone https://github.com/Tfloow/auto_adpq.git
cd auto_adpq
python -m pip install -e .

Makefile helper:

# Run formatting, linting, coverage and docs targets as defined in Makefile
make

Quick usage

Import the package and use the public API. Example (replace with real API):

from auto_adpq import Auto_AdpQ

Add a short usage snippet here specific to the package functions you expect users to try first.

The most simple way to quantize a model is to follow a similar script as in examples/simple_quantization.py.

Running tests & linters

Coverage test: 91%

Run tests with pytest:

pytest -q

Run full coverage report (Makefile target):

make coverage

Format & lint with ruff (Makefile target):

make ruff

Debug mode

To obtain logs of the package, it is possible to enable the logging module. To activate it please create the new environment variable AUTO_ADPQ_DEBUG by running:

# Linux
export AUTO_ADPQ_DEBUG=1

# Windows
$Env:AUTO_ADPQ_DEBUG = 1

Documentation

The documentation can be found here.

Building the documentation

This project uses Sphinx for documentation. There are two common workflows:

Build a single-version site (useful for local writing and previews):

python -m pip install -r docs/requirements.txt
python -m sphinx -b html docs docs/_build/html

Build a multi-version site using sphinx-multiversion (we configure this in docs/conf.py). This produces one static site containing each built branch and tag (useful for publishing versioned docs with a dropdown selector):

python -m pip install -r docs/requirements.txt
sphinx-multiversion docs docs/_build/html-mv

Notes about versions

The project includes a small template docs/_templates/versions.html which renders a versions dropdown when the site is built with sphinx-multiversion.
Adjust smv_tag_whitelist and smv_branch_whitelist in docs/conf.py to control which tags/branches are included in the build.

Tasklist

Solve the datapacking issue #1
Support efficient inference (maybe wrap in SpQR?)
Optimize pydantic module AdpQQuantizedWeights
- Currently, there is a major overhead when creating a new object to validate the field. Since it is used internally only, we could ditch the Pydantic module but would need to ensure proper dump and load function
Support model and integrate with .safetensors

Quantized models

Pre-quantized models are available in this collection. They are simulated models meaning they are stored as bf16 values instead of the quantized versions. If I stored them in the custom format, I would either need an algorithm to reconstruct the weights in full at runtime or develop a custom CUDA kernel, which is quite tough.

Nonetheless, those models represent the quality and rounding errors that a typical quantized model can meet.

Performances

Model Variant	Quantization Method	PPL (Perplexity)
meta-llama/Llama-3.1-8B	Baseline	4.8693
	BNB	5.0733
	AdpQ	5.3671
meta-llama/Llama-3.1-8B-Instruct	Baseline	4.9080
	BNB	4.9993
	AdpQ	5.0069
	AWQ	5.0440
	GPTQ	nan
meta-llama/Llama-3.2-1B	Baseline	6.5546
	AdpQ 9%	6.9491
	BNB	6.9971
	AdpQ 2%	7.0380
meta-llama/Llama-3.2-3B-Instruct	Baseline	5.7864
	AWQ	5.8339
	AdpQ	5.9040

Contributing

Contributions are welcome. A suggested workflow:

Fork the repository and create a feature branch.
Add tests for new functionality.
Run ruff to format and lint.
Open a pull request describing the change.

Please include unit tests and keep the public API stable when possible.

Development notes

Docs templates: docs/_templates/versions.html — version switcher used by sphinx-multiversion.
Makefile targets: make ruff, make coverage, make docs (runs single and multiversion builds).

License

This work is under Apache 2.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
src/auto_adpq		src/auto_adpq
tests		tests
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
nohup.out		nohup.out
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

auto-adpq

Installation

Quick usage

Running tests & linters

Debug mode

Documentation

Building the documentation

Tasklist

Quantized models

Performances

Contributing

Development notes

License

About

Uh oh!

Releases 5

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

auto-adpq

Installation

Quick usage

Running tests & linters

Debug mode

Documentation

Building the documentation

Tasklist

Quantized models

Performances

Contributing

Development notes

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages