Skip to content

Support FlatIndex with Vector Quantization #1

Description

@phrozen

Any plans on supporting FlatIndex (or even HNSW) with quantized vectors? (int8 for example)

Distance algorithms are largely the same, FlatIndex can use generics to detect the vector type, or explicitly pass the quantization option.

Vector quantization is great for speed and memory savings, but sometimes you just need the best recall possible.

Providing automatic quantization on add would be also possible if allowed, but mainly given how fast SIMD accelerated int8 dot product gets, just being able to use FlatIndex over int8 vectors will be great for a LOT of use cases. Int8 vectors retain almost all of the semantic coherence of float32 while providing 4x memory savings and I bet speed up gains on FlatIndex and HNSW would be substantial (specially is AVX512 is available for SIMD).

I have taken it way futher on tests, and binary quantization on large dimensionality vectors (2560 - 4096 dimensions for example, using Qwen3 embed) has 0.1 recall difference vs float32, with 32x space savings and 60-120x speed increases vs DotProduct/CosineSimilarity (HammingDistance over uint64). That alone makes FlatIndex a contender vs other algorithms while keeping perfect recall.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions