Skip to content

[FEATURE] Add zvec as a target connector #1701

@Haleshot

Description

@Haleshot

What is the use case?

zvec is an embedded vector database from Alibaba, built on their production Proxima engine. No server, no daemon; just pip install zvec and point it at a local path. It supports dense and sparse vectors, HNSW indexing w/ int8 quantization, filtered hybrid search, and built-in multi-vector retrieval + reranking.

CocoIndex already has LanceDB as an embedded vector DB target; zvec fills a similar niche but brings a few things additionally: native sparse vector support, multi-vector retrieval w/ built-in reranking, and fine-grained resource governance (memory limits, CPU thread caps, mmap mode for datasets exceeding RAM).

Describe the solution you'd like
Since both zvec and LanceDB are embedded, path-based vector DBs, the LanceDB target connector is a natural implementation template? The mapping seems to be straightforward: path replaces db_uri, collection_name replaces table_name, zvec.open(path) serves as the shared handle, and collection.optimize() mirrors table.optimize() for periodic index compaction after incremental writes.

A Zvec target spec would need path, collection_name, and optionally num_transactions_before_optimize (same pattern as LanceDB) plus enable_mmap for large-dataset scenarios. For mutations, zvec supports insert, delete by ID, and delete_by_filter; there's no native upsert(), so it'd need a delete-then-insert pattern (similar to how some other connectors handle this). CocoIndex's Vector[Float32, N] maps directly to zvec's VECTOR_FP32 type; scalar metadata goes into Doc.scalars.

This would be target/sink only. zvec has no CDC, no change streaming, and no way to enumerate all documents incrementally; so it doesn't make sense as a source connector.

CC: @iaojnh @Cuiyus, @feihongxu0824: Hope you don't mind the ping; CC'ing you to keep you in the loop re: this integration.


❤️ Contributors, please refer to 📙Contributing Guide.
Unless the PR can be sent immediately (e.g. just a few lines of code), we recommend you to leave a comment on the issue like I'm working on it or Can I work on this issue? to avoid duplicating work. Our Discord server is always open and friendly.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions