SwarnDB

The vector database that thinks in graphs.

One store for meaning and structure

Most teams bolt a vector database to a graph database and then spend their lives keeping the two in sync. Two systems, two copies of the data, two failure modes, and a layer of glue code in between that is always a little out of date.

SwarnDB removes the seam. It is a production-grade engine, written in Rust, where a vector and a graph node are the same object: one identity, one storage path, one crash-recovery path. So a single query can move between what is similar and what is connected without ever leaving the engine.

And if all you want is a fast, accurate vector store, that is exactly what you get out of the box. The graph is opt-in, per collection, ready the moment your problem grows past nearest-neighbor.

The one idea

In SwarnDB, the id of a vector is the id of its graph node. There is no foreign key, no mirror table, no eventual consistency between two stores. The thing you searched for and the thing you traverse from are literally the same row.

That single decision is what makes a query like this possible: scope by structure, then rank by meaning, in one plan.

from swarndb import SwarnDBClient

with SwarnDBClient(host="localhost", port=50051) as client:
    # mode="hybrid" turns on the first-class typed graph.
    client.collections.create(
        "articles", dimension=384, distance_metric="cosine", mode="hybrid"
    )

    # The id each insert returns is the node's id. Vector and node are one object.
    a = client.vectors.insert("articles", vector=[0.1, 0.2, 0.3, ...], metadata={"topic": "physics"})
    b = client.vectors.insert("articles", vector=[0.3, 0.1, 0.4, ...], metadata={"topic": "math"})
    c = client.vectors.insert("articles", vector=[0.2, 0.4, 0.1, ...], metadata={"topic": "physics"})

    # Typed edges carry provenance, not just a pointer.
    client.graph.put_edge("articles", source=a, target=b, edge_type="CITES",
                          provenance={"doc_id": "paper-1"})
    client.graph.put_edge("articles", source=b, target=c, edge_type="CITES",
                          provenance={"doc_id": "paper-2"})

    # One composable hybrid query:
    # seed by similarity -> walk the graph -> rank the frontier exactly by meaning.
    result = (
        client.graph.query("articles")
        .vector_similar([0.1, 0.2, 0.3, ...], k=20)
        .traverse("CITES", direction="outgoing")
        .vector_rank([0.1, 0.2, 0.3, ...], k=10)
        .return_nodes()
    )
    for node in result.nodes:
        print(node.id, node.label)

No second database. No sync job. No copy drift. Just one query that knows both what things mean and how they connect.

What you get

A fast, accurate vector store: the default, with nothing turned on. Every collection is an approximate-nearest-neighbor store backed by HNSW for high-recall in-memory search, with optional SQ8 scalar quantization for a compressed index on larger collections. Four distance metrics (cosine, euclidean, dot, manhattan), per-query ef_search to tune recall against latency, batch search, and metadata pre-filtering with adaptive index selection.

A first-class typed graph in the same engine (opt-in). Flip a collection to mode="hybrid" and store typed, directed edges that carry confidence, a manual-versus-extracted flag, and a full provenance record. Then run one composable query that chains vector similarity, single-hop and k-hop traversal, shortest path, and a graph-first vector_rank that scopes by structure and ranks exactly by meaning.

Attribute-constrained search that is actually correct. scan_by_filter(predicate=...) fixes the candidate set first, then vector_rank(...) ranks it exactly, so the returned top-k is the complete, correct top-k among items that meet the condition. On real public datasets, plain vector search with an attribute condition often returns mostly non-matching items, and on a large share of such queries returns nothing usable at all. Filter-then-search returns the right top-10 every time.

Optional LLM-driven extraction: bring your own key. Point a hybrid collection at any OpenAI-compatible model to turn raw text chunks into typed entities and edges, each with full provenance and a verify / reject / re-extract curation loop. Off by default. You supply and own the key.

Quality-aware and time-filtered traversal. Weight hops by edge confidence, recency, or an explicit numeric property, and restrict a hop to edges valid at a point in time and regime. All opt-in.

15+ vector math operations, built in. Ghost vectors, cone search, SLERP interpolation, k-means, PCA, maximal marginal relevance, centroid computation, analogy completion, drift detection, and more. On a hybrid collection, several of them run exactly over a graph-built frontier.

Built to survive production. Rust-native with SIMD acceleration (AVX2, SSE4.1, NEON, scalar fallback), zero-copy mmap, arena allocators, and lock-free concurrency. Crash-safe by default via write-ahead log, with transparent recovery and fast restart. Dual API: high-throughput gRPC and curl-friendly REST.

macOS Intel (x86_64) is not built by CI. Apple Silicon Macs only for the macOS wheel today. Intel-Mac users can run the manual release script on an x86_64 macOS host or wait for native support.

Windows ARM64 is not built by CI. Windows x86_64 only for the Windows wheel today. Windows on ARM hosts can run the x86_64 wheel under Windows' built-in x86 emulation, or wait for native support.

How a hybrid query flows

A hybrid query is a small pipeline. You seed a candidate set by similarity, let the graph reshape it by structure, then rank what survives, all inside one engine, over one copy of the data.

And because storage is unified, ingestion and recovery are one path too: a write hits the log first, so a crash never costs you committed data.

Performance

Search throughput, recall, and latency on DBpedia 1M (1536-dim float32) with cosine distance and default HNSW parameters (M=16, ef_construction=200), measured on a 32-core, 64 GB host with 8 concurrent searcher threads, 1,000 queries per ef_search setting averaged across 3 iterations:

ef_search	QPS	Recall@10	p50 (ms)	p95 (ms)	p99 (ms)
25	2,398	0.9816	3.16	4.91	6.06
50	2,214	0.9894	3.33	5.26	6.77
100	1,801	0.9921	4.16	6.85	8.02
200	1,233	0.9935	6.18	10.19	12.26
400	760	0.9960	10.00	16.83	20.48
800	437	0.9974	17.42	30.43	35.90

~0.99 recall@10 at over 2,200 QPS, single host. Loading that same 1M set via bulk_insert_from_path peaks at 7.45 GiB resident (the file is memory-mapped, so working memory is bounded by the index, not the input), and a 200k collection comes back queryable 5.5 seconds after a hard SIGKILL.

For worker-saturation curves, ingestion rates, restart and recovery timings, and full reproduction recipes, see Benchmarks.

Quick Start

Pull and run from Docker Hub:

docker run -d -p 8080:8080 -p 50051:50051 sarthiai/swarndb

Install the SDK:

pip install swarndb

Connect, create a plain vector collection, insert a few vectors, and search:

from swarndb import SwarnDBClient

with SwarnDBClient(host="localhost", port=50051) as client:
    # Vector-only by default.
    client.collections.create("articles", dimension=384, distance_metric="cosine")

    # Each insert returns the assigned id.
    client.vectors.insert("articles", vector=[0.1, 0.2, 0.3, ...], metadata={"topic": "physics"})
    client.vectors.insert("articles", vector=[0.3, 0.1, 0.4, ...], metadata={"topic": "math"})

    # Search for the nearest neighbors.
    results = client.search.query("articles", vector=[0.1, 0.2, 0.3, ...], k=10)
    for r in results.results:
        print(r.id, r.score)  # distance score, lower is more similar

The same operations are available over REST, and async support is available via AsyncSwarnDBClient with the same API surface.

See the Docker Guide for persistence, configuration, and Docker Compose, and the API Reference for REST.

Turn text into a graph (optional)

Bring your own OpenAI-compatible key, and SwarnDB will read your text chunks, propose typed entities and edges, and let you preview the cost before a single token is spent, then curate what it found.

from swarndb import SwarnDBClient

with SwarnDBClient(host="localhost", port=50051) as client:
    client.extraction.set_llm_config(
        "articles",
        base_url="https://openrouter.ai/api/v1",
        api_key="sk-or-...",
        model_name="openai/gpt-4o-mini",
        temperature=0.0,
        max_tokens=2048,
    )
    client.extraction.set_ontology("articles", base_template="research-papers", replace=False)

    estimate = client.extraction.cost_preview("articles", chunks)
    print(f"Estimated cost: ${estimate.estimated_cost_usd}")

    result = client.extraction.start_extraction("articles", chunks)

Every auto-generated edge keeps its source document, source chunk, model, confidence, and verification status, so you always know where a fact came from. See LLM Extraction.

Architecture

SwarnDB is organized as eight Rust crates with clean dependency boundaries:

Crate	Role
`vf-core`	Core types, distance functions, SIMD kernels
`vf-storage`	WAL, segment management, memory-mapped I/O, collections
`vf-index`	HNSW and brute-force index implementations
`vf-query`	Filter evaluation, query execution, batch processing, hybrid vector-and-graph query engine
`vf-quantization`	Scalar, product, and binary quantization; IVF partitioning
`vf-graph`	First-class typed graph: typed nodes and edges with provenance, traversal, and composable hybrid queries
`vf-extraction`	Optional LLM-driven extraction of typed entities and edges from text (bring your own key)
`vf-server`	gRPC and REST servers, authentication, health checks

Key Capabilities

Ingestion

Single insert for one-at-a-time writes via gRPC or REST
Streaming bulk insert with batched gRPC streams, configurable batch lock size, WAL flush interval, and optional parallel HNSW construction
File-based bulk insert via bulk_insert_from_path: the server reads a .npy or flat .f32 file from any path it can read and ingests directly from the kernel page cache, without copying the payload through gRPC
Deferred indexing during bulk loads, finalized by a single optimize() call that rebuilds the HNSW index and metadata index, with the virtual graph rebuilt on the same call when rebuild_graph=true
Bulk insert checkpoints and resume via per-batch checkpoints and an opaque resume_token, so interrupted loads pick up from the last committed batch

Restart and Recovery

Fast restart for plain HNSW collections, queryable within seconds of the server opening its ports
Parallel collection load at startup, so a multi-collection database comes up in parallel rather than serially
Incremental delta replay or full write-ahead log replay on unclean shutdown, applied transparently before traffic resumes
Operational endpoints for orchestration: /healthz, /readyz, /startupz; a global /recovery_status; a per-collection GET /api/v1/collections/{collection}/persistence_status; and Prometheus metrics at /metrics

Vector Operations

HNSW index with configurable ef_construction, ef_search, and M
Scalar quantization (SQ8) as the per-collection quantization mode: 8-bit encoding that rescores candidates against full-precision vectors, keeping recall close to plain HNSW, with fast-restart parity
IVF + Product Quantization for billion-scale datasets with bounded memory
Batch search with multi-query execution and shared overhead
Pre-filtering with adaptive index selection (B-tree, hash, bitmap) for metadata-filtered queries
Per-query ef_search to tune the recall/latency tradeoff at query time

Graph (first-class, hybrid)

Opt in per collection via mode="hybrid" at create time. All of the following is off until you do.

Typed edges with provenance linking content and entity nodes, each carrying a type, confidence, a manual-versus-extracted flag, and a provenance record
Optional LLM extraction (BYOK) turning text chunks into typed entities and edges with any OpenAI-compatible model
Composable hybrid queries chaining vector similarity, single-hop traverse, k-hop expansion, shortest path, and a graph-first vector_rank
Quality-aware and temporal traversal weighting hops by confidence, recency, or property, and restricting hops to edges valid at a point in time and regime
Vector math over a graph-built frontier so analogy, diversity, cone, and centroid operations run exactly over the candidate set the graph produced
Manual edge CRUD and bulk import with create, read, update, verify, reject, audit history, and CSV/JSONL bulk loading

The virtual graph (SwarnDB's automatic similarity graph) is also available as a separate mode (mode="auto_similarity"), off by default.

Math Engine

A library of vector math operations available through both gRPC and REST. The core operations:

Operation	What it does	Where to use it
Ghost vectors	Synthetic vectors representing absent concepts in a space	Search for something you have no example of yet, like the ideal product that fills a gap in your catalog
Cone search	Angular proximity search within a cone aperture	"More like this, but only in this direction," with a tunable strictness dial, for tightly themed search
SLERP interpolation	Spherical linear interpolation between vectors	Blend two preferences into a smooth in-between, such as a style halfway between two products
Centroid computation	Weighted and unweighted centroids of vector sets	Roll many items into one profile, like a customer's overall taste from their history or a topic's signature
Vector drift detection	Track how vector representations change over time	Catch when meaning shifts, a user's interests moving, content going off-topic, or an embedding model going stale
K-means clustering	Partition vectors into k clusters	Group items into natural buckets for customer segmentation, content topics, or catalog organization
PCA	Dimensionality reduction via principal component analysis	Shrink vectors for faster search and smaller storage, or project to 2D for maps and dashboards
Analogy completion	Vector arithmetic for analogy tasks (A:B :: C:?)	"This is to that as X is to ?" for relationship-based recommendations and attribute reasoning
Maximal marginal relevance	Diversity-aware result re-ranking	Keep results relevant but varied so you never show ten near-duplicates, and to pick broad context for RAG
Vector normalization	L2 normalization for angular similarity	The prep step that makes similarity fair, comparing by direction of meaning rather than vector length

On a hybrid collection, several of these also run as graph-first ranking steps (analogy, diversity, cone, isolation, centroid, interpolation), operating exactly over the candidate set a graph query has already produced, which is where the count climbs past fifteen.

SIMD Acceleration

All distance computations are SIMD-accelerated with runtime dispatch:

Instruction Set	Platform	Width
AVX2	x86_64	256-bit
SSE4.1	x86_64	128-bit
NEON	ARM / Apple Silicon	128-bit
Scalar	All platforms	Portable fallback

Specialized kernels include fused cosine distance (dot product plus norms in a single pass), batched multi-vector distance computation, and SIMD gather for PQ distance-table lookups.

Configuration

All configuration is via environment variables. See .env.example for the full list.

Variable	Default	Description
`SWARNDB_HOST`	`0.0.0.0`	Bind address
`SWARNDB_GRPC_PORT`	`50051`	gRPC listener port
`SWARNDB_REST_PORT`	`8080`	REST listener port
`SWARNDB_DATA_DIR`	`./data`	Data storage directory
`SWARNDB_LOG_LEVEL`	`info`	Log verbosity (`trace`, `debug`, `info`, `warn`, `error`)
`SWARNDB_API_KEYS`	(empty)	Comma-separated API keys; empty disables auth
`SWARNDB_MAX_CONNECTIONS`	`1000`	Maximum concurrent connections
`SWARNDB_REQUEST_TIMEOUT_MS`	`10000`	Request timeout in milliseconds

API Reference

SwarnDB exposes dual API surfaces: gRPC on port 50051 and REST on port 8080.

Operation	gRPC Service	REST Endpoint
Collection CRUD	`CollectionService`	`POST/GET/DELETE /api/v1/collections`
Vector CRUD	`VectorService`	`POST/GET/DELETE /api/v1/collections/{id}/vectors`
Search	`SearchService`	`POST /api/v1/collections/{id}/search`
Batch search	`SearchService`	`POST /api/v1/search/batch`
Graph operations	`GraphService`	`POST/GET /api/v1/collections/{id}/graph/*`
Math operations	`MathService`	`POST /api/v1/collections/{id}/math/*`
Health / Readiness	`HealthService`	`GET /health`, `GET /ready`

For complete API documentation, see API Reference.

Documentation

Get started

Guide	Description
Getting Started	Installation, first steps, basic usage
Core Concepts	Collections, vectors, metadata, indexing, collection modes

Vector search

Guide	Description
Quantization	SQ8 and other quantization modes, when to choose each, how to enable them
Vector Math	All 15+ vector math operations with examples

The graph

Guide	Description
Typed Graph: Overview	What the typed graph is and which graph to use (start here)
Typed Graph: Complete Guide	The full how-to reference: typed node and edge CRUD, all hybrid-query steps, predicates, curation
LLM Extraction	Optional LLM-driven extraction of typed entities and edges from text, bring your own key
Virtual Graph	The virtual graph (the automatic similarity graph): concepts, traversal, thresholds

Ingestion and operations

Guide	Description
Bulk Ingestion	Insert modes, optimize(), large file-based loads via bulk_insert_from_path
Configuration	Environment variables and tuning guide
Docker Guide	Docker setup, persistence, Compose, and building from source
Deployment	Docker, Kubernetes, and Helm deployment
Benchmarks	Reference workloads, hardware, measured numbers, reproduction recipes
Known Issues	Current limitations and their recommended mitigations

API and SDK

Guide	Description
API Reference	Complete gRPC and REST API documentation
Python SDK	SDK installation, client usage, async support

Issues and Feedback

Found a bug or have a feature request? Open an issue on GitHub Issues.

License

Elastic License 2.0 (ELv2) Source-available. You are free to use, embed, modify, and redistribute SwarnDB for any purpose, including commercial use inside your products. You may not offer SwarnDB itself as a hosted or managed service that substitutes for the features of this software, and you may not remove or obscure the license notices.

In plain language: build on top of SwarnDB, ship it inside your products, modify it for your own use. Do not repackage it and sell it as "MyVectorDBSolution."

The SwarnDB project is envisioned, developed and maintained by Chirotpal

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.github/workflows		.github/workflows
assets		assets
benchmark		benchmark
crates		crates
data/test		data/test
docs		docs
helm/swarndb		helm/swarndb
k8s		k8s
proto/swarndb/v1		proto/swarndb/v1
sdk/python		sdk/python
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
deny.toml		deny.toml
docker-compose.benchmark.yml		docker-compose.benchmark.yml
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SwarnDB

One store for meaning and structure

The one idea

What you get

How a hybrid query flows

Performance

Quick Start

Turn text into a graph (optional)

Architecture

Key Capabilities

Ingestion

Restart and Recovery

Vector Operations

Graph (first-class, hybrid)

Math Engine

SIMD Acceleration

Configuration

API Reference

Documentation

Issues and Feedback

License

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SwarnDB

One store for meaning and structure

The one idea

What you get

How a hybrid query flows

Performance

Quick Start

Turn text into a graph (optional)

Architecture

Key Capabilities

Ingestion

Restart and Recovery

Vector Operations

Graph (first-class, hybrid)

Math Engine

SIMD Acceleration

Configuration

API Reference

Documentation

Issues and Feedback

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages