MySQL-compatible Β· Embedded or Server Β· Hybrid Vector + Full-text Search Β· COW Sandbox
β‘ 1,523 QPS streaming write+search (10Γ Milvus, 3Γ Elasticsearch)
πΏ FORK/MERGE sandboxes for safe agent exploration
π Vector + full-text + scalar in one SQL query
π¬ Full ACID, MySQL protocol, works with LangChain/LlamaIndex/Dify
English | δΈζη | ζ₯ζ¬θͺ
30-Second Try Β· Quick Start Β· Why seekdb Β· Ecosystem Β· Contributing
If you find seekdb useful, consider giving it a star β it helps others discover the project.
π Read the launch blog β Β· π Reproduce the benchmark β
pip install -U pyseekdb # pyseekdb is the Python SDK for seekdbNo servers, no schemas, no embedding setup. Embedded mode runs in-process; switch to server / OceanBase mode with one line. More examples β
Agent workloads are continuous write + millisecond-later read. seekdb's async index pipeline (Change Stream) decouples DML from index build, and its two-level HNSW (incremental + snapshot) makes newly-written vectors immediately searchable.
The write path commits and returns without waiting on index construction. The Change Stream pipeline consumes the redo log asynchronously and updates the delta HNSW. Queries hit both delta and snapshot indexes with fine-grained read locks β this is why P99 stays flat under concurrency.
The result: 1,523 QPS with 21.7 ms concurrent P99 β 10.7Γ the QPS of Milvus, and P99 jitter of just 1.1Γ when concurrency rises (vs ~10Γ for ES / Milvus on the same workload).
Source: src/share/change_stream/ Β· src/share/vector_index/
FORK DATABASE snapshots an entire database in seconds β no data copy.
Agents experiment freely (write, query, even break tables); then MERGE TABLE
commits the work back, or DROP DATABASE discards it. Kernel-level COW,
not application-layer save/restore.
-- Snapshot in seconds, no data copy
FORK DATABASE agent_state TO agent_sandbox_42;
-- Agent reads/writes freely on the sandbox...
USE agent_sandbox_42;
INSERT INTO memory (session_id, embedding, content) VALUES (...);
-- Accept the work back to mainline (strategies: FAIL / THEIRS / OURS)
MERGE TABLE agent_sandbox_42.memory INTO agent_state.memory STRATEGY THEIRS;
-- ...or throw it away:
DROP DATABASE agent_sandbox_42;Source: tools/deploy/mysql_test/test_suite/fork_table/
Vector + full-text + scalar filter pushed into one execution plan. No N+1 client-side merging, no glue code to combine results.
SELECT id, title, l2_distance(emb, '[0.12,0.34,...]') AS dist
FROM docs
WHERE MATCH(content) AGAINST('quarterly report')
AND author_id = 42
AND created_at > '2026-01-01'
ORDER BY dist APPROXIMATE LIMIT 10;Built on the proven OceanBase SQL engine. Works as an embedded library, a single-node server, or in the OceanBase distributed cluster. Full ACID, real-time writes, and the entire MySQL ecosystem out of the box.
Choose your platform:
βοΈ Cloud (Zero Install)
One curl, a running database β no signup, no credit card.
curl -X POST https://d0.seekdb.ai/api/v1/instancesFree for 7 days. Learn more β
π Python (Recommended for AI/ML)
pip install -U pyseekdbπ³ Docker (Quick Testing)
docker run -d \
--name seekdb \
-p 2881:2881 \
-p 2886:2886 \
-v ./data:/var/lib/oceanbase \
oceanbase/seekdb:latestPlease refer to the document of this docker image for details.
π¦ Binary (Standalone)
# Linux (one-line install, may need sudo)
curl -fsSL https://obportal.s3.ap-southeast-1.amazonaws.com/download-center/opensource/seekdb/seekdb_install.sh | bash
# macOS (Homebrew)
brew tap oceanbase/seekdb
brew install seekdbSee deployment docs for DEB/RPM offline install and configuration details.
For the full Python SDK walkthrough β connection modes, embedding functions, metadata filters β see the pyseekdb User Guide.
π€ Agent Memory Pattern (continuous write + immediate retrieval)
The canonical agent loop: write an observation, retrieve relevant context milliseconds later, repeat. seekdb's async index pipeline keeps both sides fast under sustained concurrency.
import pyseekdb
client = pyseekdb.Client(path="./agent_state.db")
memory = client.get_or_create_collection(name="episodic")
for step in agent.run():
# Persist the observation
memory.upsert(ids=[step.id], documents=[step.observation])
# Retrieve relevant context β milliseconds after the write,
# served by the incremental HNSW (no waiting on a background rebuild)
relevant = memory.query(query_texts=step.next_query, n_results=5)
agent.act(relevant)ποΈ SQL β Schema + Hybrid Search
-- Table with vector column, full-text index, and HNSW vector index
CREATE TABLE articles (
id INT PRIMARY KEY,
title TEXT,
content TEXT,
embedding VECTOR(384),
FULLTEXT INDEX idx_fts (content) WITH PARSER ik,
VECTOR INDEX idx_vec (embedding) WITH (DISTANCE=l2, TYPE=hnsw, LIB=vsag)
) ORGANIZATION = HEAP;
-- Hybrid search: vector similarity + full-text match in one query
SELECT id, title,
l2_distance(embedding, '[0.12, 0.34, ...]') AS dist
FROM articles
WHERE MATCH(content) AGAINST('quarterly report')
ORDER BY dist APPROXIMATE
LIMIT 10;Python developers can access this via SQLAlchemy or any MySQL driver.
π― Agentic AI β Memory, Sandbox & State
Agents need a state store that handles continuous memory writes, millisecond-later retrieval, branching for exploration, and rollback when things go wrong. seekdb is built for exactly this:
- Streaming-friendly storage β write a memory, query it in the next ms
- COW sandboxes β
FORK DATABASEfor safe experimentation,MERGEto accept,DROPto roll back - Hybrid retrieval β vector + full-text + relational in one SQL
- MySQL protocol β works with LangChain, LlamaIndex, Dify out of the box
Personal assistants Β· Enterprise automation Β· Vertical agents Β· Agent platforms
π§© Other Use Cases
seekdb's hybrid retrieval + multi-model engine also fits classic AI workloads:
- π RAG & Knowledge Retrieval β vector + full-text + scalar filters with multi-level access control. Enterprise QA, customer support, industry insights, personal knowledge bases.
- π Semantic Search β embedding-based search across text, images, and other modalities. Product search, text-to-image, image-to-product.
- π» AI-Assisted Coding β semantic code search, multi-project isolation, time-travel queries for IDE plugins and code agents. Local IDEs, web IDEs, design-to-web.
- β¬οΈ Enterprise Application Intelligence β MySQL-compatible AI layer for legacy systems, with row/column hybrid storage. Document intelligence, business insights, finance systems.
- π± On-Device & Edge AI β embedded / micro-server modes for resource-constrained devices. In-vehicle systems, AI education, companion robots, healthcare devices.
+ Camel-AI Β· DB-GPT Β· FastGPT Β· Firecrawl Β· Spring-AI-Alibaba Β· Cloudflare Workers AI Β· Jina AI Β· Ragas Β· Instructor Β· Baseten β see User Guide for the full list.
- π Read the docs β β Quickstart, API reference, integration guides
- π Launch blog β β The architecture behind 10.7Γ the QPS of Milvus
- π Open an issue β β Report bugs, request features
- π€ Contribute β β Help build the agent-era state store
Before building, please install the required toolchain and dependencies for your operating system. See Install Toolchain for detailed instructions.
# Clone the repository
git clone https://github.com/oceanbase/seekdb.git
cd seekdb
bash build.sh debug --init --make
mkdir -p ~/seekdb/bin
cp build_debug/src/observer/seekdb ~/seekdb/bin
cd ~/seekdb
./bin/seekdbIn this example, the working directory is $HOME/seekdb, please use a fresh directory for testing. Please see the Developer Guide for detailed instructions.
We welcome contributions! See our Contributing Guide to get started.
If seekdb is useful to you, a star helps others find it. β
seekdb is built by the OceanBase team β the same database engine running in production at Alipay, Taobao, DiDi, Xiaomi, and more. Fully open-source under the Apache License, Version 2.0.
