Skip to content

oceanbase/seekdb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

602 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
seekdb logo

Write. Search. Fork. The State Store for AI Agents.

GitHub Stars Latest Release Commit Activity Contributors Issues License Downloads Join Discord Documentation Ask DeepWiki follow on LinkedIn YouTube

MySQL-compatible Β· Embedded or Server Β· Hybrid Vector + Full-text Search Β· COW Sandbox

⚑ 1,523 QPS streaming write+search (10Γ— Milvus, 3Γ— Elasticsearch)
🌿 FORK/MERGE sandboxes for safe agent exploration
πŸ” Vector + full-text + scalar in one SQL query
🐬 Full ACID, MySQL protocol, works with LangChain/LlamaIndex/Dify

English | δΈ­ζ–‡η‰ˆ | ζ—₯本θͺž

30-Second Try Β· Quick Start Β· Why seekdb Β· Ecosystem Β· Contributing

If you find seekdb useful, consider giving it a star β€” it helps others discover the project.


⚑ Performance at a Glance

seekdb benchmark: 10.7Γ— the QPS of Milvus, 3.2Γ— of Elasticsearch

πŸ“– Read the launch blog β†’ Β· πŸ” Reproduce the benchmark β†’


⏱️ 30-Second Try

seekdb 30-second demo
pip install -U pyseekdb   # pyseekdb is the Python SDK for seekdb

No servers, no schemas, no embedding setup. Embedded mode runs in-process; switch to server / OceanBase mode with one line. More examples β†’


✨ Why seekdb for Agents?

πŸ”₯ Streaming Write + Concurrent Search, Without the P99 Spike

Agent workloads are continuous write + millisecond-later read. seekdb's async index pipeline (Change Stream) decouples DML from index build, and its two-level HNSW (incremental + snapshot) makes newly-written vectors immediately searchable.

seekdb async index pipeline architecture

The write path commits and returns without waiting on index construction. The Change Stream pipeline consumes the redo log asynchronously and updates the delta HNSW. Queries hit both delta and snapshot indexes with fine-grained read locks β€” this is why P99 stays flat under concurrency.

The result: 1,523 QPS with 21.7 ms concurrent P99 β€” 10.7Γ— the QPS of Milvus, and P99 jitter of just 1.1Γ— when concurrency rises (vs ~10Γ— for ES / Milvus on the same workload).

Source: src/share/change_stream/ Β· src/share/vector_index/

🌿 Copy-on-Write Sandboxes for Agent Exploration

FORK DATABASE snapshots an entire database in seconds β€” no data copy. Agents experiment freely (write, query, even break tables); then MERGE TABLE commits the work back, or DROP DATABASE discards it. Kernel-level COW, not application-layer save/restore.

-- Snapshot in seconds, no data copy
FORK DATABASE agent_state TO agent_sandbox_42;

-- Agent reads/writes freely on the sandbox...
USE agent_sandbox_42;
INSERT INTO memory (session_id, embedding, content) VALUES (...);

-- Accept the work back to mainline (strategies: FAIL / THEIRS / OURS)
MERGE TABLE agent_sandbox_42.memory INTO agent_state.memory STRATEGY THEIRS;
-- ...or throw it away:
DROP DATABASE agent_sandbox_42;

Source: tools/deploy/mysql_test/test_suite/fork_table/

πŸ” Hybrid Search in a Single SQL

Vector + full-text + scalar filter pushed into one execution plan. No N+1 client-side merging, no glue code to combine results.

SELECT id, title, l2_distance(emb, '[0.12,0.34,...]') AS dist
FROM docs
WHERE MATCH(content) AGAINST('quarterly report')
  AND author_id = 42
  AND created_at > '2026-01-01'
ORDER BY dist APPROXIMATE LIMIT 10;

🐬 MySQL-Compatible, ACID, Embeddable

Built on the proven OceanBase SQL engine. Works as an embedded library, a single-node server, or in the OceanBase distributed cluster. Full ACID, real-time writes, and the entire MySQL ecosystem out of the box.


🎬 Quick Start

Installation

Choose your platform:

☁️ Cloud (Zero Install)

One curl, a running database β€” no signup, no credit card.

curl -X POST https://d0.seekdb.ai/api/v1/instances

Free for 7 days. Learn more β†’

🐍 Python (Recommended for AI/ML)
pip install -U pyseekdb
🐳 Docker (Quick Testing)
docker run -d \
  --name seekdb \
  -p 2881:2881 \
  -p 2886:2886 \
  -v ./data:/var/lib/oceanbase \
  oceanbase/seekdb:latest

Please refer to the document of this docker image for details.

πŸ“¦ Binary (Standalone)
# Linux (one-line install, may need sudo)
curl -fsSL https://obportal.s3.ap-southeast-1.amazonaws.com/download-center/opensource/seekdb/seekdb_install.sh | bash

# macOS (Homebrew)
brew tap oceanbase/seekdb
brew install seekdb

See deployment docs for DEB/RPM offline install and configuration details.

πŸ“ More Examples

For the full Python SDK walkthrough β€” connection modes, embedding functions, metadata filters β€” see the pyseekdb User Guide.

πŸ€– Agent Memory Pattern (continuous write + immediate retrieval)

The canonical agent loop: write an observation, retrieve relevant context milliseconds later, repeat. seekdb's async index pipeline keeps both sides fast under sustained concurrency.

import pyseekdb

client = pyseekdb.Client(path="./agent_state.db")
memory = client.get_or_create_collection(name="episodic")

for step in agent.run():
    # Persist the observation
    memory.upsert(ids=[step.id], documents=[step.observation])

    # Retrieve relevant context β€” milliseconds after the write,
    # served by the incremental HNSW (no waiting on a background rebuild)
    relevant = memory.query(query_texts=step.next_query, n_results=5)

    agent.act(relevant)
πŸ—„οΈ SQL β€” Schema + Hybrid Search
-- Table with vector column, full-text index, and HNSW vector index
CREATE TABLE articles (
  id        INT PRIMARY KEY,
  title     TEXT,
  content   TEXT,
  embedding VECTOR(384),
  FULLTEXT INDEX idx_fts (content) WITH PARSER ik,
  VECTOR   INDEX idx_vec (embedding) WITH (DISTANCE=l2, TYPE=hnsw, LIB=vsag)
) ORGANIZATION = HEAP;

-- Hybrid search: vector similarity + full-text match in one query
SELECT id, title,
       l2_distance(embedding, '[0.12, 0.34, ...]') AS dist
FROM articles
WHERE MATCH(content) AGAINST('quarterly report')
ORDER BY dist APPROXIMATE
LIMIT 10;

Python developers can access this via SQLAlchemy or any MySQL driver.

πŸ“š Use Cases

🎯 Agentic AI β€” Memory, Sandbox & State

Agents need a state store that handles continuous memory writes, millisecond-later retrieval, branching for exploration, and rollback when things go wrong. seekdb is built for exactly this:

  • Streaming-friendly storage β€” write a memory, query it in the next ms
  • COW sandboxes β€” FORK DATABASE for safe experimentation, MERGE to accept, DROP to roll back
  • Hybrid retrieval β€” vector + full-text + relational in one SQL
  • MySQL protocol β€” works with LangChain, LlamaIndex, Dify out of the box

Personal assistants Β· Enterprise automation Β· Vertical agents Β· Agent platforms

🧩 Other Use Cases

seekdb's hybrid retrieval + multi-model engine also fits classic AI workloads:

  • πŸ“– RAG & Knowledge Retrieval β€” vector + full-text + scalar filters with multi-level access control. Enterprise QA, customer support, industry insights, personal knowledge bases.
  • πŸ” Semantic Search β€” embedding-based search across text, images, and other modalities. Product search, text-to-image, image-to-product.
  • πŸ’» AI-Assisted Coding β€” semantic code search, multi-project isolation, time-travel queries for IDE plugins and code agents. Local IDEs, web IDEs, design-to-web.
  • ⬆️ Enterprise Application Intelligence β€” MySQL-compatible AI layer for legacy systems, with row/column hybrid storage. Document intelligence, business insights, finance systems.
  • πŸ“± On-Device & Edge AI β€” embedded / micro-server modes for resource-constrained devices. In-vehicle systems, AI education, companion robots, healthcare devices.

🌟 Ecosystem & Integrations

LangChain LlamaIndex Dify LangGraph Coze HuggingFace

+ Camel-AI Β· DB-GPT Β· FastGPT Β· Firecrawl Β· Spring-AI-Alibaba Β· Cloudflare Workers AI Β· Jina AI Β· Ragas Β· Instructor Β· Baseten β€” see User Guide for the full list.


🌐 Next Steps & Community


πŸ› οΈ Development

Build from Source

Before building, please install the required toolchain and dependencies for your operating system. See Install Toolchain for detailed instructions.

# Clone the repository
git clone https://github.com/oceanbase/seekdb.git
cd seekdb
bash build.sh debug --init --make
mkdir -p ~/seekdb/bin
cp build_debug/src/observer/seekdb ~/seekdb/bin
cd ~/seekdb
./bin/seekdb

In this example, the working directory is $HOME/seekdb, please use a fresh directory for testing. Please see the Developer Guide for detailed instructions.

Contributing

We welcome contributions! See our Contributing Guide to get started.

Contributors

πŸ“ˆ Star History

Star History Chart

If seekdb is useful to you, a star helps others find it. ⭐


πŸ“„ License

seekdb is built by the OceanBase team β€” the same database engine running in production at Alipay, Taobao, DiDi, Xiaomi, and more. Fully open-source under the Apache License, Version 2.0.

About

The AI-native state store for agents. MySQL-compatible, embedded or server, hybrid vector + full-text search, COW sandboxes (FORK/MERGE).

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors