Skip to content

birand/brisedb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

brisedb

A distributed key-value store with volume-based blob storage, replication across multiple machines and drives, and an HTTP API optimised for values between 1 MB and 1 GB. Inspired by SeaweedFS, but simple.


Features

  • Persistent storage — append-only volume files + WAL for crash recovery
  • On-disk hash index — O(1) key lookup with 2 disk seeks; keys do not need to fit in RAM
  • Multi-drive support — spread volumes across multiple local directories
  • Remote volume servers — blob data can live on separate machines over HTTP
  • Replication — each blob written to N servers simultaneously; reads auto-failover
  • HTTP blob API — streaming PUT/GET/DELETE with Range header and TTL
  • TCP command protocol — Redis-inspired text protocol for key management
  • Transactions — nested BEGIN / COMMIT / ROLLBACK with per-connection isolation
  • TTL / expirySET key value EX 60, lazy eviction + background sweep
  • Pub/SubSUBSCRIBE / PUBLISH / UNSUBSCRIBE
  • WAL replication — stream changes to read-only replicas in real time
  • Compaction — rewrite index and WAL, discarding deleted and expired entries

Quick start

# Build everything
go build ./...

# Start a single-node server (TCP :6380, HTTP blob API :6381)
./server --data ./data --http :6381

# Store and retrieve a value via TCP
echo "SET hello world" | nc localhost 6380   # +OK
echo "GET hello"       | nc localhost 6380   # +world

# Store and retrieve a blob via HTTP
curl -X PUT http://localhost:6381/v1/blobs/myfile --data-binary @photo.jpg
curl     http://localhost:6381/v1/blobs/myfile -o photo-copy.jpg

Architecture

┌─────────────────────────────────────────────────────┐
│                    brisedb master                   │
│                                                     │
│  TCP server (:6380)     HTTP blob API (:6381)       │
│       │                        │                    │
│       └──────────┬─────────────┘                    │
│                  │                                  │
│             BriseDB core                            │
│          ┌────────────────┐                         │
│          │  hash index    │  index.hash (on disk)   │
│          │  WAL           │  wal.log                │
│          │  VolumePool    │                         │
│          └───────┬────────┘                         │
│                  │                                  │
│        ┌─────────┼──────────┐                       │
│   local vol   remote     remote                     │
│   (drives)    server 1   server 2                   │
└─────────────────────────────────────────────────────┘
                   │ HTTP
           ┌───────┴───────┐
      volume-server   volume-server
      (:8081)          (:8082)

Each volume file (vol-000001.data) is an append-only binary file. A volume group (created when replication is enabled) holds N identical copies across N servers — because all members receive the same writes in order, offsets are identical, allowing any member to serve a read.


Binaries

Binary Description
./server Master node — TCP protocol + optional HTTP blob API
./volume-server Standalone blob storage node (HTTP only)
./replica Read-only replica that streams WAL from master
./cli Interactive REPL for local embedded use

TCP protocol

Connect with nc, redis-cli -3, or the Go client (pkg/client).

Key-value

SET key value             → +OK
SET key value EX 60       → +OK   (TTL 60 seconds)
GET key                   → +value | +nil
DELETE key                → +OK
TTL key                   → +<seconds> | +-1 (no TTL) | +-2 (not found)
PERSIST key               → +1 | +0
COUNT value               → +<n>   (number of keys whose value equals value)
KEYS pattern              → +<n>
                             +key1
                             +key2 ...
SCAN cursor COUNT n       → +nextCursor n
                             +key1 ...

Transactions

BEGIN
SET x 1
SET y 2
COMMIT            → +OK  (writes to volumes + WAL atomically)

BEGIN
SET x bad
ROLLBACK          → +OK  (discards all changes)

Transactions nest — each BEGIN pushes a new savepoint. COMMIT merges into the parent transaction or the main store.

Pub/Sub

SUBSCRIBE news sports      → +SUBSCRIBE news 1
PUBLISH  news "breaking"   → +1
# subscribers receive:       +MESSAGE news breaking
UNSUBSCRIBE news           → +UNSUBSCRIBE news 0

Admin

COMPACT    → rewrite index + WAL, reclaim space from deleted/expired keys
VOLINFO    → list volume files: +<id> <drive> <size_bytes>
STOP       → close this connection

HTTP blob API

Start the HTTP server with --http :6381 or set http_addr in the config file.

Endpoints

Method Path Description
PUT /v1/blobs/{key} Store blob; optional ?ttl=30s
GET /v1/blobs/{key} Retrieve blob; supports Range header
HEAD /v1/blobs/{key} Metadata only (no body)
DELETE /v1/blobs/{key} Remove blob
GET /v1/keys?pattern=* List keys matching glob
GET /v1/scan?cursor=0&count=100 Paginated key scan
GET /v1/status Health check + volume stats

Examples

# Store with TTL
curl -X PUT "http://localhost:6381/v1/blobs/session:abc?ttl=1h" \
     --data-binary @token.bin

# Retrieve full blob
curl http://localhost:6381/v1/blobs/session:abc -o token.bin

# Resume a partial download (Range)
curl -H "Range: bytes=1000000-" \
     http://localhost:6381/v1/blobs/bigfile >> partial.bin

# Check metadata without downloading
curl -I http://localhost:6381/v1/blobs/myfile
# X-Brise-Size: 104857600
# X-Brise-TTL: 3542   (remaining seconds; -1 = no expiry)

# List keys
curl "http://localhost:6381/v1/keys?pattern=session:*"
# {"keys":["session:abc","session:xyz"],"count":2}

# Paginated scan
curl "http://localhost:6381/v1/scan?cursor=0&count=100"
# {"next_cursor":100,"keys":[...],"count":100}

Multi-machine cluster

1. Start volume servers on storage nodes

# node2
./volume-server --addr :8081 --data /mnt/ssd1

# node3
./volume-server --addr :8082 --data /mnt/ssd2

2. Configure and start the master

{
  "addr": ":6380",
  "http_addr": ":6381",
  "data_dir": "/var/brisedb",
  "volume_servers": ["http://node2:8081", "http://node3:8082"],
  "replication_factor": 2,
  "max_volume_size": 2147483648
}
./server --config cluster.json

With replication_factor: 2, every blob is written to 2 backends simultaneously (local + one remote, assigned round-robin per volume group). If a server goes down, reads automatically fail over to the surviving replica.

3. Start a read replica (optional)

./replica --master localhost:6380 --data /var/brisedb-replica --addr :6382

The replica streams WAL entries from the master in real time and serves read-only queries on its own TCP port.


Configuration reference

All fields are optional; command-line flags override file values.

{
  "addr": ":6380",
  "http_addr": ":6381",
  "data_dir": "brisedb-data",
  "drives": ["/mnt/disk2/volumes", "/mnt/disk3/volumes"],
  "volume_servers": ["http://10.0.0.2:8081"],
  "replication_factor": 2,
  "max_volume_size": 2147483648
}
Field Default Description
addr :6380 TCP server listen address
http_addr (disabled) HTTP blob API listen address
data_dir brisedb-data Primary data directory
drives [] Extra local drive directories for volume files
volume_servers [] Remote HTTP volume server base URLs
replication_factor 1 Blob copies per write (1 = no replication)
max_volume_size 2147483648 Max bytes per volume file before rotation (2 GiB)

Go client

import "github.com/birand/brisedb/pkg/client"

c, err := client.Dial("localhost:6380")
if err != nil { ... }
defer c.Close()

c.Set("name", "Alice")
val, ok := c.Get("name")      // "Alice", true
c.SetEX("token", "xyz", 60)   // expires in 60 s
c.Delete("name")

// Transactions
c.Begin()
c.Set("x", "1")
c.Set("y", "2")
c.Commit()

// Pub/Sub
sub, _ := c.Subscribe("news")
go func() {
    for msg := range sub {
        fmt.Println(msg.Channel, msg.Payload)
    }
}()
c.Publish("news", "hello")

// WAL replication stream
r, _ := client.DialReplica("localhost:6380")
for entry := range r.Entries() {
    fmt.Println(entry.Type, entry.Key)
}

Storage layout

brisedb-data/
├── index.hash          # persistent hash index (key → NeedleAddr)
├── wal.log             # write-ahead log (needle addresses, no values)
├── vol-registry.json   # volume group registry (globalID → servers)
└── volumes/
    ├── vol-000000.data
    ├── vol-000001.data
    └── ...
  • index.hash — open-addressing hash table, append-only. Default 1 M buckets (8 MiB). COMPACT rewrites it, removing tombstones and expired entries.
  • wal.log — one JSON line per committed SET/DELETE. Only used when the hash index is empty (crash recovery / first open). COMPACT rewrites it to match the live index.
  • Volume files[8-byte big-endian size][payload bytes] per entry. Never overwritten; values accumulate until the file reaches max_volume_size, then a new file is created in the same drive.

Benchmarks

Measured on Apple M1, Go 1.24, local disk (-benchtime=1s).

Hot-path operations

Benchmark ops/s ns/op allocs/op
Set (single key, WAL flush) ~109 K 9 221 7
Get (cache miss) ~615 K 1 624 3
Get (miss / not found) ~42 M 28 0
Delete ~691 K 1 482 3
SetBlob 1 KB ~92 K 13 028 10
SetBlob 64 KB ~12 K 104 181 10
SetBlob 1 MB ~866 1 513 876 13
SetBlobStream 1 MB (zero-copy) ~1 797 1 023 929
GetBlob 1 KB ~427 K 2 420 2
GetBlob 64 KB ~149 K 7 983 2
GetBlob 1 MB (cached) ~1 M 1 062 1
Transaction (1 key) ~106 K 11 221 19
Transaction (10 keys) ~10 K 103 723 104
Rollback ~5 M 231 6
HashIndex Set ~280 K 3 566 1
HashIndex Get ~950 K 1 059 1
HashIndex GetMiss ~135 M 7 0
Set (8 goroutines, WAL group-commit) ~884 K 9 036 7
Get (8 goroutines) ~421 K 2 369 3
Mixed 80%R/20%W (8 goroutines) ~355 K 2 816 5

ForEach / Compact — adaptive strategy

ForEach selects its scan strategy automatically based on index size:

  • Small index (< 100 K keys): sequential data-region scan → map[string]latest — fastest path, O(N) RAM.
  • Large index (≥ 100 K keys): bucket-chain traversal — O(max chain depth) RAM (a handful of strings), prevents OOM on large datasets.
Operation Keys Strategy RAM Speed
ForEach 1 K sequential scan 139 KB 1.1 ms
ForEach 10 K sequential scan 1.1 MB 10.7 ms
ForEach 100 K bucket-chain 1.5 MB 107 ms
Compact 100 K bucket-chain 9.9 MB 943 ms

At 100 K keys the bucket-chain path uses 1.5 MB vs. 19.6 MB for the map-based approach (−92%), while also running faster (107 ms vs. 148 ms) because the pre-sized map avoids rehashing overhead at this scale.

Run benchmarks:

go test ./pkg/brisedb/ -bench=. -benchtime=1s -benchmem

Development

# Run all tests
go test ./...

# Verbose output for one package
go test -v ./pkg/brisedb/

# Run a specific test
go test ./pkg/brisedb/ -run TestTransaction

# Build all binaries
go build ./cmd/server/ ./cmd/volume-server/ ./cmd/replica/ ./cmd/cli/

About

A distributed key-value store with volume-based blob storage, replication across multiple machines and drives, and an HTTP API optimised for values between 1 MB and 1 GB. Inspired by SeaweedFS, but simple.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors