distribKV is a distributed key-value store written in Go. It supports sharding, replication, client-side routing, LSM-based storage (BadgerDB), and more.
Note
This project is for educational purposes — to understand how a key-value store works under the hood along with the theoretical concepts of sharding, replication, LSM trees, consistency models, bloom filters, and distributed systems.
- Sharding & Replication with configurable leaders and replicas
- Client-side Routing with HTTP redirection
- LSM-Tree Storage using BadgerDB
- Benchmarking Tools for performance testing
- Concurrency via Goroutines
- Persistence with Write-Ahead Logging and Compaction
- Built with Go 1.24 and designed for extensibility
This distributed key-value store is built with modern distributed systems principles, ensuring scalability, fault tolerance, and high availability.
-
Sharding(static)
The distribKV project utilizes a static sharding technique. In static sharding, the dataset is partitioned across predefined shards based on a fixed configuration. Each shard is responsible for a specific subset of the data, and the mapping between data keys and shards is established at the time of system configuration. In distribKV, this configuration is specified in the sharding.toml file. -
Replication
Maintains multiple copies (replicas) of data across nodes for fault tolerance and high availability. Uses a leader–follower model or single leader replication, where the leader node handles all writes and followers synchronize data.
-
Eventual Consistency
After a write, data is eventually replicated to all nodes, ensuring convergence over time. -
Leader-based Replication
A single leader node handles all writes. Follower nodes replicate updates to maintain consistency.
-
Key-based Routing
Clients use consistent hashing to determine the shard responsible for a key. -
Redirection
If a request hits the wrong shard or non-leader replica, it is redirected using HTTP307or308status codes to the correct node.
-
BadgerDB (LSM Tree) and SSTables
Uses BadgerDB, an embeddable key-value store inspired by RocksDB and LevelDB. It implements a Log-Structured Merge Tree (LSM) architecture for high write throughput and efficient storage management.- LSM Tree: Writes are first stored in memory (MemTable) and periodically flushed to disk as immutable Sorted String Tables (SSTables).
- Write-Ahead Log (WAL): Ensures durability by logging changes before applying them.
- Compaction: Periodically merges SSTables to optimize read performance and reclaim storage space.
- Bloom Filters:Bloom filters are built for each SSTable to help quickly reject non-existent keys, avoiding unnecessary disk reads.
-
Horizontal Scaling (Scale-Out)
The system supports adding more nodes (shards or replicas) to distribute the load. Keys are partitioned across shards using consistent hashing, allowing efficient horizontal growth. -
Vertical Scaling (Scale-Up)
Each node can independently handle increased load by using efficient concurrency with Goroutines and optimizing storage with BadgerDB’s low-overhead design. -
Shard-Based Partitioning
Data is divided across shards by hashing keys. Each shard can be independently scaled with its own replicas and leaders. -
Replication-Based Scaling
Read operations are offloaded to replicas, reducing the load on leader nodes and improving throughput.
-
Goroutines
Efficient parallelism using lightweight Go goroutines for concurrentGET/SETrequests. -
Benchmarking
Tools for measuring:- Throughput (requests/sec)
- Latency (response times)
- Load performance
-
Replication
Replicas allow the system to survive leader failures without data loss. -
Data Synchronization
Periodic syncing ensures eventual consistency between leader and replicas.
“You can only choose two: Consistency, Availability, Partition Tolerance.”
distribKV favors:
- ✅ Availability
- ✅ Partition Tolerance
- 🚫 (Relaxed) Consistency
Strong consistency can be added with protocols like Raft or Paxos in the future (LOL).
This project draws inspiration from:
