Skip to content

AnshuMishra01/KV-store

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

KVStore — LSM-Tree Storage Engine (C++)

A persistent key-value storage engine built from scratch in C++17, implementing the same LSM-tree architecture used by LevelDB, RocksDB, and Cassandra.

Benchmark Results

500,000 sequential writes, 100,000 random reads, 100-byte values.

Metric Baseline (v1) Current LevelDB (published)
Sequential write 26,666 ops/s 224,823 ops/s ~566,000 ops/s
Random read (hit) 37,347 ops/s 27,075 ops/s ~62,000 ops/s
Negative read (miss) 1,419,825 ops/s 2,978,277 ops/s bloom-gated
Read avg latency 26.68 us 36.83 us 16 us
Read p99 latency 40.83 us 61.42 us
Disk usage 61.3 MB 60.0 MB

8.4x write improvement from baseline to current (WAL open/close fix).

Architecture

          WRITE PATH                         READ PATH
          ----------                         ---------
PUT(k,v) -> WAL append (disk)                GET(k)
          -> MemTable insert (RAM)             |
                  |                            v
           (when full)                   MemTable (RAM)
                  v                            | miss
           Flush to SSTable (disk)             v
                  |                      Bloom filter check
           (background)                        | maybe
                  v                            v
           Compaction                    SSTable scan (disk)
           (merge SSTables)                    | miss
                                               v
                                         Older SSTables...

Components

File What it does
include/kvstore.h / src/kvstore.cpp Public API + orchestration (flush, compaction, recovery)
include/memtable.h / src/memtable.cpp In-memory sorted store (std::map)
include/wal.h / src/wal.cpp Write-ahead log for crash recovery (binary format)
include/sstable.h / src/sstable.cpp Immutable sorted on-disk tables (data + sparse index + bloom + footer)
include/bloom.h / src/bloom.cpp Bloom filter (~1% FP rate, FNV-1a hash, double hashing)
include/binio.h Shared binary I/O helpers (length-prefixed strings, fixed-width ints)
include/tombstone.h Delete marker (NUL-padded sentinel value)

SSTable File Format

+----------------------------+
|  DATA BLOCK                |  [key_len:4][key][val_len:4][val] ...sorted...
+----------------------------+
|  INDEX BLOCK (sparse)      |  [count:4] then [key_len:4][key][offset:8] every 16th key
+----------------------------+
|  BLOOM BLOCK               |  [m:8][k:4][nbytes:4][bit_array]
+----------------------------+
|  FOOTER                    |  [index_offset:8][bloom_offset:8]  (last 16 bytes)
+----------------------------+

Build & Run

make            # build static library
make test       # run tests
make bench      # run benchmarks (results appended to bench/results.md)

Requires C++17 and pthreads.

Key Implementation Decisions

  1. WAL before RAM — durable log first, then in-memory insert. Crash between the two loses nothing.
  2. Binary format everywhere — length-prefixed [len:4][bytes], no delimiters. Values can contain any bytes.
  3. Immutable SSTables — written once, never modified. Enables lock-free reads during compaction.
  4. Tombstone deletes — delete = write a marker. Compaction reclaims space.
  5. Sparse index — every 16th key checkpointed. Binary search + bounded forward scan.
  6. Bloom filter per SSTable — 0.95% measured false positive rate. Skips disk entirely for absent keys.
  7. Background compaction — full merge on a separate thread, mutex only for snapshot/swap.
  8. Atomic MANIFEST — write to temp file + rename. Never seen half-written.

What's Next

  • fsync for true power-loss durability
  • Size-tiered compaction (merge same-size groups instead of full merge)
  • Block-based index segments (4KB blocks instead of fixed key count)
  • Background flush (currently holds the lock)
  • Write batching / group commit
  • Skip list or arena allocator for MemTable (cache-friendly)
  • Raw POSIX write() instead of std::ofstream
  • Java implementation for comparison

About

persistent key-value storage engine built on the LSM-tree architecture in C++, 224k writes/sec

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors