A basic Bitcask key-value store implementation in Rust, built as a learning project to explore log-structured storage engine internals.
This is intentionally simple: single-threaded, synchronous (blocking) compaction, no concurrency support. The goal is clarity over production-readiness.
Bitcask is an append-only log-structured storage engine. All writes go to the current active segment file. An in-memory hash index (KeyDir) maps every key to its on-disk location (file_id + offset), giving O(1) reads with a single disk seek.
- Tombstones: Deletes append a tombstone entry (
value = None) to the active segment rather than modifying existing data. The key is removed from the in-memory index immediately. Tombstones are cleaned up during compaction. - Empty keys: Allowed. An empty string is a valid key.
- Empty values: A zero-length value is treated as a tombstone (delete), not as a stored empty value.
- Compaction: Synchronous and blocking. Rewrites all live keys into a new segment, removes stale segments, and updates the index.
- Crash recovery: On startup, all segment files are scanned in order to rebuild the KeyDir. Corrupted or partially written trailing entries are skipped with a warning logged.
- Integrity: Each entry includes a CRC32 checksum covering the full payload. Corruption is detected on read.
[crc32:4][timestamp:8][key_size:4][value_size:4][key][value]
All integers are big-endian. CRC32 covers everything after the checksum bytes.
This is a Cargo workspace with three crates:
storage/— on-disk primitives:Entry(serialized KV record) andSegment(append-only data file named{file_id:08}.data)engine/— Bitcask engine:KeyDirin-memory index,BitcaskEnginewithget/put/delete/compactoperationscli/— command-line interface (using clap):put,get,deletesubcommands
cargo build # build all crates
cargo test # run all tests
cargo test -p storage # storage tests only
cargo test -p engine # engine tests onlycargo run -p cli -- put <key> <value>
cargo run -p cli -- get <key>
cargo run -p cli -- delete <key>Data is stored in ./data by default. Use --dir <path> to specify a different location:
cargo run -p cli -- --dir /tmp/mydb put foo barEnable debug logging with --debug:
cargo run -p cli -- --debug put foo barWithout --debug, warnings are still printed (e.g. corrupted segment entries detected on startup).
For the release binary:
cargo build -p cli --release
./target/release/kv put foo bar
./target/release/kv get foo
./target/release/kv delete foo
./target/release/kv --debug get fooCompared to the original Bitcask paper and production implementations, the following are intentionally left out:
- Hint files — startup always rebuilds the KeyDir by scanning all segment files. Hint files (a compact index written alongside segments) would make startup faster but are not implemented.
- Concurrency — the engine has no locking. Concurrent reads or writes are not safe.
- Async I/O — all operations are synchronous and blocking.
- Automatic compaction — no background thread or size-based trigger. Compaction must be called explicitly.
- Iterator / key listing — no way to list all keys or perform range scans.
- TTL / expiry — a timestamp is stored per entry but is not used for expiration.
- Fsync per write — no per-write durability guarantee. Data in the OS page cache may be lost on a crash.
- Atomic compaction — a crash mid-compaction leaves both old and new segment files on disk. The engine recovers safely on next startup but does not auto-clean the orphaned files.