Skip to content

DeamonDev/bitcask-rs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bitcask-rs

A basic Bitcask key-value store implementation in Rust, built as a learning project to explore log-structured storage engine internals.

This is intentionally simple: single-threaded, synchronous (blocking) compaction, no concurrency support. The goal is clarity over production-readiness.

How it works

Bitcask is an append-only log-structured storage engine. All writes go to the current active segment file. An in-memory hash index (KeyDir) maps every key to its on-disk location (file_id + offset), giving O(1) reads with a single disk seek.

Design decisions

  • Tombstones: Deletes append a tombstone entry (value = None) to the active segment rather than modifying existing data. The key is removed from the in-memory index immediately. Tombstones are cleaned up during compaction.
  • Empty keys: Allowed. An empty string is a valid key.
  • Empty values: A zero-length value is treated as a tombstone (delete), not as a stored empty value.
  • Compaction: Synchronous and blocking. Rewrites all live keys into a new segment, removes stale segments, and updates the index.
  • Crash recovery: On startup, all segment files are scanned in order to rebuild the KeyDir. Corrupted or partially written trailing entries are skipped with a warning logged.
  • Integrity: Each entry includes a CRC32 checksum covering the full payload. Corruption is detected on read.

Entry binary format

[crc32:4][timestamp:8][key_size:4][value_size:4][key][value]

All integers are big-endian. CRC32 covers everything after the checksum bytes.

Workspace structure

This is a Cargo workspace with three crates:

  • storage/ — on-disk primitives: Entry (serialized KV record) and Segment (append-only data file named {file_id:08}.data)
  • engine/ — Bitcask engine: KeyDir in-memory index, BitcaskEngine with get/put/delete/compact operations
  • cli/ — command-line interface (using clap): put, get, delete subcommands

Build and test

cargo build            # build all crates
cargo test             # run all tests
cargo test -p storage  # storage tests only
cargo test -p engine   # engine tests only

CLI usage

cargo run -p cli -- put <key> <value>
cargo run -p cli -- get <key>
cargo run -p cli -- delete <key>

Data is stored in ./data by default. Use --dir <path> to specify a different location:

cargo run -p cli -- --dir /tmp/mydb put foo bar

Enable debug logging with --debug:

cargo run -p cli -- --debug put foo bar

Without --debug, warnings are still printed (e.g. corrupted segment entries detected on startup).

For the release binary:

cargo build -p cli --release
./target/release/kv put foo bar
./target/release/kv get foo
./target/release/kv delete foo
./target/release/kv --debug get foo

Not implemented

Compared to the original Bitcask paper and production implementations, the following are intentionally left out:

  • Hint files — startup always rebuilds the KeyDir by scanning all segment files. Hint files (a compact index written alongside segments) would make startup faster but are not implemented.
  • Concurrency — the engine has no locking. Concurrent reads or writes are not safe.
  • Async I/O — all operations are synchronous and blocking.
  • Automatic compaction — no background thread or size-based trigger. Compaction must be called explicitly.
  • Iterator / key listing — no way to list all keys or perform range scans.
  • TTL / expiry — a timestamp is stored per entry but is not used for expiration.
  • Fsync per write — no per-write durability guarantee. Data in the OS page cache may be lost on a crash.
  • Atomic compaction — a crash mid-compaction leaves both old and new segment files on disk. The engine recovers safely on next startup but does not auto-clean the orphaned files.

About

My rusty implementation of bitcask storage engine

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages