Skip to content

quicklybly/lsm-database

Repository files navigation

LSM-Tree Based Key-Value Database Engine

A thread-safe LSM-tree storage engine written in pure in Java 21+. Built with Project Panama (Foreign Memory API), lock-free concurrency, and background flush/compaction.

Features

Feature Implementation
LSM-tree architecture MemTable (write) -> SSTable flush (disk) -> compaction
Off-heap memory MemorySegment + Arena for memory-mapped SSTable access, zero-copy reads
Thread-safe state management Immutable DaoState snapshots via AtomicReference + CAS transitions
Non-blocking reads get() and range iterators work against a consistent snapshot, never block on flush/compact
Background flush Auto-triggered when MemTable exceeds flushThresholdBytes, runs in dedicated executor
Background compaction Merges SSTables in a separate executor, readers continue on old tables until completion
Merge iterator Priority-based merge across active MemTable, flushing MemTable, and N SSTables
Tombstone support Logical deletes propagated through merge, removed during compaction
Crash recovery Atomic file moves; interrupted compaction recovered on next startup
Binary search in SSTable O(log N) point lookups via trailing index

SSTable File Format

[Entry 0: key_size (8 bytes) | key | value_size (8 bytes) | value ]
[Entry 1: key_size (8 bytes) | key | value_size (8 bytes) | value ]
...
[Entry N: key_size (8 bytes) | key | value_size = -1 (tombstone)  ]
[Index:   offset_0 (8 bytes) | offset_1 | ... | offset_N          ]
[Footer:  entry_count (8 bytes) | index_start_offset (8 bytes)    ]
  • Trailing index allows single-pass sequential writes during flush
  • Footer provides O(1) access to entry count and index location
  • Value size -1 marks a tombstone (deleted entry)

Thread Safety Model

State is managed as an immutable record, replaced atomically:

public record DaoState(
        MemTable activeMemTable,    // accepts writes
        MemTable flushingMemTable,  // being written to disk (or null)
        List<SSTable> ssTables      // immutable list of on-disk tables
) {
}

private final AtomicReference<DaoState> state;

State transitions:

upsert():   state.get().activeMemTable().put(entry)

flush:      [active=A, flushing=null] -> [active=new, flushing=A]
            write A to disk
            [active=B, flushing=A]    -> [active=B, flushing=null, tables+=[new SSTable]]

compact:    merge SSTables from snapshot -> write compacted file
            atomically replace old tables, preserve tables added by concurrent flushes

Two single-thread executors guarantee ordering:

  • flushExecutor -- at most one flush in progress
  • compactExecutor -- at most one compaction in progress

Reads (get, get(from, to)) take a snapshot via state.get() and never block.

Usage

Config config = new Config(
        Path.of("/data/mydb"),
        64 * 1024 * 1024        // flush threshold: 64 MB
);

Dao<MemorySegment, Entry<MemorySegment>> dao = new MemorySegmentDao(config);

dao.upsert(new BaseEntry<>(key, value));
dao.upsert(new BaseEntry<>(key, null));     // tombstone

Entry<MemorySegment> entry = dao.get(key);

Iterator<Entry<MemorySegment>> range = dao.get(fromKey, toKey);

dao.compact();
dao.close();    // blocks until background flush/compact complete

Running Tests

# Requires -Xmx64m
./gradlew clean test

License

MIT License - see LICENSE file for details.

About

LSM-tree key-value store in pure Java with SSTable persistence, build on project panama and lock-free concurrency model

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages