Skip to content

skandrigi/artemis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Artemis

A replicated, linearizable, Kafka-style commit log in Go: an append-only partitioned log with offset-based reads and per-consumer committed offsets, verified for correctness and benchmarked with the Maelstrom distributed-systems testing harness (a Jepsen-based fault and network simulator).

Go License: MIT CI

Overview

Artemis stores an append-only log per key, where a key is the unit of partitioning (the analogue of a Kafka topic-partition).

  • Append-only partitioned log. Each key holds an ordered, dense sequence of messages. Writes never mutate or delete existing entries.
  • Offset-based reads. Every message gets a 0-based offset within its key. Consumers fetch from an offset and receive entries in order, so they can detect lost or reordered writes.
  • Committed offsets per consumer group. A per-key watermark records the highest offset a consumer has finished processing.
  • Replication across nodes. Any node can serve any key. State is shared either through a single owning node or through a linearizable key/value store.
  • Availability under partition. Reads and commits remain correct while the network is faulty; the harness injects partitions and checks for anomalies.

Correctness is enforced by Maelstrom, which checks for lost writes, offset monotonicity violations, and stale reads across thousands of operations per run.

Evolution

Artemis was built in three stages, each a working system verified by the harness.

Stage What it adds Backend
1. Single-node log Correct offset and commit semantics in isolation In-memory
2. Replicated log Multi-node correctness via a linearizable store lin-kv compare-and-swap
3. Optimized log Same correctness at a fraction of the message cost Per-key owner routing

Stage 2 makes the system correct under replication but pays for it: every append is a read-modify-write against a shared key, which contends under load. Stage 3 keeps the guarantees while cutting coordination traffic by partitioning key ownership across nodes (details below).

Architecture

Log and offset model

A key maps to a slice of messages; an offset is an index into that slice. send appends and returns the new offset. poll returns a tail slice from a requested offset. commit_offsets advances a per-key watermark, and list_committed_offsets reads it back. The watermark only moves forward, so retried or reordered commits are safe.

sequenceDiagram
    participant C as Client
    participant N as Node
    participant L as Log for key k1

    C->>N: send msg 100
    N->>L: append 100
    L-->>N: offset 0
    N-->>C: send_ok offset 0

    C->>N: poll from offset 0
    N->>L: read tail from offset 0
    L-->>N: offset 0 holds 100
    N-->>C: poll_ok

    C->>N: commit at offset 0
    N->>L: advance watermark to 0
    N-->>C: commit_offsets_ok
Loading

Keeping commits linearizable

Offset assignment is the operation that must be linearizable. In the replicated backend the log lives in a key/value store and each append is a compare-and-swap: read the current list, append, and swap it back, retrying when another node won the race. The CAS turns read-and-extend into a single atomic step, which is what prevents two nodes from claiming the same offset.

sequenceDiagram
    participant N1 as Node n1
    participant N2 as Node n2
    participant KV as lin-kv store

    par concurrent appends to k1
        N1->>KV: read log for k1
        KV-->>N1: length 1
    and
        N2->>KV: read log for k1
        KV-->>N2: length 1
    end

    N1->>KV: cas append at offset 1
    KV-->>N1: ok offset 1

    N2->>KV: cas append at offset 1
    KV-->>N2: precondition failed
    Note over N2: retry with fresh read
    N2->>KV: read log for k1
    KV-->>N2: length 2
    N2->>KV: cas append at offset 2
    KV-->>N2: ok offset 2
Loading

Linearizable vs sequential stores

Offset assignment requires a linearizable store (lin-kv): a sequentially consistent store could let two nodes read the same list and both believe they claimed the same offset, producing a lost write. Committed offsets are weaker; they only need to never move backward, so a monotonic watermark is safe even on a sequentially consistent store (seq-kv). Artemis defaults to lin-kv and exposes the choice through configuration.

Throughput-optimized backend

The optimized backend removes the shared store entirely. Each key is owned by exactly one node, chosen by a stable hash over the sorted node set. The owner is the single writer and keeps the log in memory, so offsets are totally ordered with no compare-and-swap. A node that receives a request for a key it does not own forwards just that key's portion to the owner. This is the production analogue of a partition leader, minus follower replicas.

Performance

Measured with Maelstrom: 2 nodes (except single), 4 clients, a 20s workload at an offered rate of 1000 ops/s. Messages-per-op and availability are read directly from results.edn; latency percentiles are computed from the recorded per-operation invoke and complete timestamps in history.edn. Every run passed with no anomalies and zero failed operations.

Stage msgs/op inter-node msgs/op p50 p95 p99 max availability
Single node 2.19 0.00 0.3 ms 0.6 ms 1.1 ms 44 ms 99.970%
Replicated (CAS) 8.55 6.37 0.4 ms 2.0 ms 4.0 ms 238 ms 99.964%
Optimized (owner) 3.34 1.18 0.3 ms 1.0 ms 2.0 ms 93 ms 99.959%

Sustained throughput over the 20s window was roughly 820 to 850 successful ops/s across all three stages (load-bound by the offered rate, not by the server).

Optimization: before and after

Replacing shared-key compare-and-swap with per-key owner routing, at identical node count and load:

Metric Replicated (before) Optimized (after) Change
Messages per op 8.55 3.34 -61%
Inter-node messages per op 6.37 1.18 -82%
p95 latency 2.0 ms 1.0 ms -50%
p99 latency 4.0 ms 2.0 ms -50%
Max latency 238 ms 93 ms -61%

The win comes from eliminating CAS round trips and their retries. Correctness is unchanged: both backends pass every Maelstrom check.

Latency by stage

Latency by stage

The chart is rendered from each run's recorded operation latencies. Maelstrom's native latency plots are produced by the bench-* targets on any machine with gnuplot installed.

Quickstart

make build              # compile to bin/artemis
make test               # unit tests with the race detector
make bench              # in-memory hot-path microbenchmarks
make demo               # readable send/poll/commit/list trace, no harness needed

The demo starts a real node as a subprocess and drives it over the Maelstrom wire protocol:

$ make demo
== Kafka-style log demo (single node, in-memory backend) ==
send   key=k1  msg=100  -> offset 0
send   key=k1  msg=200  -> offset 1
send   key=k1  msg=300  -> offset 2
send   key=k2  msg=900  -> offset 0
poll   from {k1:1, k2:0} -> {"k1":[[1,200],[2,300]],"k2":[[0,900]]}
commit {k1:2, k2:0}       -> commit_offsets_ok
list   {k1, k2}          -> {"k1":2,"k2":0}

Reproduce each benchmark (requires a local Maelstrom install; override its path with MAELSTROM=/path/to/maelstrom/maelstrom):

make bench-single       # stage 1: single node, in-memory
make bench-replicated   # stage 2: two nodes, lin-kv compare-and-swap
make bench-optimized    # stage 3: two nodes, per-key owner routing

Configuration

The backend is selected by flag, each backed by an environment variable so it also applies when the harness launches the binary.

Flag Env Values Default
-backend KAFKA_BACKEND memory, kv memory
-kv KAFKA_KV lin, seq lin

Comparison with Apache Kafka

Artemis implements the core log abstraction, not a production broker. What it matches and what it does not:

Capability Artemis Apache Kafka
Partitioned append-only log Yes Yes
Per-partition offsets Yes Yes
Consumer committed offsets Yes Yes
Multi-node replication Yes Yes
Linearizable offset assignment Yes Yes
On-disk segmented storage No (in-memory or kv) Yes
In-sync replica sets and leader election No (single owner or kv) Yes
Exactly-once / transactions No Yes
Network-facing client protocol No (Maelstrom RPC) Yes
Log compaction and retention No Yes

Roadmap

The gaps above are the natural next steps: durable segmented storage with recovery, a replica set with leader election in place of single-owner partitions, an idempotent producer path toward exactly-once, and a stable client protocol.

Design decisions and tradeoffs

  • Linearizable only where it counts. Offset assignment uses a linearizable store; committed offsets use a forward-only watermark that is safe under weaker consistency. Spending strong consistency only on the operation that needs it keeps the common path cheap.
  • Single-writer ownership over shared CAS. Routing each key to one owning node gives total ordering for free and removes compare-and-swap contention, cutting inter-node traffic by 82% with no loss of correctness. The tradeoff is no follower replicas, so durability is future work.
  • Full-set, idempotent operations. Appends are unique per offset and commit watermarks are monotonic, so retries are safe and do not double-count.
  • Verification over assertion. Every performance and correctness claim here comes from Maelstrom output, not from hand-written tests of the happy path.

Project layout

.
├── main.go                       # flag/env config, wires a node to a backend
├── cmd/demo/                     # standalone trace driver (no harness)
├── internal/
│   ├── logstore/                 # pure log + offset logic (unit tested, benched)
│   └── server/                   # handlers and both backends
├── docs/latency.svg              # latency chart from run history
├── Makefile
└── .github/workflows/ci.yml      # build, vet, test, and a harness run on push

License

MIT

About

A replicated, linearizable, Kafka-style commit log in Go

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors