A replicated, linearizable, Kafka-style commit log in Go: an append-only partitioned log with offset-based reads and per-consumer committed offsets, verified for correctness and benchmarked with the Maelstrom distributed-systems testing harness (a Jepsen-based fault and network simulator).
Artemis stores an append-only log per key, where a key is the unit of partitioning (the analogue of a Kafka topic-partition).
- Append-only partitioned log. Each key holds an ordered, dense sequence of messages. Writes never mutate or delete existing entries.
- Offset-based reads. Every message gets a 0-based offset within its key. Consumers fetch from an offset and receive entries in order, so they can detect lost or reordered writes.
- Committed offsets per consumer group. A per-key watermark records the highest offset a consumer has finished processing.
- Replication across nodes. Any node can serve any key. State is shared either through a single owning node or through a linearizable key/value store.
- Availability under partition. Reads and commits remain correct while the network is faulty; the harness injects partitions and checks for anomalies.
Correctness is enforced by Maelstrom, which checks for lost writes, offset monotonicity violations, and stale reads across thousands of operations per run.
Artemis was built in three stages, each a working system verified by the harness.
| Stage | What it adds | Backend |
|---|---|---|
| 1. Single-node log | Correct offset and commit semantics in isolation | In-memory |
| 2. Replicated log | Multi-node correctness via a linearizable store | lin-kv compare-and-swap |
| 3. Optimized log | Same correctness at a fraction of the message cost | Per-key owner routing |
Stage 2 makes the system correct under replication but pays for it: every append is a read-modify-write against a shared key, which contends under load. Stage 3 keeps the guarantees while cutting coordination traffic by partitioning key ownership across nodes (details below).
A key maps to a slice of messages; an offset is an index into that slice.
send appends and returns the new offset. poll returns a tail slice from a
requested offset. commit_offsets advances a per-key watermark, and
list_committed_offsets reads it back. The watermark only moves forward, so
retried or reordered commits are safe.
sequenceDiagram
participant C as Client
participant N as Node
participant L as Log for key k1
C->>N: send msg 100
N->>L: append 100
L-->>N: offset 0
N-->>C: send_ok offset 0
C->>N: poll from offset 0
N->>L: read tail from offset 0
L-->>N: offset 0 holds 100
N-->>C: poll_ok
C->>N: commit at offset 0
N->>L: advance watermark to 0
N-->>C: commit_offsets_ok
Offset assignment is the operation that must be linearizable. In the replicated backend the log lives in a key/value store and each append is a compare-and-swap: read the current list, append, and swap it back, retrying when another node won the race. The CAS turns read-and-extend into a single atomic step, which is what prevents two nodes from claiming the same offset.
sequenceDiagram
participant N1 as Node n1
participant N2 as Node n2
participant KV as lin-kv store
par concurrent appends to k1
N1->>KV: read log for k1
KV-->>N1: length 1
and
N2->>KV: read log for k1
KV-->>N2: length 1
end
N1->>KV: cas append at offset 1
KV-->>N1: ok offset 1
N2->>KV: cas append at offset 1
KV-->>N2: precondition failed
Note over N2: retry with fresh read
N2->>KV: read log for k1
KV-->>N2: length 2
N2->>KV: cas append at offset 2
KV-->>N2: ok offset 2
Offset assignment requires a linearizable store (lin-kv): a sequentially
consistent store could let two nodes read the same list and both believe they
claimed the same offset, producing a lost write. Committed offsets are weaker;
they only need to never move backward, so a monotonic watermark is safe even on
a sequentially consistent store (seq-kv). Artemis defaults to lin-kv and
exposes the choice through configuration.
The optimized backend removes the shared store entirely. Each key is owned by exactly one node, chosen by a stable hash over the sorted node set. The owner is the single writer and keeps the log in memory, so offsets are totally ordered with no compare-and-swap. A node that receives a request for a key it does not own forwards just that key's portion to the owner. This is the production analogue of a partition leader, minus follower replicas.
Measured with Maelstrom: 2 nodes (except single), 4 clients, a 20s workload at
an offered rate of 1000 ops/s. Messages-per-op and availability are read
directly from results.edn; latency percentiles are computed from the recorded
per-operation invoke and complete timestamps in history.edn. Every run passed
with no anomalies and zero failed operations.
| Stage | msgs/op | inter-node msgs/op | p50 | p95 | p99 | max | availability |
|---|---|---|---|---|---|---|---|
| Single node | 2.19 | 0.00 | 0.3 ms | 0.6 ms | 1.1 ms | 44 ms | 99.970% |
| Replicated (CAS) | 8.55 | 6.37 | 0.4 ms | 2.0 ms | 4.0 ms | 238 ms | 99.964% |
| Optimized (owner) | 3.34 | 1.18 | 0.3 ms | 1.0 ms | 2.0 ms | 93 ms | 99.959% |
Sustained throughput over the 20s window was roughly 820 to 850 successful ops/s across all three stages (load-bound by the offered rate, not by the server).
Replacing shared-key compare-and-swap with per-key owner routing, at identical node count and load:
| Metric | Replicated (before) | Optimized (after) | Change |
|---|---|---|---|
| Messages per op | 8.55 | 3.34 | -61% |
| Inter-node messages per op | 6.37 | 1.18 | -82% |
| p95 latency | 2.0 ms | 1.0 ms | -50% |
| p99 latency | 4.0 ms | 2.0 ms | -50% |
| Max latency | 238 ms | 93 ms | -61% |
The win comes from eliminating CAS round trips and their retries. Correctness is unchanged: both backends pass every Maelstrom check.
The chart is rendered from each run's recorded operation latencies. Maelstrom's
native latency plots are produced by the bench-* targets on any machine with
gnuplot installed.
make build # compile to bin/artemis
make test # unit tests with the race detector
make bench # in-memory hot-path microbenchmarks
make demo # readable send/poll/commit/list trace, no harness neededThe demo starts a real node as a subprocess and drives it over the Maelstrom wire protocol:
$ make demo
== Kafka-style log demo (single node, in-memory backend) ==
send key=k1 msg=100 -> offset 0
send key=k1 msg=200 -> offset 1
send key=k1 msg=300 -> offset 2
send key=k2 msg=900 -> offset 0
poll from {k1:1, k2:0} -> {"k1":[[1,200],[2,300]],"k2":[[0,900]]}
commit {k1:2, k2:0} -> commit_offsets_ok
list {k1, k2} -> {"k1":2,"k2":0}
Reproduce each benchmark (requires a local Maelstrom install; override its path
with MAELSTROM=/path/to/maelstrom/maelstrom):
make bench-single # stage 1: single node, in-memory
make bench-replicated # stage 2: two nodes, lin-kv compare-and-swap
make bench-optimized # stage 3: two nodes, per-key owner routingThe backend is selected by flag, each backed by an environment variable so it also applies when the harness launches the binary.
| Flag | Env | Values | Default |
|---|---|---|---|
-backend |
KAFKA_BACKEND |
memory, kv |
memory |
-kv |
KAFKA_KV |
lin, seq |
lin |
Artemis implements the core log abstraction, not a production broker. What it matches and what it does not:
| Capability | Artemis | Apache Kafka |
|---|---|---|
| Partitioned append-only log | Yes | Yes |
| Per-partition offsets | Yes | Yes |
| Consumer committed offsets | Yes | Yes |
| Multi-node replication | Yes | Yes |
| Linearizable offset assignment | Yes | Yes |
| On-disk segmented storage | No (in-memory or kv) | Yes |
| In-sync replica sets and leader election | No (single owner or kv) | Yes |
| Exactly-once / transactions | No | Yes |
| Network-facing client protocol | No (Maelstrom RPC) | Yes |
| Log compaction and retention | No | Yes |
The gaps above are the natural next steps: durable segmented storage with recovery, a replica set with leader election in place of single-owner partitions, an idempotent producer path toward exactly-once, and a stable client protocol.
- Linearizable only where it counts. Offset assignment uses a linearizable store; committed offsets use a forward-only watermark that is safe under weaker consistency. Spending strong consistency only on the operation that needs it keeps the common path cheap.
- Single-writer ownership over shared CAS. Routing each key to one owning node gives total ordering for free and removes compare-and-swap contention, cutting inter-node traffic by 82% with no loss of correctness. The tradeoff is no follower replicas, so durability is future work.
- Full-set, idempotent operations. Appends are unique per offset and commit watermarks are monotonic, so retries are safe and do not double-count.
- Verification over assertion. Every performance and correctness claim here comes from Maelstrom output, not from hand-written tests of the happy path.
.
├── main.go # flag/env config, wires a node to a backend
├── cmd/demo/ # standalone trace driver (no harness)
├── internal/
│ ├── logstore/ # pure log + offset logic (unit tested, benched)
│ └── server/ # handlers and both backends
├── docs/latency.svg # latency chart from run history
├── Makefile
└── .github/workflows/ci.yml # build, vet, test, and a harness run on push