MicroExchange

A low-latency order book and matching engine in C++17. Processes 7.2 million orders/sec at 138ns average latency.

This is the core data structure inside every stock exchange — it takes buy/sell orders, matches them by price-time priority, and outputs trades.

Orders In ──> Feed Handler ──> Matching Engine ──> Order Book ──> Trades Out
                (stdin/binary)     (strategy)      (bids + asks)     (async logger)

Performance

Workload	Throughput	Avg Latency
1.29M orders	7.22M orders/sec	138 ns
2.57M orders	4.85M/sec	206 ns
3.86M orders	4.33M/sec	230 ns
5.14M orders	3.56M/sec	280 ns

Binary order loading: ~170x faster than text parsing (15ms vs 2.4s for 1.29M orders).

All benchmarks on Apple Silicon, macOS, single-threaded matching.

Features

Order types: LIMIT, MARKET, CANCEL
Price-time priority: best price first, FIFO within each price level
Partial fills: large orders match across multiple price levels
Pluggable matching strategies: compile-time strategy pattern (CRTP) — swap in pro-rata, etc.
Async trade logging: producer-consumer pattern with background writer thread (mutex + condition_variable)
Binary protocol: orders loaded as raw structs, zero parsing overhead

Build

Requires a C++17 compiler with pthreads support.

make            # builds to build/microexchange
make clean      # removes build artifacts

Compiler flags: -O3 -flto -pthread (-flto inlines Order accessors across translation units — measured ~2x gain).

Usage

Text mode (interactive / piped)

# Type orders manually
./build/microexchange

# Pipe from a file
./build/microexchange < data/order.txt

Text input format (space-separated):

orderId amount type companyId quantity side
1 150.0 LIMIT 17 100 SELL
2 148.0 LIMIT 17 50 SELL
3 151.0 LIMIT 17 120 BUY

Binary mode

For benchmarking, orders can be loaded as raw binary (32-byte structs read directly into memory — no parsing).

Generate test data

python3 benchmark/generate_orders.py 1000000 > data/order_1m.txt

Project Structure

microexchange/
├── Makefile
├── PROGRESS.md            # detailed optimization journal with all benchmarks
├── SPEC.md                # original design spec
├── src/
│   ├── main.cpp           # entry point, stdin parsing, benchmarking
│   ├── Order.h / .cpp     # Order struct (24B, packed, trivially copyable)
│   ├── Orderbook.h / .cpp # order book (sorted vector of PriceLevels)
│   ├── PriceTimeStrategy.h / .cpp  # FIFO price-time matching
│   ├── MatchingStrategy.h # strategy interface
│   ├── MatchingEngine.h   # routes orders, delegates to strategy
│   ├── Trade.h            # trade output struct
│   ├── TradeLogger.h / .cpp  # async file writer (background thread)
├── benchmark/
│   ├── generate_orders.py # random order generator
│   └── results/           # all benchmark runs preserved
└── data/                  # test order files (text + binary)

Architecture

Order (24 bytes)

class Order {
    double   amount;      // 8B — limit price
    int      orderId;     // 4B
    int      quantity;    // 4B
    uint16_t companyId;   // 2B — symbol as integer ID (not string)
    Ordertype type;       // 1B — uint8_t enum
    Side     side;        // 1B — uint8_t enum
};                        // 20B + 4B padding = 24B

Fields ordered by alignment to eliminate internal padding. std::string replaced with uint16_t (24B → 2B). Enums use uint8_t storage (4B → 1B each).

Order Book

Each side (bids/asks) is a sorted vector of PriceLevels. Each PriceLevel holds a contiguous vector<Order> with a head index — front removal is O(1) (increment index, no data movement). Best price is at back() so removing an empty level is pop_back() — also O(1).

Matching Engine

Compile-time strategy pattern. PriceTimeStrategy walks from best price inward, matching FIFO within each level. Trades are emitted via an inlined sink — no heap allocation per order.

Trade Logger

Background thread with mutex + condition_variable. The matching thread pushes trades into a shared deque; the logger thread wakes, swaps the queue in O(1), releases the lock, then writes to disk. The matching thread never blocks on I/O.

Optimization Journey

Started at 431K orders/sec with cout in the hot path. Reached 7.2M orders/sec through 9 measured iterations. Key wins:

What worked	Why
Remove I/O from hot path	`cout` costs 1-5μs per call
Shrink struct (64→24B)	2-3 orders per cache line instead of 1
Contiguous storage (vector+head)	Cache-friendly, no scattered heap chunks
Reusable trade buffer	Eliminated per-order malloc
Binary protocol	Zero parsing — `read()` directly into struct

What didn't work	Why
map → sorted vector (level container)	~30 active levels, lookup wasn't hot
CRTP (drop virtual dispatch)	Virtual call is ~2-5ns — negligible
Lock-free logger attempts	Mutex uncontended, already ~20ns

Full details with all benchmark data in PROGRESS.md.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
benchmark		benchmark
src		src
tools		tools
.gitignore		.gitignore
Makefile		Makefile
PROGRESS.md		PROGRESS.md
README.md		README.md
SPEC.md		SPEC.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MicroExchange

Performance

Features

Build

Usage

Text mode (interactive / piped)

Binary mode

Generate test data

Project Structure

Architecture

Order (24 bytes)

Order Book

Matching Engine

Trade Logger

Optimization Journey

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MicroExchange

Performance

Features

Build

Usage

Text mode (interactive / piped)

Binary mode

Generate test data

Project Structure

Architecture

Order (24 bytes)

Order Book

Matching Engine

Trade Logger

Optimization Journey

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages