Skip to content

AnshuMishra01/MicroExchange

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MicroExchange

A low-latency order book and matching engine in C++17. Processes 7.2 million orders/sec at 138ns average latency.

This is the core data structure inside every stock exchange — it takes buy/sell orders, matches them by price-time priority, and outputs trades.

Orders In ──> Feed Handler ──> Matching Engine ──> Order Book ──> Trades Out
                (stdin/binary)     (strategy)      (bids + asks)     (async logger)

Performance

Workload Throughput Avg Latency
1.29M orders 7.22M orders/sec 138 ns
2.57M orders 4.85M/sec 206 ns
3.86M orders 4.33M/sec 230 ns
5.14M orders 3.56M/sec 280 ns

Binary order loading: ~170x faster than text parsing (15ms vs 2.4s for 1.29M orders).

All benchmarks on Apple Silicon, macOS, single-threaded matching.

Features

  • Order types: LIMIT, MARKET, CANCEL
  • Price-time priority: best price first, FIFO within each price level
  • Partial fills: large orders match across multiple price levels
  • Pluggable matching strategies: compile-time strategy pattern (CRTP) — swap in pro-rata, etc.
  • Async trade logging: producer-consumer pattern with background writer thread (mutex + condition_variable)
  • Binary protocol: orders loaded as raw structs, zero parsing overhead

Build

Requires a C++17 compiler with pthreads support.

make            # builds to build/microexchange
make clean      # removes build artifacts

Compiler flags: -O3 -flto -pthread (-flto inlines Order accessors across translation units — measured ~2x gain).

Usage

Text mode (interactive / piped)

# Type orders manually
./build/microexchange

# Pipe from a file
./build/microexchange < data/order.txt

Text input format (space-separated):

orderId amount type companyId quantity side
1 150.0 LIMIT 17 100 SELL
2 148.0 LIMIT 17 50 SELL
3 151.0 LIMIT 17 120 BUY

Binary mode

For benchmarking, orders can be loaded as raw binary (32-byte structs read directly into memory — no parsing).

Generate test data

python3 benchmark/generate_orders.py 1000000 > data/order_1m.txt

Project Structure

microexchange/
├── Makefile
├── PROGRESS.md            # detailed optimization journal with all benchmarks
├── SPEC.md                # original design spec
├── src/
│   ├── main.cpp           # entry point, stdin parsing, benchmarking
│   ├── Order.h / .cpp     # Order struct (24B, packed, trivially copyable)
│   ├── Orderbook.h / .cpp # order book (sorted vector of PriceLevels)
│   ├── PriceTimeStrategy.h / .cpp  # FIFO price-time matching
│   ├── MatchingStrategy.h # strategy interface
│   ├── MatchingEngine.h   # routes orders, delegates to strategy
│   ├── Trade.h            # trade output struct
│   ├── TradeLogger.h / .cpp  # async file writer (background thread)
├── benchmark/
│   ├── generate_orders.py # random order generator
│   └── results/           # all benchmark runs preserved
└── data/                  # test order files (text + binary)

Architecture

Order (24 bytes)

class Order {
    double   amount;      // 8B — limit price
    int      orderId;     // 4B
    int      quantity;    // 4B
    uint16_t companyId;   // 2B — symbol as integer ID (not string)
    Ordertype type;       // 1B — uint8_t enum
    Side     side;        // 1B — uint8_t enum
};                        // 20B + 4B padding = 24B

Fields ordered by alignment to eliminate internal padding. std::string replaced with uint16_t (24B → 2B). Enums use uint8_t storage (4B → 1B each).

Order Book

Each side (bids/asks) is a sorted vector of PriceLevels. Each PriceLevel holds a contiguous vector<Order> with a head index — front removal is O(1) (increment index, no data movement). Best price is at back() so removing an empty level is pop_back() — also O(1).

Matching Engine

Compile-time strategy pattern. PriceTimeStrategy walks from best price inward, matching FIFO within each level. Trades are emitted via an inlined sink — no heap allocation per order.

Trade Logger

Background thread with mutex + condition_variable. The matching thread pushes trades into a shared deque; the logger thread wakes, swaps the queue in O(1), releases the lock, then writes to disk. The matching thread never blocks on I/O.

Optimization Journey

Started at 431K orders/sec with cout in the hot path. Reached 7.2M orders/sec through 9 measured iterations. Key wins:

What worked Why
Remove I/O from hot path cout costs 1-5μs per call
Shrink struct (64→24B) 2-3 orders per cache line instead of 1
Contiguous storage (vector+head) Cache-friendly, no scattered heap chunks
Reusable trade buffer Eliminated per-order malloc
Binary protocol Zero parsing — read() directly into struct
What didn't work Why
map → sorted vector (level container) ~30 active levels, lookup wasn't hot
CRTP (drop virtual dispatch) Virtual call is ~2-5ns — negligible
Lock-free logger attempts Mutex uncontended, already ~20ns

Full details with all benchmark data in PROGRESS.md.

License

MIT

About

3.7M → 5.2M → 7.2M orders/sec, offloading I/O to a separate thread -> a 40% throughput improvement (Still working on it)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors