Mini 2 — Distributed gRPC Network

A 9-node distributed gRPC cluster over the NYC 311 dataset. Node A is the client-facing coordinator; the other nodes form a tree and own data shards. Supports forwarding / chunked streaming, SOA analytical queries (count / group-by / top-k), pluggable LRU/LFU result cache, FIFO or priority job scheduling, and inserts routed to the correct shard by created_date.

Tree Overlay

           A  (portal — port 50051, receives client queries, no data)
         / | \
        B  H  I
       /|\
      C D E
         / \
        F   G

Node	Port	Role	Peers
A	50051	coordinator	B, H, I
B	50052	inner	C, D, E
C	50053	leaf (data)	—
D	50054	leaf (data)	—
E	50055	inner	F, G
F	50056	leaf (data)	—
G	50057	leaf (data)	—
H	50058	leaf (data)	—
I	50059	leaf (data)	—

C++ server (build/bin/server): runs one node per process — ./build/bin/server <id>.
C++ client (build/bin/client) and Python client (client_py/client.py): external callers, normally talk to A on port 50051.

Repository Structure

mini2/
├── proto/mini2.proto              # gRPC schema
│
├── config/
│   ├── node_A.yaml … node_I.yaml  # per-node config (host, port, peers, queue, cache)
│   ├── insert_routes.yaml         # created_date → owning leaf node
│   ├── global_agency_ids.csv      # shared dictionaries (agency / problem / borough / status)
│   ├── global_problem_ids.csv
│   ├── global_borough_ids.csv
│   └── global_status_ids.csv
│
├── common/                         # shared C++ library (dataset, queries)
│   ├── include/
│   │   ├── csv_parser.hpp
│   │   ├── dataset.hpp             # AOS dataset + dictionary encoding
│   │   ├── dataset_SOA.hpp         # struct-of-arrays layout
│   │   ├── dataset_utils.hpp
│   │   ├── iQuery.hpp / query_base.hpp
│   │   ├── query_omp.hpp           # OpenMP-parallel queries
│   │   ├── query_SOA.hpp           # SOA count / group-by / top-k
│   │   └── timer.hpp
│   └── src/                        # implementations
│
├── server_cpp/
│   ├── server.cpp                  # entry point: load config, start gRPC server
│   ├── Mini2ServiceImpl.{h,cc}     # all RPC handlers
│   ├── RequestJobQueue.{h,cc}      # FIFO / priority job scheduler
│   ├── ForwardResponseCache.{h,cc} # LRU / LFU result cache
│   └── InsertRouteConfig.{h,cc}    # routes Insert to the right leaf
│
├── client_cpp/client.cpp           # C++ CLI client
├── client_py/client.py             # Python CLI client
│
├── tests/
│   ├── README.md                   # extensive client command examples
│   ├── run_cluster.sh              # bring up all 9 nodes locally
│   ├── run_test_client.sh
│   ├── test_client.py              # Ping / Query / Forward / Insert / Delete
│   ├── test_cache_demo.py          # cache hit/miss demo
│   ├── test_queue_demo.py          # FIFO vs priority scheduler demo
│   ├── benchmark_cache.py
│   ├── benchmark_chunk_sizes.py
│   └── run_benchmarks.py           # full benchmark runner (StartForwardChunks)
│
├── benchmarks/                     # input shards + workloads (gitignored CSVs)
├── tools/                          # dataset sharding helpers
└── CMakeLists.txt

RPC Surface (`proto/mini2.proto`)

RPC	Type	Purpose
`Ping`	unary	Liveness — coordinator returns the list of reachable nodes.
`Query`	unary	Filter records on the receiving node only (no fan-out).
`Forward`	unary	Coordinator fans out to peers, aggregates full result.
`Insert`	unary	Insert one record; routed to owning leaf via `insert_routes.yaml`.
`Delete`	unary	Delete by predicate; fan-out, returns per-node delete counts.
`StartForwardChunks`	unary	Start a chunked Forward session; returns a `session_id`.
`GetForwardChunk`	unary	Pull one chunk of a session by index.
`CancelChunks`	unary	Cancel an in-flight chunked session.
`CountQuery`	unary	SOA count (created-date range, by-agency, by-status).
`GroupByQuery`	unary	SOA group-by (borough/zipcode × complaint, agency counts, etc.).
`TopKQuery`	unary	SOA top-K complaints in a created-date range.

QueryRequest carries optional filters (agency_id, borough_id, zip_code, lat/lon bbox), a chunk_size, and two streaming-flow flags:

leaf_buffered_streaming — leaf buffers all matches, parent pulls in chunks.
internal_full_streaming — every tree edge pulls chunk-by-chunk with unary RPCs.

Record Schema (NYC 311)

Field	Type	Notes
`id`	uint32
`created_date`	int64	epoch seconds (used for sharding)
`closed_date`	int64	epoch seconds
`agency_id`	uint32	dict-encoded
`problem_id`	uint32	dict-encoded
`status_id`	uint32	dict-encoded
`borough_id`	uint32	dict-encoded
`zip_code`	uint32
`latitude`	float
`longitude`	float

Per-Node Configuration (`config/node_*.yaml`)

Key	Values	Notes
`node_id`	`"A"`–`"I"`	Logical id, must match peer references in other configs.
`host`, `port`	string, int	Bind address.
`coordinator_only`	bool (A only)	Skips dataset load on coordinator.
`queue_mode`	`fifo` \| `priority`	Job scheduler (priority uses `QueryRequest` priority class).
`enable_cache`	bool	Toggle `ForwardResponseCache`.
`cache_policy`	`lru` \| `lfu`	Eviction policy when cache is enabled.
`cache_max_entries`	int	Cache capacity.
`dataset_mode`	`aos` \| `soa` \| `both`	AOS for Forward/Query/Insert; SOA for Count/GroupBy/TopK.
`dataset_path`	path	Per-leaf shard CSV (one shard per leaf).
`*_dict_path`	path	Shared dictionary CSVs in `config/`.
`peers`	list of `{id,host,port}`	Children in the tree topology.

config/insert_routes.yaml maps created_date ranges to the owning leaf (default node handles records past the last threshold). Edit this when you re-shard the dataset.

Build

Dependencies (macOS via Homebrew shown):

brew install grpc protobuf yaml-cpp cmake
# OpenMP is optional; on macOS:
brew install libomp

Build:

cmake -S . -B build
cmake --build build -j
# Outputs:
#   build/bin/server   — pass node id as argv[1]
#   build/bin/client   — C++ CLI client

Running the Cluster

All-in-one helper (local, all 9 nodes)

./tests/run_cluster.sh        # logs in /tmp/mini2_cluster_logs

Manual (one node per process)

mkdir -p logs
for n in A B C D E F G H I; do
  nohup ./build/bin/server $n > logs/node_$n.log 2>&1 &
  echo $! > logs/node_$n.pid
done
# Wait for all 9 ports to be LISTEN (50051–50059):
until [ "$(lsof -nP -iTCP:50051 -iTCP:50052 -iTCP:50053 -iTCP:50054 \
  -iTCP:50055 -iTCP:50056 -iTCP:50057 -iTCP:50058 -iTCP:50059 \
  -sTCP:LISTEN 2>/dev/null | grep -c LISTEN)" = "9" ]; do sleep 3; done

Stop:

kill $(cat logs/node_*.pid) 2>/dev/null

Client Examples

C++ client (`build/bin/client`)

# Liveness
./build/bin/client -s localhost:50051 ping

# Local query (A has no data → 0)
./build/bin/client -s localhost:50051 query --request-id q-local

# Fan-out filters
./build/bin/client -s localhost:50051 forward --request-id f-all
./build/bin/client -s localhost:50051 forward --agency-id 10  --request-id f-agency
./build/bin/client -s localhost:50051 forward --borough-id 1  --request-id f-borough
./build/bin/client -s localhost:50051 forward \
  --lat-min 40.7 --lat-max 40.8 --lon-min -74.0 --lon-max -73.9 \
  --request-id f-geo

# SOA analytics (require dataset_mode soa or both on leaves)
./build/bin/client -s localhost:50051 count-created-date-range \
  --created-date-start 1577836800 --created-date-end 1609459199 \
  --request-id count-2020

./build/bin/client -s localhost:50051 count-by-agency-and-created-date-range \
  --agency-id 10 --created-date-start 1577836800 --created-date-end 1609459199 \
  --request-id count-agency10-2020

./build/bin/client -s localhost:50051 top-k-complaints \
  --created-date-start 1577836800 --created-date-end 1609459199 --top-k 10 \
  --request-id topk-2020

# Insert / Delete
./build/bin/client -s localhost:50051 insert --record '{...}'
./build/bin/client -s localhost:50051 delete --agency-id 10

See tests/README.md for a comprehensive command reference with example outputs.

Python client (`client_py/client.py`)

pip install -r client_py/requirements.txt
python client_py/client.py -s localhost:50051 ping
python client_py/client.py -s localhost:50051 forward --agency-id 10 --request-id f-agency

Tests & Benchmarks

Script	Purpose
`tests/test_client.py`	Smoke tests — Ping/Query/Forward/Insert/Delete.
`tests/test_cache_demo.py`	Demonstrates LRU/LFU cache hit/miss behaviour.
`tests/test_queue_demo.py`	Demonstrates FIFO vs priority scheduling.
`tests/run_benchmarks.py`	Full StartForwardChunks benchmark suite.
`tests/benchmark_cache.py`	Cache-focused benchmark.
`tests/benchmark_chunk_sizes.py`	Sweep of `chunk_size` values.

Run a benchmark, for example:

python tests/run_benchmarks.py --help

Sharding Helpers (`tools/`)

split_time_shards.py — split a CSV by created_date into N shards.
split_sorted_311_time_shards.py — same, requires presorted input (faster).
analyze_sharding.py — print shard sizes / boundaries.

After re-sharding, update config/insert_routes.yaml and the per-leaf dataset_path to match the new boundaries.

Phases

Phase	Scope
1 — Basecamp	All 9 nodes start, `Ping` works across the tree.
2 — Data	`Query` / `Forward` fan-out, chunked streaming responses.
3 — Fairness	Concurrent clients, `CancelChunks`, FIFO/priority queue.
4 — Analytics	SOA `Count` / `GroupBy` / `TopK` with shared dictionaries.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mini 2 — Distributed gRPC Network

Tree Overlay

Repository Structure

RPC Surface (`proto/mini2.proto`)

Record Schema (NYC 311)

Per-Node Configuration (`config/node_*.yaml`)

Build

Running the Cluster

All-in-one helper (local, all 9 nodes)

Manual (one node per process)

Client Examples

C++ client (`build/bin/client`)

Python client (`client_py/client.py`)

Tests & Benchmarks

Sharding Helpers (`tools/`)

Phases

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
client_cpp		client_cpp
client_py		client_py
common		common
config		config
proto		proto
server_cpp		server_cpp
tests		tests
tools		tools
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
mini2-chunks.md		mini2-chunks.md

Folders and files

Latest commit

History

Repository files navigation

Mini 2 — Distributed gRPC Network

Tree Overlay

Repository Structure

RPC Surface (proto/mini2.proto)

Record Schema (NYC 311)

Per-Node Configuration (config/node_*.yaml)

Build

Running the Cluster

All-in-one helper (local, all 9 nodes)

Manual (one node per process)

Client Examples

C++ client (build/bin/client)

Python client (client_py/client.py)

Tests & Benchmarks

Sharding Helpers (tools/)

Phases

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

RPC Surface (`proto/mini2.proto`)

Per-Node Configuration (`config/node_*.yaml`)

C++ client (`build/bin/client`)

Python client (`client_py/client.py`)

Sharding Helpers (`tools/`)

Packages