Adaptive Overlay Replication

Implementation for an adaptive replication overlay for CRDV, a PostgreSQL-based CRDT store. The system is split into a control and data plane. The control plane is built on top of the HyParView gossip protocol for dynamic and automatic cluster management, replication topology, as well as failure detection and recovery. The data plane uses PostgreSQL logical replication to propagate writes between CRDV instances according to the overlay topology managed by the control plane.

Architecture

Each node consists of a Controller and a CRDV instance running as a pair. The controller manages cluster membership and drives the replication topology via its HyParView server/client and DB Manager. CRDV replication information is stored in ClusterInfo. The data plane maintains logical replication subscriptions based on the overlay copmuted by the HyParView-based control plane.

Deployment

Local Deployment

The cluster can be deployed locally with different (not so beautifully) hard-coded docker-compose files in /deployment.

AWS

Rebuild and republish the controller docker image:

docker build -f deployment/Dockerfile.controller -t <DOCKER_HUB_USER>/adaptive-overlay-replication-controller:latest . \
&& docker push <DOCKER_HUB_USER>/adaptive-overlay-replication-controller:latest

Provide instances with terraform (deployment/aws/terraform).
Copy the output IP addresses into /script/cluster.sh.
Use the different scripts (/scripts) to start, stop, inspect the deployment or change configurations or to re-deploy.

In order to run the benchmarks, first deploy the cluster as described above. Then start the controller and to create the cluster and overlay. Wait shortly for the stabilization period, then start the metric server (/crdv/deploy/metrics?server.py). Now simulate clients with the benchmarker CLI. (Optionally) Before collecting data about replication latency, stop the metric server, then collect to avoid adding unnecessary network traffic.

Simulating Failures

To simulate network partitions for the local docker cluster, use docker network commands. Simulating a partition that would isolate node 3 (controller and crdv instance) for 5 seconds:

docker network disconnect deployment_default controller3 \
    && sleep 5 \
    && docker network connect deployment_default controller3

On a AWS deployment, use the network_partition_server.py (crdv/deployment/network_partition_server.py).

Small example:

# Block a single IP for 10 seconds
curl -X POST "http://<instance-ip>:8083/block?ip=<another-ip>&time=10"

# Complete network isolation
curl -X POST "http://<instance-ip>:8083/block?ip=all&time=10"

Name		Name	Last commit message	Last commit date
Latest commit History 262 Commits
cmd		cmd
controller		controller
crdv @ 5e4d690		crdv @ 5e4d690
deployment		deployment
internal		internal
output		output
pkg		pkg
proto		proto
public/assets		public/assets
results		results
scripts		scripts
test/integration		test/integration
.env.local.single		.env.local.single
.gitignore		.gitignore
.gitmodules		.gitmodules
.ignore		.ignore
.python-version		.python-version
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum
golangci.yaml		golangci.yaml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Adaptive Overlay Replication

Architecture

Deployment

Local Deployment

AWS

Simulating Failures

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Adaptive Overlay Replication

Architecture

Deployment

Local Deployment

AWS

Simulating Failures

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages