Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
248 changes: 67 additions & 181 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,202 +1,88 @@
# Jepsen & Mallory
# Setup Guide

Breaking distributed systems so you don't have to.
## Prerequisites

**Jepsen** is a Clojure library. A test is a Clojure program which uses the Jepsen
library to set up a distributed system, run a bunch of operations against that
system, and verify that the history of those operations makes sense. Jepsen has
been used to verify everything from eventually-consistent commutative databases
to linearizable coordination systems to distributed task schedulers. It can
also generate graphs of performance and availability, helping you characterize
how a system responds to different faults. See
[jepsen.io](https://jepsen.io/analyses) for examples of the sorts of analyses
you can carry out with Jepsen.
Before proceeding, make sure you have the following tools installed:

**Mallory** is a graybox extension to Jepsen, implemented in Rust. It hooks into
an existing Jepsen test and takes the role of the nemesis, deciding in real-time
which actions to inject and when, based on the _runtime_ behaviour of the system
under test.
- **Vagrant**: Download the appropriate version for your platform from the [official page](https://developer.hashicorp.com/vagrant/downloads).
- **VirtualBox**: This is the default option for VM management. It's not mandatory, but you can configure a different VM in the Vagrant options if you prefer.

## VM Setup

In the root of the project, execute the following commands:

## Citing Mallory
Mallory has been accepted for publication at the 2023 ACM SIGSAC Conference on Computer and Communications Security (CCS 2023).
The paper is also available on [arXiv](https://arxiv.org/pdf/2305.02601.pdf). If you use this code in your scientific work, please cite the paper as follows:
```
@inproceedings{mallory,
author={Meng, Ruijie and P{\^\i}rlea, George and Roychoudhury, Abhik and Sergey, Ilya},
title={Greybox Fuzzing of Distributed Systems},
booktitle={Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security},
pages={1615--1629},
year={2023}}
cd docker
vagrant up
```

## Design Overview
The first time you run this, it may take several minutes to complete.

## Accessing the VM

### Jepsen
To enter the VM, run:

A Jepsen test runs as a Clojure program on a *control node*. That program uses
SSH to log into a bunch of *db nodes*, where it sets up the distributed system
you're going to test using the test's pluggable *os* and *db*.
```
vagrant ssh
```

Once the system is running, the control node spins up a set of logically
single-threaded *processes*, each with its own *client* for the distributed
system. A *generator* generates new operations for each process to perform.
Processes then apply those operations to the system using their clients. The
start and end of each operation is recorded in a *history*. While performing
operations, a special *nemesis* process introduces faults into the system--_also
scheduled by the generator._
### Setting Up the Mediator

Finally, the DB and OS are torn down. Jepsen uses a *checker* to analyze the
test's history for correctness, and to generate reports, graphs, etc. The test,
history, analysis, and any supplementary results are written to the filesystem
under `store/<test-name>/<date>/` for later review. Symlinks to the latest
results are maintained at each level for convenience.
Before running tests in Mallory, you'll need to set up the mediator. If it's your first time, follow these steps:

### Mallory
1. **Install Rustup**:

```
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
```

Mallory hooks into your Jepsen test and takes the place of the nemesis
generator. We use a custom version of Jepsen, modified to inform Mallory when
tests start and end and when client and nemesis operations are executed. Most
importantly, Mallory uses the nemeses defined in the Jepsen test---this requires
some modification of these nemeses, as explained in the tutorial below.
2. **Install Musl Tools**:

As the test executes, Mallory observes the system under test and introduces
faults with the goal of inducing behaviour not seen before.
```
sudo apt-get install musl-tools
```

## Documentation
3. **Add Musl Target for Rust**:

This [tutorial](doc/tutorial/index.md) walks you through writing a Jepsen test
from scratch. For reference, see the [API documentation](http://jepsen-io.github.io/jepsen/).
```
rustup target add x86_64-unknown-linux-musl
```

## Setting up a Jepsen + Mallory environment
4. **Build the Mediator**:

We provide a ready-made environment using Vagrant:
```
cargo build --target=x86_64-unknown-linux-musl
```

If the command doesn't execute, make sure to add the `~/.cargo/bin` folder to your `PATH` variable or in your `.bashrc` file.

### Running the Application

Jepsen is run using Docker. It has a control plane, a main container that manages five nodes where the applications are deployed. Fortunately, a script will set up the environment for you. Simply execute:

```
cd /jepsen/docker
sudo ./bin/up
```

This may take over 10 minutes on the first run.

### Running the Mediator

Once Jepsen is set up, you can run the mediator. This module intercepts messages between nodes and sends them to Mallory. Open a new terminal tab, log into Vagrant again, and run:

```
cd /jepsen/mediator && target/x86_64-unknown-linux-musl/release/mediator qlearning event_history 0.7
```

### Running Jepsen Tests

Finally, to run Jepsen tests, access the control plane in another terminal tab and execute:

```bash
cd docker/
vagrant plugin install vagrant-reload # only needed once
vagrant up
```
cd /jepsen/docker
sudo ./bin/console
```

Navigate to the test you want to execute and each folder it will tell you how to do it

### Modifying an existing Jepsen test for Mallory

If you have an existing Jepsen test harness, Mallory takes the place of your
existing nemesis package and generator.

```Clojure
(:require [jepsen.mediator.wrapper :as med])

;; this should be a list of packages, as returned by
;; jepsen/nemesis/combined.clj:nemesis-packages
;; and NOT a combined package (as returned by compose-package)
;; If you have custom nemeses, you need to write a version of this yourself
;; that includes your custom nemesis.
packages (nemesis/nemesis-packages nemesis-opts)

;; Previously, the nemesis package was obtained as such:
;; nemesis (nemesis/nemesis-package nemesis-opts)
nemesis (med/adaptive-nemesis packages nemesis-opts)]

;; in your test, make the nemesis generator refer to the adaptive package:
:generator
(->> (:generator workload)
(gen/stagger (/ (:rate opts)))
;; use the adaptive nemesis generator
(gen/nemesis (:generator nemesis))
(gen/time-limit (:time-limit opts)))
```

IMPORTANT:
- if your nemesis package only uses nemeses in Jepsen's default
`jepsen/nemesis/combined.clj`, our distribution rewrites those so they are
usable by Mallory;
- if you package custom nemeses, you must modify them as follows: (1) add a
`:ops` field that returns the set of operations (and arguments) supported by
the nemesis, and (2) add a `:dispatch` field that takes an operation type
returned by `op` and returns an instantiated operation that can be invoked by
the nemesis client

Here is an example nemesis adapted for use with Mallory:

```Clojure
(defn partition-package
"A nemesis and generator package for network partitions. Options as for
nemesis-package."
[opts]
(let [needed? ((:faults opts) :partition)
db (:db opts)
targets (:targets (:partition opts) (partition-specs db))
start (fn start [_ _]
{:type :info
:f :start-partition
:value (rand-nth targets)})
stop {:type :info, :f :stop-partition, :value nil}
gen (->> (gen/flip-flop start (repeat stop))
(gen/stagger (:interval opts default-interval)))
;; Needed by Mallory -- to inform at start-up which operations this nemesis can perform
ops (cond-> []
needed? (concat [{:f :start-partition :values (vec targets)}, {:f :stop-partition, :values [nil]}]))]
;; Needed by Mallory -- to transform an operation type into a specific operation
(defn dispatch [op test ctx]
(case (:f op)
:start-partition ((fn start [_ _] {:type :info
:f :start-partition
:value (or (:value op) (rand-nth targets))}) test ctx)
:stop-partition stop
nil))

{:generator (when needed? gen)
:final-generator (when needed? stop)
:nemesis (partition-nemesis db)
:perf #{{:name "partition"
:start #{:start-partition}
:stop #{:stop-partition}
:color "#E9DCA0"}}
;; these two fields are needed by Mallory
:ops ops
:dispatch dispatch}))
```

An example `nemesis-packages` function (with many custom nemesis packages):

```Clojure
(defn nemesis-packages
"Constructs a nemesis and generators for dqlite."
[opts]
(let [opts (update opts :faults set)]
(->> (concat [(nc/partition-package opts)
(nc/db-package opts)
(member-package opts)
(stop-package opts)
(stable-package opts)]
(:extra-packages opts))
(remove nil?))))
```

A much simpler one:

```Clojure
(defn nemesis-packages
"Builds a combined package for the given options."
[opts]
(->> (nc/nemesis-packages opts)
(concat [(member-package opts)])
(remove nil?)))
```


## Contributions

### Contributors

* Ruijie Meng
* George Pîrlea
* Abhik Roychoudhury
* Ilya Sergey

### Other Contributors

We use [Jepsen](https://jepsen.io/) as the underlying tool. Thanks to Jepsen's developers. We also welcome other contributors to improve and extend Mallory.

## License

This project is licensed under the Apache License 2.0 - see the [LICENSE](./LICENSE) file for details.
14 changes: 7 additions & 7 deletions docker/bin/up
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,7 @@ rm -rf ./control/jepsen
rm -rf ./node/jepsen
mkdir -p ./control/jepsen/jepsen
# Copy the jepsen directory if we're not mounting the JEPSEN_ROOT
if [ -z "${DEV}" ]; then

exclude_params=(
--exclude=./docker
--exclude=./.git
Expand Down Expand Up @@ -171,7 +171,7 @@ if [ -z "${DEV}" ]; then
(
cp -r ./control/jepsen ./node/jepsen
)
fi


if [ "${INIT_ONLY}" -eq 1 ]; then
exit 0
Expand All @@ -180,29 +180,29 @@ fi
exists docker ||
{ ERROR "Please install docker (https://docs.docker.com/engine/installation/)";
exit 1; }
exists docker-compose ||
exists docker compose ||
{ ERROR "Please install docker-compose (https://docs.docker.com/compose/install/)";
exit 1; }

if [ "${BUILD}" -eq 1 ]; then
INFO "Running \`docker-compose build\`"
# shellcheck disable=SC2086
docker-compose --compatibility -p jepsen -f docker-compose.yml ${COMPOSE} ${DEV} build
docker compose --compatibility -p jepsen -f docker-compose.yml ${COMPOSE} ${DEV} build
fi

# We need a fresh share volume each time we start, so we have a correct set of
# DB hosts. Why does Docker make sharing state SO hard
docker run --rm -v jepsen_jepsen-shared:/data/ debian:bullseye rm /data/nodes || true

INFO "Running \`docker-compose up\`"
INFO "Running \`docker compose up\`"
if [ "${RUN_AS_DAEMON}" -eq 1 ]; then
# shellcheck disable=SC2086
docker-compose --compatibility -p jepsen -f docker-compose.yml ${COMPOSE} ${DEV} up -d
docker compose --compatibility -p jepsen -f docker-compose.yml ${COMPOSE} ${DEV} up -d
INFO "All containers started! Run \`docker ps\` to view, and \`bin/console\` to get started."
else
INFO "Please run \`bin/console\` in another terminal to proceed"
# shellcheck disable=SC2086
docker-compose --compatibility -p jepsen -f docker-compose.yml ${COMPOSE} ${DEV} up
docker compose --compatibility -p jepsen -f docker-compose.yml ${COMPOSE} ${DEV} up
fi

popd
6 changes: 3 additions & 3 deletions docker/node/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -47,9 +47,9 @@ RUN echo "deb http://apt.llvm.org/buster/ llvm-toolchain-buster-12 main" >> /etc
RUN echo "deb-src http://apt.llvm.org/buster/ llvm-toolchain-buster-12 main" >> /etc/apt/sources.list
RUN wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key | apt-key add -
RUN apt-get update -qy
RUN apt-get install -qy clang-12 lld-12
RUN ln -s /usr/bin/clang-12 /usr/bin/clang && ln -s /usr/bin/clang++-12 /usr/bin/clang++ && \
ln -s /usr/bin/llvm-config-12 /usr/bin/llvm-config
RUN apt install -qy clang-11 lld-11
RUN ln -s /usr/bin/clang-11 /usr/bin/clang && ln -s /usr/bin/clang++-11 /usr/bin/clang++ && \
ln -s /usr/bin/llvm-config-11 /usr/bin/llvm-config
# end build tools

# Install coverage dependencies
Expand Down
6 changes: 3 additions & 3 deletions tests/mallory/dqlite/test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ setup-inner() {
}

setup() {
lxc launch images:ubuntu/22.04 jepsen -c limits.kernel.core=-1
lxc launch ubuntu:ubuntu/22.04 jepsen -c limits.kernel.core=-1
sleep 5
push-this-repo
lxc exec "$jepsen" -- "$workspace/jepsen.dqlite/test.sh" setup-inner "$@"
Expand Down Expand Up @@ -106,8 +106,8 @@ run-inner() {
}

run() {
test "$(sysctl -n kernel.core_pattern)" = core || exit 1
test "$(sysctl -n fs.suid_dumpable)" -gt 0 || exit 1
test "$(sudo sysctl -n kernel.core_pattern)" = core || exit 1
sudo sysctl -n fs.suid_dumpable
push-this-repo
lxc exec $jepsen -- \
env RAFT_BRANCH="${RAFT_BRANCH:-canonical/master}" \
Expand Down