This is the artifact of the submitted manuscript Understanding and Detecting Scalability Faults in Large-Scale Distributed Systems. This artifact contains two parts: (1) raw data description and (2) experiment reproduction.
Source code for ScaleLens can be found here and the Docker images for experiments can be found here.
Quick links: empirical data | evaluation data | mini-scale experiments | full-scale experiments | ablation analysis
This section provides scripts to generate all data presented in paper. Python package pandas is required (install it).
This section presents the raw data used for investigating the anti-patterns of scalability faults. In this section, 444 scalability faults were analyzed and categorized into 4 root-cause categories and 11 anti-patterns.
A CSV file that contains all 444 scalability faults, as well as tags indicating their anti-patterns, can be found here.
The scripts below run from a clone of this repository:
git clone https://github.com/ucd-plse/scalability
cd scalabilityTo generate table 1 in the paper, run the following command:
REPO_DIR=$(git rev-parse --show-toplevel)
python $REPO_DIR/scripts/table-1-breakdown.pyTo generate all data for the anti-patterns in empirical study, run the following command:
REPO_DIR=$(git rev-parse --show-toplevel)
python $REPO_DIR/scripts/pattern-analysis.pyIn this section, we describe the raw data from our evaluation. For RQ1, we provide raw outputs from ScaleLens and ScaleCheck. For RQ2, we provide our analysis details.
Raw outputs from ScaleLens are available here. There are one json file for each system, listing all DCFs, their dimensions with relationship, and the anti-patterns they are associated with.
ScaleCheck is our baseline to compare with ScaleLens. Raw outputs from ScaleCheck are available here. For each system, there is one file containing the list of DCFs detected by ScaleCheck.
We provide our ablation analysis of the results from ScaleLens on the latest versions of Cassandra, HDFS, and Ignite. The analysis details are available here.
This section provides instructions to reproduce the experiments conducted in the study. The ScaleLens pipeline (source code) is shipped as Docker images on Docker Hub (link); each image bundles the JAR, scripts, workloads, and the target system source so no further assembly is required.
The mini experiments are intended as a quick way to exercise the full ScaleLens pipeline end-to-end on a single machine. We recommend running a mini experiment as a smoke test before launching the larger experiments in §2.2, to confirm that ScaleLens behaves as expected in your environment.
We provide three self-contained Docker images:
ucdavisplse/scalelens:CA-3.11.0-minifor CASSANDRA-3.11.0 (add-nodeworkload)ucdavisplse/scalelens:HD-3.1.0-minifor HDFS-3.1.0 (add-dn-blockworkload)ucdavisplse/scalelens:IG-2.8.0-minifor IGNITE-2.8.0 (add-nodeworkload)
The mini experiments have been tested on a 32 GB / 8-core Ubuntu 20.04 host, where each run completes in approximately 20 minutes (video). Other platforms might also work but have not been validated.
To run a mini experiment, start the container from one of the three images. For CASSANDRA-3.11.0:
docker run -it --pull always ucdavisplse/scalelens:CA-3.11.0-miniThis drops you into a shell at /home/scaleview/scaleview-core inside the container. From inside the container, run:
bash run.sh ./experiments/CA-3.11.0.yaml CA-3.11.0-workspaceAfter the experiment completes, the results are in CA-3.11.0-workspace/ inside the container:
sdeps.json— DCFs detected by ScaleView, with their correlated dimensionssdeps-patterns.json— anti-pattern labels assigned by ScalePickstatistics.txt— summary counts
Run with HD-3.1.0-mini instead
docker run -it --pull always ucdavisplse/scalelens:HD-3.1.0-miniFrom inside the container, run:
bash run.sh ./experiments/HD-3.1.0.yaml HD-3.1.0-workspaceOutputs land in HD-3.1.0-workspace/.
Run with IG-2.8.0-mini instead
docker run -it --pull always ucdavisplse/scalelens:IG-2.8.0-miniFrom inside the container, run:
bash run.sh ./experiments/IG-2.8.0.yaml IG-2.8.0-workspaceOutputs land in IG-2.8.0-workspace/.
Each SYSTEM-VERSION studied in the paper is also published as a full image. To run an experiment, for example with Cassandra 4.1.0, start the image:
docker run -it --pull always ucdavisplse/scalelens:CA-4.1.0You will be dropped into a shell at /home/scaleview/scaleview-core. From inside the container, run the experiment:
bash run.sh experiments/CA-4.1.0.yaml workspaceTo run other experiments, replace CA-4.1.0 with any experiment ID (SYSTEM-VERSION) listed in the experiments directory here. Outputs are written to workspace/ inside the container; full-scale runs require substantially more memory and time than the mini variants in §2.1.
The ablation analysis (Table III) is reproduced from the static evaluation data shipped in this repository, so the only setup is a clone:
git clone https://github.com/ucd-plse/scalability
cd scalabilityTo list statistics of all fragments without applying ScaleView:
REPO_DIR=$(git rev-parse --show-toplevel)
cd $REPO_DIR/evaluation-data/ablation-study/all-fragments && python run_stat.pyTo see how ScaleLens narrows down to DCFs:
REPO_DIR=$(git rev-parse --show-toplevel)
cd $REPO_DIR/evaluation-data/ablation-study/scalelens && python run_stat.py