Skip to content

Sid00011/AdaptiveFlow-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AdaptiveFlow

A self-tuning scheduler for streaming data workflows on heterogeneous clusters.

Research project — M1 Informatique, Université Claude Bernard Lyon 1. Targets the intersection of three Master's specializations: DiPaC (distributed scheduling), DataScale (data-aware pipelines), Autonomic Systems (MAPE-K control loop).

Summary

A streaming-workflow scheduler that combines three angles:

Component Role
HEFT-LC Static baseline — load-aware HEFT with tie-break tolerance ε
DLS Data-Locality Scheduler — locality-aware variant
RLScheduler Tabular Q-learning agent that learns per-task placements
MAPE-K controller Autonomic loop that adjusts ε online from observed metrics

Workflows arrive over time as a Poisson stream (50% MapReduce, 30% ETL, 20% random DAGs), each task carrying compute cost, input/output partitions and data volumes. The cluster models per-node bandwidth, data residency, and node failures.

Key results

Stable-workload scheduler comparison (108 runs):

Scheduler Makespan (s) Imbalance (CV) Locality
HEFT-LC 23.8 0.030 0.552
DLS 17.3 0.130 0.747
RL 46.7 0.049 0.492
  • DLS reduces makespan by 27% vs HEFT-LC by exploiting data locality (75% vs 55%).
  • MAPE-K controller reduces mean makespan by 31–39% vs static ε=0.05 on heterogeneous clusters by detecting that the default is too aggressive and lowering ε to zero — see Scenario D and the non-stationary three-phase workload (Scenario F).
  • RL agent converges from 2.57× to 1.57× the HEFT-LC baseline within 350 episodes (39% improvement). Honest result: it does not beat the strong baseline; the paper discusses why (coarse tabular state).

Structure

src/
  cluster.py          — heterogeneous cluster + data residency + failures
  dag_generator.py    — DataDAG model + WorkflowStream (Poisson arrivals)
  schedulers.py       — HEFT-LC, DLS, RLScheduler
  rl_agent.py         — tabular Q-learning agent (4-D discrete state)
  controller.py       — MAPE-K autonomic controller
  simulator.py        — event-driven streaming simulator
experiments/
  run_experiments.py  — 190 streaming runs across 5 scenarios
  generate_figures.py — 7 publication-quality figures
paper/
  paper.tex           — IEEE-format research paper (~7 pages)
data/
  results.csv         — main results table
  adaptations.csv     — controller adaptation trace
  learning_curve.csv  — per-episode RL convergence data
  shift_timeline.csv  — per-DAG timeline for the shift scenario
figures/              — fig1–fig7 (PNG + PDF)

Quick start

pip install networkx numpy pandas matplotlib scipy seaborn
python experiments/run_experiments.py     # writes data/*.csv (~3 min)
python experiments/generate_figures.py    # writes figures/*.pdf + .png

To compile the paper:

cd paper && pdflatex paper.tex

Reproducibility

Every result in the paper traces back to a row in data/results.csv or data/learning_curve.csv. All RNG seeds are explicit. The benchmark runs in under three minutes on a laptop CPU.

Citation

Sidali. (2026). AdaptiveFlow: A Self-Tuning Scheduler for Streaming Data
Workflows on Heterogeneous Clusters. Research report, Université Lyon 1.

About

Self-tuning scheduler for streaming data workflows on heterogeneous clusters. Combines HEFT-LC, data-locality scheduling, Q-learning, and a MAPE-K autonomic controller.

Topics

Resources

Stars

Watchers

Forks

Contributors