Skip to content

cmuparlay/PIM-Scheduling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PIM Scheduling

This repository contains an implementation of scheduling algorithms for Processing-in-Memory (PIM) systems.

If you use our code, please cite our paper:

@inproceedings{kang2026nonclairvoyant,
    author = {Kang, Hongbo and Zhao, Yiwei and Agrawal, Kunal and Wu, Yongwei and Gibbons, Phillip B.},
    title = {Non-Clairvoyant Scheduling for Processing-in-Memory},
    year = {2026},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    booktitle = {Proceedings of the 38th ACM Symposium on Parallelism in Algorithms and Architectures},
    location = {London, UK},
    series = {SPAA '26},
    doi = {10.1145/3816782.3819200},
    url = {https://doi.org/10.1145/3816782.3819200}
}

The code consists of two parts:

  • simulator/ -- a closed-form makespan simulator that runs on any Linux/Windows host. Implements clairvoyant (Sec. 4) and non-clairvoyant (Sec. 5) schedulers alongside four baselines, plus a parameter sweep harness.
  • upmem/ -- a real-hardware port targeting UPMEM DPUs. Same algorithm set, driven through libdpu and a shared DPU kernel.

third-party/ holds git submodules (parlaylib, UPMEM SDK). Initialise with git submodule update --init --recursive before building.

Simulator

C++17 + pthreads. parlaylib is fetched as a submodule; nothing else is required.

Build

cmake -B simulator/build -S simulator -DCMAKE_BUILD_TYPE=Release
cmake --build simulator/build -j

Produces three binaries under simulator/build/:

  • pim_sim -- single-config run; prints per-algorithm makespan and ratios.
  • pim_sweep -- parallel regime x P x seed grid; emits CSV to stdout.
  • pim_tests -- unit + invariant tests.

Run

Two-regime demo (Strong-CPU and Weak-CPU on the same workload):

./simulator/build/pim_sim

Single custom regime via --phi:

./simulator/build/pim_sim --phi 3 --target-dist zipf --zipf-theta 1.2 \
                         --work-dist pareto --pareto-alpha 1.5

Parameter sweep (default 5 seeds x {8,16,32,64} PIMs x 2 regimes):

./simulator/build/pim_sweep > results.csv
./simulator/build/pim_sweep --Ps "8,16,32,64" --seeds 10 > results.csv

Tests:

./simulator/build/pim_tests

A larger sweep script lives at simulator/bench/run_param_sweep.sh and prints a markdown ratio table over 26 settings of (P, phi, skew, work distribution).

CLI options

--P, --n, --m, --seed, --obj-size, --desc, --reply, --work, --mu, --B, --phi
--target-dist {uniform,zipf}     --zipf-theta <f>     (default 0.99)
--work-dist {constant,exp,pareto} --pareto-alpha <f>  (default 1.5)
--Ps, --seeds                                          sweep-only

./simulator/build/pim_sim --help lists every flag with its default. Without --phi, both pim_sim and pim_sweep run the dual Strong-CPU + Weak-CPU demo on the requested workload.

UPMEM port

Host-side C++17 against UPMEM libdpu; the DPU kernel is built with dpu-upmem-dpurte-clang. CMake locates both via pkg-config or the standard UPMEM SDK paths.

Layout

upmem/
  include/                shared host headers
    pim_hw.h              hardware parameters (B, E, E_CPU, ...)
    model.h               Object, Task, Workload, SystemParams
    workload.h            generator interface
    schedule.h            Schedule = sequence of Phases (Push/Pull/CpuExec)
    runtime.h             DPU dispatcher + wall-clock measurement
    lower_bound.h         per-instance lower bounds
    algo.h                AlgoFn interface shared by baselines/Clv/NCV
    common/kernel_args.h  layout shared with the DPU kernel
  host/                   host-side C++17 implementation
    main.cpp              CLI entry point
    workload.cpp          deterministic hash-based generator
    runtime.cpp           libdpu wrapper, MRAM packing, phase execution
    lower_bound.cpp
    algos/                one .cpp per algorithm
  dpu/                    DPU-side C
    kernel_exec.c         single-binary execution kernel
  bench/                  smoke + calibration + sweep helpers
  CMakeLists.txt          host + DPU build

The host binary and the compiled DPU kernel land side by side in upmem/build/pim_upmem and upmem/build/kernel_exec.dpu, so the host loads the kernel by relative path at runtime.

Build

cmake -B upmem/build -S upmem -DCMAKE_BUILD_TYPE=Release
cmake --build upmem/build -j

To pick a non-default tasklet count (1..24, default 12):

cmake -B upmem/build -S upmem -DUPMEM_NR_TASKLETS=8
cmake --build upmem/build -j

Run

cd upmem/build
./pim_upmem --n 1024 --m 100000 --target-dist zipf --zipf-theta 0.99 \
            --work-dist pareto --pareto-alpha 1.5 --work 1000

--dpus N requests a specific DPU count; omit to use every DPU returned by dpu_alloc(DPU_ALLOCATE_ALL, ...). The CLI prints one CSV row per algorithm with wall-clock makespan and ratios against the paper-merged and strict lower bounds. Run ./pim_upmem --help for all flags.

Tests and sweeps

./upmem/bench/run_tests.sh          # smoke test + small CSV checks
./upmem/bench/run_calibration.sh    # bench the hardware, suggest pim_hw.h values
./upmem/bench/run_param_sweep.sh    # 19-setting sweep over n/W/theta/work-dist
./upmem/bench/run_sweep_size.sh     # size sweep over m
./upmem/bench/run_revolver_test.sh  # NR_TASKLETS sweep using the calibrator

Hardware parameter calibration

upmem/include/pim_hw.h carries bandwidths B_push / B_pull, DPU compute rate E_DPU, host CPU rate E_CPU, DPU clock frequency, and per-launch overhead. Re-calibrate via upmem/bench/run_calibration.sh and paste the suggested constants into pim_hw.h whenever you change UPMEM_NR_TASKLETS, the DPU work loop, or the host machine.

About

[SPAA'26] Non-Clairvoyant Scheduling for Processing-in-Memory

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors