Tiny Data Sync schedules recurring rsync synchronizations through Slurm.
The script ingests .env configuration profiles and produces self-resubmitting batch jobs that keep pairs of directories in sync on a somewhat predictable cadence.
- Auto-discovers configuration profiles from
$HOME/.config/td-sync.dor$TD_CONFIG_DIR. - Generates per-profile Slurm job scripts that execute
rsync, log results, send notifications, and requeue themselves with--begin=+TD_SUBMIT_INTERVAL. - Creates and rotates log files under
$PWD/logsor$TD_LOG_DIRwith optional retention viaTD_LOG_RETENTION_DAYS. - Provides mock
sbatch,mail, andsendmailutilities for isolated testing viaTD_TEST_MODE. - Ships a Docker test environment with
bats-corefor unit tests.
# Run tests inside the provided Docker container
make testTo exercise the script locally without Docker:
export TD_TEST_MODE=1
./bin/td-sync --helpEach profile is a simple .env file. Required keys:
TD_SRC– source path forrsync.TD_DEST– destination path forrsync.TD_SUBMIT_INTERVAL– Slurm delay (seconds or strings like7days).TD_SLURM_ACCOUNT– Slurm account (defaults to the profile filename when omitted).
Optional keys:
TD_NOTIFY– comma-separated email recipients.TD_DRY_RUN–1ortrueforrsync --dry-run.TD_SLURM_RUNTIME– job runtime (HH:MM:SS, defaults to01:00:00).TD_RSYNC_OPTS– additionalrsyncflags (e.g.,--exclude .venv --delete).
Setting TD_TEST_MODE to 1 or true prepends mocks/bin to PATH, replacing sbatch, mail, and sendmail with test doubles. Use this when running unit tests or developing on a system without Slurm or a mail transfer agent.
If TD_LOG_RETENTION_DAYS is set, Tiny Data Sync purges logs older than the configured number of days using find -mtime each time the script runs.
The Dockerfile under docker/ uses Debian Bookworm, respects the host architecture via TARGETPLATFORM, installs bats-core, and defaults to executing the Bats test suite.
Below is a representative profile and command sequence for a project that mirrors data from a scratch volume to a long-term storage area every week:
-
Create a profile such as
~/ .config/td-sync.d/data-sync.envcontaining:TD_SRC=/scratch/project/data/ TD_DEST=/archive/project/data/ TD_SUBMIT_INTERVAL=7days TD_SLURM_ACCOUNT=research TD_NOTIFY=user@example.edu TD_DRY_RUN=1 TD_SLURM_RUNTIME=02:00:00 TD_RSYNC_OPTS=--exclude .venv --delete
-
Export any optional runtime variables, then execute the scheduler:
export TD_LOG_RETENTION_DAYS=7 td-sync/bin/td-sync -
Tiny Data Sync will submit a Slurm job that performs an
rsync --dry-run, writes a log underlogs/, emails a short summary to the specified recipient, and requeues itself to run again after seven days. SwitchingTD_DRY_RUNto0promotes the synchronization from a preview to a real transfer. Any customTD_RSYNC_OPTS(such as excluding.venv) are appended to thersyncinvocation.
When TD_RSYNC_OPTS contains multiple arguments, quote values exactly as you would on the command line. For example:
TD_RSYNC_OPTS="--filter='- .venv' --filter='- __pycache__'"Tiny Data Sync evaluates these options inside the generated job script, so each quoted segment is preserved and forwarded to rsync without additional escaping.