Baryon is a tool that enables bioinformatics workflows to run across different environments and operating systems without any manual conversion or adaptation. Researchers simply describe their pipeline once in a .bala configuration file, and Baryon takes care of generating ready-to-use scripts for multiple execution platforms.
The core idea is straightforward: a researcher creates a .bala file containing all the information needed for a bioinformatics analysis — parameters, input files, directories, and execution logic — in a simple, structured format. Baryon reads this file, validates it, and automatically generates execution scripts for:
- Python
- Galaxy
- R
- Bash
- Nextflow
- Streamflow
Each generated script guides the user to customize paths and parameters, then runs the appropriate Docker container. Any researcher who wants to reproduce the analysis only needs to supply their own files and parameters.
Prerequisite: The script to be executed, along with its full environment, must be packaged as a Docker image.
The .bala file is the single configuration file that drives Baryon's script generation.
- Keywords are case-insensitive (can be written in uppercase, lowercase, or mixed).
- Values after
=are case-sensitive and must be written exactly as required. - Lines starting with
#are comments and are ignored. - Each piece of information must be on a single line.
- Information is organized into named sections enclosed in square brackets.
Contains metadata about the experiment or publication. Only one section is allowed.
| Keyword | Description |
|---|---|
name |
Name of the experiment; used to name the generated scripts. |
description |
Optional. Description propagated into generated files. |
Defines input files to pass to the application. Multiple sections are allowed.
| Keyword | Description |
|---|---|
name |
Unique name for the file (required). |
flag |
RO = read-only, cp = copy to workdir before processing, nc = no copy. |
description |
Optional description propagated to generated scripts. |
Defines directories to pass to the application. Multiple sections are allowed.
| Keyword | Description |
|---|---|
name |
Unique name. workDir is mandatory — it is used for copied files and to write output_log.txt. |
mount |
Required. The path/name by which the directory will be mounted inside Docker. |
flag |
ro = read-only, io = full access, in = input only, out = output only. |
description |
Optional description propagated to generated scripts. |
A numbered
scratchsubdirectory is created insideworkdirfor each run to preserve logs and copied files across executions.If an
outdirdirectory is defined, a numbered scratch subdirectory aligned withworkdir's counter is created inside it. Do not useoutdirfor scripts that expect a specific mount path.
workdirandoutdircannot be read-only.
Defines parameters to pass to the application. Multiple sections are allowed.
| Keyword | Description |
|---|---|
name |
Unique name for the parameter (required). |
value |
Default value. |
values |
Comma-separated list of accepted values (e.g. values=x,y,z). Use double quotes for values with spaces or special characters (e.g. "my value"). |
description |
Optional description propagated to generated scripts. |
Defines how the Docker container is executed. Only one section is allowed.
| Keyword | Description |
|---|---|
command |
The Docker run command and its flags. |
script |
The script to execute inside the container. Prefix with interpreter if needed (e.g. python3 /path/to/script.py). |
image |
The Docker image to use. |
usage |
Defines the order and flags of parameters. Use <name> as placeholders referencing section names. Constants can be placed directly. |
Example usage with placeholders:
usage = <param1> --param2 <param2> --parammulti <paramx> <paramy> constant_value
For output files in a specific location:
usage = <outdir>/output_file.txt
Define the corresponding directory section with name=outdir and set mount to the expected disk path.
Baryon performs the following checks on the .bala file before generating scripts:
- Lines starting with
#and blank lines are ignored. name,mount, andflagvalues must be a single word (no spaces).- Sections
[research]and[run]are mandatory, as is theworkDirdirectory. - Every placeholder in
usage(<name>) must have a corresponding section with a matchingname=. flagfor files accepts onlyCP(copy) orRO(read-only) orNC(not-copy); for directories onlyRO(read-only) orIN(Input-only) orOUT(output-only) orIO(Input-and-Output).- Extra characters after a value trigger a warning and halt processing.
- Duplicate
namevalues are not allowed. - Missing
namekeywords are flagged as errors. - All directories must have a
mountkeyword.
- Find the Docker image name and set it in
imageunder[run]. - Find the script entry point and set it in
scriptunder[run]. Prefix with the interpreter if necessary (e.g.python3 /Algorithm/script.py). - Always include a
workDirdirectory. Ascratchsubdirectory will be created for each run. - If the script accepts an output folder, use
outdirto keep numbered scratch folders aligned withworkdir. - Add any additional directories; use an appropriate
flagusuallyflag=io. - Build the
usageline with the appropriate placeholders and define a[file],[directory], or[parameter]section for each. - For file sections, use
flag=ncto avoid copying,flag=cpto copy toworkdir/scratch, orflag=rofor read-only.
Baryon supports both Docker and Singularity as container environments.
To use Singularity as a Docker‑compatible backend, simply modify the run section by setting:
command = singularity exec --writable-tmpfs- prefixing the image name with
docker://
This allows Baryon to execute containers through Singularity while pulling images from Docker registries.
When a .bala uses Singularity as its Docker environment, configuration files for Galaxy, Nextflow, and StreamFlow will not be generated, as these platforms require native Docker‑compatible metadata.
A container-based Workflow Management System designed for complex, distributed data analysis pipelines (such as bioinformatics ones).
Generated files (named after the [research] name):
<name>-params.yml— all parameters and file paths (customize this)<name>.yml— directory mounts (customize this)<name>.cwl— CWL workflow definition (do not modify)
Philosophy:
- All parameters are written to
<name>-params.ymlwith descriptions fromdescriptionkeywords; default values are set fromvalueor the first entry invalues. - All files appear in
<name>-params.ymlas customizable paths. - All directories are mounted in
<name>.yml. - Files with
flag=cpare copied toworkdirbefore the script is called.
Prerequisites: Docker and WSL installed; the Streamflow image from UniTO downloaded (note: the original image does not include Docker — use the local variant).
Run command (Windows):
docker run --rm -it \
-v /var/run/docker.sock:/var/run/docker.sock \
-v "${PWD}:/workflow" \
-w /workflow \
streamflow-unito-local streamflow run <name>.ymlA code-based Workflow Management System using a Groovy-derived DSL.
Generated files:
nextflow.config<name>.nf- Updated
<name>_command.txtwith the Docker launch command
Configure parameters, files, and directories in the first section of the .nf file.
Each run creates a random subdirectory under work/ containing log.out.
Run command (Windows .bat example):
set MY_PATH=/c/my_project/nextflow/my_analysis
docker run -it --rm ^
-v /var/run/docker.sock:/var/run/docker.sock ^
-v %MY_PATH%:%MY_PATH% ^
-w %MY_PATH% ^
-e DOCKER_API_VERSION=1.44 ^
nextflow/nextflow:24.10.4 nextflow run my_analysis.nf
pauseWhen input directories are present, Baryon generates two XML variants: one using Galaxy collections, one using .tar.gz archives. Output directories are always .tar.gz due to collection limitations.
Creating a collection from a directory:
- Upload all files from the directory.
- Enable the selection checkbox and select all files.
- Choose Advanced build list from the dropdown.
- Select flat list and disable the option to strip file extensions.
- Name the collection after the source directory.
Creating a .tar.gz from a directory (Windows):
tar -czf genome.tar.gz -C genome .When uploading, specify the type as tar.gz instead of auto-detect.
| Workflow | Description |
|---|---|
htgts_Full |
Analyzes sequencing data to map genomic translocations or large-scale DNA break sites. Processes two FASTQ files and two libseq files from workdir; outputs to outdir. |
index_align_bulk_rna_seq |
Bulk RNA-Seq analysis. Measures average gene expression across a cell population. Expects FASTA and annotation files in /genome/ and FASTQ reads in /scratch/. |
index_align_scrs |
Single-Cell RNA-Seq (scRNA-Seq) analysis. Tracks gene expression at single-cell level using cell barcodes. Same input structure as Bulk RNA-Seq. |
sample_sheetTolibInfo |
Converts experiment metadata from an Excel spreadsheet into a KEY=VALUE format readable by downstream HTGTS Bash pipeline scripts. |
seqkit |
A comprehensive toolkit for FASTA/FASTQ file manipulation, filtering, and statistics. Note: This workflow is specifically used here as a lightweight test to validate Singularity container integration. |
TopX |
Filters a gene count matrix, selecting the most relevant genes by variance (using edgeR) or by total count. Reads a CSV from /data/ and writes results to the same folder. |
Baryon is launched from the command line. All arguments are optional: if omitted, Baryon will prompt interactively for any missing information.
python baryon.py [BALA_FILE] [--lang LANGUAGE] [--output SCRIPT_NAME] [--overwrite] [--generate_function]| Argument | Short | Description |
|---|---|---|
BALA_FILE |
Path or name of the .bala file to process. Extension .bala can be omitted. If not provided, Baryon lists all .bala files in the current directory and lets you choose. |
|
--lang |
-l |
Target language to generate. Valid values: nextflow, streamflow, galaxy, python, r, bash, all. If not provided, Baryon shows a numbered list to choose from (default: all). |
--output |
-n |
Base name for the generated script files. If not provided, the name is derived from the name field in the [research] section of the .bala file. |
--overwrite |
-w |
Overwrite existing output files without asking for confirmation. If not set, Baryon will warn and prompt before deleting any existing files. |
--generate_function |
-f |
Generate Python, R, and Bash scripts as self-contained functions instead of standalone scripts. Has no effect on nextflow, streamflow, or galaxy targets. |
# Fully interactive — Baryon will prompt for file and language
python baryon.py
# Specify the .bala file; choose language interactively
python baryon.py HTGTS_Full
# Generate only the R script
python baryon.py HTGTS_Full --lang r
# Generate all scripts with a custom output base name
python baryon.py HTGTS_Full --lang all --output my_analysis
# Regenerate without confirmation prompts
python baryon.py HTGTS_Full --lang python --overwrite
# Generate Python and R scripts as functions instead of standalone scripts
python baryon.py HTGTS_Full --lang python --generate_function
python baryon.py HTGTS_Full --lang all -fNote: The
--overwriteflag only checks and removes files relevant to the selected language. For example,--lang pythonwill only look for an existing<name>.pyfile, while--lang allchecks all possible output files.
Note: The
--generate_functionflag applies only topython,r, andbashtargets. When used with--lang all,nextflow,streamflow, andgalaxyscripts are generated normally as standalone workflows.
# HTGTS Full analysis
docker run --rm -it \
-v /var/run/docker.sock:/var/run/docker.sock \
-v "${PWD}:/workflow" -w /workflow \
streamflow-unito-local streamflow run HTGTS_Full.yml
# TopX filtering
docker run --rm -it \
-v /var/run/docker.sock:/var/run/docker.sock \
-v "${PWD}:/workflow" -w /workflow \
streamflow-unito-local streamflow run topX.yml
# Bulk RNA-Seq
docker run --rm -it \
-v /var/run/docker.sock:/var/run/docker.sock \
-v "${PWD}:/workflow" -w /workflow \
streamflow-unito-local streamflow run index_align_bulk_rna_seq.yml
# Single-Cell RNA-Seq
docker run --rm -it \
-v /var/run/docker.sock:/var/run/docker.sock \
-v "${PWD}:/workflow" -w /workflow \
streamflow-unito-local streamflow run index_align_scrs.yml
# Sample sheet conversion
docker run --rm -it \
-v /var/run/docker.sock:/var/run/docker.sock \
-v "${PWD}:/workflow" -w /workflow \
streamflow-unito-local streamflow run sample_sheetToLibInfo.yml