expready is a CLI tool that checks whether study inputs are analysis-ready before downstream workflows.
It focuses on metadata quality, design quality, and sample-ID consistency across files.
pip install expreadyvalidate: check your inputs and create an HTML report.fix: clean common formatting issues in metadata/manifest files.
Supported formats for all inputs: .csv, .tsv, .txt.
Delimiter handling is flexible: expready auto-detects common delimiters (comma, tab, semicolon, pipe, or whitespace-separated columns).
All input files must include a header row (column names in the first row). Headerless files are not supported.
metadata file (--metadata):
- One row per sample.
- Required columns:
- Metadata sample-ID column (default
sample_id, or the column passed via--sample-id/--metadata-id) - Condition column (default
condition; if defaultconditionis not found,treatmentis tried; or use--condition)
- Metadata sample-ID column (default
- If you pass
--batch,--pair, or--covars, those columns must exist in metadata. - Metadata sample-ID values should be unique and non-empty.
matrix file (--matrix):
- Feature-by-sample table (rows = features, sample IDs in columns).
- Sample column names should match metadata sample-ID values exactly when both files are used.
- If metadata is not provided,
expreadyinfers metadata from matrix sample columns. - Caveat: matrix-only runs are supported, but inferred metadata can weaken design checks; use a real metadata file for more reliable design validation.
- Common annotation headers such as
gene_id,feature_id, or#OTU IDare supported as non-sample columns.
manifest file (--manifest):
- Sample inventory table (for example, sample ID + file path columns).
- Must contain the sample-ID column specified by
--sample-id/--manifest-id(defaultsample_id). - Sample IDs in this column should match metadata sample-ID values exactly.
Metadata (metadata.csv)
sample_id,condition,batch
S1,Control,B1
S2,Control,B1
S3,Treated,B2
S4,Treated,B2Matrix (matrix.tsv)
gene_id S1 S2 S3 S4
GeneA 10 12 4 6
GeneB 0 1 8 9Manifest (manifest.tsv)
sample_id file_path
S1 /data/S1.fastq.gz
S2 /data/S2.fastq.gz
S3 /data/S3.fastq.gz
S4 /data/S4.fastq.gzUse these as starting points for common experiment types.
Bulk RNA-seq (metadata + count matrix):
expready validate --metadata metadata.csv --matrix counts.tsv --condition condition --output reports/rnaseqMicrobiome / 16S (feature table + metadata):
expready validate --metadata metadata.tsv --matrix feature_table.tsv --condition group --output reports/microbiomeMetabolomics (intensity matrix with batch effect checks):
expready validate --metadata metadata.csv --matrix intensities.csv --condition condition --batch batch --output reports/metabolomicsPaired or blocked design (for paired samples/subjects):
expready validate --metadata metadata.csv --matrix counts.tsv --condition condition --pair pair_id --output reports/pairedManifest consistency check (sample inventory + paths):
expready validate --metadata metadata.csv --manifest manifest.tsv --output reports/manifest_checkPass example (metadata + matrix):
expready validate --metadata examples/metadata_valid.csv --matrix examples/matrix_valid.tsv --output reports/test_pass --report pass_reportExpected:
- Console shows
Status: PASS - Writes
reports/test_pass/pass_report.html
Fail example (matrix only):
expready validate --matrix examples/matrix_valid.tsv --output reports/test_fail --report fail_reportExpected:
- Console may show
Status: FAIL(this is expected for this demo path) - Writes
reports/test_fail/fail_report.html - Writes
reports/test_fail/metadata.inferred.csv
validate: requires at least one of--metadataor--matrixfix: requires at least one of--metadata,--matrix, or--manifest- Supported tabular file formats:
.csv,.tsv,.txt(delimiter is auto-detected)
--metadata FILE: metadata table path.--sample-id COLUMN: shared sample-ID column name for metadata and manifest (default:sample_id).--metadata-id COLUMN: metadata sample-ID column name (overrides--sample-idfor metadata).--condition COLUMN: main grouping column (default:condition, matched case-insensitively; if defaultconditionis not found,treatmentis tried).--batch COLUMN: optional batch column.--pair COLUMN: optional pair/block column.--covars COLS...: optional covariate columns (space-separated names).--contrast A_vs_B: optional contrast inGroupA_vs_GroupBformat.
--matrix FILE: feature-by-sample matrix path.- Sample IDs are read from matrix sample columns.
- If
--metadatais omitted, metadata is inferred from matrix sample columns.
--manifest FILE: manifest table path.--manifest-id COLUMN: manifest sample-ID column name (overrides--sample-idfor manifest).--manifest-path COLUMN: manifest column containing file paths.--check-paths: check whether manifest paths exist on disk (uses--manifest-pathor common names likefile_path).- Used for cross-file sample-ID consistency checks against metadata.
Other command options:
--report NAME: output report filename forvalidate(.htmlis added if omitted).--format FMT: fixed-table output format forfix(tsvorcsv, default:tsv).
Writes:
report.html(or your--reportfilename)metadata.inferred.csv(only when--metadatais omitted)
Behavior:
- Provide at least one of
--metadataor--matrix. - Input-contract errors fail fast and do not write a report (for example: missing required columns requested by CLI options, inconsistent delimiters, or invalid manifest sample/path column settings).
- If you provide only
--matrix, expready builds metadata from matrix sample columns and saves it asmetadata.inferred.csv. - Matrix-only validation is useful for quick checks, but inferred metadata depends on sample-name patterns and may reduce design-check accuracy.
- Sample-ID and condition column-name matching is case-insensitive, and treats
_,-, and spaces as equivalent for matching. - If you provide
--manifest, expready compares metadata sample-ID values (from--sample-idor--metadata-id) to the manifest column set by--sample-idor--manifest-id. --sample-idsets a shared default;--metadata-idand--manifest-idoverride it per file.- If the manifest sample column is missing, validation exits with an input error and does not generate a report.
validateexpects sample IDs to match exactly across files.
Examples:
# matrix-only validation (metadata inferred automatically)
expready validate --matrix counts.tsv --output reports/validate_matrix_only
# metadata + matrix validation
expready validate --metadata metadata.csv --matrix counts.tsv --output reports/validate_meta_matrix
# metadata + manifest validation (manifest column is named "rownames")
expready validate --metadata metadata.csv --manifest manifest.tsv --manifest-id rownames --output reports/validate_meta_manifestWrites:
metadata.fixed.<fmt>(if--metadatais provided, or inferred from--matrix)manifest.fixed.<fmt>(if manifest is provided)fix.log
Behavior:
--metadatais optional.- With
--matrixand no--metadata, metadata is inferred and saved tometadata.fixed.<fmt>. - With only
--manifest, nometadata.fixed.<fmt>is written. --formatcontrols fixed table format and extension (tsv->.tsv,csv->.csv).fixdoes not map different sample-ID schemes. It only does safe cleanup (trim spaces, standardize empty-like values, remove fully empty rows).
Examples:
# metadata fixed + manifest fixed + fix.log
expready fix --metadata metadata.csv --manifest manifest.tsv --output reports/fix_meta_manifest
# metadata inferred from matrix, then fixed + fix.log
expready fix --matrix counts.tsv --output reports/fix_from_matrix
# only manifest fixed + fix.log
expready fix --manifest manifest.tsv --output reports/fix_manifest_only
# optional: write fixed tables as CSV instead of the default TSV
expready fix --metadata metadata.csv --manifest manifest.tsv --output reports/fix_csv --format csvWhat each output file means:
report.html: main validation report with status, issue list, and suggested fixes.metadata.inferred.csv: metadata generated from matrix sample columns (only when metadata input is omitted).metadata.fixed.<fmt>: cleaned metadata written byfix(fmtistsvby default, orcsvif selected).manifest.fixed.<fmt>: cleaned manifest written byfix(fmtistsvby default, orcsvif selected).fix.log: summary of whatfixchanged (empty-like values standardized, fully empty rows removed, headers normalized/skipped).
How to read validation status:
PASS: no blocking issues were found.FAIL: at least one blocking issue was found.
How to prioritize issues in report.html:
Extreme: blocking issue; fix these first.Moderate: non-blocking but important quality risk.None: informational check passed/no action required.
Issue sections:
Metadata: schema and sample-ID quality checks.Design: group structure and model-readiness checks.Cross-file: sample-ID consistency across metadata, matrix, and manifest.
Common report language and what it means:
Blocking issue: an issue severe enough to set overall status toFAIL.Some metadata sample IDs are missing in the matrix: sample IDs exist in metadata but are not found in matrix sample columns.Some matrix sample IDs are not listed in metadata: sample IDs exist in matrix columns but not in metadata.Some metadata sample IDs are missing in the manifest: sample IDs exist in metadata but are not found in the manifest sample-ID column.Manifest sample-ID column was not found: the column passed via--manifest-iddoes not exist in manifest.Manifest path column was not found: the column passed via--manifest-pathdoes not exist in manifest.Header names contain spaces: non-blocking warning; runfixto normalize headers with underscores.Input file appears to have inconsistent delimiters: rows do not have a consistent column structure (often caused by mixed tabs/spaces/commas). In CLIvalidate, this is treated as an input error and the run exits before report generation.Duplicate sample IDs: the same metadata sample-ID value appears in more than one metadata row.Required metadata fields are empty: required columns (like metadata sample ID or condition) contain missing values.Some condition groups have too few replicates: at least one condition group has fewer than 2 samples.Condition and batch are fully linked: condition and batch are one-to-one, so their effects cannot be separated.A category value appears only once: a value in condition, batch, or covariates appears for only one sample.Model setup is too complex for the sample count: estimated model terms are too many for available samples.
expready --help
expready validate --help
expready fix --help