Skip to content

Add validation scripts for pipeline integrity checks#12

Open
eddUG wants to merge 3 commits into
mainfrom
pipeline_validation
Open

Add validation scripts for pipeline integrity checks#12
eddUG wants to merge 3 commits into
mainfrom
pipeline_validation

Conversation

@eddUG

@eddUG eddUG commented Jul 16, 2024

Copy link
Copy Markdown
Owner

Hi @ahernank, here's a set of validation scripts developed to ensure that each step of the pipeline functions as intended. We take a modular approach, tracking through each task while making use of scripts by a more appropriate scripting language for each module. Specifically, the validation process includes checks for input file accessibility, integrity, and format, as well as output file format and association. This enhancement aims to improve the reliability and accuracy of our data processing pipeline.

Details of Changes:

  • Input file accessibility: The scripts include functions to verify that all required input files are present and accessible.
  • Input file integrity and format: The scripts perform checks to ensure that input files are complete, uncorrupted, and in the correct format.
  • Output file format and association: The scripts validate that the output files are correctly formatted and properly associated with their respective input files.

Modes of Operation:

Interactive mode: Users can run each module validation individually. This mode is useful for debugging and verifying specific parts of the pipeline.
Non-Interactive mode: Users can run all module validations in a batch using the all_in_one_validation script. This mode is efficient for full pipeline checks

We aim to catch and address data processing potential issues in the workflow to improve the overall robustness of our pipeline.

Resolves #11

@eddUG eddUG assigned eddUG and ahernank and unassigned eddUG Jul 16, 2024
@eddUG eddUG requested a review from ahernank July 16, 2024 20:54
@eddUG

eddUG commented Jul 16, 2024

Copy link
Copy Markdown
Owner Author

Validation report:
In addition to performing the checks, the all_in_one_validation script generates a report in text format (validation_report.txt). This report provides the status of each module validation, including hint about an issue where any is encountered. See sample validation report below:

Validation Checklist Report
===========================
Validate Input Manifest: Success
------------------------------
Validate SelectVariants Output: Success
------------------------------
Validate Bgzip and Tabix: Success
------------------------------
Validate WhatsHap Phase: Success
------------------------------
Validate WhatsHap Stats: Success
------------------------------
Validate MergeVcfs: Failed
  Details: Missing file: merged.vcf.gz
------------------------------
Validate BgzipAndTabixII: Success
------------------------------
Validate SHAPEIT: Success
------------------------------
Validate TabixII: Success
------------------------------
Validate LigateRegions: Success
------------------------------
Validate cohortVcfToZarr: Success
------------------------------

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Technical validation - Ensuring Pipeline Integrity

2 participants