Add validation scripts for pipeline integrity checks#12
Open
eddUG wants to merge 3 commits into
Open
Conversation
Owner
Author
|
Validation report: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hi @ahernank, here's a set of validation scripts developed to ensure that each step of the pipeline functions as intended. We take a modular approach, tracking through each task while making use of scripts by a more appropriate scripting language for each module. Specifically, the validation process includes checks for input file accessibility, integrity, and format, as well as output file format and association. This enhancement aims to improve the reliability and accuracy of our data processing pipeline.
Details of Changes:
Input file accessibility: The scripts include functions to verify that all required input files are present and accessible.Input file integrity and format: The scripts perform checks to ensure that input files are complete, uncorrupted, and in the correct format.Output file format and association: The scripts validate that the output files are correctly formatted and properly associated with their respective input files.Modes of Operation:
Interactive mode: Users can run each module validation individually. This mode is useful for debugging and verifying specific parts of the pipeline.Non-Interactive mode: Users can run all module validations in a batch using theall_in_one_validationscript. This mode is efficient for full pipeline checksWe aim to catch and address data processing potential issues in the workflow to improve the overall robustness of our pipeline.
Resolves #11