Skip to content

Stage modularity, round 2 #418

@mwhamgenomics

Description

@mwhamgenomics

At the moment, pipelines are split into stages but handling of intermediate files is messy and it's hard to modify pipelines. It would be better if:

  • stages take generic parameters for input/output files, not predetermined file paths
  • file paths are defined by the Pipeline object, or whatever the stages are being used by
  • no wildcards - part from date-stamped files in BCBio, we should know what every file will be called
    • this might make output_files.yaml redundant
  • checks for whether a stage should run uses presence of a reporting app stage is used as well as presence of files, not instead

We should also be able to mock a dataset, patch executor, run a pipeline and assert what all the bash commands were.

We should also make Stage objects as lightweight as possible, ideally removing their access to Dataset - see #395 for reasons why.

This might be a good opportunity to look at sciluigi

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions