Skip to content

28 input file parser and classes#30

Open
marrip wants to merge 5 commits into
developfrom
28-input-file-parser-and-classes
Open

28 input file parser and classes#30
marrip wants to merge 5 commits into
developfrom
28-input-file-parser-and-classes

Conversation

@marrip
Copy link
Copy Markdown
Contributor

@marrip marrip commented May 27, 2026

I have started on implementing 3 different classes for the different types of input files and a parent class that handles the identification of dragen or localapp data. I even added a parser function but that just reads the different files as is and doesn't really do any parsing yet. I can continue with the parsing while you can have a look at the first draft of the implementation.

@marrip marrip requested review from danielvo and tinavisnovska May 27, 2026 12:17
@marrip marrip self-assigned this May 27, 2026
Comment thread src/tsoppy/general/input_classes.py Outdated
@@ -0,0 +1,153 @@
import os
from pathlib import Path
from typing import Dict, Optional
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this use now deprecated?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, replaced with Mapping

@marrip marrip force-pushed the 28-input-file-parser-and-classes branch from 33b4597 to de6026e Compare May 27, 2026 13:21
self.sample = sample
self.root = Path(root_path)
if subpath_formats:
self.subpath_formats.update(subpath_formats)
Copy link
Copy Markdown

@danielvo danielvo May 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying my best to catch up on classes! If I understand correctly, if "subpath_formats" was provided as an argument when instantiating a BaseInput object, the subpath_formats class attribute should be set. May I ask where the "update()" function comes from? I assume this code is meant to set the attribute value based on the corresponding argument, but I cannot see how it is done.

Subclasses should define `default_subpath_formats` mapping workflow names to
subpath format strings that accept `(sample, sample)` for formatting.

Attributes:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I can nitpick on the form side, I would suggest some slight attribute naming changes:

sample -> sample_id
type -> workflow_version

(these would make the attribute names more compatible with the Illumina input/output nomenclature)

raise FileNotFoundError(
f"No workflow file found for sample {self.sample}. Searched: {self.paths}"
)
self.type = found[0]
Copy link
Copy Markdown

@danielvo danielvo May 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the pipeline determining itself what its input is generated by, I just wonder whether doing it repeatedly, for each input file, is the way to go (some input files have the same paths in both pipelines; also, it seems unnecessary to do this determination many times) . Should we instead aim for there being a piece of code that always detects the workflow version based on one specific indicator in the root directory (e.g., the workflow version specified in the metrics output file), and then based on that the correct path for a particular input file could be picked from the default_subpath_formats dictionary? (At that point, one should double-check that the expected file is indeed there - if a sample analysis fails due to there being too few reads for the sample for example, some output files might be missing even when we have the workflow version determined correctly.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants