28 input file parser and classes by marrip · Pull Request #30 · InPreD/tsoppy

marrip · 2026-05-27T12:17:18Z

I have started on implementing 3 different classes for the different types of input files and a parent class that handles the identification of dragen or localapp data. I even added a parser function but that just reads the different files as is and doesn't really do any parsing yet. I can continue with the parsing while you can have a look at the first draft of the implementation.

danielvo · 2026-05-27T12:47:16Z

@@ -0,0 +1,153 @@
+import os
+from pathlib import Path
+from typing import Dict, Optional


Is this use now deprecated?

good catch, replaced with Mapping

danielvo · 2026-05-27T13:36:00Z

+        self.sample = sample
+        self.root = Path(root_path)
+        if subpath_formats:
+            self.subpath_formats.update(subpath_formats)


Trying my best to catch up on classes! If I understand correctly, if "subpath_formats" was provided as an argument when instantiating a BaseInput object, the subpath_formats class attribute should be set. May I ask where the "update()" function comes from? I assume this code is meant to set the attribute value based on the corresponding argument, but I cannot see how it is done.

danielvo · 2026-05-27T13:51:59Z

+    Subclasses should define `default_subpath_formats` mapping workflow names to
+    subpath format strings that accept `(sample, sample)` for formatting.
+
+    Attributes:


If I can nitpick on the form side, I would suggest some slight attribute naming changes:

sample -> sample_id
type -> workflow_version

(these would make the attribute names more compatible with the Illumina input/output nomenclature)

danielvo · 2026-05-27T14:33:48Z

+            raise FileNotFoundError(
+                f"No workflow file found for sample {self.sample}. Searched: {self.paths}"
+            )
+        self.type = found[0]


I like the pipeline determining itself what its input is generated by, I just wonder whether doing it repeatedly, for each input file, is the way to go (some input files have the same paths in both pipelines; also, it seems unnecessary to do this determination many times) . Should we instead aim for there being a piece of code that always detects the workflow version based on one specific indicator in the root directory (e.g., the workflow version specified in the metrics output file), and then based on that the correct path for a particular input file could be picked from the default_subpath_formats dictionary? (At that point, one should double-check that the expected file is indeed there - if a sample analysis fails due to there being too few reads for the sample for example, some output files might be missing even when we have the workflow version determined correctly.)

marrip added 4 commits May 27, 2026 13:42

feat: add classes for vcf, tmb trace and nirvana json

134267d

chore: add deps

59643a7

chore: sort imports

5a43584

chore: lint

48cee5f

marrip requested review from danielvo and tinavisnovska May 27, 2026 12:17

marrip self-assigned this May 27, 2026

danielvo reviewed May 27, 2026

View reviewed changes

chore: replace Dict with Mapping

de6026e

marrip force-pushed the 28-input-file-parser-and-classes branch from 33b4597 to de6026e Compare May 27, 2026 13:21

danielvo reviewed May 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

28 input file parser and classes#30

28 input file parser and classes#30
marrip wants to merge 5 commits into
developfrom
28-input-file-parser-and-classes

marrip commented May 27, 2026

Uh oh!

danielvo May 27, 2026

Uh oh!

marrip May 27, 2026

Uh oh!

danielvo May 27, 2026 •

edited

Loading

Uh oh!

danielvo May 27, 2026

Uh oh!

danielvo May 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

marrip commented May 27, 2026

Uh oh!

danielvo May 27, 2026

Choose a reason for hiding this comment

Uh oh!

marrip May 27, 2026

Choose a reason for hiding this comment

Uh oh!

danielvo May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danielvo May 27, 2026

Choose a reason for hiding this comment

Uh oh!

danielvo May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

danielvo May 27, 2026 •

edited

Loading

danielvo May 27, 2026 •

edited

Loading