Skip to content

feat(prepro): make tests more readable#6790

Open
anna-parker wants to merge 10 commits into
display-name-fallbackfrom
prepro_tests2
Open

feat(prepro): make tests more readable#6790
anna-parker wants to merge 10 commits into
display-name-fallbackfrom
prepro_tests2

Conversation

@anna-parker

@anna-parker anna-parker commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

While reviewing #6774 I had a lot of trouble reading the tests, I think having clear test cases with descriptive names of what the test is checking makes this a lot easier to parse (or grok as @corneliusroemer would say). I got claude to reformat these tests like the other prepro test Case objects.

Screenshot

PR Checklist

  • All necessary documentation has been adapted.
  • The implemented feature is covered by appropriate, automated tests.
  • Any manual testing that has been done is documented (i.e. what exactly was tested?)

🚀 Preview: Add preview label to enable

@claude claude Bot added the preprocessing Issues related to the preprocessing component label Jun 29, 2026
input_fields,
args,
)
_CONCATENATE_CASES = [

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_CONCATENATE_CASES = [
concatenate_case_definitions = [

Maybe this? To be more consistent with test_case_definitions etc. in the same file

assert res.datum == "DENV-1/Switzerland/mySample/2025"
assert res_insdc.datum == "DENV-1/Switzerland/mySample/2025"
assert res_prefix.datum == "hYF/Switzerland/mySample/2025"
_DISPLAY_NAME_CASES = [

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_DISPLAY_NAME_CASES = [
display_name_case_definitions = [

"submissionId",
"sampleCollectionDate",
]
_DISPLAY_NAME_REGEX = r"^[^\/][^/]*/[^/]+/(?P<identifier>[^/]+)/\d{4}(?:-\d{2}){0,2}$"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not blocking, but idk why claude likes this _UPPERCASE_WITH_LEADING_UNDERSCORE so much. I'm not the biggest fan and it's also inconsistent with other variable definitions in the same file e.g.,

Could we refactor these?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbf capitalising constants is actually pretty common and considered good practice I think, the leading underscore is weird though, I will remove it

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah fair, I don't mind the capitalization either really it's mostly the leading underscore

Comment thread preprocessing/nextclade/tests/test_metadata_processing_functions.py

@maverbiest maverbiest left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this, these tests had really turned into a mess.

I left some small comments but overall I like this approach much more!

if identifier is None:
identifier = parse_identifier_string(submission_id, insdc_ingested, pattern)
if insdc_ingested:
# For INSDC ingested sequences the submissionId is the INSDC accession

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maverbiest I realized now that the logic of the PR should maybe be different? I thought we did not want to fallback to the INSDC accession for ingested sequences

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh hmm I don't remember exactly what we decided on this. I'll check on Slack to see what people's thoughts are

None if has_display_name_separators(str(collector_id)) else str(collector_id)
)
else:
identifier = parse_identifier_string(str(collector_id), str(regex_pattern))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is why the unit test is failing, I think: if regex_pattern is None, str(None) -> "None"

We need to narrow to str | None from ArgValue and then check for None before passing

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(might be wrong though, need to take a closer look)

@maverbiest maverbiest left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're now anyway doing a big refactor of the prepro code in #6537, could we maybe just keep the changes to tests in this PR?

We can then see how we want to update to logic after rebasing the display-name-fallback branch

@anna-parker

Copy link
Copy Markdown
Contributor Author

@maverbiest, yes! you can cherry-pick the test changes to your PR if you would like :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

preprocessing Issues related to the preprocessing component

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants