Skip to content

feat(prepro): refactor processing functions to use RawProcessingResult for less verbose error/warning handling#6537

Open
Copilot wants to merge 18 commits into
mainfrom
copilot/refactor-error-warnings-verbosity
Open

feat(prepro): refactor processing functions to use RawProcessingResult for less verbose error/warning handling#6537
Copilot wants to merge 18 commits into
mainfrom
copilot/refactor-error-warnings-verbosity

Conversation

Copilot AI commented May 29, 2026

Copy link
Copy Markdown
Contributor

resolves #3857

processing_functions.py had ~50 instances of verbose ProcessingAnnotation.from_fields(input_fields, [output_field], AnnotationSourceType.METADATA, message=...) boilerplate. Processing functions don't need to know about annotation structure—only the message matters. The caller (call_function) already has all the context to build the full annotation.

Changes

  • datatypes.py: Added RawProcessingResult(datum, warnings: list[str], errors: list[str]) and helpers processing_error(msg) / processing_warning(msg, datum)
  • processing_functions.py:
    • call_function now accepts both ProcessingResult and RawProcessingResult, converting the latter by attaching field/type metadata
    • All 13 processing functions converted to return RawProcessingResult
    • Helper functions (check_latin_characters, missing_taxonomy_service_error, taxonomy_network_error) simplified accordingly
    • Removed dead invalid_value_annotation function
    • 2208 → 1719 lines (−22%)
  • Tests: Updated assertions from .errors[0].message.errors[0] (now plain strings)

Before / After

# Before (6+ lines per error)
return ProcessingResult(
    datum=None, warnings=[],
    errors=[ProcessingAnnotation.from_fields(
        input_fields, [output_field], AnnotationSourceType.METADATA,
        message="Something went wrong.",
    )],
)

# After (1 line)
return processing_error("Something went wrong.")

Screenshot

N/A — backend refactoring only, no UI changes.

PR Checklist

  • All necessary documentation has been adapted.
  • The implemented feature is covered by appropriate, automated tests.
  • Any manual testing that has been done is documented (i.e. what exactly was tested?)
    • All 58 preprocessing unit tests pass (excluding test_nextclade_preprocessing.py which requires external nextclade3 binary).

🚀 Preview: https://copilot-refactor-error-wa.loculus.org

Copilot AI linked an issue May 29, 2026 that may be closed by this pull request
Copilot AI changed the title [WIP] Refactor error/warnings to be less verbose in code Refactor processing functions to use RawProcessingResult for less verbose error/warning handling May 29, 2026
Copilot AI requested a review from corneliusroemer May 29, 2026 10:33
Copilot AI added 2 commits June 30, 2026 09:27
…bose error/warning handling

- Add RawProcessingResult dataclass with plain string warnings/errors
- Add processing_error() and processing_warning() helper functions
- Modify call_function to handle both ProcessingResult and RawProcessingResult
- Convert all processing functions to return RawProcessingResult
- Simplify check_latin_characters and taxonomy helper functions
- Remove unused invalid_value_annotation function
- Update tests to match new return types (errors/warnings are now strings)

This reduces the file from 2208 to 1719 lines by eliminating repetitive
ProcessingAnnotation.from_fields(...) boilerplate in every error/warning.
@anna-parker anna-parker force-pushed the copilot/refactor-error-warnings-verbosity branch from b0a599d to 030c4fc Compare June 30, 2026 07:31
@anna-parker anna-parker marked this pull request as ready for review June 30, 2026 07:31
@anna-parker anna-parker requested a review from maverbiest June 30, 2026 07:33
@anna-parker anna-parker mentioned this pull request Jun 30, 2026
3 tasks
@anna-parker anna-parker changed the title Refactor processing functions to use RawProcessingResult for less verbose error/warning handling feat(prepro): refactor processing functions to use RawProcessingResult for less verbose error/warning handling Jun 30, 2026
@anna-parker

Copy link
Copy Markdown
Contributor

@claude review

@claude

claude Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Claude finished @anna-parker's task in 2m 29s —— View job


Code Review

  • Read changed files
  • Analyze for bugs, correctness, and design issues
  • Post review feedback

This is a clean, well-motivated refactoring. The core idea—separating message construction from annotation metadata attachment—is sound, and the 22% line reduction is genuine rather than cosmetic. A few issues worth addressing:

Issues

1. processing_warning is dead code [datatypes.py:198]

processing_warning is defined in datatypes.py but never imported or used anywhere in the codebase. Either import and use it in the appropriate single-warning sites (e.g. check_authors, build_display_name) or remove it.

Fix this →

2. Unused parameters in check_latin_characters, missing_taxonomy_service_error, and taxonomy_network_error

After the refactoring, input_fields and output_field are accepted by these functions but never used—annotation construction was moved to call_function. The parameters are now dead weight and mislead callers into thinking they affect behaviour.

# check_latin_characters - input_fields and output_field unused
def check_latin_characters(
    authors: str, input_fields: list[str], output_field: str  # ← unused
) -> tuple[list[str], list[str]]:

# missing_taxonomy_service_error - both params unused
def missing_taxonomy_service_error(input_fields: list[str], output_field: str) -> RawProcessingResult:

# taxonomy_network_error - last two params unused
def taxonomy_network_error(subject, action, e, input_fields: list[str], output_field: str) -> RawProcessingResult:

Fix this →

3. Tests bypass call_function — the conversion path is untested

After the refactoring, test assertions changed from .errors[0].message to .errors[0] (plain string). This is correct for direct calls to processing functions (which now return RawProcessingResult), but it means the test suite no longer exercises the conversion path in call_function that wraps each string into a ProcessingAnnotation. If there's a bug in call_function's conversion loop (e.g. wrong AnnotationSourceType, wrong field mapping), no test would catch it.

Consider adding at least one integration-level test that calls ProcessingFunctions.call_function(...) end-to-end and asserts the resulting ProcessingAnnotation.message and field linkage.

Minor observations

parse_date initialises warnings but not errors (line ~432) — intentional since errors are now constructed inline, but slightly inconsistent within the function.

is_variant reads is_above_threshold_result.warnings and .errors directly (lines 1082-1086) — correct since both are now list[str]. Just worth confirming this doesn't silently drop the type-narrowing in a future refactor if someone adds a non-string warning to that inner call.

What's good

  • The separation of concerns is correct: processing functions no longer need to know about AnnotationSource/AnnotationSourceType—only the message matters to them.
  • call_function's isinstance(result, RawProcessingResult) check is placed before the isinstance(result, ProcessingResult) check, which is necessary since they're independent types.
  • build_display_name correctly combines its own list[str] warnings with those from concat_result (also list[str]) before returning.
  • The RawProcessingResult dataclass docstring clearly communicates the ownership contract.

anna-parker and others added 3 commits June 30, 2026 09:44
Co-authored-by: Cornelius Roemer <cornelius.roemer@gmail.com>
@anna-parker anna-parker force-pushed the copilot/refactor-error-warnings-verbosity branch from aadee66 to b10a363 Compare June 30, 2026 08:42
Comment thread preprocessing/nextclade/src/loculus_preprocessing/processing_functions.py Outdated
Comment thread preprocessing/nextclade/src/loculus_preprocessing/processing_functions.py Outdated
Comment thread preprocessing/nextclade/src/loculus_preprocessing/processing_functions.py Outdated
Comment thread preprocessing/nextclade/src/loculus_preprocessing/datatypes.py Outdated
anna-parker and others added 2 commits June 30, 2026 11:59
Co-authored-by: maverbiest <59956362+maverbiest@users.noreply.github.com>
@anna-parker anna-parker requested a review from maverbiest June 30, 2026 10:06
Comment thread preprocessing/nextclade/src/loculus_preprocessing/processing_functions.py Outdated
Comment thread preprocessing/nextclade/tests/test_host_name_validation.py Outdated

@maverbiest maverbiest left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very nice improvement in readability! I leftsome minor comments here and there but should then be good to merge I think

maverbiest

This comment was marked as duplicate.

anna-parker and others added 5 commits July 2, 2026 08:41
…ude proper information (#6801)

…lude proper information

Follow up to #6537 -
ensure that all internal errors thrown by prepro (these are related to
invalid config, unexpected nextclade format etc.) are properly annotated
with the prefix "Internal Error:" and the suffix "Please contact the
administrator."

### Screenshot

### PR Checklist
- [x] All necessary documentation has been adapted.
- [x] The implemented feature is covered by appropriate, automated
tests.
- [ ] Any manual testing that has been done is documented (i.e. what
exactly was tested?)

🚀 Preview: Add `preview` label to enable

---------

Co-authored-by: maverbiest <59956362+maverbiest@users.noreply.github.com>
@anna-parker anna-parker requested a review from maverbiest July 2, 2026 07:08
@anna-parker anna-parker added the preview Triggers a deployment to argocd label Jul 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

preview Triggers a deployment to argocd

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor error/warnings to be less verbose in code Clearing date filters from the 'active filters' doesn't reset the filter input fields.

4 participants