feat(prepro): refactor processing functions to use RawProcessingResult for less verbose error/warning handling by Copilot · Pull Request #6537 · loculus-project/loculus

Copilot · 2026-05-29T10:18:23Z

resolves #3857

processing_functions.py had ~50 instances of verbose ProcessingAnnotation.from_fields(input_fields, [output_field], AnnotationSourceType.METADATA, message=...) boilerplate. Processing functions don't need to know about annotation structure—only the message matters. The caller (call_function) already has all the context to build the full annotation.

Changes

datatypes.py: Added RawProcessingResult(datum, warnings: list[str], errors: list[str]) and helpers processing_error(msg) / processing_warning(msg, datum)
processing_functions.py:
- call_function now accepts both ProcessingResult and RawProcessingResult, converting the latter by attaching field/type metadata
- All 13 processing functions converted to return RawProcessingResult
- Helper functions (check_latin_characters, missing_taxonomy_service_error, taxonomy_network_error) simplified accordingly
- Removed dead invalid_value_annotation function
- 2208 → 1719 lines (−22%)
Tests: Updated assertions from .errors[0].message → .errors[0] (now plain strings)

Before / After

# Before (6+ lines per error)
return ProcessingResult(
    datum=None, warnings=[],
    errors=[ProcessingAnnotation.from_fields(
        input_fields, [output_field], AnnotationSourceType.METADATA,
        message="Something went wrong.",
    )],
)

# After (1 line)
return processing_error("Something went wrong.")

Screenshot

N/A — backend refactoring only, no UI changes.

PR Checklist

All necessary documentation has been adapted.
The implemented feature is covered by appropriate, automated tests.
Any manual testing that has been done is documented (i.e. what exactly was tested?)
- All 58 preprocessing unit tests pass (excluding test_nextclade_preprocessing.py which requires external nextclade3 binary).

🚀 Preview: https://copilot-refactor-error-wa.loculus.org

…bose error/warning handling - Add RawProcessingResult dataclass with plain string warnings/errors - Add processing_error() and processing_warning() helper functions - Modify call_function to handle both ProcessingResult and RawProcessingResult - Convert all processing functions to return RawProcessingResult - Simplify check_latin_characters and taxonomy helper functions - Remove unused invalid_value_annotation function - Update tests to match new return types (errors/warnings are now strings) This reduces the file from 2208 to 1719 lines by eliminating repetitive ProcessingAnnotation.from_fields(...) boilerplate in every error/warning.

anna-parker · 2026-06-30T07:42:16Z

@claude review

claude · 2026-06-30T07:42:34Z

Claude finished @anna-parker's task in 2m 29s —— View job

Code Review

Read changed files
Analyze for bugs, correctness, and design issues
Post review feedback

This is a clean, well-motivated refactoring. The core idea—separating message construction from annotation metadata attachment—is sound, and the 22% line reduction is genuine rather than cosmetic. A few issues worth addressing:

Issues

1. processing_warning is dead code [datatypes.py:198]

processing_warning is defined in datatypes.py but never imported or used anywhere in the codebase. Either import and use it in the appropriate single-warning sites (e.g. check_authors, build_display_name) or remove it.

Fix this →

2. Unused parameters in check_latin_characters, missing_taxonomy_service_error, and taxonomy_network_error

After the refactoring, input_fields and output_field are accepted by these functions but never used—annotation construction was moved to call_function. The parameters are now dead weight and mislead callers into thinking they affect behaviour.

# check_latin_characters - input_fields and output_field unused
def check_latin_characters(
    authors: str, input_fields: list[str], output_field: str  # ← unused
) -> tuple[list[str], list[str]]:

# missing_taxonomy_service_error - both params unused
def missing_taxonomy_service_error(input_fields: list[str], output_field: str) -> RawProcessingResult:

# taxonomy_network_error - last two params unused
def taxonomy_network_error(subject, action, e, input_fields: list[str], output_field: str) -> RawProcessingResult:

Fix this →

3. Tests bypass call_function — the conversion path is untested

After the refactoring, test assertions changed from .errors[0].message to .errors[0] (plain string). This is correct for direct calls to processing functions (which now return RawProcessingResult), but it means the test suite no longer exercises the conversion path in call_function that wraps each string into a ProcessingAnnotation. If there's a bug in call_function's conversion loop (e.g. wrong AnnotationSourceType, wrong field mapping), no test would catch it.

Consider adding at least one integration-level test that calls ProcessingFunctions.call_function(...) end-to-end and asserts the resulting ProcessingAnnotation.message and field linkage.

Minor observations

parse_date initialises warnings but not errors (line ~432) — intentional since errors are now constructed inline, but slightly inconsistent within the function.

is_variant reads is_above_threshold_result.warnings and .errors directly (lines 1082-1086) — correct since both are now list[str]. Just worth confirming this doesn't silently drop the type-narrowing in a future refactor if someone adds a non-string warning to that inner call.

What's good

The separation of concerns is correct: processing functions no longer need to know about AnnotationSource/AnnotationSourceType—only the message matters to them.
call_function's isinstance(result, RawProcessingResult) check is placed before the isinstance(result, ProcessingResult) check, which is necessary since they're independent types.
build_display_name correctly combines its own list[str] warnings with those from concat_result (also list[str]) before returning.
The RawProcessingResult dataclass docstring clearly communicates the ownership contract.

Co-authored-by: Cornelius Roemer <cornelius.roemer@gmail.com>

Co-authored-by: maverbiest <59956362+maverbiest@users.noreply.github.com>

maverbiest

This is a very nice improvement in readability! I leftsome minor comments here and there but should then be good to merge I think

…ude proper information (#6801) …lude proper information Follow up to #6537 - ensure that all internal errors thrown by prepro (these are related to invalid config, unexpected nextclade format etc.) are properly annotated with the prefix "Internal Error:" and the suffix "Please contact the administrator." ### Screenshot ### PR Checklist - [x] All necessary documentation has been adapted. - [x] The implemented feature is covered by appropriate, automated tests. - [ ] Any manual testing that has been done is documented (i.e. what exactly was tested?) 🚀 Preview: Add `preview` label to enable --------- Co-authored-by: maverbiest <59956362+maverbiest@users.noreply.github.com>

Copilot AI assigned Copilot and corneliusroemer May 29, 2026

Copilot started work on behalf of corneliusroemer May 29, 2026 10:18 View session

Copilot AI linked an issue May 29, 2026 that may be closed by this pull request

Refactor error/warnings to be less verbose in code #6536

Open

Copilot AI changed the title ~~[WIP] Refactor error/warnings to be less verbose in code~~ Refactor processing functions to use RawProcessingResult for less verbose error/warning handling May 29, 2026

Copilot finished work on behalf of corneliusroemer May 29, 2026 10:33

Copilot AI requested a review from corneliusroemer May 29, 2026 10:33

corneliusroemer mentioned this pull request May 29, 2026

Refactor error/warnings to be less verbose in code #6536

Open

Copilot AI added 2 commits June 30, 2026 09:27

Initial plan

8d7830c

anna-parker force-pushed the copilot/refactor-error-warnings-verbosity branch from b0a599d to 030c4fc Compare June 30, 2026 07:31

anna-parker marked this pull request as ready for review June 30, 2026 07:31

anna-parker requested a review from maverbiest June 30, 2026 07:33

format

bcbf406

anna-parker mentioned this pull request Jun 30, 2026

Prepro error functions #6800

Closed

3 tasks

merge conflict

9c7eb5d

anna-parker changed the title ~~Refactor processing functions to use RawProcessingResult for less verbose error/warning handling~~ feat(prepro): refactor processing functions to use RawProcessingResult for less verbose error/warning handling Jun 30, 2026

anna-parker and others added 3 commits June 30, 2026 09:44

fixup

10eb6e4

clean up more

8ecdff0

feat: add test for conversion

0d0023f

Co-authored-by: Cornelius Roemer <cornelius.roemer@gmail.com>

anna-parker mentioned this pull request Jun 30, 2026

feat(prepro): ensure all internal errors are properly logged and include proper information #6801

Merged

3 tasks

fixup

b10a363

anna-parker force-pushed the copilot/refactor-error-warnings-verbosity branch from aadee66 to b10a363 Compare June 30, 2026 08:42

maverbiest mentioned this pull request Jun 30, 2026

feat(prepro): make tests more readable #6790

Open

3 tasks