Skip to content

feat(prepro): ensure all internal errors are properly logged and include proper information#6801

Merged
anna-parker merged 6 commits into
copilot/refactor-error-warnings-verbosityfrom
internal_errors
Jul 2, 2026
Merged

feat(prepro): ensure all internal errors are properly logged and include proper information#6801
anna-parker merged 6 commits into
copilot/refactor-error-warnings-verbosityfrom
internal_errors

Conversation

@anna-parker

@anna-parker anna-parker commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

…lude proper information

Follow up to #6537 - ensure that all internal errors thrown by prepro (these are related to invalid config, unexpected nextclade format etc.) are properly annotated with the prefix "Internal Error:" and the suffix "Please contact the administrator."

Screenshot

PR Checklist

  • All necessary documentation has been adapted.
  • The implemented feature is covered by appropriate, automated tests.
  • Any manual testing that has been done is documented (i.e. what exactly was tested?)

🚀 Preview: Add preview label to enable

@claude claude Bot added the preprocessing Issues related to the preprocessing component label Jun 30, 2026
@claude

claude Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

This PR may be related to: #2040 (Fail silo-preprocessing container when command fails so we notice errors)

@anna-parker anna-parker marked this pull request as ready for review June 30, 2026 08:21
@claude

claude Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Claude finished @anna-parker's task in 2m 41s —— View job


Code Review

  • Gather context
  • Read changed files
  • Analyze code quality, security, performance, and testing
  • Post inline comments
  • Summarize findings

Overall this is a clean, well-scoped refactor. The raw_internal_error helper is a good abstraction — it enforces a consistent message format (Internal Error. … Please contact the administrator.) and ensures every internal error is logged. The regex_error return-type fix (was str, now RawProcessingResult) removes a double-wrap at all call sites. Error messages now include the exception value, which makes debugging much easier.

Two things to look at:

  1. dateRangeString error path is inconsistent (inline comment on line 732): uses errors.extend(raw_internal_error(...).errors) and continues, while all other internal error paths in concatenate do an early return. This was pre-existing behaviour, but the new code makes the asymmetry more visible. Worth a conscious decision (abort early, or keep the fallback-and-continue with a clarifying comment).

  2. raw_internal_error called just for its .errors (inline comment on line 778): in the ValueError catch, raw_internal_error(...) is called but only .errors is used, which is an awkward pattern for a function designed to be returned directly. Logging explicitly and building the string inline would be clearer.

Test updates look correct — the new assertions match the updated error message format.

@anna-parker

Copy link
Copy Markdown
Contributor Author

@claude review

@claude

claude Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Claude finished @anna-parker's task in 1m 51s —— View job


Code Review

  • Gather context
  • Read changed files
  • Review processing_functions.py
  • Review test files
  • Post final review

Overall

Good cleanup. The raw_internal_error() helper reduces repetition, ensures consistent logging, and the richer error messages (with actual field names, accession versions, exception details) are a real improvement. The regex_error return-type fix (was str, now RawProcessingResult) was a latent type bug — good catch.


Findings

1. Awkward raw_internal_error().errors pattern — minor smell

In concatenate, two places call raw_internal_error(...) just to extract .errors:

# ~line 728
errors.extend(
    raw_internal_error(
        f"dateRangeString field '{order[i]}' has an unexpected format: ..."
    ).errors
)

# ~line 773
errors=errors + raw_internal_error(
    f"Concatenation failed for '{output_field}' with error: {e} ..."
).errors,

This works correctly (the logger.error() side-effect fires as intended), but it's a code smell: raw_internal_error is named and documented as returning a RawProcessingResult, not as a logging utility with extractable errors. A reader has to trace into the helper to understand why .errors is being accessed.

A cleaner alternative would be to extract a tiny private helper:

def _internal_error_message(message: str) -> str:
    full = f"Internal Error. {message} Please contact the administrator."
    logger.error(full)
    return full

def raw_internal_error(message: str) -> RawProcessingResult:
    return processing_error(_internal_error_message(message))

Then the extend call becomes:

errors.append(_internal_error_message(f"dateRangeString field ..."))

Not a blocker, but worth considering. Fix this →


2. dateRangeString error doesn't early-return — intentional but worth a comment

At ~line 726, when the date string has too many / separators, the code appends an error and continues, using fallback_value for that slot. All other config/format errors in concatenate early-return. If the current behaviour (continue with fallback rather than abort the whole concatenation) is deliberate, a brief inline comment would prevent someone from "fixing" it to early-return in the future.


3. ValueError catch now preserves previously-accumulated errors — improvement

The old code lost any errors accumulated before the ValueError. The new code correctly merges them:

errors=errors + raw_internal_error(...).errors,

Good.


4. _call_preprocessing_function still uses logger.exception() — correct

The general function-dispatch handler at ~line 335 still calls logger.exception(message) directly (which includes a stack trace) rather than routing through raw_internal_error. This is the right call — stack traces are valuable for unexpected exceptions from user-configured functions.


5. Test assertion is now tighter — good

# before
assert "Was unable to process phenotype values" in invalid.errors[0].message
# after
assert "Internal Error. Phenotype value processing failed: invalid syntax" in invalid.errors[0].message

The tighter assertion catches regressions in the message format. The other test file change is just adding a type annotation.


Summary

No blocking issues. The main suggestion is the raw_internal_error().errors pattern (finding 1), and optionally a comment on the dateRangeString non-early-return (finding 2). Everything else looks correct.

Comment thread preprocessing/nextclade/src/loculus_preprocessing/processing_functions.py Outdated
@anna-parker anna-parker changed the title feat(prepro): ensure all internal errors are properly logged and inc… feat(prepro): ensure all internal errors are properly logged and include proper information Jun 30, 2026
Comment thread preprocessing/nextclade/src/loculus_preprocessing/processing_functions.py Outdated
Comment thread preprocessing/nextclade/src/loculus_preprocessing/processing_functions.py Outdated
Co-authored-by: maverbiest <59956362+maverbiest@users.noreply.github.com>
@anna-parker anna-parker merged commit 5f649af into copilot/refactor-error-warnings-verbosity Jul 2, 2026
35 of 36 checks passed
@anna-parker anna-parker deleted the internal_errors branch July 2, 2026 06:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

preprocessing Issues related to the preprocessing component

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants