Skip to content

fix(repository): replace peek_result panic with InfrastructureError#326

Open
SAY-5 wants to merge 1 commit into
aixigo:masterfrom
SAY-5:fix/issue-307-task-result-missing-no-panic
Open

fix(repository): replace peek_result panic with InfrastructureError#326
SAY-5 wants to merge 1 commit into
aixigo:masterfrom
SAY-5:fix/issue-307-task-result-missing-no-panic

Conversation

@SAY-5
Copy link
Copy Markdown

@SAY-5 SAY-5 commented Apr 30, 2026

Summary

Closes #307.

peek_result (api/src/apps/repository.rs) panics via unreachable!() when the database returns a row whose status = 'done' but whose result_success and result_error columns (and the merged-with task's columns) are all NULL. The incident in #307 shows this state does occur in production, the operator's database had ten such rows:

SELECT id, created_at FROM app_task a
WHERE result_success is null AND result_error is null AND a.status = 'done';
-- 10 rows

The unreachable!() therefore reduced PREvant to a panicking task fetcher whenever a corrupt row was peeked.

Fix

Replace the unreachable arm with Err(AppsError::InfrastructureError { error: ... }) so the caller can return a 500, the operator can inspect/repair the row, and the rest of the system keeps running:

_ => Err(AppsError::InfrastructureError {
    error: format!(
        "Task {status_id} is marked done but has neither a success nor an error result stored"
    ),
}),

The error variant is already exported and is the closest semantic match for "the task ran but its outcome wasn't persisted".

Test plan

  • cargo build -p prevant, passes
  • Maintainer verification on the live staging environment (or a manually-corrupted app_task row) that the API now returns 500 with the new error message instead of crashing the worker.

peek_result panicked via unreachable!() when the database returned a
row whose status is 'done' but whose result_success and result_error
columns (and the merged-with task's columns) are all NULL. The
incident in aixigo#307 shows this state does occur in production — the
operator's database had ten such rows after a deployment failure —
and the unreachable!() reduced PREvant to a panicking task fetcher.

Surface the corrupt row as an AppsError::InfrastructureError that
identifies the affected status_id so the caller can return a 500 and
recover, leaving the row in the database for an operator to inspect
or repair instead of taking the whole API down.

Closes aixigo#307.

Signed-off-by: SAY-5 <say.apm35@gmail.com>
@schrieveslaach
Copy link
Copy Markdown
Contributor

schrieveslaach commented May 6, 2026

Thanks for the PR. However, this change might fix only a symptom when peeking into the database. The function store_result, should always create an entry that never leads to the situation.

Your change made me think if there is an issue in the following places (which I already expressed in the issue description):

Please, notice the .ok() calls of Result::ok and I'm suspecting that in some cases values for the database cannot be converted into JSON for some reason and due to .ok() both values are null.

Do you mind to investigate in it more? You could use the test clean_up_back_up_after_deletion and add more cases to it, so that you maybe can craft an error or success result that triggers a row with empty entries.

@SAY-5
Copy link
Copy Markdown
Author

SAY-5 commented May 6, 2026

Agreed that this only catches the symptom, the rows shouldn't exist in the first place. I can extend the test clean_up_back_up_after_deletion with cases that exercise the .ok() paths in store_result (lines 384/402/406) and try to provoke a row with both result columns null. Before I dig in: would you prefer the panic-guard kept as a defensive backstop alongside the root-cause fix, or replaced by it once store_result is hardened?

@schrieveslaach
Copy link
Copy Markdown
Contributor

Reproduce it first and then let's see. BTW, I pushed bdee094 Yesterday which fixes the race condition and added more test cases. See if you can reproduce the issue and then let's discuss about a solution.

@SAY-5
Copy link
Copy Markdown
Author

SAY-5 commented May 7, 2026

Sounds good, I'll pull bdee094, run the new test cases, and try to reproduce the empty-row scenario before proposing a solution. Will report back with findings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Panics for Task Results

2 participants