Skip to content

test(wallet): make TransferPreapproval failure test robust to validator timeout#5988

Draft
samsondav wants to merge 1 commit into
canton-network:mainfrom
Avro-Digital:sam-avro/flaky-transfer-preapproval-timeout
Draft

test(wallet): make TransferPreapproval failure test robust to validator timeout#5988
samsondav wants to merge 1 commit into
canton-network:mainfrom
Avro-Digital:sam-avro/flaky-transfer-preapproval-timeout

Conversation

@samsondav

Copy link
Copy Markdown
Contributor

Make the "Failure to complete TransferPreapproval creation should be handled correctly" wallet test robust to a transient validator HTTP timeout that flakes the log assertion under CI load.

Flake

Observed on the wall-clock-time (0) job: https://github.com/canton-network/splice/actions/runs/27450544466 (job: https://github.com/canton-network/splice/actions/runs/27450544466/job/81436663390)

forEvery failed, because:
  at index 0, Incorrect log level WARN. Expected: ERROR.
  Message: Request to .../api/validator/v0/wallet/transfer-preapproval (POST) resulted in a timeout after 38 seconds.
  Remaining log entries:
  ## ERROR ... HttpCommandException: HTTP 503 Service Unavailable POST at '/api/validator/v0/wallet/transfer-preapproval' ... The server is taking too long to respond

The test pauses validator automation so createTransferPreapproval cannot complete. The intended outcome is an HTTP 429 once the server-side RetryFor.ClientCalls loop in HttpWalletHandler gives up (ABORTED -> TooManyRequests). Under load the validator's HTTP request timeout fired first: HttpErrorHandler.timeoutHandler logged a WARN "resulted in a timeout after 38 seconds" and returned 503 instead. The strict ordered assertion in assertThrowsAndLogsCommandFailures then matched the unexpected WARN entry at index 0 with errorMessage, which requires ERROR level, and failed.

Fix

Assert the failure via assertThrowsAndLogsUnorderedOptional[CommandFailure]: a required ERROR entry matching either "429 Too Many Requests" or "503 Service Unavailable", plus an optional WARN entry matching the server-side "resulted in a timeout". This keeps the requirement that a genuine failure is surfaced and logged as ERROR, while tolerating the transient timeout WARN/503 on a slow host. Mirrors the existing pattern in ScanIntegrationTest and SplitwellUpgradeIntegrationTest.

Observed on PR #5937's CI run. The fix is by inspection; the integration test was not run locally.

…or timeout

The induced-failure path (validator automation paused) is expected to surface
as an HTTP 429 once the server-side retry loop gives up. Under CI load the
validator's HTTP request timeout can fire first: the server logs a WARN
"resulted in a timeout" and returns a 503, so the strict ordered log
assertion saw a WARN entry where it expected the ERROR 429 and failed.

Use assertThrowsAndLogsUnorderedOptional so the test accepts either failure
(429 or the timeout 503, both logged as ERROR) and tolerates the optional
server-side timeout WARN, without weakening the requirement that a genuine
failure is surfaced and logged.

Signed-off-by: Sam Davies <sam@avrofi.com>
@moritzkiefer-da

Copy link
Copy Markdown
Contributor

I believe this is already fixed in #5940. have you still encountered this issue after rebasing to include that change?

@samsondav

Copy link
Copy Markdown
Contributor Author

Thanks Moritz — you're right. #5937 branched ~45 min before #5940 merged and wasn't rebased, so that run didn't include the retry-delay fix. I'll rebase onto main to pick up #5940 and confirm; assuming the flake is gone I'll close this in favor of #5940.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants