Skip to content

Make microservice content resolution resilient to transient 503s #4599

@mhijaziB

Description

@mhijaziB

Reason for the Feature

When resolving content from a C# microservice, transient content fetch failures can currently bubble up as unhandled HttpRequestExceptions.

A BPC customer reported intermittent failures while processing content in a foreach loop. Their code iterates through manifest entries and calls ClientContentInfo.Resolve(). Some entries resolve successfully, then one resolve randomly fails with:

System.Net.Http.HttpRequestException: Response status code does not indicate success: 503 (Service Unavailable)

The current microservice content resolution path eventually calls:

DefaultContentResolver.RequestContent(string uri)
in
microservice/microservice/dbmicroservice/IContentResolver.cs

That method currently uses HttpClient.GetStringAsync(uri) directly, with no retry/backoff handling for transient HTTP failures.

Suggested requirements

  • Add retry/backoff handling to DefaultContentResolver.RequestContent(string uri).
  • Retry only transient HTTP responses, such as:
    • 429 Too Many Requests
    • 500 Internal Server Error
    • 502 Bad Gateway
    • 503 Service Unavailable
    • 504 Gateway Timeout
  • Respect Retry-After headers when available.
  • Use exponential backoff with jitter to avoid retry bursts.
  • Preserve fail-fast behavior for non-transient failures such as 400, 401, 403, and 404.
  • Improve final error reporting so failures include:
    • content URI
    • HTTP status code
    • attempt count
    • ideally the content id, when available from the cache/content layer
  • Add tests covering:
    • 503 followed by success
    • repeated 503 eventually failing with a useful error
    • non-transient failures not being retried

The ask is not to hide real failures, but to make the content resolution path resilient to short-lived service unavailability and to provide better diagnostics when retries are exhausted.

Alternatives or Workarounds

Clients can work around this today by adding their own retry logic around each ClientContentInfo.Resolve() call.

Additional Context*
Observed client error:

HandleContentPublish: unhandled exception:
System.Net.Http.HttpRequestException: Response status code does not indicate success: 503 (Service Unavailable).
   at Mythical.Platform.Service.ContentEventSubscriber.CreateOrUpdateAllItemTypes()

Client's code:

var manifest = await _contentApi.GetManifest("t:items");

foreach (var clientContentInfo in manifest.entries)
{
    var contentInfo = await clientContentInfo.Resolve();

    // process content
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request
    No fields configured for Feature.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions