Skip to content

Enhancement: Add Failure Details to Push Publishing Events #34356

@syedATdot

Description

@syedATdot

Description

Problem Statement

Currently, the push publishing system emits failure events that do not contain information about WHY the failure occurred. This makes it difficult for event subscribers, monitoring systems, and administrators to:

  • Distinguish between transient network issues (which may auto-resolve) and authentication/authorization failures (which require manual intervention)
  • Implement appropriate retry logic based on failure type
  • Generate meaningful alerts and notifications
  • Troubleshoot publishing issues efficiently

Current Behavior

The system currently emits 3 event types:

  1. AllPushPublishEndpointsSuccessEvent - All endpoints succeeded
  2. AllPushPublishEndpointsFailureEvent - ALL endpoints failed
  3. SinglePushPublishEndpointFailureEvent - Some endpoints failed

The failure events only contain a list of assets, with NO information about:

  • WHY the endpoint(s) failed
  • Which specific endpoints failed (in the case of SinglePushPublishEndpointFailureEvent)
  • Whether the failure is retryable or requires manual intervention
  • What action should be taken

Distinct Failure Scenarios Not Captured

The code in PushPublisher.java and PublisherQueueJob.java currently handles multiple distinct scenarios that are logged internally but NOT included in events:

  1. Authentication Failure (HTTP 401) - Invalid or expired token
  2. Authorization Failure (HTTP 403) - License issues on receiving endpoint
  3. Network Connectivity Errors - Connection timeout, DNS failure, endpoint unreachable
  4. Server Errors (HTTP 500, 503) - Receiver-side issues
  5. Bundle Send Failures - Other HTTP status codes

While these are tracked internally with specific PublishAuditStatus codes (e.g., INVALID_TOKEN, LICENSE_REQUIRED), this information is not propagated to the event system.

Proposed Enhancement

Enhance the existing failure events to include detailed failure information for each endpoint. This maintains backward compatibility while providing actionable context.

Acceptance Criteria

All failure events include EndpointFailureDetail map with complete information
[ ] Existing event subscribers continue to work without modifications
[ ] New subscribers can distinguish between authentication, network, and server failures
[ ] Failure category correctly indicates whether failure is retryable
[ ] HTTP status codes are captured for all REST API failures
[ ] Unit tests cover all failure scenarios and verify event contents
[ ] Documentation includes examples of handling different failure categories

Priority

Medium

Additional Context

https://helpdesk.dotcms.com/a/tickets/34364

Metadata

Metadata

Assignees

No fields configured for Feature.

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions