Description
Problem Statement
Currently, the push publishing system emits failure events that do not contain information about WHY the failure occurred. This makes it difficult for event subscribers, monitoring systems, and administrators to:
- Distinguish between transient network issues (which may auto-resolve) and authentication/authorization failures (which require manual intervention)
- Implement appropriate retry logic based on failure type
- Generate meaningful alerts and notifications
- Troubleshoot publishing issues efficiently
Current Behavior
The system currently emits 3 event types:
AllPushPublishEndpointsSuccessEvent - All endpoints succeeded
AllPushPublishEndpointsFailureEvent - ALL endpoints failed
SinglePushPublishEndpointFailureEvent - Some endpoints failed
The failure events only contain a list of assets, with NO information about:
- WHY the endpoint(s) failed
- Which specific endpoints failed (in the case of
SinglePushPublishEndpointFailureEvent)
- Whether the failure is retryable or requires manual intervention
- What action should be taken
Distinct Failure Scenarios Not Captured
The code in PushPublisher.java and PublisherQueueJob.java currently handles multiple distinct scenarios that are logged internally but NOT included in events:
- Authentication Failure (HTTP 401) - Invalid or expired token
- Authorization Failure (HTTP 403) - License issues on receiving endpoint
- Network Connectivity Errors - Connection timeout, DNS failure, endpoint unreachable
- Server Errors (HTTP 500, 503) - Receiver-side issues
- Bundle Send Failures - Other HTTP status codes
While these are tracked internally with specific PublishAuditStatus codes (e.g., INVALID_TOKEN, LICENSE_REQUIRED), this information is not propagated to the event system.
Proposed Enhancement
Enhance the existing failure events to include detailed failure information for each endpoint. This maintains backward compatibility while providing actionable context.
Acceptance Criteria
All failure events include EndpointFailureDetail map with complete information
[ ] Existing event subscribers continue to work without modifications
[ ] New subscribers can distinguish between authentication, network, and server failures
[ ] Failure category correctly indicates whether failure is retryable
[ ] HTTP status codes are captured for all REST API failures
[ ] Unit tests cover all failure scenarios and verify event contents
[ ] Documentation includes examples of handling different failure categories
Priority
Medium
Additional Context
https://helpdesk.dotcms.com/a/tickets/34364
Description
Problem Statement
Currently, the push publishing system emits failure events that do not contain information about WHY the failure occurred. This makes it difficult for event subscribers, monitoring systems, and administrators to:
Current Behavior
The system currently emits 3 event types:
AllPushPublishEndpointsSuccessEvent- All endpoints succeededAllPushPublishEndpointsFailureEvent- ALL endpoints failedSinglePushPublishEndpointFailureEvent- Some endpoints failedThe failure events only contain a list of assets, with NO information about:
SinglePushPublishEndpointFailureEvent)Distinct Failure Scenarios Not Captured
The code in
PushPublisher.javaandPublisherQueueJob.javacurrently handles multiple distinct scenarios that are logged internally but NOT included in events:While these are tracked internally with specific
PublishAuditStatuscodes (e.g.,INVALID_TOKEN,LICENSE_REQUIRED), this information is not propagated to the event system.Proposed Enhancement
Enhance the existing failure events to include detailed failure information for each endpoint. This maintains backward compatibility while providing actionable context.
Acceptance Criteria
All failure events include EndpointFailureDetail map with complete information
[ ] Existing event subscribers continue to work without modifications
[ ] New subscribers can distinguish between authentication, network, and server failures
[ ] Failure category correctly indicates whether failure is retryable
[ ] HTTP status codes are captured for all REST API failures
[ ] Unit tests cover all failure scenarios and verify event contents
[ ] Documentation includes examples of handling different failure categories
Priority
Medium
Additional Context
https://helpdesk.dotcms.com/a/tickets/34364