Skip to content

feat: retry rejected-for-resources records with higher limits #117

@rorybyrne

Description

@rorybyrne

Hooks can reject records that are too large for the current memory limit (e.g. a 38 MB CIF file that needs >2 GB to process). Currently these records are permanently rejected.

We should support a structured rejection code that signals "this record is valid but needs more resources":

{"id": "10PX", "reason": "Structure too large (37.8 MB CIF)", "code": "resource_limit", "hint": {"memory": "4g"}}

The ingest pipeline could then:

  1. Collect all resource_limit rejections from a batch
  2. Re-run them in a separate container with higher limits (single-record batches for isolation)
  3. Fresh container = no accumulated memory from prior records

This also addresses the minor memory leak pattern where baseline memory drifts upward over a batch — large structures get a fresh container with full headroom.

Implementation notes

  • Add code and hint fields to the rejections.jsonl contract
  • PublishBatch handler collects resource_limit rejections and emits a retry event
  • New RetryLargeRecords handler processes them one-by-one with increased limits
  • Hook authors opt in by raising Reject("...", code="resource_limit", hint={"memory": "4g"}) instead of just Reject("...")

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureNew functionality

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions