Summary
Guardian’s current canonicalization flow runs as an in-process background worker inside the server. That works for a single instance, but it becomes a bottleneck and a correctness risk when scaling Guardian horizontally across multiple machines.
We should separate canonicalization from the main request-serving path so multiple Guardian API instances can accept traffic, while a dedicated canonicalization worker is responsible for finalizing candidate deltas.
Problem
Today, canonicalization is driven by per-instance polling and account metadata flags. This creates a few scaling and coordination problems:
- API instances and canonicalization logic are tightly coupled.
- Multi-instance deployments risk duplicate processing and race conditions.
- Candidate creation and canonicalization scheduling are not modeled as a durable shared job flow.
- Finalization touches multiple pieces of state and should be coordinated more explicitly.
Goal
Introduce a high-level architecture where:
- Guardian API instances only validate and persist candidate deltas.
- Candidate creation also creates a durable canonicalization task in shared storage.
- A separate canonicalization worker/service consumes those tasks and performs finalization.
- Canonicalization remains sequential per account and safe across restarts.
- The system can support multiple Guardian API instances without relying on in-process background jobs.
Expected outcome
After this change, we should be able to run:
- many Guardian API nodes for request handling
- one dedicated canonicalization worker initially
- a path to safely support multiple workers later through proper claiming/locking semantics
Acceptance criteria
- Canonicalization execution is no longer tied to every Guardian API process.
- Candidate persistence and canonicalization task creation are coordinated durably.
- A worker can recover unfinished canonicalization tasks after restart.
- Finalization updates are applied consistently and do not leave partially completed state.
- Per-account ordering is preserved during canonicalization.
- Failed canonicalizations end in an explicit terminal or retryable state.
- The design is validated for multi-instance deployment with Postgres/shared storage.
Summary
Guardian’s current canonicalization flow runs as an in-process background worker inside the server. That works for a single instance, but it becomes a bottleneck and a correctness risk when scaling Guardian horizontally across multiple machines.
We should separate canonicalization from the main request-serving path so multiple Guardian API instances can accept traffic, while a dedicated canonicalization worker is responsible for finalizing candidate deltas.
Problem
Today, canonicalization is driven by per-instance polling and account metadata flags. This creates a few scaling and coordination problems:
Goal
Introduce a high-level architecture where:
Expected outcome
After this change, we should be able to run:
Acceptance criteria