diff --git a/.claude/skills/gh-pull-request/SKILL.md b/.claude/skills/gh-pull-request/SKILL.md
index 657cd50c42e3..f26e76ad202d 100644
--- a/.claude/skills/gh-pull-request/SKILL.md
+++ b/.claude/skills/gh-pull-request/SKILL.md
@@ -150,3 +150,35 @@ EOF
 - Add `copilot` as a reviewer: `gh pr edit <number> --add-reviewer copilot`
 - Do NOT add AI assistant as co-author. Code responsibility is on the committer's hands.
 - Return the PR URL when done.
+
+## After the PR is merged
+
+Once the PR is merged, sync the default branch and clean up the feature branch:
+
+```bash
+# 1. Prune stale remote refs. GitHub auto-deletes the PR's branch on merge, so
+#    the remote feature branch is usually already gone; --prune removes the
+#    dangling local tracking ref.
+git fetch origin --prune
+
+# 2. Switch back to the default branch and fast-forward it to include the merge.
+git checkout master
+git pull --ff-only origin master
+
+# 3. Confirm the change actually landed in master before deleting anything —
+#    `git log --oneline -1` should show the merge/squash commit with the PR
+#    number, or grep for a symbol the PR introduced.
+git log --oneline -1
+
+# 4. Delete the local feature branch. SkyWalking SQUASH-merges PRs, so the
+#    feature branch's commit is NOT an ancestor of master (master gets a new
+#    squash commit instead). `git branch -d` therefore reports "not fully
+#    merged" — that is expected, not an error. After confirming the content is
+#    in master (step 3), force-delete:
+git branch -d <branch> 2>/dev/null || git branch -D <branch>
+```
+
+Notes:
+- `git branch -d` failing with "not fully merged" on a squash-merged PR is normal — the squash commit has a different SHA than the feature commit. Verify via step 3, then `-D`.
+- If the remote branch was not auto-deleted (some repo settings), remove it explicitly: `git push origin --delete <branch>`.
+- Do NOT skip step 3. Force-deleting a local branch whose work didn't actually merge loses it.
diff --git a/CLAUDE.md b/CLAUDE.md
index 7503cdf517e8..99a004e7964e 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -328,7 +328,7 @@ Actions owned by `actions/*` (GitHub), `github/*`, and `apache/*` are always all
 13. **`moduleManager.find(X.NAME)` requires `X.NAME` in `requiredModules()`**: every call to `moduleManager.find(SomeModule.NAME)` (direct or through a helper) must have `SomeModule.NAME` in the provider's `requiredModules()` array. Missing declarations cause runtime exceptions the first time the code path fires — not at module boot. Wrapping the call in `try { ... } catch (Throwable)` is NOT a substitute; declare the module and keep the try/catch only for defensive handling of transient provider outages. When auditing a branch, grep for `moduleManager.find(` across the touched module and verify each target name appears in `requiredModules()`. Example modules that frequently catch teams out: `AlarmModule` (used by alarm-kernel reset), `LogAnalyzerModule` (used by LAL factory lookup).
 14. **Don't look up `ClusterModule` services directly**: the `ClusterModule` (ZooKeeper / K8s / Nacos coordination) exposes `ClusterRegister` / `ClusterNodesQuery` / `ClusterCoordinator`. Most receiver / analyzer modules don't declare `ClusterModule` in `requiredModules()`, so calling `moduleManager.find(ClusterModule.NAME)` will throw at runtime. Instead, go through `CoreModule`'s `RemoteClientManager` service — it's already populated by the cluster module and exposes the peer list every OAP needs. If a module genuinely needs cluster-coordinator primitives, declare `ClusterModule.NAME` in `requiredModules()` explicitly.
 15. **No `ThreadLocal` side-channels to hijack downstream behaviour**: routing a caller's intent through a `ThreadLocal` that downstream code reads (e.g., `if (PeerMode.isActive()) skipSomething()`) is almost always the wrong answer — it creates invisible coupling between far-apart code paths, leaks across async hand-offs (executors, gRPC threads, Armeria event loops), and makes the behaviour impossible to understand from a method signature. The correct fix is almost always to **extend the interface** — add a parameter, a new method, a new mode enum that appears in the signature. Rare exceptions: propagating OpenTelemetry context where the whole industry has standardised on `ThreadLocal`, or security principals enforced by a framework. In all other cases, prefer an explicit API extension, even if it costs more lines.
-16. **BanyanDB schema-visibility: fence on `mod_revision`, do NOT poll metadata**: every BanyanDB Create / Update / Delete returns an etcd `mod_revision` (0 on a delete that didn't record a tombstone). After firing DDL, fence on `BanyanDBClient.getSchemaWatcher().awaitRevisionApplied(maxRev, timeout)` before unparking dispatch / firing data writes — this blocks until every data node has caught up, which the registry's read-after-write does not guarantee. For deletes that returned `mod_revision == 0`, fall back to `awaitSchemaDeleted(SchemaKey, timeout)`. The previous "poll `findMeasure` until you can read your own write" idiom existed before the `SchemaBarrierService` proto landed and has been replaced — do not reintroduce it. JDBC and ES are synchronous-DDL on the coordinator so they don't need a fence.
+16. **BanyanDB schema-visibility: fence on `mod_revision`, do NOT poll metadata**: schema no longer lives in etcd — etcd was removed. It is now stored as property documents in BanyanDB's own `_schema` store and propagated to each data node's in-memory cache **asynchronously** (a `SchemaUpdateService.WatchSchemas` gRPC stream + a 30s reconcile sync, see `banyand/metadata/schema/property/client.go`). DDL `Create` / `Update` / `Delete` broadcast only to the schema-server (`ROLE_META`) nodes and **return immediately with a `mod_revision`** — now a client-stamped `time.Now().UnixNano()` timestamp, NOT an etcd index (which is why it still changes on every content-changing DDL; it is `0` on a delete that didn't record a tombstone) — **without waiting for data nodes**. A data write whose schema hasn't propagated yet is dropped at the data-node executor (`cannot find measure definition`, logged + skipped, `banyand/measure/write_standalone.go`), so the fence is still required — arguably more than under the old etcd-watch model. After firing DDL, fence on `BanyanDBClient.getSchemaWatcher().awaitRevisionApplied(maxRev, timeout)` before unparking dispatch / firing data writes — this blocks (liaison-side, all alive nodes, bounded by the timeout) until every data node's `notifiedModRevision` watermark reaches `maxRev`, which the registry's read-after-write does not guarantee. For deletes that returned `mod_revision == 0`, fall back to `awaitSchemaDeleted(SchemaKey, timeout)`. The previous "poll `findMeasure` until you can read your own write" idiom predates the `SchemaBarrierService` proto and has been replaced — do not reintroduce it. The `SchemaWatcher` class is in OAP's **in-tree** `library-banyandb-client` (`org.apache.skywalking.library.banyandb.v1.client.SchemaWatcher`); the standalone `skywalking-banyandb-java-client` repo does NOT have these symbols, so don't conclude the API is gone from grepping it. JDBC and ES are synchronous-DDL on the coordinator so they don't need a fence.
 
 ## Analysis and Design Principles
 
diff --git a/docs/en/changes/changes.md b/docs/en/changes/changes.md
index c11a51d92582..0de690baf9c7 100644
--- a/docs/en/changes/changes.md
+++ b/docs/en/changes/changes.md
@@ -242,6 +242,12 @@
   admin-host only" entry above for the public REST retirement.
 
 #### OAP Server
+* Batch the BanyanDB schema fence per runtime-rule apply. A runtime-rule file changes dozens of rules at once, but the post-DDL fence (`SchemaWatcher.awaitRevisionApplied`) ran once per metric/downsampling, so a large file did `K×M` sequential ≤2s fences — on a laggy cluster that overran the apply's REST budget. The main-node apply path now uses `StorageManipulationOpt.withSchemaChangeDeferredFence()`: the installer records each resource's `mod_revision` without fencing and registers a single flush that the apply runs once on the file's max revision, collapsing the whole file to one barrier. The flush is one-shot — a reconciler tick reuses one opt across every rule file, so after a file flushes, the closure and accumulated revision reset and each file fences on its own DDL only. Drops still fence inline on the dropped resource's own delete revision — or, when that delete recorded no tombstone (`mod_revision == 0`), on a key-based deletion barrier (`AwaitSchemaDeleted`) — never on the shared opt's cumulative revision, so a tombstone-less delete in a multi-file tick is still confirmed removed. On the operator REST apply the single create/update fence runs on a configurable, generous budget (default 180s) in the background **before** the rule row is persisted and dispatch resumes — it gates the persist + local commit + peer resume so the durable commit point is only reached once the schema is confirmed cluster-wide, and writes never resume against an un-propagated schema (see the apply-status entry below); the reconciler tick keeps the short inline 2s fence (a background reconcile must not wait minutes per file). Peer / `withoutSchemaChange` applies are unaffected (no fence).
+* Add a runtime-rule apply-status query. The cluster main now tracks each structural apply through a phase machine (`SchemaApplyCoordinator`: pending → DDL → fencing → rolling-out → applied, with `degraded` for a committed-but-unconfirmed apply — the cluster schema fence did not confirm within the timeout, in which case the lagging data-node ids are surfaced as `fenceLaggards` and dispatch is resumed anyway, or the local commit-tail threw — and `failed` carrying the specific reason). The schema fence runs on a configurable, generous budget (`receiver-runtime-rule.deferredFenceTimeoutSeconds`, default 180s) and **gates everything durable or visible**: because an un-propagated write is silently dropped at the data node, the order after a successful DDL is suspend → DDL → **fence → persist → commit → resume**. The rule row (the durable commit point) is written only AFTER the fence confirms, so "durable" implies "schema propagated cluster-wide" — a main crash before persist leaves no row (peers/crash-recovery stay safely on the old content; the orphaned measure is inert), and any durable row is guaranteed fence-confirmed, so convergence never resumes dispatch against an unpropagated schema. The fence + persist + resume run in the background so they never block the HTTP response — `POST /addOrUpdate` returns its `applyId` immediately at `fencing` (accepted, not yet durable; dispatch for that rule still paused — a clean gap, not dropped writes), and the operator polls `GET /runtime/rule/status` to watch `fencing → rolling-out → applied` (or `degraded`/`failed`); on a genuine laggard, dispatch resumes after the budget so one stuck node can't park the metric forever. A `GetApplyStatus` admin-internal gRPC served by the main backs the query — by `applyId`, or by `catalog`+`name` (+ optional `contentHash`, the durable identity) once the handle is gone after a page refresh. When the live status is gone (apply-id evicted, main restarted, or the main is unreachable), the query degrades to the durable rule row: a matching `ACTIVE` row reports `applied` derived from the content hash (a durable row is, by the fence-then-persist order, already propagation-confirmed). Non-main nodes route the read to the deterministic main; status is in-memory by design, with the content hash reconstructing truth after a restart.
+* Push runtime-rule convergence to peers on commit. After a successful structural apply — and on the `commit_deferred` path, where the DB row is durable but this node's commit-tail threw — the main broadcasts a `NotifyApplied` admin-internal RPC so peers reconcile against the just-persisted DB row immediately, instead of waiting up to one refresh tick (~30s) to notice it. The fan-out runs off the REST response thread (fire-and-forget on a daemon executor) so an unreachable peer's per-call deadline never adds to the operator's apply latency. On the peer side the notify-triggered reconcile is coalesced: a burst of notifies (a multi-rule file, or several applies) collapses to a single queued full reconcile rather than one redundant `dao.getAll()` scan per notify. The notify is best-effort and idempotent (the peer runs its normal per-file-locked reconcile; a lost notify is harmless — the peer still self-converges on its next tick), so it tightens the cluster-convergence window without adding a hard dependency on the main being reachable.
+* Fix BanyanDB peer nodes permanently flooding `<metric> is not registered`, and a follow-on case where a peer kept translating writes with a stale schema shape after a runtime-rule reshape, when a node held a live persist worker but its local `MetadataRegistry` schema cache was missing or stale for that model — a `withoutSchemaChange` peer apply or a runtime-rule bundled fall-over rebuilt the dispatch worker but skipped the local-cache populate, and the registry was insert-only (never evicting) while the 30s reconcile only covers runtime-rule rows, so nothing re-derived it. The peer / local-cache-only install path now (re)derives and overwrites the local schema entry from the declared model with zero server RPC — honoring the `inspectBackend=false` contract so the cache can never lag the worker, including across a reshape — and a model removal now evicts its cache entry so a dropped or reshaped model leaves no stale translation behind; the persist DAOs keep an RPC-free re-derivation as a read-side backstop, and the no-init defer poll loop retries a transient backend probe error instead of escaping and crash-looping the pod.
+* Fix a v2 MAL `CounterWindow` key collision: `rate()` / `increase()` / `irate()` keyed each counter's sliding window on the rule's output metric name (the same for every input metric of a rule) instead of the counter's own name, so two or more counters that reduce to the same label set after `.sum(...)` shared one window and computed rates against each other's values — fabricating non-zero rates from unchanged counters (e.g. the BanyanDB liaison gRPC error rate read a steady non-zero off three frozen error counters). The window is now keyed by the counter's own metric name.
+* Fix the v2 MAL Elvis operator `?:` to honor Groovy-falsy semantics. It compiled to `Optional.ofNullable(primary).orElse(fallback)`, applying the fallback only when the primary is `null`, so an empty-string primary kept `""` instead — e.g. a BanyanDB liaison `ServiceInstance` stored `node_type=""` rather than `n/a`, because `.sum([...,'node_type'])` fills an absent group-by label with `""`. The fallback now applies for falsy primaries such as null, false, numeric zero, and empty strings/containers.
 * SWIP-15: rebuild BanyanDB self-observability around the cluster / container / group model (requires BanyanDB 0.11+). A BanyanDB cluster is modeled as one `Service`, each container as a `ServiceInstance` (role/tier as attributes), and each storage group as an `Endpoint`. The `otel-rules/banyandb/` rules are category-separated by role (`node_*` / `liaison_*` / `data_*` / `lifecycle_*`) and by data type (`measure_*` / `stream_*` / `trace_*` / `property_*`), mirroring the upstream FODC-proxy Grafana boards, and include queue batch/message granularity (apache/skywalking-banyandb#1169). Adds a `SERVICE_INSTANCE_RELATION` MAL scope and `serviceInstanceRelation(...)` builder powering a new intra-cluster pod-to-pod deployment topology (`banyandb-instance-relation.yaml`). The stale single-node `host_name` model is removed.
 * Runtime MAL/LAL hot-update rules can declare `layerDefinitions:` to introduce new
   layers. Ordinals are operator-pinned in the `100_000+` tier; the layer is
diff --git a/docs/en/concepts-and-designs/runtime-rule-hot-update.md b/docs/en/concepts-and-designs/runtime-rule-hot-update.md
index 4abf49f7dd1f..48ad38202433 100644
--- a/docs/en/concepts-and-designs/runtime-rule-hot-update.md
+++ b/docs/en/concepts-and-designs/runtime-rule-hot-update.md
@@ -220,29 +220,51 @@ Two paths, picked from the diff between the new content and the current entry:
   the same fast path. No cluster pause, no backend schema change, no alarm reset.
 - **Structural path** — anything that moves metric identity (metric set added or
   removed, scope or downsampling function changed, LAL `(layer, ruleName)` set
-  changed). The main runs:
-  1. **Pause the cluster** — broadcast a pause to every peer over the cluster bus.
+  changed). The main runs, in order:
+  1. **Pause the cluster** — self-suspend first (so a concurrent peer apply is
+     detected, not merged), then broadcast a pause to every peer over the cluster bus.
      Peers stop dispatching samples for the affected metrics and drain in-flight
      batches. Unreachable peers are logged and skipped; they self-recover via the
      periodic scan.
-  2. **Update backend storage on this node**, including the schema-visibility fence on
-     BanyanDB (see below).
-  3. **Persist the entry** — this is the cluster-wide commit point.
-  4. **Resume the cluster** — broadcast a resume so peers re-open dispatch. Peers
-     that missed the resume self-heal within 60 s.
-  5. **Reset alarm windows** for any metric whose identity changed, so accumulated
-     state doesn't carry across the change.
-
-If any step before persist fails, the entry is **not** advanced, the local node
-rolls back to the previous rule state, peers self-heal back to the old content within
-60 s, and the operator gets `HTTP 500` with `applyStatus` indicating the failure.
-
-If persist itself fails, the same rollback happens — the durable state never moved,
-so neither does the cluster.
-
-If persist succeeds but the local finishing step fails (a rare path), the operator
-gets `HTTP 500 commit_deferred`: storage holds the new content (peers will converge
-on it), but this node hasn't fully applied it yet and will retry on its next scan.
+  2. **Fire the backend DDL on this node** — create / update the BanyanDB measures
+     for the metrics this rule produces. This returns the schema revision but does
+     **not** yet make the rule the cluster's truth.
+
+  At this point the HTTP call returns `applyId` with phase `FENCING` — **accepted, not
+  yet durable** — and the remaining steps run in the background so a slow data node
+  never blocks the operator. The operator polls `GET /runtime/rule/status` (see the
+  admin-API doc) to watch the rest:
+  3. **Schema-visibility fence** — wait (up to a configurable budget, default 3 min)
+     for every BanyanDB data node to apply the new measure schema (see below).
+  4. **Persist the entry** — this is the cluster-wide commit point, and it happens
+     **after** the fence, so a durable entry always implies "schema propagated
+     cluster-wide".
+  5. **Finalize + resume** — finalize the local commit (swap the bundle, reset alarm
+     windows for any metric whose identity changed, unpark local dispatch) and
+     resume / notify peers so they converge to the new entry. Peers that missed the
+     notify self-heal within 60 s.
+
+Because persist (step 4) is gated behind the fence (step 3), the ordering is
+**pause → DDL → fence → persist → commit → resume**, and crash recovery is safe at
+every point: a crash before persist leaves no entry — peers and the recovered main
+stay on the old content (the orphaned measure from the DDL is inert) — and any entry
+that *is* durable was fence-confirmed before it was written, so a peer converging to
+it via the periodic scan never resumes dispatch against an unpropagated schema.
+
+Outcomes (all observed by polling `/runtime/rule/status`, not by blocking the HTTP
+call, which already returned at `FENCING`):
+
+- **Pre-DDL error** (compile / verify) — phase `FAILED`; the entry was never advanced
+  and the cluster keeps serving the prior rule. The HTTP call returns the error
+  synchronously (it happens before the `FENCING` return).
+- **Persist fails** — phase `FAILED`; the local node rolls back to the prior rule and
+  resumes peers. The durable state never moved, so neither does the cluster.
+- **Fence does not confirm within the budget** — phase `DEGRADED` with the lagging
+  node ids; dispatch resumes anyway (a stuck node must not park the metric forever),
+  and the schema converges through BanyanDB's own watcher.
+- **Local finalize fails after persist** — phase `DEGRADED`: storage holds the new
+  content and peers converge to it, but this node will retry the local finish on its
+  next scan.
 
 ### Lifecycle
 
@@ -367,36 +389,43 @@ gate — without it the system still converges.
 
 ## Schema-visibility fence (BanyanDB)
 
-BanyanDB's distributed mode propagates registry writes from the meta-server to
-every data node asynchronously. A naive flow — register the schema, immediately
-resume dispatch — has a race: the registry holds the new measure but a data node
-may not yet have caught up, so the first sample after the apply lands on an
-unprepared node.
-
-For runtime hot-updates this would mean the operator's `200 OK` could come back
-before the cluster's data boundary actually moved. The runtime-rule install path
-narrows the gap on a best-effort basis: every BanyanDB schema write returns an
-etcd `mod_revision`, and the installer waits — synchronously, before resuming
-dispatch, up to a bounded timeout (default 2s) — for every BanyanDB data node
-to catch up to the highest revision the apply produced.
-
-The visible contract for operators is:
-
-- Between operator request and `200 OK`, all sample dispatch for the affected
-  metric is paused on every node. In-flight samples are dropped (this is by
-  design: a structural change means the schema is moving and in-flight data has
-  no valid landing).
-- When all data nodes confirm within the bounded window, the `200 OK` marks the
-  moment the cluster's data boundary moves: samples written at or after the `200`
-  use the new shape; samples written before use the old shape.
-- When one or more nodes haven't applied within the window, OAP logs a warning
-  naming the laggards and resumes dispatch anyway. The schema is already
-  authoritative in etcd, so late nodes apply it asynchronously through their
-  watcher — until they do, samples landing on those specific nodes for that
-  metric may be rejected by the local data node briefly. This trades strict
-  cluster-wide cutover for not wedging an apply behind a single slow node;
-  operators who need strict behavior should fix the slow node, not loosen the
-  timeout.
+BanyanDB's distributed mode propagates schema writes from the meta-server to every
+data node asynchronously. A naive flow — register the schema, immediately resume
+dispatch — has a race: the registry holds the new measure but a data node may not
+yet have caught up, so a sample written before that node applies the schema is
+**silently dropped** at the data node (not retried). The fence is therefore a
+write-safety barrier, not just an observability check: it must gate dispatch resume.
+
+Every BanyanDB schema write returns a `mod_revision`. After firing the DDL the main
+waits — on a configurable budget (`deferredFenceTimeoutSeconds`, default 3 min) — for
+every data node to catch up to the highest revision the apply produced, and only then
+persists the rule entry, finalizes the local commit, and resumes dispatch. The wait
+runs in the **background**: the HTTP call has already returned `applyId` at phase
+`FENCING`, so a slow data node never blocks the operator, yet nothing durable or
+visible happens until the schema is confirmed.
+
+The contract for operators is:
+
+- The HTTP call returns at `FENCING` (accepted, not yet durable). Dispatch for the
+  affected metric stays paused on every node from the pause broadcast through the
+  fence; this is a clean collection gap, not dropped writes (no node is writing the
+  new shape yet). In-flight samples drained at pause are dropped by design — a
+  structural change moves the schema and in-flight data has no valid landing.
+- When all data nodes confirm within the budget, the entry is persisted and dispatch
+  resumes — the cluster's data boundary moves at that moment. The status advances
+  `FENCING → ROLLING_OUT → APPLIED`.
+- When one or more nodes haven't applied within the budget, OAP logs a warning naming
+  the laggards, persists + resumes **anyway**, and reports `DEGRADED` with the laggard
+  ids. The schema is already authoritative, so late nodes apply it through their own
+  watcher; until they do, samples landing on those specific nodes may be dropped
+  briefly. This trades strict cluster-wide cutover for not wedging an apply behind a
+  single slow node; operators who need strict behavior fix the slow node.
+
+Because persist is gated behind the fence, the fence's guarantee survives a main
+crash: a durable entry is always one whose schema was confirmed propagated, so a peer
+converging to it (via the periodic scan, which performs no backend RPC of its own)
+never resumes against an unpropagated schema. A crash before persist simply leaves no
+entry, and the cluster stays on the prior, already-fenced content.
 
 Elasticsearch and JDBC don't have multi-node schema fan-out; their storage change is
 visible when the call returns, so the fence is a no-op for those backends.
@@ -413,23 +442,29 @@ recovery path is the same path operators already use.
   storage_change_requires_explicit_approval`. No pause broadcast, no persist, no
   side effects. Re-push with `?allowStorageChange=true` if the change is
   intentional.
-- **Backend storage verification failed mid-apply** — `HTTP 500 ddl_verify_failed`.
-  Newly added metrics are rolled back so the backend doesn't accumulate orphans; the prior
+- **Backend storage verification failed mid-apply** — `HTTP 400 ddl_verify_failed`
+  (this happens during the synchronous DDL step, before the `FENCING` return). Newly
+  added metrics are rolled back so the backend doesn't accumulate orphans; the prior
   rule keeps serving every metric that wasn't being added or reshaped.
-  `lastApplyError` on `/runtime/rule/list` carries the failure message.
-- **Persist failed** — `HTTP 500 persist_failed`. Local state is rolled back to
-  the pre-apply rule; peers self-heal within 60 s. The cluster never advanced
-  past the failure.
-- **Persist succeeded but the local finishing step failed** — `HTTP 500 commit_deferred`.
-  Storage is authoritative (peers will converge), but this node will retry on
-  its next periodic scan.
 - **Cluster routing fail-safe** — `HTTP 421 cluster_view_split` when a forwarded
   request reaches a node that also doesn't believe it's the main. Wait for the
   peer-list to settle (seconds) and retry.
 
-`GET /runtime/rule/list` is the canonical operator view of cluster state: persisted
-status, per-node `localState`, and `lastApplyError` for any rule whose most recent
-apply failed. There is no separate alert channel — `/list` plus the OAP log are
+The errors above are returned synchronously, before the call returns `applyId` at
+`FENCING`. The outcomes of the background tail (fence → persist → commit → resume) are
+observed by polling `GET /runtime/rule/status?applyId=…`, not by an HTTP code:
+
+- **Fence didn't confirm within the budget** — phase `DEGRADED` with the lagging node
+  ids; dispatch resumed anyway, schema converges via the data nodes' watcher.
+- **Persist failed** — phase `FAILED`; local state rolled back to the pre-apply rule,
+  peers self-heal within 60 s. The cluster never advanced (no durable entry).
+- **Local finishing step failed after persist** — phase `DEGRADED`; storage is
+  authoritative (peers converge), this node retries on its next periodic scan.
+
+`GET /runtime/rule/list` is the canonical operator view of cluster state (persisted
+status, per-node `localState`, `lastApplyError`); `GET /runtime/rule/status` reports a
+specific apply's live phase / laggards (and degrades to the durable entry when the
+apply-id is gone). There is no separate alert channel — those two plus the OAP log are
 the entire diagnostic surface.
 
 ## Dynamic layers
diff --git a/docs/en/setup/backend/admin-api/runtime-rule.md b/docs/en/setup/backend/admin-api/runtime-rule.md
index 4f056fb543f1..b0a8850ded8e 100644
--- a/docs/en/setup/backend/admin-api/runtime-rule.md
+++ b/docs/en/setup/backend/admin-api/runtime-rule.md
@@ -111,6 +111,7 @@ server returns `400 compile_failed`.
 | GET    | `/runtime/rule/bundled?catalog=&withContent=false`   | Returns bundled rules for one catalog as JSON. `withContent` defaults to true; `false` omits each YAML body. Each item includes whether an operator override exists. |
 | GET    | `/runtime/rule/list[?catalog=]`                      | Returns a single JSON envelope `{generatedAt, loaderStats, rules}` merged from stored rules and this node's local state. Each row carries `loaderKind`, `loaderName`, `bundled`, and `bundledContentHash` so a UI can render override badges without a second roundtrip. Optional `catalog=` narrows the output; unknown values return `400 invalid_catalog`. |
 | GET    | `/runtime/rule/dump[/<catalog>]`                     | Downloads a tar.gz of stored runtime rules plus `manifest.yaml`. The server has no bulk import endpoint; the CLI restore command replays individual `addOrUpdate` and `inactivate` calls. |
+| GET    | `/runtime/rule/status?catalog=&name=[&applyId=][&contentHash=]` | Reports the progress of a structural `/addOrUpdate` apply, served by the cluster main. Query by the `applyId` returned in the `structural_applied` response, or by `catalog`+`name` (+ optional `contentHash`) once the handle is gone (page refresh / main restart). Always `200` with a JSON status `{found, phase, applyId, contentHash, failureReason?, fenceLaggards?, startedAtMs, updatedAtMs, servedBy}`; `phase` is one of `PENDING`, `DDL`, `FENCING`, `ROLLING_OUT`, `APPLIED`, `DEGRADED`, `FAILED`, `UNKNOWN`. The `/addOrUpdate` response returns at `FENCING` with the `applyId` — **accepted, not yet durable**: the DDL has fired but the rule row is persisted only after the fence confirms. Dispatch for that rule stays paused while the background fence waits for cluster-wide schema propagation (an un-propagated write is silently dropped); once it confirms, the row is persisted (the durable commit point) and the apply advances to `ROLLING_OUT` (finalize commit, resume + notify peers) and `APPLIED`. A persist failure → `FAILED` (rolled back, nothing durable); a laggard → `DEGRADED` with `fenceLaggards` (data-node ids that did not confirm within `deferredFenceTimeoutSeconds`), resuming dispatch anyway. Because the row is persisted only after the fence, any durable row is propagation-confirmed — so when the live status is gone (or the main is unreachable) the query safely degrades to the durable rule row: a matching `ACTIVE` row reports `phase=APPLIED` with `derivedFrom=durable-dao`. |
 
 ### `/delete` storage semantics — per backend
 
diff --git a/oap-server/analyzer/meter-analyzer-scripts-test/src/test/java/org/apache/skywalking/oap/server/dsl/tester/mal/MALExpressionExecutionTest.java b/oap-server/analyzer/meter-analyzer-scripts-test/src/test/java/org/apache/skywalking/oap/server/dsl/tester/mal/MALExpressionExecutionTest.java
index 1d0f22a1ec18..296be78fb5b3 100644
--- a/oap-server/analyzer/meter-analyzer-scripts-test/src/test/java/org/apache/skywalking/oap/server/dsl/tester/mal/MALExpressionExecutionTest.java
+++ b/oap-server/analyzer/meter-analyzer-scripts-test/src/test/java/org/apache/skywalking/oap/server/dsl/tester/mal/MALExpressionExecutionTest.java
@@ -198,33 +198,27 @@ private void executeWithInput(
             final Map<String, Object> inputSection,
             final Map<String, Object> expectedSection) {
         final String metricName = rule.getName();
-        // Unique per file+rule to isolate CounterWindow entries across files
-        final String cwMetricName = rule.getSourceFile().getName() + "/" + metricName;
         final String expression = rule.getFullExpression();
         final boolean hasIncrease = expression.contains(".increase(")
             || expression.contains(".rate(");
 
-        // v2 prime + v2 real (also consecutive, same delta)
+        // rate()/increase() resolve their lower bound from the process-wide
+        // CounterWindow.INSTANCE, keyed by each counter's own (name, labels) — not
+        // the rule-level metric name. Input counter names recur across rules and
+        // files, so without a reset one rule's prime/real pair would rate against
+        // another rule's leftover window samples. Clear it so each rule is isolated
+        // to its own prime (t0) + real (t0+2s) pair.
+        org.apache.skywalking.oap.meter.analyzer.v2.dsl.counter.CounterWindow.INSTANCE.reset();
+
+        // v2 prime + v2 real (consecutive scrapes 2 s apart, same delta)
         final Map<String, org.apache.skywalking.oap.meter.analyzer.v2.dsl.SampleFamily> v2Data;
         if (hasIncrease) {
             try {
-                final Map<String, org.apache.skywalking.oap.meter.analyzer.v2.dsl.SampleFamily> primeData =
-                    buildV2MockDataFromInput(inputSection, 0.5);
-                for (final org.apache.skywalking.oap.meter.analyzer.v2.dsl.SampleFamily s : primeData.values()) {
-                    if (s != org.apache.skywalking.oap.meter.analyzer.v2.dsl.SampleFamily.EMPTY) {
-                        s.context.setMetricName(cwMetricName);
-                    }
-                }
-                v2MalExpr.run(primeData);
+                v2MalExpr.run(buildV2MockDataFromInput(inputSection, 0.5));
             } catch (Exception ignored) {
             }
         }
         v2Data = buildV2MockDataFromInput(inputSection, 1.0);
-        for (final org.apache.skywalking.oap.meter.analyzer.v2.dsl.SampleFamily s : v2Data.values()) {
-            if (s != org.apache.skywalking.oap.meter.analyzer.v2.dsl.SampleFamily.EMPTY) {
-                s.context.setMetricName(cwMetricName);
-            }
-        }
 
         // V2 run
         org.apache.skywalking.oap.meter.analyzer.v2.dsl.SampleFamily v2Sf;
diff --git a/oap-server/analyzer/meter-analyzer/src/main/java/org/apache/skywalking/oap/meter/analyzer/v2/compiler/MALClosureCodegen.java b/oap-server/analyzer/meter-analyzer/src/main/java/org/apache/skywalking/oap/meter/analyzer/v2/compiler/MALClosureCodegen.java
index 8b46f68cf91d..e12a6b7fdcf3 100644
--- a/oap-server/analyzer/meter-analyzer/src/main/java/org/apache/skywalking/oap/meter/analyzer/v2/compiler/MALClosureCodegen.java
+++ b/oap-server/analyzer/meter-analyzer/src/main/java/org/apache/skywalking/oap/meter/analyzer/v2/compiler/MALClosureCodegen.java
@@ -300,9 +300,24 @@ void generateClosureExpr(final StringBuilder sb,
         } else if (expr instanceof MALExpressionModel.ClosureElvisExpr) {
             final MALExpressionModel.ClosureElvisExpr elvis =
                 (MALExpressionModel.ClosureElvisExpr) expr;
-            sb.append("java.util.Optional.ofNullable(");
+            // Groovy `?:` applies the fallback when the primary is falsy (null,
+            // empty string/container, numeric zero, false), not only when null.
+            // Keep the primary single-evaluated so expressions such as tags.remove(...)
+            // do not observe different values between the truth check and result.
+            //
+            // KNOWN LIMITATION (eager fallback): elvis(primary, fallback) is a plain method call,
+            // so Java evaluates BOTH arguments before elvis() runs — the fallback is always
+            // computed, even when the primary is truthy. This matches Groovy's falsy-SELECTION but
+            // not its lazy EVALUATION: a fallback that mutates state or is expensive still runs.
+            // True laziness would need the fallback wrapped in a Supplier (or an inlined ternary
+            // over a generated temp), which Javassist cannot emit (no lambdas; expression context
+            // has no statement slot for a temp). Accepted because real MAL fallbacks are pure,
+            // cheap reads (`tags['x'] ?: tags['y']`, `metricA ?: metricB`) — side-effecting/expensive
+            // fallbacks do not occur in practice. Revisit via a Supplier-companion codegen pass if
+            // that ever changes.
+            sb.append(MALCodegenHelper.RUNTIME_HELPER_FQCN).append(".elvis(");
             generateClosureExpr(sb, elvis.getPrimary(), paramName, beanMode);
-            sb.append(").orElse(");
+            sb.append(", ");
             generateClosureExpr(sb, elvis.getFallback(), paramName, beanMode);
             sb.append(")");
         } else if (expr instanceof MALExpressionModel.ClosureRegexMatchExpr) {
diff --git a/oap-server/analyzer/meter-analyzer/src/main/java/org/apache/skywalking/oap/meter/analyzer/v2/compiler/rt/MalRuntimeHelper.java b/oap-server/analyzer/meter-analyzer/src/main/java/org/apache/skywalking/oap/meter/analyzer/v2/compiler/rt/MalRuntimeHelper.java
index 669be6ef7cec..7275e11db459 100644
--- a/oap-server/analyzer/meter-analyzer/src/main/java/org/apache/skywalking/oap/meter/analyzer/v2/compiler/rt/MalRuntimeHelper.java
+++ b/oap-server/analyzer/meter-analyzer/src/main/java/org/apache/skywalking/oap/meter/analyzer/v2/compiler/rt/MalRuntimeHelper.java
@@ -17,6 +17,9 @@
 
 package org.apache.skywalking.oap.meter.analyzer.v2.compiler.rt;
 
+import java.lang.reflect.Array;
+import java.util.Collection;
+import java.util.Map;
 import java.util.regex.Matcher;
 import java.util.regex.Pattern;
 import org.apache.skywalking.oap.meter.analyzer.v2.dsl.Sample;
@@ -55,12 +58,9 @@ public static String[][] regexMatch(final String input, final String regex) {
         return new String[][] {row};
     }
 
-    /**
-     * Reverse division: computes {@code numerator / v} for each sample value {@code v}.
-     * Used by generated code for {@code Number / SampleFamily} expressions.
-     */
     /**
      * Groovy truth check: {@code null → false}, empty string → {@code false},
+     * numeric zero → {@code false}, empty collection/map/array → {@code false},
      * {@code Boolean.FALSE → false}, everything else → {@code true}.
      * Used by generated filter code for standalone expressions in boolean context
      * (e.g., {@code tags.ApiId || tags.ApiName}).
@@ -75,9 +75,29 @@ public static boolean isTruthy(final Object value) {
         if (value instanceof CharSequence) {
             return ((CharSequence) value).length() > 0;
         }
+        if (value instanceof Number) {
+            return ((Number) value).doubleValue() != 0.0D;
+        }
+        if (value instanceof Collection) {
+            return !((Collection<?>) value).isEmpty();
+        }
+        if (value instanceof Map) {
+            return !((Map<?, ?>) value).isEmpty();
+        }
+        if (value.getClass().isArray()) {
+            return Array.getLength(value) > 0;
+        }
         return true;
     }
 
+    public static <T> T elvis(final T primary, final T fallback) {
+        return isTruthy(primary) ? primary : fallback;
+    }
+
+    /**
+     * Reverse division: computes {@code numerator / v} for each sample value {@code v}.
+     * Used by generated code for {@code Number / SampleFamily} expressions.
+     */
     public static SampleFamily divReverse(final double numerator,
                                           final SampleFamily sf) {
         if (sf == SampleFamily.EMPTY) {
diff --git a/oap-server/analyzer/meter-analyzer/src/main/java/org/apache/skywalking/oap/meter/analyzer/v2/dsl/SampleFamily.java b/oap-server/analyzer/meter-analyzer/src/main/java/org/apache/skywalking/oap/meter/analyzer/v2/dsl/SampleFamily.java
index e3d1aec7b3a7..f07ccc8bf38c 100644
--- a/oap-server/analyzer/meter-analyzer/src/main/java/org/apache/skywalking/oap/meter/analyzer/v2/dsl/SampleFamily.java
+++ b/oap-server/analyzer/meter-analyzer/src/main/java/org/apache/skywalking/oap/meter/analyzer/v2/dsl/SampleFamily.java
@@ -431,7 +431,7 @@ public SampleFamily increase(String range) {
             Arrays.stream(samples)
                   .map(sample -> sample.increase(
                       range,
-                      context.metricName,
+                      sample.getName(),
                       (lowerBoundValue, unused) -> sample.value - lowerBoundValue
                   ))
                   .toArray(Sample[]::new)
@@ -448,7 +448,7 @@ public SampleFamily rate(String range) {
             Arrays.stream(samples)
                   .map(sample -> sample.increase(
                       range,
-                      context.metricName,
+                      sample.getName(),
                       (lowerBoundValue, lowerBoundTime) -> {
                           final long timeDiff = (sample.timestamp - lowerBoundTime) / 1000;
                           return timeDiff < 1L ? 0.0 : (sample.value - lowerBoundValue) / timeDiff;
@@ -466,7 +466,7 @@ public SampleFamily irate() {
             this.context,
             Arrays.stream(samples)
                   .map(sample -> sample.increase(
-                      context.metricName,
+                      sample.getName(),
                       (lowerBoundValue, lowerBoundTime) -> {
                           final long timeDiff = (sample.timestamp - lowerBoundTime) / 1000;
                           return timeDiff < 1L ? 0.0 : (sample.value - lowerBoundValue) / timeDiff;
diff --git a/oap-server/analyzer/meter-analyzer/src/test/java/org/apache/skywalking/oap/meter/analyzer/v2/dsl/BanyanDBErrorRateReproTest.java b/oap-server/analyzer/meter-analyzer/src/test/java/org/apache/skywalking/oap/meter/analyzer/v2/dsl/BanyanDBErrorRateReproTest.java
new file mode 100644
index 000000000000..361a35b9fff0
--- /dev/null
+++ b/oap-server/analyzer/meter-analyzer/src/test/java/org/apache/skywalking/oap/meter/analyzer/v2/dsl/BanyanDBErrorRateReproTest.java
@@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ */
+
+package org.apache.skywalking.oap.meter.analyzer.v2.dsl;
+
+import com.google.common.collect.ImmutableMap;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import org.apache.skywalking.oap.meter.analyzer.v2.dsl.counter.CounterWindow;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+
+/**
+ * Reproduces the BanyanDB liaison_grpc_error_rate fabrication using the EXACT rule expression and the
+ * real (frozen) counter values scraped from the live demo FODC proxy. Counters never change across the
+ * simulated scrapes, so every rate term — and the summed result — MUST be 0. Any non-zero output proves
+ * the CounterWindow key collision: the three distinct error counters reduce to identical labels after
+ * .sum([...]) and, because the rate keys on the (shared, rule-level) context.metricName instead of each
+ * counter's own name, they share one CounterWindow slot and rate against each other's values.
+ */
+public class BanyanDBErrorRateReproTest {
+
+    private static final String GROUP_BY = "['cluster','pod_name','container_name','node_role','node_type']";
+
+    // Verbatim from otel-rules/banyandb/banyandb-instance.yaml : liaison_grpc_error_rate (value part).
+    private static final String EXPR =
+        "(banyandb_liaison_grpc_total_err.sum(" + GROUP_BY + ").rate('PT1M')"
+        + " + banyandb_liaison_grpc_total_registry_err.sum(" + GROUP_BY + ").rate('PT1M')"
+        + " + banyandb_liaison_grpc_total_stream_msg_received_err.sum(" + GROUP_BY + ").rate('PT1M')) * 60";
+
+    @BeforeEach
+    void resetWindow() {
+        CounterWindow.INSTANCE.reset();
+    }
+
+    private static Sample s(final String name, final double value, final long ts, final String... kv) {
+        final ImmutableMap.Builder<String, String> b = ImmutableMap.builder();
+        for (int i = 0; i < kv.length; i += 2) {
+            b.put(kv[i], kv[i + 1]);
+        }
+        return Sample.builder().name(name).labels(b.build()).value(value).timestamp(ts).build();
+    }
+
+    // The three liaison-1 families, with the real frozen values (total_err=5, registry_err=166, stream=5).
+    // node_type is intentionally ABSENT on liaison samples, exactly as the FODC proxy exposes them.
+    private Map<String, SampleFamily> scrape(final long ts) {
+        final String[] common = {
+            "cluster", "showcase-banyandb",
+            "pod_name", "demo-banyandb-liaison-1",
+            "container_name", "liaison",
+            "node_role", "ROLE_LIAISON",
+        };
+        final List<Sample> totalErr = new ArrayList<>();
+        totalErr.add(s("banyandb_liaison_grpc_total_err", 1, ts, with(common, "service", "measure", "method", "query", "group", "sw_metadata")));
+        totalErr.add(s("banyandb_liaison_grpc_total_err", 2, ts, with(common, "service", "measure", "method", "query", "group", "sw_metricsMinute")));
+        totalErr.add(s("banyandb_liaison_grpc_total_err", 1, ts, with(common, "service", "measure", "method", "query", "group", "sw_metricsHour")));
+        totalErr.add(s("banyandb_liaison_grpc_total_err", 1, ts, with(common, "service", "measure", "method", "query", "group", "sw_metricsDay")));
+
+        final List<Sample> registryErr = new ArrayList<>();
+        registryErr.add(s("banyandb_liaison_grpc_total_registry_err", 47, ts, with(common, "service", "measure", "method", "get", "group", "sw_metricsHour")));
+        registryErr.add(s("banyandb_liaison_grpc_total_registry_err", 47, ts, with(common, "service", "measure", "method", "get", "group", "sw_metricsMinute")));
+        registryErr.add(s("banyandb_liaison_grpc_total_registry_err", 47, ts, with(common, "service", "measure", "method", "get", "group", "sw_metricsDay")));
+        registryErr.add(s("banyandb_liaison_grpc_total_registry_err", 7, ts, with(common, "service", "indexRule", "method", "create", "group", "sw_metricsDay")));
+        registryErr.add(s("banyandb_liaison_grpc_total_registry_err", 7, ts, with(common, "service", "indexRule", "method", "create", "group", "sw_metricsHour")));
+        registryErr.add(s("banyandb_liaison_grpc_total_registry_err", 7, ts, with(common, "service", "indexRule", "method", "create", "group", "sw_metricsMinute")));
+        registryErr.add(s("banyandb_liaison_grpc_total_registry_err", 2, ts, with(common, "service", "trace", "method", "get", "group", "sw_trace")));
+        registryErr.add(s("banyandb_liaison_grpc_total_registry_err", 2, ts, with(common, "service", "trace", "method", "get", "group", "sw_zipkinTrace")));
+
+        final List<Sample> streamErr = new ArrayList<>();
+        streamErr.add(s("banyandb_liaison_grpc_total_stream_msg_received_err", 1, ts, with(common, "service", "measure", "method", "write", "group", "sw_metadata")));
+        streamErr.add(s("banyandb_liaison_grpc_total_stream_msg_received_err", 2, ts, with(common, "service", "trace", "method", "write", "group", "sw_trace")));
+        streamErr.add(s("banyandb_liaison_grpc_total_stream_msg_received_err", 2, ts, with(common, "service", "stream", "method", "write", "group", "sw_recordsLog")));
+
+        final Map<String, SampleFamily> map = new HashMap<>();
+        map.put("banyandb_liaison_grpc_total_err", SampleFamilyBuilder.newBuilder(totalErr.toArray(new Sample[0])).build());
+        map.put("banyandb_liaison_grpc_total_registry_err", SampleFamilyBuilder.newBuilder(registryErr.toArray(new Sample[0])).build());
+        map.put("banyandb_liaison_grpc_total_stream_msg_received_err", SampleFamilyBuilder.newBuilder(streamErr.toArray(new Sample[0])).build());
+        return map;
+    }
+
+    private static String[] with(final String[] common, final String... extra) {
+        final String[] out = new String[common.length + extra.length];
+        System.arraycopy(common, 0, out, 0, common.length);
+        System.arraycopy(extra, 0, out, common.length, extra.length);
+        return out;
+    }
+
+    @Test
+    void unchangedCounters_errorRate_mustBeZero() {
+        final Expression expr = DSL.parse("meter_banyandb_instance_liaison_grpc_error_rate", EXPR);
+        long ts = 1_700_000_000_000L;
+        final long step = 10_000L; // 10s scrape, matching the showcase collector
+        for (int scrape = 0; scrape < 6; scrape++, ts += step) {
+            final Result result = expr.run(scrape(ts));
+            double maxAbs = 0.0;
+            if (result.isSuccess() && result.getData() != SampleFamily.EMPTY) {
+                for (final Sample out : result.getData().samples) {
+                    maxAbs = Math.max(maxAbs, Math.abs(out.getValue()));
+                }
+            }
+            // Counters never changed -> error rate MUST be 0 on every scrape.
+            assertEquals(0.0, maxAbs, 1e-9,
+                "Unchanged counters must yield 0 error rate, but scrape " + scrape + " produced " + maxAbs);
+        }
+    }
+}
diff --git a/oap-server/analyzer/meter-analyzer/src/test/java/org/apache/skywalking/oap/meter/analyzer/v2/dsl/MALElvisFalsyTest.java b/oap-server/analyzer/meter-analyzer/src/test/java/org/apache/skywalking/oap/meter/analyzer/v2/dsl/MALElvisFalsyTest.java
new file mode 100644
index 000000000000..08f334551a19
--- /dev/null
+++ b/oap-server/analyzer/meter-analyzer/src/test/java/org/apache/skywalking/oap/meter/analyzer/v2/dsl/MALElvisFalsyTest.java
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ */
+
+package org.apache.skywalking.oap.meter.analyzer.v2.dsl;
+
+import com.google.common.collect.ImmutableMap;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import org.apache.skywalking.oap.meter.analyzer.v2.compiler.rt.MalRuntimeHelper;
+import org.junit.jupiter.api.Test;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertFalse;
+import static org.junit.jupiter.api.Assertions.assertTrue;
+
+/**
+ * Groovy's Elvis `?:` applies the fallback when the primary is falsy — including the empty string.
+ * The v2 codegen previously emitted Optional.ofNullable(P).orElse(F), which only fires on null, so an
+ * empty-string primary (e.g. a label that .sum() filled with "" for an absent key) leaked "" instead
+ * of the fallback. This is the exact mechanism behind BanyanDB liaison instances storing node_type=""
+ * instead of "n/a".
+ */
+public class MALElvisFalsyTest {
+
+    private static String tagAfterElvis(final String nodeTypeValue) {
+        final ImmutableMap<String, String> labels = nodeTypeValue == null
+            ? ImmutableMap.of("svc", "s")
+            : ImmutableMap.of("svc", "s", "node_type", nodeTypeValue);
+        final SampleFamily sf = SampleFamilyBuilder.newBuilder(
+            Sample.builder().name("metric").labels(labels).value(1.0).timestamp(1L).build()).build();
+        final Expression expr = DSL.parse("test_elvis",
+            "metric.tag({tags -> tags['nt'] = tags.node_type ?: 'n/a'})");
+        final Result r = expr.run(Map.of("metric", sf));
+        return r.getData().samples[0].getLabels().get("nt");
+    }
+
+    private static String tagAfterSideEffectingElvis(final String nodeTypeValue) {
+        final SampleFamily sf = SampleFamilyBuilder.newBuilder(
+            Sample.builder()
+                  .name("metric")
+                  .labels(ImmutableMap.of("svc", "s", "node_type", nodeTypeValue))
+                  .value(1.0)
+                  .timestamp(1L)
+                  .build()).build();
+        final Expression expr = DSL.parse("test_elvis_remove",
+            "metric.tag({tags -> tags['nt'] = tags.remove('node_type') ?: 'n/a'})");
+        final Result r = expr.run(Map.of("metric", sf));
+        return r.getData().samples[0].getLabels().get("nt");
+    }
+
+    @Test
+    void emptyStringPrimary_usesFallback() {
+        assertEquals("n/a", tagAfterElvis(""), "empty-string primary must fall through to 'n/a' (Groovy-falsy)");
+    }
+
+    @Test
+    void absentPrimary_usesFallback() {
+        assertEquals("n/a", tagAfterElvis(null), "absent (null) primary must fall through to 'n/a'");
+    }
+
+    @Test
+    void nonEmptyPrimary_keptAsIs() {
+        assertEquals("hot", tagAfterElvis("hot"), "non-empty primary must be kept");
+    }
+
+    @Test
+    void sideEffectingPrimary_evaluatedOnce() {
+        assertEquals("hot", tagAfterSideEffectingElvis("hot"),
+            "Elvis must not evaluate the primary twice; tags.remove(...) returns a value only once");
+    }
+
+    @Test
+    void runtimeTruthiness_matchesGroovyFalsyValues() {
+        assertFalse(MalRuntimeHelper.isTruthy(0));
+        assertFalse(MalRuntimeHelper.isTruthy(0.0D));
+        assertFalse(MalRuntimeHelper.isTruthy(Collections.emptyList()));
+        assertFalse(MalRuntimeHelper.isTruthy(Collections.emptyMap()));
+        assertFalse(MalRuntimeHelper.isTruthy(new String[0]));
+        assertTrue(MalRuntimeHelper.isTruthy(-1));
+        assertTrue(MalRuntimeHelper.isTruthy(List.of("value")));
+    }
+}
diff --git a/oap-server/server-admin/runtime-rule/src/main/java/org/apache/skywalking/oap/server/receiver/runtimerule/apply/MalFileApplier.java b/oap-server/server-admin/runtime-rule/src/main/java/org/apache/skywalking/oap/server/receiver/runtimerule/apply/MalFileApplier.java
index 25d1c349a7e8..17cf7a5560ef 100644
--- a/oap-server/server-admin/runtime-rule/src/main/java/org/apache/skywalking/oap/server/receiver/runtimerule/apply/MalFileApplier.java
+++ b/oap-server/server-admin/runtime-rule/src/main/java/org/apache/skywalking/oap/server/receiver/runtimerule/apply/MalFileApplier.java
@@ -39,6 +39,7 @@
 import org.apache.skywalking.oap.server.core.classloader.Catalog;
 import org.apache.skywalking.oap.server.core.classloader.DSLClassLoaderManager;
 import org.apache.skywalking.oap.server.core.classloader.RuleClassLoader;
+import org.apache.skywalking.oap.server.core.storage.StorageException;
 import org.apache.skywalking.oap.server.core.storage.model.StorageManipulationOpt;
 import org.apache.skywalking.oap.server.library.module.ModuleManager;
 import org.apache.skywalking.oap.server.receiver.runtimerule.layer.AppliedClaims;
@@ -168,6 +169,24 @@ public Applied apply(final String yamlContent, final String sourceName,
             layerRegistry.rollback(appliedClaims);
             throw new ApplyException("MAL compile failed for " + sourceName, t, Collections.emptySet());
         }
+        // All DDL for this file's metrics is now fired. If the opt deferred its schema fence
+        // (batched apply via withSchemaChangeDeferredFence), run the single barrier here so the
+        // whole file waits ONCE instead of one fence per metric/downsampling. A fence timeout is
+        // a non-fatal WARN inside the closure; only a barrier transport error throws, which
+        // aborts this apply exactly as an inline per-resource fence would have.
+        //
+        // EXCEPTION: when fenceRunByCaller is set (the runtime-rule REST apply), the orchestrator
+        // runs the fence itself AFTER the durable commit + peer resume, on a background thread, so
+        // a long (3-min) cluster-propagation wait neither blocks the apply nor holds peers
+        // suspended. We only fire the DDL here and leave the closure for the caller to run.
+        if (!storageOpt.isFenceRunByCaller()) {
+            try {
+                storageOpt.runDeferredFence();
+            } catch (final StorageException e) {
+                layerRegistry.rollback(appliedClaims);
+                throw new ApplyException("schema fence failed for " + sourceName, e, metricNames);
+            }
+        }
         return new Applied(rule, convert, metricNames, ruleLoader, appliedClaims);
     }
 
diff --git a/oap-server/server-admin/runtime-rule/src/main/java/org/apache/skywalking/oap/server/receiver/runtimerule/cluster/RuntimeRuleClusterClient.java b/oap-server/server-admin/runtime-rule/src/main/java/org/apache/skywalking/oap/server/receiver/runtimerule/cluster/RuntimeRuleClusterClient.java
index 3652f17feac4..f2bc2438926b 100644
--- a/oap-server/server-admin/runtime-rule/src/main/java/org/apache/skywalking/oap/server/receiver/runtimerule/cluster/RuntimeRuleClusterClient.java
+++ b/oap-server/server-admin/runtime-rule/src/main/java/org/apache/skywalking/oap/server/receiver/runtimerule/cluster/RuntimeRuleClusterClient.java
@@ -29,6 +29,10 @@
 import org.apache.skywalking.oap.server.receiver.runtimerule.cluster.v1.ForwardResponse;
 import org.apache.skywalking.oap.server.receiver.runtimerule.cluster.v1.ResumeAck;
 import org.apache.skywalking.oap.server.receiver.runtimerule.cluster.v1.ResumeRequest;
+import org.apache.skywalking.oap.server.receiver.runtimerule.cluster.v1.ApplyStatusRequest;
+import org.apache.skywalking.oap.server.receiver.runtimerule.cluster.v1.ApplyStatusResponse;
+import org.apache.skywalking.oap.server.receiver.runtimerule.cluster.v1.NotifyAppliedAck;
+import org.apache.skywalking.oap.server.receiver.runtimerule.cluster.v1.NotifyAppliedRequest;
 import org.apache.skywalking.oap.server.receiver.runtimerule.cluster.v1.RuntimeRuleClusterServiceGrpc;
 import org.apache.skywalking.oap.server.receiver.runtimerule.cluster.v1.SuspendAck;
 import org.apache.skywalking.oap.server.receiver.runtimerule.cluster.v1.SuspendRequest;
@@ -113,6 +117,84 @@ private SuspendAck suspendOne(final AdminClusterChannelManager.Peer peer, final
         }
     }
 
+    /**
+     * Route a read-only apply-status query to the cluster main (the only node that runs applies
+     * and holds the status). Returns {@code null} if no main is resolvable or the call fails —
+     * the REST caller then degrades to a content-hash comparison against the durable DAO row.
+     */
+    public ApplyStatusResponse getApplyStatus(final ApplyStatusRequest request) {
+        final AdminClusterChannelManager.Peer main = MainRouter.mainPeer(peerChannelManager);
+        if (main == null) {
+            log.warn("runtime-rule GetApplyStatus skipped: no cluster main resolvable");
+            return null;
+        }
+        final ManagedChannel channel = main.getChannel();
+        if (channel == null) {
+            log.warn("runtime-rule GetApplyStatus skipped: main {} channel not yet established",
+                main.getAddress());
+            return null;
+        }
+        final RuntimeRuleClusterServiceGrpc.RuntimeRuleClusterServiceBlockingStub stub =
+            RuntimeRuleClusterServiceGrpc.newBlockingStub(channel)
+                                         .withDeadlineAfter(perCallDeadlineMs, TimeUnit.MILLISECONDS);
+        try {
+            return stub.getApplyStatus(request);
+        } catch (final Throwable t) {
+            log.warn("runtime-rule GetApplyStatus to main {} failed: {}",
+                main.getAddress(), t.getMessage());
+            return null;
+        }
+    }
+
+    /**
+     * Fan out NotifyApplied to every non-self peer after a successful commit so peers converge
+     * NOW (run a reconcile against the just-persisted DB row) rather than on their next ~30s tick.
+     * Best-effort, same sequential-with-deadline transport as the others; unreachable peers
+     * self-converge on their own tick.
+     */
+    public List<NotifyAppliedAck> broadcastNotifyApplied(final String catalog, final String name,
+                                                         final String contentHash) {
+        final List<AdminClusterChannelManager.Peer> peers = peerChannelManager.getPeers();
+        final List<NotifyAppliedAck> acks = new ArrayList<>(peers.size());
+        for (final AdminClusterChannelManager.Peer peer : peers) {
+            if (peer.isSelf()) {
+                continue;
+            }
+            final NotifyAppliedAck ack = notifyAppliedOne(peer, catalog, name, contentHash);
+            if (ack != null) {
+                acks.add(ack);
+            }
+        }
+        return acks;
+    }
+
+    private NotifyAppliedAck notifyAppliedOne(final AdminClusterChannelManager.Peer peer,
+                                              final String catalog, final String name,
+                                              final String contentHash) {
+        final ManagedChannel channel = peer.getChannel();
+        if (channel == null) {
+            log.warn("runtime-rule NotifyApplied skipped for peer {}: channel not yet established",
+                peer.getAddress());
+            return null;
+        }
+        final RuntimeRuleClusterServiceGrpc.RuntimeRuleClusterServiceBlockingStub stub =
+            RuntimeRuleClusterServiceGrpc.newBlockingStub(channel)
+                                         .withDeadlineAfter(perCallDeadlineMs, TimeUnit.MILLISECONDS);
+        try {
+            return stub.notifyApplied(NotifyAppliedRequest.newBuilder()
+                .setCatalog(catalog)
+                .setName(name)
+                .setContentHash(contentHash == null ? "" : contentHash)
+                .setSenderNodeId(selfNodeId)
+                .setIssuedAtMs(System.currentTimeMillis())
+                .build());
+        } catch (final Throwable t) {
+            log.warn("runtime-rule NotifyApplied to peer {} failed for {}/{}: {}",
+                peer.getAddress(), catalog, name, t.getMessage());
+            return null;
+        }
+    }
+
     /**
      * Fan out Resume to every non-self peer. Same transport, same sequential-with-deadline
      * policy as {@link #broadcastSuspend}. Called by the REST handler's failure branches so
diff --git a/oap-server/server-admin/runtime-rule/src/main/java/org/apache/skywalking/oap/server/receiver/runtimerule/cluster/RuntimeRuleClusterServiceImpl.java b/oap-server/server-admin/runtime-rule/src/main/java/org/apache/skywalking/oap/server/receiver/runtimerule/cluster/RuntimeRuleClusterServiceImpl.java
index 7b01b6f5d8ea..0924bd7ad41f 100644
--- a/oap-server/server-admin/runtime-rule/src/main/java/org/apache/skywalking/oap/server/receiver/runtimerule/cluster/RuntimeRuleClusterServiceImpl.java
+++ b/oap-server/server-admin/runtime-rule/src/main/java/org/apache/skywalking/oap/server/receiver/runtimerule/cluster/RuntimeRuleClusterServiceImpl.java
@@ -23,10 +23,18 @@
 import io.grpc.stub.StreamObserver;
 import java.nio.charset.StandardCharsets;
 import java.util.Objects;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.atomic.AtomicBoolean;
 import lombok.Setter;
 import lombok.extern.slf4j.Slf4j;
+import org.apache.skywalking.oap.server.receiver.runtimerule.cluster.v1.ApplyStatusPhase;
+import org.apache.skywalking.oap.server.receiver.runtimerule.cluster.v1.ApplyStatusRequest;
+import org.apache.skywalking.oap.server.receiver.runtimerule.cluster.v1.ApplyStatusResponse;
 import org.apache.skywalking.oap.server.receiver.runtimerule.cluster.v1.ForwardRequest;
 import org.apache.skywalking.oap.server.receiver.runtimerule.cluster.v1.ForwardResponse;
+import org.apache.skywalking.oap.server.receiver.runtimerule.cluster.v1.NotifyAppliedAck;
+import org.apache.skywalking.oap.server.receiver.runtimerule.cluster.v1.NotifyAppliedRequest;
 import org.apache.skywalking.oap.server.receiver.runtimerule.cluster.v1.ResumeAck;
 import org.apache.skywalking.oap.server.receiver.runtimerule.cluster.v1.ResumeRequest;
 import org.apache.skywalking.oap.server.receiver.runtimerule.cluster.v1.ResumeState;
@@ -41,6 +49,9 @@
 import org.apache.skywalking.oap.server.receiver.runtimerule.rest.RuntimeRuleService;
 import org.apache.skywalking.oap.server.receiver.runtimerule.state.AppliedRuleScript;
 import org.apache.skywalking.oap.server.receiver.runtimerule.state.DSLRuntimeState;
+import org.apache.skywalking.oap.server.receiver.runtimerule.status.ApplyPhase;
+import org.apache.skywalking.oap.server.receiver.runtimerule.status.ApplyStatus;
+import org.apache.skywalking.oap.server.receiver.runtimerule.status.SchemaApplyCoordinator;
 
 /**
  * Server-side handler for the three cluster-internal runtime-rule RPCs — see
@@ -97,11 +108,149 @@ public class RuntimeRuleClusterServiceImpl
     @Setter
     private volatile RuntimeRuleService runtimeRuleService;
 
+    /** Off-RPC-thread runner for notify-triggered reconciles so {@link #notifyApplied} acks
+     *  immediately. Single daemon thread — reconciles are per-file-locked + idempotent, so
+     *  serializing them is fine; daemon so it never blocks JVM shutdown. */
+    private final ExecutorService reconcileNudgeExecutor = Executors.newSingleThreadExecutor(r -> {
+        final Thread t = new Thread(r, "runtime-rule-notify-reconcile");
+        t.setDaemon(true);
+        return t;
+    });
+
+    /** Coalesces a burst of NotifyApplied into a single queued reconcile. {@code dslManager.tick()}
+     *  is a full reconcile over every rule file, so when a multi-file apply (or several applies)
+     *  fires many notifies, the first queued tick already converges all of them — the rest would be
+     *  redundant full {@code dao.getAll()} scans. Set on schedule, cleared at the START of the task
+     *  so a notify arriving while a tick runs still queues exactly one follow-up (no lost update). */
+    private final AtomicBoolean tickPending = new AtomicBoolean(false);
+
     public RuntimeRuleClusterServiceImpl(final DSLManager dslManager, final String selfNodeId) {
         this.dslManager = dslManager;
         this.selfNodeId = selfNodeId;
     }
 
+    /** Stop the off-thread reconcile-nudge executor. The framework's {@code ModuleProvider} has no
+     *  stop lifecycle hook, so in production this is not auto-invoked — the executor is a daemon
+     *  thread that never blocks JVM exit. Provided for clean test teardown and for a future module
+     *  shutdown hook; mirrors {@code RuntimeRuleService.shutdown()}. */
+    public void shutdown() {
+        reconcileNudgeExecutor.shutdownNow();
+    }
+
+    /**
+     * Push-notify from the main after a successful commit: converge NOW rather than on the next
+     * ~30s tick. Runs a full reconcile off the gRPC thread (idempotent, per-file-locked — unchanged
+     * files short-circuit on hash). Best-effort: the peer self-converges on its own tick if this is
+     * lost, so a self-broadcast or a schedule failure is non-fatal.
+     */
+    @Override
+    public void notifyApplied(final NotifyAppliedRequest request,
+                              final StreamObserver<NotifyAppliedAck> responseObserver) {
+        if (Objects.equals(selfNodeId, request.getSenderNodeId())) {
+            responseObserver.onNext(NotifyAppliedAck.newBuilder()
+                .setNodeId(selfNodeId).setAccepted(false)
+                .setDetail("self-broadcast suppressed").build());
+            responseObserver.onCompleted();
+            return;
+        }
+        boolean accepted = true;
+        // Not final: assigned in both the try and catch arms of the schedule attempt below.
+        String detail;
+        if (tickPending.compareAndSet(false, true)) {
+            try {
+                reconcileNudgeExecutor.submit(() -> {
+                    // Clear before running so a notify that arrives during this tick queues exactly
+                    // one follow-up rather than being dropped.
+                    tickPending.set(false);
+                    try {
+                        dslManager.tick();
+                    } catch (final Throwable t) {
+                        log.warn("runtime-rule NotifyApplied reconcile for {}/{} failed; peer will "
+                            + "self-converge on its next tick: {}",
+                            request.getCatalog(), request.getName(), t.getMessage());
+                    }
+                });
+                detail = "reconcile scheduled";
+            } catch (final Throwable t) {
+                // Submit rejected (executor shut down). Release the flag so the next notify retries.
+                tickPending.set(false);
+                accepted = false;
+                detail = "schedule failed; self-converge on next tick";
+                log.warn("runtime-rule NotifyApplied could not schedule reconcile for {}/{}: {}",
+                    request.getCatalog(), request.getName(), t.getMessage());
+            }
+        } else {
+            // A reconcile is already queued; this notify is covered by it. tick() is a full
+            // cluster-wide reconcile, so no per-file work is lost by coalescing.
+            detail = "coalesced into pending reconcile";
+        }
+        responseObserver.onNext(NotifyAppliedAck.newBuilder()
+            .setNodeId(selfNodeId)
+            .setAccepted(accepted)
+            .setDetail(detail)
+            .build());
+        responseObserver.onCompleted();
+    }
+
+    /**
+     * Read-only apply-status query, served by the main (only the main runs applies and holds the
+     * status). Resolves by apply_id when present, else by (catalog, name) gated on content_hash.
+     * Returns {@code found=false} / {@code APPLY_PHASE_UNKNOWN} when nothing matches — the caller
+     * falls back to comparing the durable content hash against the DAO row.
+     */
+    @Override
+    public void getApplyStatus(final ApplyStatusRequest request,
+                               final StreamObserver<ApplyStatusResponse> responseObserver) {
+        final ApplyStatus status;
+        if (!request.getApplyId().isEmpty()) {
+            status = SchemaApplyCoordinator.INSTANCE.get(request.getApplyId());
+        } else {
+            final String hash = request.getContentHash().isEmpty() ? null : request.getContentHash();
+            status = SchemaApplyCoordinator.INSTANCE.getLatestByFile(
+                request.getCatalog(), request.getName(), hash);
+        }
+        final ApplyStatusResponse.Builder resp = ApplyStatusResponse.newBuilder().setNodeId(selfNodeId);
+        if (status == null) {
+            resp.setFound(false).setPhase(ApplyStatusPhase.APPLY_PHASE_UNKNOWN);
+        } else {
+            resp.setFound(true)
+                .setApplyId(status.getApplyId())
+                .setCatalog(status.getCatalog())
+                .setName(status.getName())
+                .setContentHash(status.getContentHash() == null ? "" : status.getContentHash())
+                .setPhase(toProtoPhase(status.getPhase()))
+                .setFailureReason(status.getFailureReason() == null ? "" : status.getFailureReason())
+                .setStartedAtMs(status.getStartedAtMs())
+                .setUpdatedAtMs(status.getUpdatedAtMs());
+            if (status.getFenceLaggards() != null && !status.getFenceLaggards().isEmpty()) {
+                resp.addAllFenceLaggards(status.getFenceLaggards());
+            }
+        }
+        responseObserver.onNext(resp.build());
+        responseObserver.onCompleted();
+    }
+
+    private static ApplyStatusPhase toProtoPhase(final ApplyPhase phase) {
+        switch (phase) {
+            case PENDING:
+                return ApplyStatusPhase.APPLY_PHASE_PENDING;
+            case DDL:
+                return ApplyStatusPhase.APPLY_PHASE_DDL;
+            case FENCING:
+                return ApplyStatusPhase.APPLY_PHASE_FENCING;
+            case ROLLING_OUT:
+                return ApplyStatusPhase.APPLY_PHASE_ROLLING_OUT;
+            case APPLIED:
+                return ApplyStatusPhase.APPLY_PHASE_APPLIED;
+            case DEGRADED:
+                return ApplyStatusPhase.APPLY_PHASE_DEGRADED;
+            case FAILED:
+                return ApplyStatusPhase.APPLY_PHASE_FAILED;
+            default:
+                return ApplyStatusPhase.APPLY_PHASE_UNKNOWN;
+        }
+    }
+
     @Override
     public void suspend(final SuspendRequest request,
                         final StreamObserver<SuspendAck> responseObserver) {
diff --git a/oap-server/server-admin/runtime-rule/src/main/java/org/apache/skywalking/oap/server/receiver/runtimerule/module/RuntimeRuleModuleConfig.java b/oap-server/server-admin/runtime-rule/src/main/java/org/apache/skywalking/oap/server/receiver/runtimerule/module/RuntimeRuleModuleConfig.java
index 58eec1fc78f9..4c4394105e1d 100644
--- a/oap-server/server-admin/runtime-rule/src/main/java/org/apache/skywalking/oap/server/receiver/runtimerule/module/RuntimeRuleModuleConfig.java
+++ b/oap-server/server-admin/runtime-rule/src/main/java/org/apache/skywalking/oap/server/receiver/runtimerule/module/RuntimeRuleModuleConfig.java
@@ -40,4 +40,16 @@ public class RuntimeRuleModuleConfig extends ModuleConfig {
      * refresh + storage replica lag + RPC jitter. Default 60 s is conservative.
      */
     private long selfHealThresholdSeconds = 60;
+    /**
+     * Timeout (seconds) for the runtime-rule deferred/batched BanyanDB schema fence — the wait
+     * for every data node to apply a newly created/updated measure's schema revision. Applies
+     * ONLY to the operator REST apply path, where the fence runs in the background after the
+     * durable commit + peer resume (so it never blocks the apply or holds peers suspended), and
+     * the operator polls {@code GET /runtime/rule/status} to watch {@code FENCING → APPLIED} or
+     * {@code DEGRADED} (with the lagging node ids). The inline / static / delete fences keep their
+     * short 2 s constant. Default 180 s is the multi-node cluster-propagation budget for a large
+     * reshape; on timeout the apply is reported {@code DEGRADED} (durable + forward-progressing,
+     * never reverted) with the laggard node list.
+     */
+    private long deferredFenceTimeoutSeconds = 180;
 }
diff --git a/oap-server/server-admin/runtime-rule/src/main/java/org/apache/skywalking/oap/server/receiver/runtimerule/module/RuntimeRuleModuleProvider.java b/oap-server/server-admin/runtime-rule/src/main/java/org/apache/skywalking/oap/server/receiver/runtimerule/module/RuntimeRuleModuleProvider.java
index ec7197d5de63..6c50beab2a97 100644
--- a/oap-server/server-admin/runtime-rule/src/main/java/org/apache/skywalking/oap/server/receiver/runtimerule/module/RuntimeRuleModuleProvider.java
+++ b/oap-server/server-admin/runtime-rule/src/main/java/org/apache/skywalking/oap/server/receiver/runtimerule/module/RuntimeRuleModuleProvider.java
@@ -41,6 +41,7 @@
 import org.apache.skywalking.oap.server.receiver.runtimerule.cluster.RuntimeRuleClusterClient;
 import org.apache.skywalking.oap.server.receiver.runtimerule.cluster.RuntimeRuleClusterServiceImpl;
 import org.apache.skywalking.oap.server.receiver.runtimerule.reconcile.DSLManager;
+import org.apache.skywalking.oap.server.receiver.runtimerule.status.SchemaApplyCoordinator;
 import org.apache.skywalking.oap.server.receiver.runtimerule.rest.RuntimeRuleRestHandler;
 import org.apache.skywalking.oap.server.telemetry.TelemetryModule;
 import org.apache.skywalking.oap.server.telemetry.api.TelemetryRelatedContext;
@@ -218,6 +219,11 @@ public class RuntimeRuleModuleProvider extends ModuleProvider {
      * unchanged bundles.
      */
     private static final long SCHEDULER_INITIAL_DELAY_SECONDS = 2L;
+    /** Retain a tracked apply-status this long after its last update so a post-apply UI poll (and
+     *  a post-refresh content query) still resolves, then reap it to bound memory. */
+    private static final long APPLY_STATUS_TTL_MS = 3_600_000L;
+    /** How often the apply-status eviction sweep runs on the reconciler executor. */
+    private static final long APPLY_STATUS_EVICT_INTERVAL_SECONDS = 300L;
 
     /**
      * Env var carrying this OAP's unique per-node identity — the Kubernetes pod UID, injected
@@ -271,7 +277,8 @@ public void start() throws ServiceNotProvidedException, ModuleStartException {
         // RuleEngineRegistry from the per-DSL state maps it owns.
         dslManager = new DSLManager(
             getManager(),
-            moduleConfig.getSelfHealThresholdSeconds() * 1000L
+            moduleConfig.getSelfHealThresholdSeconds() * 1000L,
+            moduleConfig.getDeferredFenceTimeoutSeconds() * 1000L
         );
 
         // Cluster-facing Suspend client: fans out to every non-self peer on the OAP cluster bus
@@ -406,6 +413,16 @@ public void notifyAfterCompleted() throws ModuleStartException {
         );
         log.info("Runtime rule dslManager scheduled: first tick in {} s, then every {} s.",
             SCHEDULER_INITIAL_DELAY_SECONDS, intervalSeconds);
+
+        // Bound the apply-status coordinator's memory: reap tracked applies past the retention
+        // window (terminal ones linger long enough for a post-apply UI poll; a stale PENDING from
+        // a missed branch is reaped too — a later query then returns UNKNOWN and the caller falls
+        // back to the durable content hash). Reuses the same single-thread executor; the sweep is
+        // O(tracked) and cheap.
+        reconcilerExecutor.scheduleWithFixedDelay(
+            () -> SchemaApplyCoordinator.INSTANCE.evictExpired(APPLY_STATUS_TTL_MS),
+            APPLY_STATUS_EVICT_INTERVAL_SECONDS, APPLY_STATUS_EVICT_INTERVAL_SECONDS, TimeUnit.SECONDS
+        );
     }
 
     /**
diff --git a/oap-server/server-admin/runtime-rule/src/main/java/org/apache/skywalking/oap/server/receiver/runtimerule/reconcile/DSLManager.java b/oap-server/server-admin/runtime-rule/src/main/java/org/apache/skywalking/oap/server/receiver/runtimerule/reconcile/DSLManager.java
index 102a05ebbe85..c9d2ece6fb6f 100644
--- a/oap-server/server-admin/runtime-rule/src/main/java/org/apache/skywalking/oap/server/receiver/runtimerule/reconcile/DSLManager.java
+++ b/oap-server/server-admin/runtime-rule/src/main/java/org/apache/skywalking/oap/server/receiver/runtimerule/reconcile/DSLManager.java
@@ -150,6 +150,13 @@ public final class DSLManager {
     @Getter
     private final long selfHealThresholdMs;
 
+    /** Timeout for the runtime-rule deferred/batched BanyanDB schema fence on the operator REST
+     *  apply path (default 3 min). Read via {@link #getDeferredFenceTimeoutMs()} by the REST
+     *  orchestrator, which builds the deferred-fence opt with it
+     *  ({@code RuntimeRuleModuleConfig.deferredFenceTimeoutSeconds}). */
+    @Getter
+    private final long deferredFenceTimeoutMs;
+
     /** Lock-observability wrapper. Owned by the DSLManager; the REST handler borrows via
      *  {@link #getLockMetrics()} so every lock acquire path reports to the same histograms. */
     @Getter
@@ -183,12 +190,14 @@ public final class DSLManager {
     private final RuleEngineRegistry engineRegistry;
 
     public DSLManager(final ModuleManager moduleManager,
-                      final long selfHealThresholdMs) {
+                      final long selfHealThresholdMs,
+                      final long deferredFenceTimeoutMs) {
         this.moduleManager = Objects.requireNonNull(moduleManager, "moduleManager");
         this.engineRegistry = new RuleEngineRegistry();
         this.engineRegistry.register(new MalRuleEngine(this.rules, this.moduleManager));
         this.engineRegistry.register(new LalRuleEngine(this.rules, this.moduleManager));
         this.selfHealThresholdMs = selfHealThresholdMs;
+        this.deferredFenceTimeoutMs = deferredFenceTimeoutMs;
         this.lockMetrics =
             new LockMetrics(moduleManager);
         this.suspendCoord = new SuspendResumeCoordinator(
@@ -361,7 +370,7 @@ public DSLRuntimeState applyNowForRuleFile(final RuntimeRuleManagementDAO.Runtim
      */
     public DSLRuntimeState applyNowForRuleFile(final RuntimeRuleManagementDAO.RuntimeRuleFile ruleFile,
                                             final boolean deferCommit) {
-        return applyNowForRuleFile(ruleFile, deferCommit, StorageManipulationOpt.withSchemaChange());
+        return applyNowForRuleFile(ruleFile, deferCommit, StorageManipulationOpt.withSchemaChangeDeferredFence());
     }
 
     /**
@@ -759,10 +768,10 @@ private StorageManipulationOpt tickStorageOpt(final boolean atBoot) {
             selfMain = MainRouter.isSelfMain(apm);
         } catch (final Throwable t) {
             // Cluster module not wired (embedded / single-process) — always main.
-            return StorageManipulationOpt.withSchemaChange();
+            return StorageManipulationOpt.withSchemaChangeDeferredFence();
         }
         if (selfMain) {
-            return StorageManipulationOpt.withSchemaChange();
+            return StorageManipulationOpt.withSchemaChangeDeferredFence();
         }
         return atBoot
             ? StorageManipulationOpt.verifySchemaOnly()
diff --git a/oap-server/server-admin/runtime-rule/src/main/java/org/apache/skywalking/oap/server/receiver/runtimerule/reconcile/DSLRuntimeDelete.java b/oap-server/server-admin/runtime-rule/src/main/java/org/apache/skywalking/oap/server/receiver/runtimerule/reconcile/DSLRuntimeDelete.java
index ce6af280a0ce..0579e9771576 100644
--- a/oap-server/server-admin/runtime-rule/src/main/java/org/apache/skywalking/oap/server/receiver/runtimerule/reconcile/DSLRuntimeDelete.java
+++ b/oap-server/server-admin/runtime-rule/src/main/java/org/apache/skywalking/oap/server/receiver/runtimerule/reconcile/DSLRuntimeDelete.java
@@ -230,7 +230,7 @@ private <C extends ApplyContext> Result runRevert(final RuleEngine<C> engine,
             new RuntimeRuleManagementDAO.RuntimeRuleFile(
                 catalog, name, bundledContent, /* status */ null, /* updateTime */ 0L);
         final ApplyInputs withSchema = new ApplyInputs(
-            moduleManager, StorageManipulationOpt.withSchemaChange(),
+            moduleManager, StorageManipulationOpt.withSchemaChangeDeferredFence(),
             alarmResetter, rules);
         final DSLRuntimeApply.Outcome outcome = dslRuntimeApply.apply(
             bundledFile, Classification.STRUCTURAL,
diff --git a/oap-server/server-admin/runtime-rule/src/main/java/org/apache/skywalking/oap/server/receiver/runtimerule/rest/RuntimeRuleRestHandler.java b/oap-server/server-admin/runtime-rule/src/main/java/org/apache/skywalking/oap/server/receiver/runtimerule/rest/RuntimeRuleRestHandler.java
index 77084f614fcb..bdb4dd0c5b74 100644
--- a/oap-server/server-admin/runtime-rule/src/main/java/org/apache/skywalking/oap/server/receiver/runtimerule/rest/RuntimeRuleRestHandler.java
+++ b/oap-server/server-admin/runtime-rule/src/main/java/org/apache/skywalking/oap/server/receiver/runtimerule/rest/RuntimeRuleRestHandler.java
@@ -113,6 +113,20 @@ public HttpResponse list(@Param("catalog") @Default("") final String catalog) {
         return service.list(catalog);
     }
 
+    /**
+     * Apply-status query for the UI / operator. Served by the cluster main (the service routes
+     * there when self isn't main). Query by {@code applyId} (the live handle from a just-submitted
+     * apply) or, once it's gone, by {@code catalog} + {@code name} (+ optional {@code contentHash}
+     * — the durable identity). Always 200 with a JSON status; {@code found=false} when nothing matches.
+     */
+    @Get("/runtime/rule/status")
+    public HttpResponse applyStatus(@Param("catalog") @Default("") final String catalog,
+                                    @Param("name") @Default("") final String name,
+                                    @Param("contentHash") @Default("") final String contentHash,
+                                    @Param("applyId") @Default("") final String applyId) {
+        return service.queryApplyStatus(catalog, name, contentHash, applyId);
+    }
+
     @Get("/runtime/rule")
     public HttpResponse get(@Param("catalog") final String catalog,
                             @Param("name") final String name,
diff --git a/oap-server/server-admin/runtime-rule/src/main/java/org/apache/skywalking/oap/server/receiver/runtimerule/rest/RuntimeRuleService.java b/oap-server/server-admin/runtime-rule/src/main/java/org/apache/skywalking/oap/server/receiver/runtimerule/rest/RuntimeRuleService.java
index 01d645a4e8a3..fb5ad4fb9af9 100644
--- a/oap-server/server-admin/runtime-rule/src/main/java/org/apache/skywalking/oap/server/receiver/runtimerule/rest/RuntimeRuleService.java
+++ b/oap-server/server-admin/runtime-rule/src/main/java/org/apache/skywalking/oap/server/receiver/runtimerule/rest/RuntimeRuleService.java
@@ -42,6 +42,9 @@
 import java.util.Objects;
 import java.util.Optional;
 import java.util.Set;
+import java.time.Duration;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
 import java.util.concurrent.locks.ReentrantLock;
 import java.util.regex.Pattern;
 import java.util.zip.GZIPOutputStream;
@@ -69,6 +72,8 @@
 import org.apache.skywalking.oap.server.receiver.runtimerule.layer.LayerConflictException;
 import org.apache.skywalking.oap.server.receiver.runtimerule.cluster.MainRouter;
 import org.apache.skywalking.oap.server.receiver.runtimerule.cluster.RuntimeRuleClusterClient;
+import org.apache.skywalking.oap.server.receiver.runtimerule.cluster.v1.ApplyStatusRequest;
+import org.apache.skywalking.oap.server.receiver.runtimerule.cluster.v1.ApplyStatusResponse;
 import org.apache.skywalking.oap.server.receiver.runtimerule.cluster.v1.ForwardResponse;
 import org.apache.skywalking.oap.server.receiver.runtimerule.cluster.v1.SuspendAck;
 import org.apache.skywalking.oap.server.receiver.runtimerule.cluster.v1.SuspendState;
@@ -78,6 +83,9 @@
 import org.apache.skywalking.oap.server.receiver.runtimerule.reconcile.SuspendResult;
 import org.apache.skywalking.oap.server.receiver.runtimerule.state.AppliedRuleScript;
 import org.apache.skywalking.oap.server.receiver.runtimerule.state.DSLRuntimeState;
+import org.apache.skywalking.oap.server.receiver.runtimerule.status.ApplyPhase;
+import org.apache.skywalking.oap.server.receiver.runtimerule.status.ApplyStatus;
+import org.apache.skywalking.oap.server.receiver.runtimerule.status.SchemaApplyCoordinator;
 import org.apache.skywalking.oap.server.receiver.runtimerule.util.ContentHash;
 
 /**
@@ -178,6 +186,34 @@ private Set<String> validCatalogs() {
      */
     private volatile AdminClusterChannelManager peerChannelManager;
 
+    /**
+     * Off-REST-thread runner for the best-effort peer {@code NotifyApplied} fan-out. The fan-out is
+     * sequential with a per-peer deadline, so an unreachable peer would otherwise add
+     * {@code peerCount × deadline} to the operator's apply latency — the notify is a convergence
+     * optimization (peers self-converge on their next tick if it's lost), so it must never hold the
+     * HTTP response. Single daemon thread: applies are serialized per file upstream and the fan-out
+     * is idempotent, so ordering across applies doesn't matter; daemon so it never blocks JVM
+     * shutdown. Shut down by {@link #shutdown()}.
+     */
+    private final ExecutorService notifyExecutor = Executors.newSingleThreadExecutor(r -> {
+        final Thread t = new Thread(r, "runtime-rule-notify-broadcast");
+        t.setDaemon(true);
+        return t;
+    });
+
+    /**
+     * Runs the runtime-rule deferred schema fence in the background after a structural apply's
+     * durable commit (see {@link #scheduleBackgroundFence}). A small fixed daemon pool so several
+     * concurrent applies (different files) can each confirm cluster-wide propagation without one
+     * file's up-to-3-min fence blocking another's status; daemon so it never blocks JVM shutdown.
+     * Shut down by {@link #shutdown()}.
+     */
+    private final ExecutorService fenceExecutor = Executors.newFixedThreadPool(4, r -> {
+        final Thread t = new Thread(r, "runtime-rule-bg-fence");
+        t.setDaemon(true);
+        return t;
+    });
+
     public RuntimeRuleService(final ModuleManager moduleManager,
                                   final DSLManager dslManager,
                                   final RuntimeRuleClusterClient clusterClient,
@@ -269,6 +305,169 @@ private HttpResponse forwardToMain(final String mainAddr,
         }
     }
 
+    /**
+     * Read-only apply-status query for the UI / operator. Served by the main: when self is main
+     * (or single-process / no cluster client), reads the local {@link SchemaApplyCoordinator};
+     * otherwise routes to the main via {@code GetApplyStatus}. Resolve by {@code applyId} (the
+     * live handle) or by {@code catalog}/{@code name} (+ optional {@code contentHash}) once the
+     * apply-id is gone (page refresh / main restart). Always 200 with a JSON status; {@code
+     * found=false} / phase {@code UNKNOWN} when nothing matches (caller compares the durable
+     * content hash itself).
+     */
+    public HttpResponse queryApplyStatus(final String catalog, final String name,
+                                         final String contentHash, final String applyId) {
+        final AdminClusterChannelManager apm = resolvePeerChannelManager();
+        if (apm == null || MainRouter.isSelfMain(apm) || clusterClient == null) {
+            final ApplyStatus local = applyId.isEmpty()
+                ? SchemaApplyCoordinator.INSTANCE.getLatestByFile(
+                    catalog, name, contentHash.isEmpty() ? null : contentHash)
+                : SchemaApplyCoordinator.INSTANCE.get(applyId);
+            if (local != null) {
+                return applyStatusJson(local);
+            }
+            // Live status gone (apply-id evicted / never began on this main). Fall back to the
+            // durable rule row for a content-derived answer; only UNKNOWN if even that is absent.
+            final HttpResponse fromDao = applyStatusFromDao(catalog, name, contentHash);
+            return fromDao != null ? fromDao : applyStatusJson(null);
+        }
+        final ApplyStatusResponse remote = clusterClient.getApplyStatus(ApplyStatusRequest.newBuilder()
+            .setApplyId(applyId).setCatalog(catalog).setName(name).setContentHash(contentHash).build());
+        if (remote == null) {
+            // Main unreachable, or a mixed-version main answered UNIMPLEMENTED. Degrade to the
+            // durable rule row (shared storage) before giving up with a transport error.
+            final HttpResponse fromDao = applyStatusFromDao(catalog, name, contentHash);
+            if (fromDao != null) {
+                return fromDao;
+            }
+            return HttpResponse.of(HttpStatus.BAD_GATEWAY, MediaType.JSON_UTF_8,
+                jsonBody("status_unavailable", catalog, name,
+                    "could not reach the cluster main for apply status; retry shortly"));
+        }
+        if (!remote.getFound()) {
+            // The main reached us but no longer holds a live status (evicted / restarted). The
+            // durable row is the same truth the main would derive — read it locally.
+            final HttpResponse fromDao = applyStatusFromDao(catalog, name, contentHash);
+            if (fromDao != null) {
+                return fromDao;
+            }
+        }
+        return applyStatusJsonFromProto(remote);
+    }
+
+    /**
+     * Content-derived apply status from the durable rule row, used when the live coordinator status
+     * is gone (apply-id evicted, main restarted, or the main is unreachable). The persist-is-commit
+     * invariant means an {@code ACTIVE} row IS the durable record that the apply of its content
+     * committed, so a matching {@code ACTIVE} row reports {@link ApplyPhase#APPLIED} (flagged
+     * {@code derivedFrom=durable-dao}). A hash mismatch means the queried content is not the current
+     * applied content; an {@code INACTIVE} row means it was paused. Returns {@code null} when the DAO
+     * is unresolvable or the read fails, so the caller keeps its own unavailable/unknown response.
+     */
+    private HttpResponse applyStatusFromDao(final String catalog, final String name, final String contentHash) {
+        final RuntimeRuleManagementDAO.RuntimeRuleFile row;
+        try {
+            row = currentRuleFile(catalog, name);
+        } catch (final IOException e) {
+            log.warn("apply-status DAO fallback read failed for {}/{}; reporting unknown", catalog, name, e);
+            return null;
+        }
+        if (row == null) {
+            return null;
+        }
+        final String rowHash = ContentHash.sha256Hex(row.getContent());
+        final boolean active = !RuntimeRule.STATUS_INACTIVE.equals(row.getStatus());
+        final boolean hashMatches = contentHash.isEmpty() || contentHash.equals(rowHash);
+        final JsonObject o = new JsonObject();
+        o.addProperty("catalog", catalog);
+        o.addProperty("name", name);
+        o.addProperty("contentHash", rowHash);
+        o.addProperty("derivedFrom", "durable-dao");
+        if (active && hashMatches) {
+            o.addProperty("found", true);
+            o.addProperty("phase", ApplyPhase.APPLIED.name());
+            o.addProperty("note", "live apply status unavailable; derived from the durable rule row "
+                + "(persist-is-commit: an ACTIVE row means this content's apply committed)");
+        } else {
+            o.addProperty("found", false);
+            o.addProperty("phase", ApplyPhase.UNKNOWN.name());
+            o.addProperty("note", active
+                ? "the queried content is not the currently applied content (hash mismatch)"
+                : "the rule row is INACTIVE (paused)");
+        }
+        return HttpResponse.of(HttpStatus.OK, MediaType.JSON_UTF_8, GSON.toJson(o));
+    }
+
+    private HttpResponse applyStatusJson(final ApplyStatus s) {
+        final JsonObject o = new JsonObject();
+        if (s == null) {
+            o.addProperty("found", false);
+            o.addProperty("phase", ApplyPhase.UNKNOWN.name());
+        } else {
+            o.addProperty("found", true);
+            o.addProperty("applyId", s.getApplyId());
+            o.addProperty("catalog", s.getCatalog());
+            o.addProperty("name", s.getName());
+            o.addProperty("contentHash", s.getContentHash());
+            o.addProperty("phase", s.getPhase().name());
+            if (s.getFailureReason() != null) {
+                o.addProperty("failureReason", s.getFailureReason());
+            }
+            o.addProperty("startedAtMs", s.getStartedAtMs());
+            o.addProperty("updatedAtMs", s.getUpdatedAtMs());
+            if (!s.getFenceLaggards().isEmpty()) {
+                final JsonArray laggards = new JsonArray();
+                s.getFenceLaggards().forEach(laggards::add);
+                o.add("fenceLaggards", laggards);
+            }
+        }
+        // Schema parity with the routed-from-main path (applyStatusJsonFromProto), which always
+        // carries servedBy: this node answered locally (self is main / single process).
+        o.addProperty("servedBy", "self");
+        return HttpResponse.of(HttpStatus.OK, MediaType.JSON_UTF_8, GSON.toJson(o));
+    }
+
+    private HttpResponse applyStatusJsonFromProto(final ApplyStatusResponse r) {
+        final JsonObject o = new JsonObject();
+        o.addProperty("found", r.getFound());
+        if (r.getFound()) {
+            o.addProperty("applyId", r.getApplyId());
+            o.addProperty("catalog", r.getCatalog());
+            o.addProperty("name", r.getName());
+            o.addProperty("contentHash", r.getContentHash());
+            o.addProperty("phase", stripApplyPhasePrefix(r.getPhase().name()));
+            if (!r.getFailureReason().isEmpty()) {
+                o.addProperty("failureReason", r.getFailureReason());
+            }
+            o.addProperty("startedAtMs", r.getStartedAtMs());
+            o.addProperty("updatedAtMs", r.getUpdatedAtMs());
+            if (r.getFenceLaggardsCount() > 0) {
+                final JsonArray laggards = new JsonArray();
+                r.getFenceLaggardsList().forEach(laggards::add);
+                o.add("fenceLaggards", laggards);
+            }
+        } else {
+            o.addProperty("phase", ApplyPhase.UNKNOWN.name());
+        }
+        o.addProperty("servedBy", r.getNodeId());
+        return HttpResponse.of(HttpStatus.OK, MediaType.JSON_UTF_8, GSON.toJson(o));
+    }
+
+    /** Strip the proto {@code APPLY_PHASE_} prefix so the routed-from-main JSON phase matches the
+     *  local-path {@link ApplyPhase} names (e.g. {@code APPLY_PHASE_APPLIED} → {@code APPLIED}).
+     *  Any value that isn't a live {@link ApplyPhase} — the proto default {@code UNSPECIFIED}, a
+     *  reserved/removed phase, or a value from a newer-version peer — maps to {@code UNKNOWN} so the
+     *  JSON phase is always a name the client's enum recognizes. */
+    private static String stripApplyPhasePrefix(final String protoName) {
+        final String prefix = "APPLY_PHASE_";
+        final String stripped = protoName.startsWith(prefix) ? protoName.substring(prefix.length()) : protoName;
+        for (final ApplyPhase p : ApplyPhase.values()) {
+            if (p.name().equals(stripped)) {
+                return stripped;
+            }
+        }
+        return ApplyPhase.UNKNOWN.name();
+    }
+
     private AdminClusterChannelManager resolvePeerChannelManager() {
         AdminClusterChannelManager local = peerChannelManager;
         if (local != null) {
@@ -1002,12 +1201,29 @@ private HttpResponse applyStructural(final String catalog, final String name,
         // removedMetrics, swap appliedMal/appliedContent, retire old loader, alarm reset,
         // advance snapshot) is stashed in the dslManager — we drain it below once persist
         // resolves.
+        // Track this apply in the coordinator so a progress query (and peers, later) can observe
+        // its outcome. From here every exit path marks a terminal phase (APPLIED / FAILED /
+        // DEGRADED); a missed branch leaves only a stale PENDING the background watch reaps.
+        final String applyId = SchemaApplyCoordinator.INSTANCE.begin(
+            catalog, name, ContentHash.sha256Hex(content));
         final long updateTime = System.currentTimeMillis();
         final RuntimeRuleManagementDAO.RuntimeRuleFile ruleFile = new RuntimeRuleManagementDAO.RuntimeRuleFile(
             catalog, name, content, RuntimeRule.STATUS_ACTIVE, updateTime);
         final DSLRuntimeState postApply;
+        // Build the deferred-fence opt ourselves so WE own the post-DDL schema fence. It must run
+        // AFTER persist but BEFORE dispatch resumes: an un-propagated write is silently dropped at
+        // the data node (CLAUDE.md tip #16), so the local commit (which swaps the bundle + unparks
+        // dispatch) and the peer resume/notify must wait for the fence. fenceRunByCaller tells the
+        // installer to register the fence closure but NOT run it inline — we run it (in the
+        // background) right before resuming, so the HTTP response is not held for the wait.
+        final StorageManipulationOpt fenceOpt = StorageManipulationOpt.withSchemaChangeDeferredFence(
+            Duration.ofMillis(dslManager.getDeferredFenceTimeoutMs()));
+        fenceOpt.setFenceRunByCaller(true);
+        // The apply call compiles, verifies, and fires the schema-change DDL (the fence is deferred
+        // to us); mark DDL before it so an in-flight query sees progress past PENDING.
+        SchemaApplyCoordinator.INSTANCE.transition(applyId, ApplyPhase.DDL);
         try {
-            postApply = dslManager.applyNowForRuleFile(ruleFile, true);
+            postApply = dslManager.applyNowForRuleFile(ruleFile, true, fenceOpt);
         } catch (final LayerConflictException lce) {
             // Runtime-DSL layer-declaration conflict — operator-actionable, not a server
             // error. Resume local + peers immediately so the cluster keeps serving the
@@ -1016,6 +1232,7 @@ private HttpResponse applyStructural(final String catalog, final String name,
                 catalog, name, lce.getMessage());
             dslManager.getSuspendCoord().localResume(catalog, name);
             broadcastResume(catalog, name, "layer_conflict");
+            SchemaApplyCoordinator.INSTANCE.markFailed(applyId, "layer conflict: " + lce.getMessage());
             return badRequest(lce.applyStatus(), catalog, name, lce.getMessage());
         } catch (final Throwable t) {
             // Some exceptions wrap LayerConflictException as cause; unwrap so the operator
@@ -1027,6 +1244,7 @@ private HttpResponse applyStructural(final String catalog, final String name,
                     + "(wrapped): {}", catalog, name, wrapped.getMessage());
                 dslManager.getSuspendCoord().localResume(catalog, name);
                 broadcastResume(catalog, name, "layer_conflict");
+                SchemaApplyCoordinator.INSTANCE.markFailed(applyId, "layer conflict: " + wrapped.getMessage());
                 return badRequest(wrapped.applyStatus(), catalog, name, wrapped.getMessage());
             }
             log.error("runtime-rule STRUCTURAL apply threw for {}/{}", catalog, name, t);
@@ -1034,9 +1252,11 @@ private HttpResponse applyStructural(final String catalog, final String name,
             // Peers went SUSPENDED on our earlier broadcast; let them know the apply
             // aborted so they flip back to RUNNING within an RPC round-trip.
             broadcastResume(catalog, name, "apply_threw");
+            SchemaApplyCoordinator.INSTANCE.markFailed(applyId, "apply threw: " + t.getMessage());
             return serverError("apply_failed", catalog, name, t.getMessage());
         }
         if (postApply != null && postApply.getLastApplyError() != null) {
+            SchemaApplyCoordinator.INSTANCE.markFailed(applyId, postApply.getLastApplyError());
             // Apply failed (DDL verify mismatch, compile surprise, applier exception). Row
             // is NOT yet persisted. applyOneRuleFile already rolled back its own partial
             // registration on the exception path; the pendingCommits stash is only
@@ -1063,10 +1283,69 @@ private HttpResponse applyStructural(final String catalog, final String name,
             return serverError("apply_failed", catalog, name, err);
         }
 
-        // Apply succeeded + verified. Commit the row — the design's commit point. Retry a
-        // couple of times on transient failures before giving up; the per-backend
-        // RuntimeRuleManagementDAO.save can throw on a brief storage outage. A narrow retry
-        // here avoids turning a blip into a cluster-divergence event.
+        // Apply succeeded + verified — DDL fired, the schema fence is deferred to us, and the
+        // pending commit is stashed (deferCommit). The write-safe tail runs in the BACKGROUND so the
+        // (up to 3-min) fence wait doesn't block this response: fence → persist → commit → resume.
+        // Persist (the durable commit point) happens AFTER the fence, so "durable" implies "schema
+        // propagated cluster-wide": a main crash before persist leaves NO row — peers/crash-recovery
+        // safely stay on the old content (the orphaned measure from DDL is inert) — and any durable
+        // row is guaranteed fence-confirmed, so convergence never resumes dispatch against an
+        // unpropagated schema. The response returns now at FENCING with the applyId; the operator
+        // polls GET /runtime/rule/status for FENCING → ROLLING_OUT → APPLIED (or DEGRADED + laggards,
+        // or FAILED on persist failure).
+        SchemaApplyCoordinator.INSTANCE.markFencing(applyId);
+        boolean scheduled = true;
+        try {
+            fenceExecutor.submit(
+                () -> fenceThenPersistThenResume(applyId, fenceOpt, catalog, name, content, updateTime));
+        } catch (final Throwable t) {
+            // Executor rejected (shutting down). Run inline (blocking) so the suspend bracket does
+            // not leak and the apply still completes — write-safety wins over a non-blocking response.
+            log.warn("runtime-rule could not schedule the background fence for {}/{}; running it "
+                + "inline so dispatch is not left suspended", catalog, name, t);
+            scheduled = false;
+        }
+        if (!scheduled) {
+            fenceThenPersistThenResume(applyId, fenceOpt, catalog, name, content, updateTime);
+        }
+        return okWithApplyId(HttpStatus.OK, "structural_applied", catalog, name, applyId,
+            "structural apply accepted; fencing schema propagation, then persisting + resuming in "
+                + "the background — poll /runtime/rule/status?applyId=" + applyId + describeDelta(delta));
+    }
+
+    /**
+     * Write-safe post-apply tail, run on the fence executor (or inline if it's gone):
+     * <strong>fence → persist → commit → resume</strong>. The fence waits for the new measure schema
+     * to reach every data node BEFORE anything durable or visible happens (an un-propagated write is
+     * silently dropped — CLAUDE.md tip #16); only then is the rule row persisted (the durable commit
+     * point), so a crash before persist leaves no row and peers stay safely on the old content, and
+     * any durable row is guaranteed fence-confirmed. After persist it finalizes the local commit
+     * (drop removed metrics, swap the bundle, unpark dispatch) and resumes/notifies peers.
+     *
+     * <p>Terminal status: {@link ApplyPhase#FAILED} if persist fails (rolled back — nothing
+     * committed); {@link ApplyPhase#DEGRADED} if the fence didn't confirm within the budget (laggards
+     * listed; dispatch resumed anyway so a stuck node can't park the metric forever) or the local
+     * commit-tail threw (DB durable, peers converge from it); {@link ApplyPhase#APPLIED} otherwise.
+     */
+    private void fenceThenPersistThenResume(final String applyId, final StorageManipulationOpt fenceOpt,
+                                            final String catalog, final String name,
+                                            final String content, final long updateTime) {
+        // 1. Fence the new measure schema's propagation BEFORE persisting or resuming. A fence
+        //    error/timeout is non-fatal (best-effort) — we proceed but mark DEGRADED below.
+        StorageManipulationOpt.FenceOutcome fenceOutcome = null;
+        Throwable fenceError = null;
+        try {
+            fenceOpt.runDeferredFence();
+            fenceOutcome = fenceOpt.getFenceOutcome();
+        } catch (final Throwable t) {
+            fenceError = t;
+            log.warn("runtime-rule schema fence for {}/{} errored; proceeding anyway "
+                + "(BanyanDB still propagates the schema asynchronously)", catalog, name, t);
+        }
+
+        // 2. Persist the rule row — THE durable commit point, AFTER the fence so durable ⟹ fenced.
+        //    Narrow retry over a transient storage blip. On failure nothing is committed: roll back
+        //    the local pending commit and resume peers to the old content (the DB never advanced).
         HttpResponse persistError = persistRuleSync(catalog, name, content, updateTime);
         if (persistError != null) {
             try {
@@ -1077,71 +1356,116 @@ private HttpResponse applyStructural(final String catalog, final String name,
             persistError = persistRuleSync(catalog, name, content, updateTime);
         }
         if (persistError != null) {
-            // Persist still failing. The local node has registered added + shape-break
-            // metrics in MeterSystem (DDL fired, isExists verified) while the DB and peers
-            // remain on the old content. Discard drains the pending commit by removing only
-            // the added + shape-break metrics — it does NOT drop removedMetrics (the commit
-            // was stashed before that step, so those are still alive) and does NOT swap
-            // appliedMal/appliedContent (still on the pre-apply bundle). Net outcome:
-            // local node converges back to the pre-apply bundle exactly, no divergence from
-            // what the DB still says is current.
             try {
                 dslManager.getCommitCoord().discardCommit(catalog, name);
             } catch (final Throwable rt) {
-                log.error("runtime-rule CRITICAL: persist-failure discard itself failed for "
-                    + "{}/{}; state is inconsistent and requires operator intervention",
-                    catalog, name, rt);
+                log.error("runtime-rule CRITICAL: persist-failure discard itself failed for {}/{}; "
+                    + "state is inconsistent and requires operator intervention", catalog, name, rt);
             }
-            // Peers are still SUSPENDED on our earlier broadcast. The DB didn't advance,
-            // so self-heal would eventually flip them back, but broadcasting Resume now
-            // cuts the dispatch gap from 60 s to a single RPC round-trip.
             broadcastResume(catalog, name, "persist_failed");
-            log.error("runtime-rule CRITICAL: STRUCTURAL persist FAILED after successful apply "
-                + "for {}/{} — discarded pending commit; local node re-aligned with old "
-                + "content. Operator action: re-push via /addOrUpdate once storage is healthy.",
-                catalog, name);
-            return persistError;
+            log.error("runtime-rule CRITICAL: STRUCTURAL persist FAILED after apply + fence for {}/{} "
+                + "— discarded the pending commit; local node re-aligned with old content. Operator "
+                + "action: re-push via /addOrUpdate once storage is healthy.", catalog, name);
+            SchemaApplyCoordinator.INSTANCE.markFailed(applyId, "persist failed after apply + fence");
+            return;
         }
 
-        // Persist succeeded — drain the pending commit now that the DB reflects the new
-        // content. commitCoord.finalizeCommit drops removedMetrics, swaps the applied
-        // pointers, retires the old loader, fires alarm reset, and advances the snapshot.
-        //
-        // Commit-tail failure handling: the DB row is durable (persist already succeeded),
-        // so peers converge from the DB — but on THIS node the local drop+recreate may
-        // not have fully landed. Return 500 commit_deferred so the operator sees a clear
-        // "DB row flipped, local commit threw" signal and can retry. Returning 200 would
-        // tell the operator "done" while the backend schema on this node may still be
-        // stale — that's the failure mode the review flagged.
+        // 3. Durable — finalize the local commit (swap bundle, unpark dispatch) + resume/notify peers.
+        SchemaApplyCoordinator.INSTANCE.transition(applyId, ApplyPhase.ROLLING_OUT);
         Throwable commitFailure = null;
         boolean drained = false;
         try {
             drained = dslManager.getCommitCoord().finalizeCommit(catalog, name);
         } catch (final Throwable t) {
             commitFailure = t;
-            log.error("runtime-rule CRITICAL: finalize commit FAILED for {}/{} after persist "
-                + "succeeded — DB is authoritative, peers will converge. Operator action: "
-                + "inspect log for the underlying cause.", catalog, name, t);
+            log.error("runtime-rule CRITICAL: finalize commit FAILED for {}/{} after persist — DB is "
+                + "authoritative, peers converge from it; this node retries on the next tick.",
+                catalog, name, t);
         }
+        if (commitFailure != null || drained) {
+            // The durable row advanced (commit drained, OR the local commit-tail threw but the row IS
+            // persisted) — peers must reconcile against it NOW rather than wait one ~30s tick.
+            broadcastNotifyApplied(catalog, name, ContentHash.sha256Hex(content));
+        } else {
+            // Nothing changed (force re-apply on byte-identical content) — just un-suspend peers.
+            broadcastResume(catalog, name, "structural_resume");
+        }
+
+        // 4. Terminal status.
         if (commitFailure != null) {
-            return serverError("commit_deferred", catalog, name,
-                "DB row persisted, but local commit-tail threw — backend shape on this "
-                    + "node may not have fully landed. Peers converge from DB; this node "
-                    + "will retry on the next dslManager tick. Cause: "
+            SchemaApplyCoordinator.INSTANCE.markDegraded(applyId,
+                "commit-tail deferred: DB persisted, local backend may be stale until the next tick: "
                     + commitFailure.getMessage());
+        } else if (fenceError != null) {
+            SchemaApplyCoordinator.INSTANCE.markDegraded(applyId,
+                "schema fence error (committed + durable; cluster converges): " + fenceError.getMessage());
+        } else if (fenceOutcome != null && !fenceOutcome.isApplied()) {
+            SchemaApplyCoordinator.INSTANCE.markDegraded(applyId,
+                "schema fence did not confirm cluster-wide propagation within "
+                    + dslManager.getDeferredFenceTimeoutMs() + " ms",
+                fenceOutcome.getLaggardNodeIds());
+        } else {
+            SchemaApplyCoordinator.INSTANCE.markApplied(applyId);
         }
+    }
 
-        // No commit was drained — typical for {@code force=true} re-applies on byte-
-        // identical content (engine returned NO_CHANGE so nothing was stashed). Peers are
-        // still PEER-suspended from our earlier broadcast and would only converge via the
-        // 60 s self-heal window without an explicit Resume. Send the Resume now so peers
-        // recover within an RPC round-trip.
-        if (!drained) {
-            broadcastResume(catalog, name, "force_no_change");
+    /**
+     * Background tail for {@code /delete?mode=revertToBundled}, run on the fence executor (or
+     * inline if it's gone). Reinstalls the bundled rule via the apply pipeline — which runs the
+     * deferred schema fence — then removes the runtime row, and maps the orchestrator outcome to
+     * the apply's terminal phase so an operator polling {@code GET /runtime/rule/status?applyId}
+     * sees {@code APPLIED}, or {@code FAILED} with the reason. Unlike a structural /addOrUpdate
+     * there is no Suspend bracket to release here: revert never broadcast a pause (peers converge
+     * on the persisted row via the periodic scan), so a failure needs no Resume.
+     */
+    private void revertToBundledTracked(final String applyId, final String catalog,
+                                         final String name, final String priorContent,
+                                         final RuntimeRuleManagementDAO dao) {
+        final DSLRuntimeDelete.Result revert;
+        try {
+            revert = dslManager.getDslRuntimeDelete().revertToBundled(catalog, name, priorContent);
+        } catch (final Throwable t) {
+            log.error("runtime-rule /delete: revertToBundled threw for {}/{}", catalog, name, t);
+            SchemaApplyCoordinator.INSTANCE.markFailed(applyId, "revert threw: " + t.getMessage());
+            return;
         }
-
-        return ok(HttpStatus.OK, "structural_applied", catalog, name,
-            "structural apply succeeded" + describeDelta(delta));
+        switch (revert.status) {
+            case REFUSED_CONFLICT:
+                log.warn("runtime-rule /delete refused for {}/{}: {}", catalog, name, revert.error);
+                SchemaApplyCoordinator.INSTANCE.markFailed(applyId, "revert refused: " + revert.error);
+                return;
+            case PRECONDITION_FAILED:
+                log.error("runtime-rule /delete: revertToBundled precondition failed for {}/{}: {}",
+                    catalog, name, revert.error);
+                SchemaApplyCoordinator.INSTANCE.markFailed(applyId,
+                    "revert precondition failed: " + revert.error);
+                return;
+            case BUNDLED_APPLY_FAILED:
+                log.error("runtime-rule /delete: bundled apply failed for {}/{}: {}",
+                    catalog, name, revert.error);
+                SchemaApplyCoordinator.INSTANCE.markFailed(applyId,
+                    "bundled apply failed (typically a storage-backend DDL/verify issue); the "
+                        + "orchestrator unwound the runtime install so local state matches the "
+                        + "persisted INACTIVE row — retry once storage recovers. Cause: " + revert.error);
+                return;
+            case REVERTED:
+            default:
+                break;
+        }
+        // Bundled is durable cluster-wide (the apply's fence confirmed) — finalize by removing
+        // the runtime row. A delete failure here leaves bundled applied but the INACTIVE row
+        // lingering; the reconcile retries it, so report FAILED with that context.
+        SchemaApplyCoordinator.INSTANCE.transition(applyId, ApplyPhase.ROLLING_OUT);
+        try {
+            dao.delete(catalog, name);
+        } catch (final IOException e) {
+            log.error("failed to delete runtime rule {}/{} after revert", catalog, name, e);
+            SchemaApplyCoordinator.INSTANCE.markFailed(applyId,
+                "bundled was reinstalled but removing the runtime row failed; it will be retried "
+                    + "on the next reconcile: " + e.getMessage());
+            return;
+        }
+        SchemaApplyCoordinator.INSTANCE.markApplied(applyId);
     }
 
     /**
@@ -1306,6 +1630,42 @@ private List<SuspendAck> broadcastSuspend(final String catalog, final String nam
         }
     }
 
+    /**
+     * Fire NotifyApplied to every non-self peer after a successful commit so peers converge NOW
+     * instead of waiting up to one ~30s refresh tick. Best-effort and fire-and-forget: a failure
+     * is non-fatal because peers self-converge on their own tick regardless.
+     */
+    private void broadcastNotifyApplied(final String catalog, final String name, final String contentHash) {
+        if (clusterClient == null) {
+            return;
+        }
+        // Fire-and-forget: never block the REST response on a sequential, per-peer-deadline fan-out.
+        // A lost notify is harmless — the peer self-converges on its next tick.
+        try {
+            notifyExecutor.submit(() -> {
+                try {
+                    clusterClient.broadcastNotifyApplied(catalog, name, contentHash);
+                } catch (final Throwable t) {
+                    log.warn("runtime-rule NotifyApplied broadcast failed for {}/{}; peers will "
+                        + "converge on their next tick", catalog, name, t);
+                }
+            });
+        } catch (final Throwable t) {
+            // Executor rejected (shutting down). Drop it — peers self-converge on their next tick.
+            log.warn("runtime-rule NotifyApplied could not be scheduled for {}/{}; peers converge "
+                + "on their next tick", catalog, name, t);
+        }
+    }
+
+    /** Stop the best-effort background executors (notify fan-out + schema fence). The framework's
+     *  {@code ModuleProvider} has no stop lifecycle hook, so this is not auto-invoked in production —
+     *  both executors are daemon threads that never block JVM exit. Provided for clean test teardown
+     *  and for a future module shutdown hook; mirrors {@code RuntimeRuleClusterServiceImpl.shutdown()}. */
+    public void shutdown() {
+        notifyExecutor.shutdownNow();
+        fenceExecutor.shutdownNow();
+    }
+
     /**
      * Inspect Suspend acks for the split-brain guard: if any peer responded with REJECTED
      * (origin conflict — it believes it is the main and is mid-apply), surface that to the
@@ -1649,55 +2009,39 @@ private HttpResponse doDeleteLocked(final String catalog, final String name,
         }
 
         if (mode == DeleteMode.REVERT_TO_BUNDLED) {
-            // Bundled-revert path is the schema-change path: bundled may have a different
-            // shape than runtime. The orchestrator runs the unified pipeline:
-            //   (1) installRuntime to put prior runtime claims back locally,
-            //   (2) apply(bundled, STRUCTURAL, BUNDLED, withSchemaChange) — engine.commit
-            //       drops runtime-only metrics through the standard delta path,
-            //   (3) reset rules-map state to boot-seeded so gone-keys reconcile leaves
-            //       it alone after dao.delete.
-            // dao.delete only runs after revertToBundled returns REVERTED — a precondition
-            // or compile failure aborts the row deletion so the operator can retry.
-            final DSLRuntimeDelete.Result revert;
+            // Bundled-revert is the schema-change path: bundled may have a different shape
+            // than runtime, so — like a structural /addOrUpdate — it is tracked and runs
+            // ASYNCHRONOUSLY. The unified pipeline (installRuntime → apply(bundled, STRUCTURAL,
+            // withSchemaChange, deferred fence) → reset state → dao.delete) plus its schema
+            // fence (up to deferredFenceTimeoutMs) run on the background executor, and the
+            // response returns immediately with an applyId the operator polls via
+            // GET /runtime/rule/status. The precondition rejections above (inactivate-first,
+            // no_bundled_twin, requires_revert_to_bundled) are still reported synchronously;
+            // the revert pipeline's own outcomes (REFUSED_CONFLICT / PRECONDITION_FAILED /
+            // BUNDLED_APPLY_FAILED / row-delete failure) surface as the apply's terminal phase
+            // (FAILED, with the reason) on /status rather than an HTTP error.
+            final String bundled =
+                StaticRuleRegistry.active().find(catalog, name).orElse(prior.getContent());
+            final String applyId = SchemaApplyCoordinator.INSTANCE.begin(
+                catalog, name, ContentHash.sha256Hex(bundled));
+            SchemaApplyCoordinator.INSTANCE.markFencing(applyId);
+            final String priorContent = prior.getContent();
+            boolean scheduled = true;
             try {
-                revert = dslManager.getDslRuntimeDelete()
-                    .revertToBundled(catalog, name, prior.getContent());
+                fenceExecutor.submit(() -> revertToBundledTracked(applyId, catalog, name, priorContent, dao));
             } catch (final Throwable t) {
-                log.error("runtime-rule /delete: revertToBundled threw for {}/{}", catalog, name, t);
-                return serverError("revert_to_bundled_failed", catalog, name, t.getMessage());
+                // Executor rejected (shutting down) — run inline so the apply still completes.
+                log.warn("runtime-rule could not schedule the background revert for {}/{}; "
+                    + "running it inline", catalog, name, t);
+                scheduled = false;
             }
-            switch (revert.status) {
-                case REFUSED_CONFLICT:
-                    log.warn("runtime-rule /delete refused for {}/{}: {}", catalog, name, revert.error);
-                    return HttpResponse.of(HttpStatus.CONFLICT, MediaType.JSON_UTF_8,
-                        jsonBody("delete_refused", catalog, name, revert.error));
-                case PRECONDITION_FAILED:
-                    log.error("runtime-rule /delete: revertToBundled precondition failed for {}/{}: {}",
-                        catalog, name, revert.error);
-                    return serverError("revert_to_bundled_precondition_failed", catalog, name, revert.error);
-                case BUNDLED_APPLY_FAILED:
-                    log.error("runtime-rule /delete: bundled apply failed for {}/{}: {}",
-                        catalog, name, revert.error);
-                    return serverError("revert_to_bundled_failed", catalog, name,
-                        "bundled apply failed (typically a storage-backend DDL/verify "
-                            + "issue — BanyanDB unreachable, shape rejection, or schema-"
-                            + "barrier timeout). The orchestrator unwound the step-1 "
-                            + "runtime install so local state matches the persisted "
-                            + "INACTIVE row. Retry once storage recovers. Cause: "
-                            + revert.error);
-                case REVERTED:
-                default:
-                    break;
-            }
-            try {
-                dao.delete(catalog, name);
-            } catch (final IOException e) {
-                log.error("failed to delete runtime rule {}/{}", catalog, name, e);
-                return serverError("delete_failed", catalog, name, e.getMessage());
+            if (!scheduled) {
+                revertToBundledTracked(applyId, catalog, name, priorContent, dao);
             }
-            return ok(HttpStatus.OK, "reverted_to_bundled", catalog, name,
-                "runtime row removed; bundled rule installed via apply pipeline (schema "
-                    + "change handled by the standard delta path); peers converge on next tick");
+            return okWithApplyId(HttpStatus.OK, "reverted_to_bundled", catalog, name, applyId,
+                "revert-to-bundled accepted; reinstalling the bundled rule across the cluster "
+                    + "and removing the runtime row in the background — poll "
+                    + "/runtime/rule/status?applyId=" + applyId);
         }
 
         // No-bundled-twin DEFAULT path. /inactivate already tore down local handlers under
@@ -1783,6 +2127,21 @@ private static HttpResponse ok(final HttpStatus status, final String applyStatus
         return HttpResponse.of(status, MediaType.JSON_UTF_8, jsonBody(applyStatus, catalog, name, message));
     }
 
+    /** {@link #ok} variant that also carries the apply's {@code applyId} so the caller can poll
+     *  {@code GET /runtime/rule/status?applyId=…} directly. Used by the structural-apply success
+     *  envelope. */
+    private static HttpResponse okWithApplyId(final HttpStatus status, final String applyStatus,
+                                              final String catalog, final String name,
+                                              final String applyId, final String message) {
+        final JsonObject body = new JsonObject();
+        body.addProperty("applyStatus", applyStatus);
+        body.addProperty("catalog", catalog == null ? "" : catalog);
+        body.addProperty("name", name == null ? "" : name);
+        body.addProperty("applyId", applyId == null ? "" : applyId);
+        body.addProperty("message", message == null ? "" : message);
+        return HttpResponse.of(status, MediaType.JSON_UTF_8, GSON.toJson(body));
+    }
+
     private static HttpResponse badRequest(final String applyStatus, final String catalog,
                                            final String name, final String message) {
         return HttpResponse.of(HttpStatus.BAD_REQUEST, MediaType.JSON_UTF_8,
diff --git a/oap-server/server-admin/runtime-rule/src/main/java/org/apache/skywalking/oap/server/receiver/runtimerule/status/ApplyPhase.java b/oap-server/server-admin/runtime-rule/src/main/java/org/apache/skywalking/oap/server/receiver/runtimerule/status/ApplyPhase.java
new file mode 100644
index 000000000000..2c0c39c6c151
--- /dev/null
+++ b/oap-server/server-admin/runtime-rule/src/main/java/org/apache/skywalking/oap/server/receiver/runtimerule/status/ApplyPhase.java
@@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ */
+
+package org.apache.skywalking.oap.server.receiver.runtimerule.status;
+
+/**
+ * Lifecycle phase of one runtime-rule apply, as tracked by {@link SchemaApplyCoordinator} on the
+ * cluster main. A progress query (added in a later phase) reports this so an operator can see how
+ * far an apply got — and, when it stops short, whether it is converging ({@link #DEGRADED}) or
+ * failed outright ({@link #FAILED}).
+ *
+ * <p>Normal progression:
+ * {@link #PENDING} → {@link #DDL} → {@link #FENCING} → {@link #ROLLING_OUT} → {@link #APPLIED}.
+ * These are the phases the main observes from its apply orchestration: {@code PENDING} once the
+ * apply is accepted, {@code DDL} while the compile/verify/schema-change call runs (a single opaque
+ * step from the orchestrator's vantage — sub-steps such as validation are not separately
+ * observable, so they are not modelled). Once DDL fires the HTTP response returns with the
+ * {@code applyId} at {@code FENCING} — the apply is accepted but NOT yet durable — while the main
+ * waits (in the background, on a generous timeout) for every BanyanDB data node to apply the new
+ * schema revision. The rule row is persisted (the durable commit point) only AFTER the fence
+ * confirms, so "durable" implies "schema propagated" and crash recovery never resumes against an
+ * unpropagated schema. Dispatch stays suspended through {@code FENCING} because an un-propagated
+ * write is silently dropped. Once the fence confirms and the row is persisted, the apply reaches
+ * {@code ROLLING_OUT} — finalize the local commit, unpark dispatch, resume/notify peers — and then
+ * {@code APPLIED}. The operator polls to watch {@code FENCING → ROLLING_OUT → APPLIED} (or
+ * {@code DEGRADED}/{@code FAILED}). Two off-ramps:
+ * <ul>
+ *   <li>{@link #FAILED} — a pre-commit error (compile / verify / DDL RPC / persist). Nothing was
+ *       committed (a crash before persist likewise leaves no durable row); the change was rolled
+ *       back and peers stay on the prior content.</li>
+ *   <li>{@link #DEGRADED} — committed and durable, but the fence did not confirm cluster-wide
+ *       propagation within the timeout (one or more data nodes lagging — exposed as
+ *       {@code fenceLaggards}; dispatch is resumed anyway so a stuck node doesn't park the metric
+ *       forever) or the local commit-tail threw. Forward-progress: peers converge from the durable
+ *       row and BanyanDB keeps converging; this is NOT a revert.</li>
+ * </ul>
+ * {@link #UNKNOWN} is returned for an apply-id the main no longer holds (evicted / main restarted);
+ * callers fall back to a content-hash comparison.
+ */
+public enum ApplyPhase {
+    PENDING,
+    DDL,
+    FENCING,
+    ROLLING_OUT,
+    APPLIED,
+    DEGRADED,
+    FAILED,
+    UNKNOWN;
+
+    /**
+     * A terminal phase no longer advances on its own: the apply succeeded ({@link #APPLIED}),
+     * failed and rolled back ({@link #FAILED}), or committed-but-unconfirmed ({@link #DEGRADED},
+     * which a background re-check may still flip to {@link #APPLIED}).
+     */
+    public boolean isTerminal() {
+        return this == APPLIED || this == DEGRADED || this == FAILED;
+    }
+}
diff --git a/oap-server/server-admin/runtime-rule/src/main/java/org/apache/skywalking/oap/server/receiver/runtimerule/status/ApplyStatus.java b/oap-server/server-admin/runtime-rule/src/main/java/org/apache/skywalking/oap/server/receiver/runtimerule/status/ApplyStatus.java
new file mode 100644
index 000000000000..57dbb35ea175
--- /dev/null
+++ b/oap-server/server-admin/runtime-rule/src/main/java/org/apache/skywalking/oap/server/receiver/runtimerule/status/ApplyStatus.java
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ */
+
+package org.apache.skywalking.oap.server.receiver.runtimerule.status;
+
+import java.util.Collections;
+import java.util.List;
+import lombok.Getter;
+
+/**
+ * Immutable snapshot of one runtime-rule apply's progress on the cluster main. Held by
+ * {@link SchemaApplyCoordinator}; transitions produce a new snapshot (copy-on-write) so concurrent
+ * readers (a progress query) never see a torn value.
+ *
+ * <p>{@code contentHash} is the durable identity of the applied file — a status query can be
+ * answered by it even after the ephemeral {@code applyId} is gone. {@code failureReason} is set
+ * only for {@link ApplyPhase#FAILED} / {@link ApplyPhase#DEGRADED}.
+ *
+ * <p>This snapshot carries the apply-level (main-orchestrated) phase. Per-node breakdown
+ * (storage-plane laggards, per-OAP-node applied state) is layered on in a later phase via the
+ * status query DTO; this type stays a small, immutable core.
+ */
+@Getter
+public final class ApplyStatus {
+    private final String applyId;
+    private final String catalog;
+    private final String name;
+    private final String contentHash;
+    private final ApplyPhase phase;
+    /** Non-null only for {@link ApplyPhase#FAILED} (pre-commit error) and {@link ApplyPhase#DEGRADED}
+     *  (committed but a node lagging at fence timeout, or local commit-tail threw). Null otherwise. */
+    private final String failureReason;
+    private final long startedAtMs;
+    private final long updatedAtMs;
+    /** Data-node ids that had not confirmed the new schema revision when the fence timed out.
+     *  Non-empty only for {@link ApplyPhase#DEGRADED} caused by fence non-confirmation; always a
+     *  non-null (possibly empty) immutable list. */
+    private final List<String> fenceLaggards;
+
+    public ApplyStatus(final String applyId, final String catalog, final String name,
+                       final String contentHash, final ApplyPhase phase, final String failureReason,
+                       final long startedAtMs, final long updatedAtMs, final List<String> fenceLaggards) {
+        this.applyId = applyId;
+        this.catalog = catalog;
+        this.name = name;
+        this.contentHash = contentHash;
+        this.phase = phase;
+        this.failureReason = failureReason;
+        this.startedAtMs = startedAtMs;
+        this.updatedAtMs = updatedAtMs;
+        this.fenceLaggards = fenceLaggards == null
+            ? Collections.emptyList()
+            : Collections.unmodifiableList(fenceLaggards);
+    }
+
+    /** A copy advanced to {@code newPhase}, clearing any prior failure reason + laggards (forward
+     *  progress). */
+    public ApplyStatus withPhase(final ApplyPhase newPhase, final long nowMs) {
+        return new ApplyStatus(applyId, catalog, name, contentHash, newPhase, null, startedAtMs, nowMs,
+            Collections.emptyList());
+    }
+
+    /** A copy moved to a non-success terminal ({@link ApplyPhase#FAILED} / {@link ApplyPhase#DEGRADED})
+     *  carrying {@code reason} and no laggards. */
+    public ApplyStatus withFailure(final ApplyPhase terminalPhase, final String reason, final long nowMs) {
+        return withFailure(terminalPhase, reason, Collections.emptyList(), nowMs);
+    }
+
+    /** A copy moved to a non-success terminal carrying {@code reason} and the {@code laggards} that
+     *  caused a fence-non-confirmation {@link ApplyPhase#DEGRADED}. */
+    public ApplyStatus withFailure(final ApplyPhase terminalPhase, final String reason,
+                                   final List<String> laggards, final long nowMs) {
+        return new ApplyStatus(applyId, catalog, name, contentHash, terminalPhase, reason, startedAtMs, nowMs,
+            laggards);
+    }
+}
diff --git a/oap-server/server-admin/runtime-rule/src/main/java/org/apache/skywalking/oap/server/receiver/runtimerule/status/SchemaApplyCoordinator.java b/oap-server/server-admin/runtime-rule/src/main/java/org/apache/skywalking/oap/server/receiver/runtimerule/status/SchemaApplyCoordinator.java
new file mode 100644
index 000000000000..c1fffe185f60
--- /dev/null
+++ b/oap-server/server-admin/runtime-rule/src/main/java/org/apache/skywalking/oap/server/receiver/runtimerule/status/SchemaApplyCoordinator.java
@@ -0,0 +1,195 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ */
+
+package org.apache.skywalking.oap.server.receiver.runtimerule.status;
+
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.function.LongSupplier;
+import java.util.function.UnaryOperator;
+import lombok.extern.slf4j.Slf4j;
+
+/**
+ * Single in-memory owner of runtime-rule apply progress on the cluster main. Each apply opens a
+ * status entry keyed by a generated {@code applyId} and advances it through {@link ApplyPhase}
+ * as the apply runs (validate → DDL → fence → roll out → applied), or to a terminal
+ * {@link ApplyPhase#FAILED} / {@link ApplyPhase#DEGRADED}.
+ *
+ * <p>Two indexes back the two ways a caller asks "is this live yet?":
+ * <ul>
+ *   <li>by {@code applyId} — the live handle a just-submitted apply polls;</li>
+ *   <li>by {@code (catalog, name)} → latest applyId — the content-based path used when the
+ *       apply-id is gone (page refresh / main restart), resolved against the durable content hash.</li>
+ * </ul>
+ *
+ * <p>Concurrency: a {@link ConcurrentHashMap} of immutable {@link ApplyStatus} snapshots. Writes
+ * for a given apply are effectively single-threaded (the apply orchestration serializes per file),
+ * reads (progress queries) are concurrent and lock-free against the snapshot. This phase is the
+ * building block; the apply-lifecycle wiring, per-node breakdown, the {@code GetApplyStatus} query
+ * surface, and a background convergence watch with TTL eviction layer on later. State is in-memory
+ * by design — the durable content hash reconstructs truth after a main restart.
+ */
+@Slf4j
+public class SchemaApplyCoordinator {
+
+    /**
+     * Process-wide instance the production apply / cluster-RPC / reconcile paths share, mirroring
+     * the {@code MetadataRegistry.INSTANCE} / {@code DSLClassLoaderManager.INSTANCE} pattern so the
+     * coordinator need not be threaded through every constructor. Uses the wall-clock; tests
+     * construct their own instance with an injected clock instead.
+     */
+    public static final SchemaApplyCoordinator INSTANCE = new SchemaApplyCoordinator();
+
+    private final Map<String, ApplyStatus> byApplyId = new ConcurrentHashMap<>();
+    private final Map<String, String> latestApplyIdByFile = new ConcurrentHashMap<>();
+    private final LongSupplier clock;
+
+    public SchemaApplyCoordinator() {
+        this(System::currentTimeMillis);
+    }
+
+    /** Clock-injectable for deterministic tests (and the timed background watch added later). */
+    public SchemaApplyCoordinator(final LongSupplier clock) {
+        this.clock = clock;
+    }
+
+    /**
+     * Open a new apply in {@link ApplyPhase#PENDING} and return its {@code applyId}. Records it as
+     * the latest apply for {@code (catalog, name)} so a content-based query resolves to it.
+     */
+    public String begin(final String catalog, final String name, final String contentHash) {
+        final String applyId = UUID.randomUUID().toString();
+        final long now = clock.getAsLong();
+        byApplyId.put(applyId, new ApplyStatus(
+            applyId, catalog, name, contentHash, ApplyPhase.PENDING, null, now, now, null));
+        latestApplyIdByFile.put(fileKey(catalog, name), applyId);
+        if (log.isDebugEnabled()) {
+            log.debug("apply [{}] begin: {}/{} hash={}", applyId, catalog, name, contentHash);
+        }
+        return applyId;
+    }
+
+    /** Advance an apply to {@code phase} (forward progress; clears any prior failure reason). No-op
+     *  for an unknown apply-id. */
+    public void transition(final String applyId, final ApplyPhase phase) {
+        update(applyId, s -> s.withPhase(phase, clock.getAsLong()));
+    }
+
+    /** Terminal success. */
+    public void markApplied(final String applyId) {
+        transition(applyId, ApplyPhase.APPLIED);
+    }
+
+    /** Move an apply to {@link ApplyPhase#FENCING}: the background wait for cluster-wide schema
+     *  propagation is in flight (after the durable commit + peer resume). No-op for an unknown id. */
+    public void markFencing(final String applyId) {
+        transition(applyId, ApplyPhase.FENCING);
+    }
+
+    /** Terminal: committed and durable, but cluster-wide propagation unconfirmed within budget
+     *  (a node is lagging). Not a revert — a background re-check may flip it to APPLIED later. */
+    public void markDegraded(final String applyId, final String reason) {
+        update(applyId, s -> s.withFailure(ApplyPhase.DEGRADED, reason, clock.getAsLong()));
+    }
+
+    /** {@link #markDegraded(String, String)} carrying the data-node ids that had not confirmed the
+     *  schema revision at fence timeout, surfaced to the operator on the status. */
+    public void markDegraded(final String applyId, final String reason, final List<String> laggards) {
+        update(applyId, s -> s.withFailure(ApplyPhase.DEGRADED, reason, laggards, clock.getAsLong()));
+    }
+
+    /** Terminal: a pre-commit error (compile / verify / DDL RPC / persist); the change was rolled
+     *  back. */
+    public void markFailed(final String applyId, final String reason) {
+        update(applyId, s -> s.withFailure(ApplyPhase.FAILED, reason, clock.getAsLong()));
+    }
+
+    private void update(final String applyId, final UnaryOperator<ApplyStatus> op) {
+        byApplyId.computeIfPresent(applyId, (k, s) -> op.apply(s));
+    }
+
+    /** The live status for an apply-id, or {@code null} when the main no longer holds it (caller
+     *  treats null as {@link ApplyPhase#UNKNOWN} and falls back to the content-based path). */
+    public ApplyStatus get(final String applyId) {
+        return byApplyId.get(applyId);
+    }
+
+    /**
+     * The latest apply status for a file, for the content-based query when the apply-id is unknown.
+     * Returns {@code null} if no apply for that file is tracked, or if {@code expectedContentHash}
+     * is non-null and does not match the latest apply's hash (the latest tracked apply is for a
+     * different content than the caller asked about).
+     */
+    public ApplyStatus getLatestByFile(final String catalog, final String name,
+                                       final String expectedContentHash) {
+        final String applyId = latestApplyIdByFile.get(fileKey(catalog, name));
+        if (applyId == null) {
+            return null;
+        }
+        final ApplyStatus status = byApplyId.get(applyId);
+        if (status == null) {
+            return null;
+        }
+        if (expectedContentHash != null && !expectedContentHash.equals(status.getContentHash())) {
+            return null;
+        }
+        return status;
+    }
+
+    /** Number of tracked applies — for tests and the TTL-eviction watch. */
+    public int trackedCount() {
+        return byApplyId.size();
+    }
+
+    /**
+     * Evict tracked applies whose last update is older than {@code ttlMs}, bounding memory. Both
+     * terminal entries (kept around so a post-refresh query within the window still resolves) and
+     * stale non-terminal ones (a missed apply branch left in PENDING) are reaped once past the
+     * TTL — a later query then returns {@code null}/UNKNOWN and the caller falls back to comparing
+     * the durable content hash against the DAO row. The {@code (catalog, name) → latest} index
+     * entry is cleared only when it still points at an evicted apply, so a newer apply for the
+     * same file keeps its mapping. Returns the number evicted.
+     */
+    public int evictExpired(final long ttlMs) {
+        final long cutoff = clock.getAsLong() - ttlMs;
+        final Set<String> evicted = new HashSet<>();
+        byApplyId.entrySet().removeIf(e -> {
+            if (e.getValue().getUpdatedAtMs() < cutoff) {
+                evicted.add(e.getKey());
+                return true;
+            }
+            return false;
+        });
+        if (!evicted.isEmpty()) {
+            latestApplyIdByFile.values().removeIf(evicted::contains);
+            if (log.isDebugEnabled()) {
+                log.debug("apply-status: evicted {} entr{} older than {} ms",
+                    evicted.size(), evicted.size() == 1 ? "y" : "ies", ttlMs);
+            }
+        }
+        return evicted.size();
+    }
+
+    private static String fileKey(final String catalog, final String name) {
+        return catalog + "/" + name;
+    }
+}
diff --git a/oap-server/server-admin/runtime-rule/src/main/proto/runtime-rule-cluster.proto b/oap-server/server-admin/runtime-rule/src/main/proto/runtime-rule-cluster.proto
index 898204985bad..54acb2032ca8 100644
--- a/oap-server/server-admin/runtime-rule/src/main/proto/runtime-rule-cluster.proto
+++ b/oap-server/server-admin/runtime-rule/src/main/proto/runtime-rule-cluster.proto
@@ -51,6 +51,16 @@ service RuntimeRuleClusterService {
     // body in the ack. Eliminates the "client must resubmit" round-trip from the old
     // 421 Misdirected Request behaviour.
     rpc Forward(ForwardRequest) returns (ForwardResponse);
+    // Query the apply-status the main tracks for a runtime-rule file. Read-only; served by the
+    // main (non-main nodes route here, since only the main runs applies and holds the status).
+    // Lets the UI/operator poll an apply's progress, and — after the ephemeral apply_id is gone
+    // (page refresh / main restart) — ask by (catalog, name[, content_hash]), the durable identity.
+    rpc GetApplyStatus(ApplyStatusRequest) returns (ApplyStatusResponse);
+    // Fired by the main right after it commits a structural apply, so peers converge NOW
+    // (run a reconcile against the just-persisted DB row) instead of waiting up to one refresh
+    // tick (~30s). Best-effort: the ack only confirms the peer accepted the nudge; the peer
+    // still self-converges on its next tick if the notify is lost.
+    rpc NotifyApplied(NotifyAppliedRequest) returns (NotifyAppliedAck);
 }
 
 message SuspendRequest {
@@ -166,3 +176,80 @@ message ForwardResponse {
     // operator can correlate which node refused.
     string node_id = 3;
 }
+
+// Identifies the apply whose status the caller wants. Either apply_id (the live handle a
+// just-submitted apply was given) OR catalog+name (+ optional content_hash) — the content-based
+// path used once the apply_id is gone. content_hash, when set, requires the latest tracked apply
+// for that file to match (else found=false), so a stale query doesn't misreport a newer apply.
+message ApplyStatusRequest {
+    string apply_id = 1;
+    string catalog = 2;
+    string name = 3;
+    string content_hash = 4;
+}
+
+// Lifecycle phase of an apply on the main. Mirrors the Java ApplyPhase; prefixed to avoid
+// collision with the other enums in this package.
+enum ApplyStatusPhase {
+    APPLY_PHASE_UNSPECIFIED = 0;
+    APPLY_PHASE_PENDING = 1;
+    // 2 (VALIDATING) was removed: it is a sub-step inside the single compile/verify/schema-change
+    // call and is not separately observable by the main's orchestration. Reserved to keep the wire
+    // number stable.
+    reserved 2;
+    reserved "APPLY_PHASE_VALIDATING";
+    APPLY_PHASE_DDL = 3;
+    // After persist, the main waits (in the background, on a generous timeout) for every data node
+    // to apply the new schema revision BEFORE resuming dispatch — an un-propagated write is dropped.
+    // The HTTP response returns here with the apply_id; dispatch stays suspended through FENCING.
+    APPLY_PHASE_FENCING = 4;
+    // Fence confirmed — finalize the local commit, unpark dispatch, resume/notify peers.
+    APPLY_PHASE_ROLLING_OUT = 5;
+    APPLY_PHASE_APPLIED = 6;
+    APPLY_PHASE_DEGRADED = 7;
+    APPLY_PHASE_FAILED = 8;
+    // The main no longer holds an apply matching the request (evicted / main restarted, or the
+    // content_hash did not match the latest tracked apply). Caller falls back to a content-hash
+    // comparison against the durable DAO row.
+    APPLY_PHASE_UNKNOWN = 9;
+}
+
+message ApplyStatusResponse {
+    // False when no matching apply is tracked (phase is then APPLY_PHASE_UNKNOWN and the rest
+    // are empty/zero). True otherwise, with the fields below populated.
+    bool found = 1;
+    string apply_id = 2;
+    string catalog = 3;
+    string name = 4;
+    string content_hash = 5;
+    ApplyStatusPhase phase = 6;
+    // Non-empty only for APPLY_PHASE_FAILED / APPLY_PHASE_DEGRADED.
+    string failure_reason = 7;
+    int64 started_at_ms = 8;
+    int64 updated_at_ms = 9;
+    // Instance id of the node that produced this status (the main).
+    string node_id = 10;
+    // Data-node ids that had not confirmed the schema revision when the fence timed out.
+    // Non-empty only on APPLY_PHASE_DEGRADED (fence non-confirmation).
+    repeated string fence_laggards = 11;
+}
+
+// Sent by the main after a successful structural commit so peers reconcile immediately.
+message NotifyAppliedRequest {
+    string catalog = 1;
+    string name = 2;
+    // The committed content hash — informational/correlation; the peer reconciles from the DB row.
+    string content_hash = 3;
+    // Main's instance id; receivers compare against their own to suppress self-broadcast loops.
+    string sender_node_id = 4;
+    // Epoch millis on the main; diagnostic only.
+    int64 issued_at_ms = 5;
+}
+
+message NotifyAppliedAck {
+    string node_id = 1;
+    // True when the peer scheduled an immediate reconcile; false on self-broadcast suppression
+    // or when the receiver isn't ready (it still self-converges on its next tick).
+    bool accepted = 2;
+    string detail = 3;
+}
diff --git a/oap-server/server-admin/runtime-rule/src/test/java/org/apache/skywalking/oap/server/receiver/runtimerule/apply/MalFileApplierTest.java b/oap-server/server-admin/runtime-rule/src/test/java/org/apache/skywalking/oap/server/receiver/runtimerule/apply/MalFileApplierTest.java
index dc4222b49185..66f147557c71 100644
--- a/oap-server/server-admin/runtime-rule/src/test/java/org/apache/skywalking/oap/server/receiver/runtimerule/apply/MalFileApplierTest.java
+++ b/oap-server/server-admin/runtime-rule/src/test/java/org/apache/skywalking/oap/server/receiver/runtimerule/apply/MalFileApplierTest.java
@@ -24,6 +24,7 @@
 import javassist.ClassPool;
 import org.apache.skywalking.oap.server.core.analysis.Layer;
 import org.apache.skywalking.oap.server.core.analysis.meter.MeterSystem;
+import org.apache.skywalking.oap.server.core.storage.StorageException;
 import org.apache.skywalking.oap.server.core.storage.model.StorageManipulationOpt;
 import org.apache.skywalking.oap.server.receiver.runtimerule.layer.LayerConflictException;
 import org.apache.skywalking.oap.server.receiver.runtimerule.layer.RuntimeLayerRegistry;
@@ -134,6 +135,37 @@ void successfulApplyRegistersDerivedMetricNames() throws Exception {
                 any(StorageManipulationOpt.class));
     }
 
+    @Test
+    void deferredFenceFailureRollsBackAndCarriesPartialMetricsForCallerRollback() throws Exception {
+        // The DDL for every metric fires first; the batched schema fence runs once at the end. A
+        // barrier transport error there must abort the apply as an ApplyException (so the REST
+        // caller rolls back), carrying the metric names registered before the fence so the caller
+        // knows what to unwind. Inline path (fenceRunByCaller=false) so the applier runs the fence.
+        final String yaml =
+            "metricPrefix: meter_vm\n"
+                + "expSuffix: service(['host'], Layer.OS_LINUX)\n"
+                + "metricsRules:\n"
+                + "  - name: cpu_total_percentage\n"
+                + "    exp: node_cpu_seconds_total.sum(['host']).rate('PT1M')\n"
+                + "  - name: mem_total_used_percentage\n"
+                + "    exp: node_memory_MemTotal_bytes.sum(['host'])\n";
+        final StorageManipulationOpt opt = StorageManipulationOpt.withSchemaChangeDeferredFence();
+        opt.setDeferredFence(() -> {
+            throw new StorageException("barrier transport down");
+        });
+
+        final MalFileApplier.ApplyException ex = assertThrows(
+            MalFileApplier.ApplyException.class,
+            () -> applier.apply(yaml, "otel-rules/vm-fence", "hashF", opt));
+
+        assertTrue(ex.getMessage().contains("schema fence failed"),
+            "fence failure must surface as an apply error, not be swallowed");
+        assertEquals(
+            setOf("meter_vm_cpu_total_percentage", "meter_vm_mem_total_used_percentage"),
+            ex.getPartiallyRegistered(),
+            "metrics registered before the fence must be carried for the caller's rollback");
+    }
+
     @Test
     void removeCallsMeterSystemPerName() {
         // The inverse side of the contract: on unregister every metric name the prior apply
diff --git a/oap-server/server-admin/runtime-rule/src/test/java/org/apache/skywalking/oap/server/receiver/runtimerule/cluster/RuntimeRuleClusterServiceImplTest.java b/oap-server/server-admin/runtime-rule/src/test/java/org/apache/skywalking/oap/server/receiver/runtimerule/cluster/RuntimeRuleClusterServiceImplTest.java
new file mode 100644
index 000000000000..3d1746f38d1f
--- /dev/null
+++ b/oap-server/server-admin/runtime-rule/src/test/java/org/apache/skywalking/oap/server/receiver/runtimerule/cluster/RuntimeRuleClusterServiceImplTest.java
@@ -0,0 +1,183 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ */
+
+package org.apache.skywalking.oap.server.receiver.runtimerule.cluster;
+
+import io.grpc.stub.StreamObserver;
+import java.util.Arrays;
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.TimeUnit;
+import org.apache.skywalking.oap.server.receiver.runtimerule.cluster.v1.ApplyStatusPhase;
+import org.apache.skywalking.oap.server.receiver.runtimerule.cluster.v1.ApplyStatusRequest;
+import org.apache.skywalking.oap.server.receiver.runtimerule.cluster.v1.ApplyStatusResponse;
+import org.apache.skywalking.oap.server.receiver.runtimerule.cluster.v1.NotifyAppliedAck;
+import org.apache.skywalking.oap.server.receiver.runtimerule.cluster.v1.NotifyAppliedRequest;
+import org.apache.skywalking.oap.server.receiver.runtimerule.reconcile.DSLManager;
+import org.apache.skywalking.oap.server.receiver.runtimerule.status.ApplyPhase;
+import org.apache.skywalking.oap.server.receiver.runtimerule.status.SchemaApplyCoordinator;
+import org.junit.jupiter.api.Test;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertFalse;
+import static org.junit.jupiter.api.Assertions.assertTrue;
+import static org.mockito.Mockito.doAnswer;
+import static org.mockito.Mockito.mock;
+import static org.mockito.Mockito.never;
+import static org.mockito.Mockito.verify;
+
+/**
+ * Unit coverage for the admin-internal gRPC handlers the cluster main serves:
+ * {@code getApplyStatus} (the apply-status query, including the {@link ApplyPhase} → proto mapping
+ * and the laggard list) and {@code notifyApplied} (self-suppression + the off-thread reconcile
+ * nudge). Status flows through the {@link SchemaApplyCoordinator#INSTANCE} singleton, so each test
+ * opens its own apply (a fresh UUID applyId) to stay isolated from the others.
+ */
+class RuntimeRuleClusterServiceImplTest {
+
+    private static final String SELF = "self-node_17129";
+
+    /** Minimal StreamObserver that captures the single onNext value the handlers emit. */
+    private static final class Capturing<T> implements StreamObserver<T> {
+        private T value;
+
+        @Override
+        public void onNext(final T v) {
+            this.value = v;
+        }
+
+        @Override
+        public void onError(final Throwable t) {
+        }
+
+        @Override
+        public void onCompleted() {
+        }
+    }
+
+    private ApplyStatusResponse query(final RuntimeRuleClusterServiceImpl impl, final String applyId) {
+        final Capturing<ApplyStatusResponse> obs = new Capturing<>();
+        impl.getApplyStatus(ApplyStatusRequest.newBuilder().setApplyId(applyId).build(), obs);
+        return obs.value;
+    }
+
+    @Test
+    void getApplyStatusMapsEachPhaseToProtoAlongTheHappyPath() {
+        final RuntimeRuleClusterServiceImpl impl = new RuntimeRuleClusterServiceImpl(mock(DSLManager.class), SELF);
+        final String id = SchemaApplyCoordinator.INSTANCE.begin("otel-rules", "vm-phase", "h1");
+
+        assertEquals(ApplyStatusPhase.APPLY_PHASE_PENDING, query(impl, id).getPhase());
+        SchemaApplyCoordinator.INSTANCE.transition(id, ApplyPhase.DDL);
+        assertEquals(ApplyStatusPhase.APPLY_PHASE_DDL, query(impl, id).getPhase());
+        SchemaApplyCoordinator.INSTANCE.markFencing(id);
+        assertEquals(ApplyStatusPhase.APPLY_PHASE_FENCING, query(impl, id).getPhase());
+        SchemaApplyCoordinator.INSTANCE.transition(id, ApplyPhase.ROLLING_OUT);
+        assertEquals(ApplyStatusPhase.APPLY_PHASE_ROLLING_OUT, query(impl, id).getPhase());
+        SchemaApplyCoordinator.INSTANCE.markApplied(id);
+
+        final ApplyStatusResponse applied = query(impl, id);
+        assertTrue(applied.getFound());
+        assertEquals(ApplyStatusPhase.APPLY_PHASE_APPLIED, applied.getPhase());
+        assertEquals("otel-rules", applied.getCatalog());
+        assertEquals("vm-phase", applied.getName());
+        assertEquals(SELF, applied.getNodeId());
+        assertTrue(applied.getFenceLaggardsList().isEmpty());
+    }
+
+    @Test
+    void getApplyStatusSurfacesDegradedWithLaggardNodeIds() {
+        final RuntimeRuleClusterServiceImpl impl = new RuntimeRuleClusterServiceImpl(mock(DSLManager.class), SELF);
+        final String id = SchemaApplyCoordinator.INSTANCE.begin("otel-rules", "vm-degraded", "h2");
+        SchemaApplyCoordinator.INSTANCE.markDegraded(id, "fence did not confirm",
+            Arrays.asList("data-1_17912", "data-2_17912"));
+
+        final ApplyStatusResponse resp = query(impl, id);
+        assertEquals(ApplyStatusPhase.APPLY_PHASE_DEGRADED, resp.getPhase());
+        assertEquals("fence did not confirm", resp.getFailureReason());
+        assertEquals(Arrays.asList("data-1_17912", "data-2_17912"), resp.getFenceLaggardsList());
+    }
+
+    @Test
+    void getApplyStatusMapsFailed() {
+        final RuntimeRuleClusterServiceImpl impl = new RuntimeRuleClusterServiceImpl(mock(DSLManager.class), SELF);
+        final String id = SchemaApplyCoordinator.INSTANCE.begin("otel-rules", "vm-failed", "h3");
+        SchemaApplyCoordinator.INSTANCE.markFailed(id, "ddl_verify_failed");
+
+        final ApplyStatusResponse resp = query(impl, id);
+        assertEquals(ApplyStatusPhase.APPLY_PHASE_FAILED, resp.getPhase());
+        assertEquals("ddl_verify_failed", resp.getFailureReason());
+        assertTrue(resp.getFenceLaggardsList().isEmpty());
+    }
+
+    @Test
+    void getApplyStatusReturnsUnknownWhenNothingTracked() {
+        final RuntimeRuleClusterServiceImpl impl = new RuntimeRuleClusterServiceImpl(mock(DSLManager.class), SELF);
+
+        final ApplyStatusResponse resp = query(impl, "no-such-apply-id");
+        assertFalse(resp.getFound());
+        assertEquals(ApplyStatusPhase.APPLY_PHASE_UNKNOWN, resp.getPhase());
+        // node_id is still stamped so the caller knows which node answered.
+        assertEquals(SELF, resp.getNodeId());
+    }
+
+    @Test
+    void getApplyStatusResolvesByCatalogNameWhenApplyIdAbsent() {
+        final RuntimeRuleClusterServiceImpl impl = new RuntimeRuleClusterServiceImpl(mock(DSLManager.class), SELF);
+        final String id = SchemaApplyCoordinator.INSTANCE.begin("otel-rules", "vm-byname", "hbyname");
+        SchemaApplyCoordinator.INSTANCE.markApplied(id);
+
+        final Capturing<ApplyStatusResponse> obs = new Capturing<>();
+        impl.getApplyStatus(ApplyStatusRequest.newBuilder()
+            .setCatalog("otel-rules").setName("vm-byname").setContentHash("hbyname").build(), obs);
+
+        assertTrue(obs.value.getFound());
+        assertEquals(ApplyStatusPhase.APPLY_PHASE_APPLIED, obs.value.getPhase());
+        assertEquals(id, obs.value.getApplyId());
+    }
+
+    @Test
+    void notifyAppliedSuppressesSelfBroadcastAndDoesNotReconcile() {
+        final DSLManager dslManager = mock(DSLManager.class);
+        final RuntimeRuleClusterServiceImpl impl = new RuntimeRuleClusterServiceImpl(dslManager, SELF);
+        final Capturing<NotifyAppliedAck> obs = new Capturing<>();
+
+        impl.notifyApplied(NotifyAppliedRequest.newBuilder()
+            .setSenderNodeId(SELF).setCatalog("otel-rules").setName("vm").build(), obs);
+
+        assertFalse(obs.value.getAccepted(), "a node's own broadcast must be suppressed");
+        verify(dslManager, never()).tick();
+    }
+
+    @Test
+    void notifyAppliedFromPeerSchedulesAReconcileTick() throws Exception {
+        final DSLManager dslManager = mock(DSLManager.class);
+        final CountDownLatch ticked = new CountDownLatch(1);
+        doAnswer(inv -> {
+            ticked.countDown();
+            return null;
+        }).when(dslManager).tick();
+        final RuntimeRuleClusterServiceImpl impl = new RuntimeRuleClusterServiceImpl(dslManager, SELF);
+        final Capturing<NotifyAppliedAck> obs = new Capturing<>();
+
+        impl.notifyApplied(NotifyAppliedRequest.newBuilder()
+            .setSenderNodeId("other-node_17129").setCatalog("otel-rules").setName("vm").build(), obs);
+
+        assertTrue(obs.value.getAccepted(), "a peer notify must be accepted");
+        assertTrue(ticked.await(3, TimeUnit.SECONDS),
+            "the reconcile tick must run off the gRPC thread");
+    }
+}
diff --git a/oap-server/server-admin/runtime-rule/src/test/java/org/apache/skywalking/oap/server/receiver/runtimerule/rest/GuardrailIntegrationTest.java b/oap-server/server-admin/runtime-rule/src/test/java/org/apache/skywalking/oap/server/receiver/runtimerule/rest/GuardrailIntegrationTest.java
index 7eccc122c234..d7a7af0b76ec 100644
--- a/oap-server/server-admin/runtime-rule/src/test/java/org/apache/skywalking/oap/server/receiver/runtimerule/rest/GuardrailIntegrationTest.java
+++ b/oap-server/server-admin/runtime-rule/src/test/java/org/apache/skywalking/oap/server/receiver/runtimerule/rest/GuardrailIntegrationTest.java
@@ -146,8 +146,9 @@ void setUp() {
             .thenReturn(mock(StructuralCommitCoordinator.class));
         when(dslManager.getDslRuntimeDelete())
             .thenReturn(mock(DSLRuntimeDelete.class));
-        // Stub both overloads — the REST handler calls the single-arg form on the
-        // FILTER_ONLY path and the two-arg form (deferCommit=true) on STRUCTURAL.
+        // Stub all overloads — the REST handler calls the single-arg form on the FILTER_ONLY
+        // path, and the three-arg form (deferCommit=true + the orchestrator-owned fence opt) on
+        // STRUCTURAL; the two-arg form is kept stubbed for any legacy caller.
         when(dslManager.applyNowForRuleFile(any())).thenAnswer(inv -> {
             final Object arg = inv.getArgument(0);
             if (arg instanceof RuntimeRuleManagementDAO.RuntimeRuleFile) {
@@ -166,6 +167,15 @@ void setUp() {
             }
             return null;
         });
+        when(dslManager.applyNowForRuleFile(any(), Mockito.anyBoolean(), any())).thenAnswer(inv -> {
+            final Object arg = inv.getArgument(0);
+            if (arg instanceof RuntimeRuleManagementDAO.RuntimeRuleFile) {
+                final RuntimeRuleManagementDAO.RuntimeRuleFile file =
+                    (RuntimeRuleManagementDAO.RuntimeRuleFile) arg;
+                return DSLRuntimeState.running(file.getCatalog(), file.getName(), "h", 0L);
+            }
+            return null;
+        });
 
         handler = new RuntimeRuleRestHandler(moduleManager, dslManager, clusterClient, 30_000L);
     }
@@ -199,7 +209,7 @@ void malScopeChangeAcceptedWithAllowStorageChangeTrue() throws Exception {
         assertHttp(resp, HttpStatus.OK);
         // STRUCTURAL path uses the two-arg overload (deferCommit=true) so row-persist
         // failure can cleanly roll back.
-        verify(dslManager).applyNowForRuleFile(any(), Mockito.eq(true));
+        verify(dslManager).applyNowForRuleFile(any(), Mockito.eq(true), any());
     }
 
     @Test
@@ -212,7 +222,7 @@ void malScopeChangeAcceptedThroughFixRoute() throws Exception {
             HttpData.ofUtf8(INSTANCE_YAML));
 
         assertHttp(resp, HttpStatus.OK);
-        verify(dslManager).applyNowForRuleFile(any(), Mockito.eq(true));
+        verify(dslManager).applyNowForRuleFile(any(), Mockito.eq(true), any());
     }
 
     @Test
@@ -249,7 +259,7 @@ void malAddedMetricNeverTripsGuardrail() throws Exception {
         assertHttp(resp, HttpStatus.OK);
         // Non-empty addedMetrics makes this STRUCTURAL (NEW classification on first apply
         // or STRUCTURAL on update) — goes through the deferred-commit path.
-        verify(dslManager).applyNowForRuleFile(any(), Mockito.eq(true));
+        verify(dslManager).applyNowForRuleFile(any(), Mockito.eq(true), any());
     }
 
     // ---- LAL scenarios --------------------------------------------------------------------
@@ -300,7 +310,7 @@ void lalBodyOnlyEditAccepted() throws Exception {
         assertHttp(resp, HttpStatus.OK);
         // LAL always routes through the STRUCTURAL path (classifyLal reports STRUCTURAL on
         // every content change), so the two-arg overload fires.
-        verify(dslManager).applyNowForRuleFile(any(), Mockito.eq(true));
+        verify(dslManager).applyNowForRuleFile(any(), Mockito.eq(true), any());
     }
 
     @Test
@@ -315,7 +325,7 @@ void lalRuleAddedAcceptedWithAllowStorageChangeTrue() throws Exception {
             HttpData.ofUtf8(twoRules));
 
         assertHttp(resp, HttpStatus.OK);
-        verify(dslManager).applyNowForRuleFile(any(), Mockito.eq(true));
+        verify(dslManager).applyNowForRuleFile(any(), Mockito.eq(true), any());
     }
 
     // ---- helpers --------------------------------------------------------------------------
diff --git a/oap-server/server-admin/runtime-rule/src/test/java/org/apache/skywalking/oap/server/receiver/runtimerule/rest/RuntimeRuleRestHandlerTest.java b/oap-server/server-admin/runtime-rule/src/test/java/org/apache/skywalking/oap/server/receiver/runtimerule/rest/RuntimeRuleRestHandlerTest.java
index ac1fd1880b1c..fcb72b3c97d8 100644
--- a/oap-server/server-admin/runtime-rule/src/test/java/org/apache/skywalking/oap/server/receiver/runtimerule/rest/RuntimeRuleRestHandlerTest.java
+++ b/oap-server/server-admin/runtime-rule/src/test/java/org/apache/skywalking/oap/server/receiver/runtimerule/rest/RuntimeRuleRestHandlerTest.java
@@ -58,6 +58,7 @@
 import org.mockito.Mockito;
 
 import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertTrue;
 import static org.mockito.ArgumentMatchers.any;
 import static org.mockito.Mockito.mock;
 import static org.mockito.Mockito.never;
@@ -183,10 +184,10 @@ void addOrUpdateBypassesNoChangeOnInactiveRow() throws Exception {
 
         // Reactivation pushes through the STRUCTURAL/NEW path — expect 200 with a status
         // other than no_change. We don't assert on the exact applyStatus string here (that
-        // depends on classifier output); the key assertion is that the two-arg deferred-
-        // commit form of applyNowForRuleFile ran (STRUCTURAL path signature).
+        // depends on classifier output); the key assertion is that the three-arg deferred-
+        // commit + fence-opt form of applyNowForRuleFile ran (STRUCTURAL path signature).
         assertHttpStatus(resp, HttpStatus.OK);
-        verify(dslManager).applyNowForRuleFile(any(), Mockito.eq(true));
+        verify(dslManager).applyNowForRuleFile(any(), Mockito.eq(true), any());
     }
 
     @Test
@@ -205,8 +206,39 @@ void fixBypassesNoChangeEvenOnByteIdenticalActiveRow() throws Exception {
         assertHttpStatus(resp, HttpStatus.OK);
         // /addOrUpdate?force=true with byte-identical content → classifier returns
         // NO_CHANGE, handler falls through to applyStructural (not applyFilterOnly since
-        // NO_CHANGE != FILTER_ONLY) which uses the two-arg deferred-commit form.
-        verify(dslManager).applyNowForRuleFile(any(), Mockito.eq(true));
+        // NO_CHANGE != FILTER_ONLY) which uses the three-arg deferred-commit + fence-opt form.
+        verify(dslManager).applyNowForRuleFile(any(), Mockito.eq(true), any());
+    }
+
+    @Test
+    void applyStatusDegradesToDurableRowWhenLiveStatusGone() throws Exception {
+        // No live coordinator status for this (catalog,name) and no apply-id supplied — the query
+        // must fall back to the durable rule row. An ACTIVE row reports APPLIED derived from the
+        // content hash (persist-is-commit), so a page refresh after the applyId is gone still
+        // resolves instead of returning an opaque UNKNOWN.
+        final String yaml = minimalMalYaml();
+        whenDaoHasRow("otel-rules", "vm-statusfallback", yaml, RuntimeRule.STATUS_ACTIVE);
+
+        final HttpResponse resp = handler.applyStatus("otel-rules", "vm-statusfallback", "", "");
+
+        assertHttpStatus(resp, HttpStatus.OK);
+        final String body = resp.aggregate().toCompletableFuture().join().contentUtf8();
+        assertTrue(body.contains("\"found\":true"), body);
+        assertTrue(body.contains("\"phase\":\"APPLIED\""), body);
+        assertTrue(body.contains("\"derivedFrom\":\"durable-dao\""), body);
+    }
+
+    @Test
+    void applyStatusUnknownWhenNoLiveStatusAndNoDurableRow() throws Exception {
+        // Neither a live status nor a durable row for the queried apply → found=false / UNKNOWN.
+        when(dao.getAll()).thenReturn(Arrays.asList());
+
+        final HttpResponse resp = handler.applyStatus("otel-rules", "vm-none", "", "no-such-apply-id");
+
+        assertHttpStatus(resp, HttpStatus.OK);
+        final String body = resp.aggregate().toCompletableFuture().join().contentUtf8();
+        assertTrue(body.contains("\"found\":false"), body);
+        assertTrue(body.contains("UNKNOWN"), body);
     }
 
     @Test
@@ -534,6 +566,8 @@ private void whenReconcilerApplySucceeds(final String catalog, final String name
         // NEW paths use the two-arg form (deferCommit=true).
         when(dslManager.applyNowForRuleFile(any())).thenReturn(state);
         when(dslManager.applyNowForRuleFile(any(), Mockito.anyBoolean())).thenReturn(state);
+        // STRUCTURAL apply now passes the orchestrator-owned fence opt → the three-arg overload.
+        when(dslManager.applyNowForRuleFile(any(), Mockito.anyBoolean(), any())).thenReturn(state);
         // Apply path needs SUSPENDED on localSuspend so the workflow proceeds. The other
         // subsystem getters were stubbed in setUp() with default mocks; here we just
         // override localSuspend to return SUSPENDED instead of the default null.
diff --git a/oap-server/server-admin/runtime-rule/src/test/java/org/apache/skywalking/oap/server/receiver/runtimerule/status/SchemaApplyCoordinatorTest.java b/oap-server/server-admin/runtime-rule/src/test/java/org/apache/skywalking/oap/server/receiver/runtimerule/status/SchemaApplyCoordinatorTest.java
new file mode 100644
index 000000000000..38a16e842401
--- /dev/null
+++ b/oap-server/server-admin/runtime-rule/src/test/java/org/apache/skywalking/oap/server/receiver/runtimerule/status/SchemaApplyCoordinatorTest.java
@@ -0,0 +1,178 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ */
+
+package org.apache.skywalking.oap.server.receiver.runtimerule.status;
+
+import java.util.concurrent.atomic.AtomicLong;
+import org.junit.jupiter.api.Test;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertNotNull;
+import static org.junit.jupiter.api.Assertions.assertNull;
+import static org.junit.jupiter.api.Assertions.assertTrue;
+import static org.junit.jupiter.api.Assertions.assertDoesNotThrow;
+
+class SchemaApplyCoordinatorTest {
+
+    private final AtomicLong clock = new AtomicLong(1_000L);
+
+    private SchemaApplyCoordinator newCoordinator() {
+        return new SchemaApplyCoordinator(clock::get);
+    }
+
+    @Test
+    void beginOpensPendingStatusAndIndexesByApplyIdAndFile() {
+        final SchemaApplyCoordinator coord = newCoordinator();
+        final String applyId = coord.begin("otel-rules", "vm", "hash-1");
+
+        final ApplyStatus byId = coord.get(applyId);
+        assertNotNull(byId, "a just-begun apply must be retrievable by apply-id");
+        assertEquals(ApplyPhase.PENDING, byId.getPhase());
+        assertEquals("otel-rules", byId.getCatalog());
+        assertEquals("vm", byId.getName());
+        assertEquals("hash-1", byId.getContentHash());
+        assertEquals(1_000L, byId.getStartedAtMs());
+        assertNull(byId.getFailureReason(), "a fresh apply has no failure reason");
+
+        final ApplyStatus byFile = coord.getLatestByFile("otel-rules", "vm", null);
+        assertEquals(applyId, byFile.getApplyId(), "content-based lookup must resolve to the latest apply");
+    }
+
+    @Test
+    void transitionsAdvancePhaseAndStampUpdatedAt() {
+        final SchemaApplyCoordinator coord = newCoordinator();
+        final String applyId = coord.begin("otel-rules", "vm", "h");
+        clock.set(2_000L);
+        coord.transition(applyId, ApplyPhase.DDL);
+        clock.set(3_000L);
+        coord.transition(applyId, ApplyPhase.ROLLING_OUT);
+
+        final ApplyStatus s = coord.get(applyId);
+        assertEquals(ApplyPhase.ROLLING_OUT, s.getPhase());
+        assertEquals(1_000L, s.getStartedAtMs(), "startedAt is fixed at begin");
+        assertEquals(3_000L, s.getUpdatedAtMs(), "updatedAt advances with each transition");
+    }
+
+    @Test
+    void markAppliedIsTerminalSuccess() {
+        final SchemaApplyCoordinator coord = newCoordinator();
+        final String applyId = coord.begin("otel-rules", "vm", "h");
+        coord.markApplied(applyId);
+        final ApplyStatus s = coord.get(applyId);
+        assertEquals(ApplyPhase.APPLIED, s.getPhase());
+        assertTrue(s.getPhase().isTerminal());
+        assertNull(s.getFailureReason());
+    }
+
+    @Test
+    void markFailedAndDegradedCarryReasonAndAreTerminal() {
+        final SchemaApplyCoordinator coord = newCoordinator();
+        final String failed = coord.begin("otel-rules", "a", "h");
+        coord.markFailed(failed, "previous schema not found");
+        assertEquals(ApplyPhase.FAILED, coord.get(failed).getPhase());
+        assertEquals("previous schema not found", coord.get(failed).getFailureReason());
+        assertTrue(coord.get(failed).getPhase().isTerminal());
+
+        final String degraded = coord.begin("otel-rules", "b", "h");
+        coord.markDegraded(degraded, "data node lagging: node-3");
+        assertEquals(ApplyPhase.DEGRADED, coord.get(degraded).getPhase());
+        assertEquals("data node lagging: node-3", coord.get(degraded).getFailureReason());
+        assertTrue(coord.get(degraded).getPhase().isTerminal());
+    }
+
+    @Test
+    void forwardTransitionClearsAStaleFailureReason() {
+        // A DEGRADED apply whose background re-check later confirms convergence flips to APPLIED,
+        // and the stale "lagging" reason must not linger.
+        final SchemaApplyCoordinator coord = newCoordinator();
+        final String applyId = coord.begin("otel-rules", "vm", "h");
+        coord.markDegraded(applyId, "node-3 lagging");
+        coord.markApplied(applyId);
+        assertEquals(ApplyPhase.APPLIED, coord.get(applyId).getPhase());
+        assertNull(coord.get(applyId).getFailureReason(),
+            "advancing past DEGRADED must clear the failure reason");
+    }
+
+    @Test
+    void unknownApplyIdReturnsNullAndTransitionsAreNoOps() {
+        final SchemaApplyCoordinator coord = newCoordinator();
+        assertNull(coord.get("does-not-exist"));
+        assertDoesNotThrow(() -> coord.transition("does-not-exist", ApplyPhase.DDL));
+        assertDoesNotThrow(() -> coord.markFailed("does-not-exist", "x"));
+    }
+
+    @Test
+    void getLatestByFileHonorsExpectedContentHash() {
+        final SchemaApplyCoordinator coord = newCoordinator();
+        coord.begin("otel-rules", "vm", "hash-A");
+
+        // Matching hash resolves; a different hash (caller asks about content the latest apply is
+        // NOT for) returns null so the caller doesn't misreport an unrelated apply.
+        assertNotNull(coord.getLatestByFile("otel-rules", "vm", "hash-A"));
+        assertNull(coord.getLatestByFile("otel-rules", "vm", "hash-B"));
+        assertNull(coord.getLatestByFile("otel-rules", "absent", null));
+    }
+
+    @Test
+    void evictExpiredReapsOldEntriesAndClearsTheirFileIndex() {
+        final SchemaApplyCoordinator coord = newCoordinator();
+        clock.set(1_000L);
+        final String old = coord.begin("otel-rules", "old", "h");
+        coord.markApplied(old);
+        clock.set(10_000L);
+        final String fresh = coord.begin("otel-rules", "fresh", "h2");
+
+        // cutoff = now(10_000) - ttl(5_000) = 5_000; 'old' (updatedAt 1_000) is evicted, 'fresh' kept.
+        final int evicted = coord.evictExpired(5_000L);
+
+        assertEquals(1, evicted);
+        assertNull(coord.get(old), "expired apply must be evicted");
+        assertNull(coord.getLatestByFile("otel-rules", "old", null),
+            "the file index entry pointing at an evicted apply must be cleared");
+        assertNotNull(coord.get(fresh), "a fresh apply must survive eviction");
+        assertEquals(1, coord.trackedCount());
+    }
+
+    @Test
+    void evictExpiredKeepsFileIndexWhenANewerApplyReplacedTheEvictedOne() {
+        final SchemaApplyCoordinator coord = newCoordinator();
+        clock.set(1_000L);
+        final String first = coord.begin("otel-rules", "vm", "h1");
+        coord.markApplied(first);
+        clock.set(10_000L);
+        final String second = coord.begin("otel-rules", "vm", "h2");
+
+        coord.evictExpired(5_000L);
+
+        assertNull(coord.get(first), "the older apply for the file is evicted");
+        final ApplyStatus latest = coord.getLatestByFile("otel-rules", "vm", null);
+        assertNotNull(latest, "the file index must still resolve via the newer apply");
+        assertEquals(second, latest.getApplyId());
+    }
+
+    @Test
+    void latestByFileFollowsTheNewestApply() {
+        final SchemaApplyCoordinator coord = newCoordinator();
+        coord.begin("otel-rules", "vm", "hash-old");
+        final String newer = coord.begin("otel-rules", "vm", "hash-new");
+        final ApplyStatus latest = coord.getLatestByFile("otel-rules", "vm", null);
+        assertEquals(newer, latest.getApplyId());
+        assertEquals("hash-new", latest.getContentHash());
+        assertEquals(2, coord.trackedCount(), "both applies remain tracked until eviction");
+    }
+}
diff --git a/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/storage/model/ModelInstaller.java b/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/storage/model/ModelInstaller.java
index d65d7fb75bc1..cbb6602b9928 100644
--- a/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/storage/model/ModelInstaller.java
+++ b/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/storage/model/ModelInstaller.java
@@ -55,11 +55,20 @@ public void whenCreating(Model model, StorageManipulationOpt opt) throws Storage
         // which made the contract a half-truth. Gate ahead of isExists so a peer apply
         // is genuinely zero-RPC.
         if (!flags.isInspectBackend()) {
+            // Local-cache-only (peer reconciler) tick: zero server RPCs, but the local schema
+            // cache MUST still be (re)derived from the declared model — the inspectBackend flag
+            // contract requires exactly this. Without it, a peer holds a live dispatch worker
+            // whose cache entry is either missing (first apply) or STALE (a reshape re-fires
+            // whenCreating with a new shape after StorageModels.remove+add). The read-side
+            // self-heal only fills a MISSING entry, never refreshes a stale one, so the peer
+            // would keep translating writes with the old shape. RPC-free; no-op for backends
+            // without a local schema cache (ES, JDBC).
+            populateLocalCacheOnly(model, opt);
             opt.recordOutcome("table", model.getName(),
                 StorageManipulationOpt.Outcome.SKIPPED_NOT_ALLOWED,
                 "local-cache-only mode; main-node is expected to have installed this resource");
             log.debug(
-                "install: model [{}] not installed; local-cache-only mode — skipping (no isExists probe)",
+                "install: model [{}] not installed; local-cache-only mode — local schema cache refreshed, no isExists probe",
                 model.getName()
             );
             return;
@@ -99,20 +108,39 @@ public void whenCreating(Model model, StorageManipulationOpt opt) throws Storage
         // resource that only this very apply would ever create.
         if (deferDDLToInitNode(opt)) {
             while (true) {
-                InstallInfo info = isExists(model, opt);
-                if (!info.isAllExist()) {
-                    try {
+                boolean allExist;
+                try {
+                    InstallInfo info = isExists(model, opt);
+                    allExist = info.isAllExist();
+                    if (!allExist) {
                         log.info(
                             "install info: {}.table for model: [{}] not all required resources exist. OAP is running in 'no-init' mode, waiting create or update... retry 3s later.",
                             info.buildInstallInfoMsg(), model.getName()
                         );
-                        Thread.sleep(3000L);
-                    } catch (InterruptedException e) {
-                        log.error(e.getMessage());
                     }
-                } else {
+                } catch (final StorageException e) {
+                    if (!isRetryableNoInitProbeFailure(e)) {
+                        throw e;
+                    }
+                    // A transient backend error during the probe (e.g. a BanyanDB cluster data node
+                    // still Init-ing, "client connection is closing") is NOT a reason to abort boot:
+                    // the init OAP will create the resource and the next probe succeeds. Treat it like
+                    // "not present yet" and retry in-loop, rather than letting it escape and crash-loop
+                    // the pod — which would only re-enter this same loop after a full restart.
+                    allExist = false;
+                    log.warn("install info: existence probe for model: [{}] threw a transient backend "
+                        + "error. OAP is running in 'no-init' mode, retry 3s later.", model.getName(), e);
+                }
+                if (allExist) {
                     break;
                 }
+                try {
+                    Thread.sleep(3000L);
+                } catch (InterruptedException e) {
+                    Thread.currentThread().interrupt();
+                    throw new StorageException(
+                        "interrupted while waiting for no-init backend resources for model " + model.getName(), e);
+                }
             }
             return;
         }
@@ -143,12 +171,20 @@ public void whenCreating(Model model, StorageManipulationOpt opt) throws Storage
     @Override
     public void whenRemoving(Model model, StorageManipulationOpt opt) throws StorageException {
         if (!opt.getFlags().isDropOnRemoval()) {
+            // Peer (or boot path that never drops): the backend drop is the main node's job,
+            // but this node must still evict its own local schema-cache entry so a removed
+            // model leaves no stale translation behind in an otherwise insert-only cache.
+            // RPC-free; no-op for backends without a local cache.
+            evictLocalCache(model);
             opt.recordOutcome("table", model.getName(),
                 StorageManipulationOpt.Outcome.SKIPPED_NOT_ALLOWED,
                 "dropOnRemoval flag is off; server drop is main-node responsibility (or boot path that never drops)");
             return;
         }
         dropTable(model, opt);
+        // Evict only after a successful drop — a thrown dropTable leaves the model in the
+        // registry for retry (see StorageModels.remove), so its cache entry must stay too.
+        evictLocalCache(model);
         opt.recordOutcome("table", model.getName(),
             StorageManipulationOpt.Outcome.DROPPED, null);
     }
@@ -170,6 +206,16 @@ protected static boolean deferDDLToInitNode(final StorageManipulationOpt opt) {
         return RunningMode.isNoInitMode() && opt.getFlags().isDeferDDLToInitNode();
     }
 
+    /**
+     * Whether a {@link StorageException} from the no-init defer-loop existence probe is
+     * known to be transient and should be retried in-loop. The base implementation is
+     * conservative so permanent model/config errors do not become an infinite boot wait;
+     * storage backends opt in only for transport-level probe failures they can classify.
+     */
+    protected boolean isRetryableNoInitProbeFailure(final StorageException e) {
+        return false;
+    }
+
     public void start() {
     }
 
@@ -237,6 +283,30 @@ public void dropTable(Model model, StorageManipulationOpt opt) throws StorageExc
         dropTable(model);
     }
 
+    /**
+     * Refresh THIS node's local schema cache for {@code model} from the declared model, with
+     * no server RPC. Called on the local-cache-only path
+     * ({@link StorageManipulationOpt.Flags#isInspectBackend() inspectBackend == false}, i.e.
+     * {@link StorageManipulationOpt#withoutSchemaChange()}), where the cluster main owns
+     * backend DDL and this node only needs an up-to-date entry to translate its own
+     * reads/writes. Backends with a local schema cache (BanyanDB) override to (re)derive and
+     * <strong>overwrite</strong> the entry — overwrite, not fill-if-absent, so a reshape that
+     * re-fires {@link #whenCreating} replaces a now-stale entry instead of leaving the old
+     * shape in place. Default no-op: backends without a local cache (ES, JDBC) have nothing to
+     * refresh.
+     */
+    protected void populateLocalCacheOnly(Model model, StorageManipulationOpt opt) throws StorageException {
+    }
+
+    /**
+     * Drop THIS node's local schema-cache entry for a removed {@code model}, with no server
+     * RPC. Called from {@link #whenRemoving} on every node so a removed model never leaves a
+     * stale translation in an otherwise insert-only cache. Default no-op: backends without a
+     * local schema cache (ES, JDBC) have nothing to evict; BanyanDB overrides.
+     */
+    protected void evictLocalCache(Model model) {
+    }
+
     @Getter
     @Setter
     public abstract static class InstallInfo {
diff --git a/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/storage/model/StorageManipulationOpt.java b/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/storage/model/StorageManipulationOpt.java
index 9b6d8cb04a3e..01f526e53a65 100644
--- a/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/storage/model/StorageManipulationOpt.java
+++ b/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/storage/model/StorageManipulationOpt.java
@@ -18,12 +18,15 @@
 
 package org.apache.skywalking.oap.server.core.storage.model;
 
+import java.time.Duration;
 import java.util.Collections;
 import java.util.List;
 import java.util.concurrent.CopyOnWriteArrayList;
 import java.util.concurrent.atomic.AtomicLong;
 import lombok.Builder;
 import lombok.Getter;
+import lombok.Setter;
+import org.apache.skywalking.oap.server.core.storage.StorageException;
 
 /**
  * Per-call policy + outcome for a storage model manipulation — threaded through the
@@ -307,6 +310,37 @@ public static StorageManipulationOpt withoutSchemaChange() {
         return new StorageManipulationOpt(Mode.WITHOUT_SCHEMA_CHANGE);
     }
 
+    /**
+     * {@link Mode#WITH_SCHEMA_CHANGE} but with the post-install schema fence DEFERRED. The
+     * installer records each resource's {@code mod_revision} without fencing, then registers a
+     * single flush via {@link #setDeferredFence(DeferredFence)}; the caller runs that flush ONCE
+     * with {@link #runDeferredFence()} after the whole apply (e.g. a multi-rule file) so the
+     * bundle waits on one barrier instead of one fence per metric/downsampling. All flags are
+     * identical to {@link #withSchemaChange()} — only the create/update fence is batched; drops
+     * still fence inline.
+     */
+    public static StorageManipulationOpt withSchemaChangeDeferredFence() {
+        final StorageManipulationOpt opt = new StorageManipulationOpt(Mode.WITH_SCHEMA_CHANGE);
+        opt.deferFence = true;
+        return opt;
+    }
+
+    /**
+     * {@link #withSchemaChangeDeferredFence()} with an explicit batched-fence timeout. Used by the
+     * runtime-rule operator apply, which fences on a generous cluster-propagation budget (default
+     * 3 min, configurable) instead of the installer's short inline default — the apply is async +
+     * progress-queryable, so a long single wait is affordable. The inline/static/delete fence paths
+     * (which never set a timeout here) keep the installer's short constant.
+     *
+     * @param timeout the batched-fence wait; passed to the backend via {@link #getFenceTimeoutMs()}.
+     * @return a deferred-fence opt carrying {@code timeout}.
+     */
+    public static StorageManipulationOpt withSchemaChangeDeferredFence(final Duration timeout) {
+        final StorageManipulationOpt opt = withSchemaChangeDeferredFence();
+        opt.fenceTimeoutMs = timeout == null ? 0L : timeout.toMillis();
+        return opt;
+    }
+
     /**
      * True for {@link Mode#WITH_SCHEMA_CHANGE}. The on-demand operator workflow — drops,
      * updates, and reshapes are permitted because the caller explicitly asked for them.
@@ -380,6 +414,115 @@ public long getMaxModRevision() {
         return maxModRevision.get();
     }
 
+    /**
+     * A storage-backend schema fence whose execution is deferred to the end of a batched
+     * apply. The backend installer (e.g. BanyanDB) registers one on a
+     * {@link #withSchemaChangeDeferredFence()} opt instead of fencing per resource; the apply
+     * orchestration runs it once via {@link #runDeferredFence()}. Implemented as a closure in
+     * the storage plugin so core stays backend-agnostic (same pattern as the local-cache
+     * populator). A timeout inside the fence is a non-fatal WARN; only a barrier transport
+     * error surfaces as {@link StorageException}.
+     */
+    @FunctionalInterface
+    public interface DeferredFence {
+        void await() throws StorageException;
+    }
+
+    /**
+     * True only for {@link #withSchemaChangeDeferredFence()}. The installer reads this to skip
+     * the per-resource create/update fence and register a single {@link DeferredFence} instead.
+     */
+    @Getter
+    private boolean deferFence = false;
+
+    private volatile DeferredFence deferredFence;
+
+    /**
+     * Batched-fence wait in millis, or {@code 0} (the default) meaning "use the backend installer's
+     * own short constant". Set only by {@link #withSchemaChangeDeferredFence(Duration)} on the
+     * runtime-rule operator path; the backend reads it when running the deferred fence so the
+     * inline/static/delete paths (which leave it {@code 0}) keep the short timeout.
+     */
+    @Getter
+    private long fenceTimeoutMs = 0L;
+
+    /**
+     * True when the CALLER (the apply orchestrator) runs {@link #runDeferredFence()} itself —
+     * typically on a background thread after the durable commit — rather than the backend installer
+     * running it inline at the end of the apply. The runtime-rule REST apply sets this so the long
+     * (3-min) fence does not block the apply / hold peers suspended; the reconciler tick leaves it
+     * {@code false} so the installer keeps fencing inline with the short timeout.
+     */
+    @Getter
+    @Setter
+    private boolean fenceRunByCaller = false;
+
+    /**
+     * Outcome of a deferred fence, recorded by the backend so the orchestrator can mark
+     * {@code APPLIED} vs {@code DEGRADED}-with-laggards after {@link #runDeferredFence()} returns.
+     */
+    public static final class FenceOutcome {
+        @Getter
+        private final boolean applied;
+        @Getter
+        private final List<String> laggardNodeIds;
+
+        public FenceOutcome(final boolean applied, final List<String> laggardNodeIds) {
+            this.applied = applied;
+            this.laggardNodeIds = laggardNodeIds == null
+                ? Collections.emptyList()
+                : Collections.unmodifiableList(laggardNodeIds);
+        }
+    }
+
+    /** Recorded by the backend during {@link #runDeferredFence()}; read by the orchestrator after.
+     *  Null when no deferred fence ran (no DDL) or the backend records no outcome. */
+    @Getter
+    @Setter
+    private volatile FenceOutcome fenceOutcome;
+
+    /**
+     * Register the single fence to run after the batched apply completes. Idempotent — the
+     * installer may call it once per resource; the latest (equivalent) closure wins. No-op
+     * carrier for backends without a revision concept (they never call it).
+     */
+    public void setDeferredFence(final DeferredFence fence) {
+        this.deferredFence = fence;
+    }
+
+    /**
+     * Run the registered {@link DeferredFence} once, if any. Called by the apply orchestration
+     * after all DDL for the batch is fired. No-op when nothing was registered (peer/no-change
+     * applies, or non-BanyanDB backends).
+     *
+     * <p><strong>One-shot.</strong> A single reconciler tick reuses ONE opt across every rule
+     * file ({@code RuleSync#runOnce}), calling this once per file. The closure + accumulated
+     * {@link #maxModRevision} are <strong>always</strong> reset (even when this file performed no
+     * DDL and registered no closure), so the next file neither re-runs this file's stale fence nor
+     * inherits this file's revision — each file fences on its own DDL only. (Drop revisions that a
+     * later commit-tail records on a shared opt are inline-fenced at drop time and benign here: the
+     * next file's own create revision is monotonically higher, so it dominates the fence.) The reset
+     * is in a {@code finally} so a fence transport failure still isolates the next file; the closure
+     * reads {@link #getMaxModRevision()} during {@code await()}, so it is reset only after.
+     *
+     * <p>{@link #fenceOutcome} is cleared <em>before</em> the fence runs (so a shared tick opt
+     * starts each file clean) and the backend sets it <em>during</em> the run; it is intentionally
+     * NOT cleared afterward so the caller can read it (the runtime-rule orchestrator reads it to
+     * decide {@code APPLIED} vs {@code DEGRADED}).
+     */
+    public void runDeferredFence() throws StorageException {
+        final DeferredFence fence = this.deferredFence;
+        this.deferredFence = null;
+        this.fenceOutcome = null;
+        try {
+            if (fence != null) {
+                fence.await();
+            }
+        } finally {
+            maxModRevision.set(0L);
+        }
+    }
+
     /**
      * Append a per-resource outcome. Called by the installer as it examines each
      * underlying storage resource.
diff --git a/oap-server/server-core/src/test/java/org/apache/skywalking/oap/server/core/storage/model/ModelInstallerNoInitTest.java b/oap-server/server-core/src/test/java/org/apache/skywalking/oap/server/core/storage/model/ModelInstallerNoInitTest.java
index d9cda58cd7ae..8e91b28f0470 100644
--- a/oap-server/server-core/src/test/java/org/apache/skywalking/oap/server/core/storage/model/ModelInstallerNoInitTest.java
+++ b/oap-server/server-core/src/test/java/org/apache/skywalking/oap/server/core/storage/model/ModelInstallerNoInitTest.java
@@ -26,6 +26,7 @@
 
 import static org.junit.jupiter.api.Assertions.assertEquals;
 import static org.junit.jupiter.api.Assertions.assertFalse;
+import static org.junit.jupiter.api.Assertions.assertThrows;
 import static org.junit.jupiter.api.Assertions.assertTimeoutPreemptively;
 import static org.junit.jupiter.api.Assertions.assertTrue;
 import static org.mockito.Mockito.mock;
@@ -102,29 +103,177 @@ void withSchemaChangeSkipsCreateWhenResourceAlreadyExists() throws StorageExcept
             "withSchemaChange must not re-create a resource that already exists");
     }
 
+    @Test
+    void noInitDeferLoopRetriesTransientProbeErrorInsteadOfCrashing() {
+        RunningMode.setMode("no-init");
+        // The first existence probe throws a transient StorageException (mimicking a BanyanDB
+        // cluster data node still Init-ing); the next probe reports the resource present.
+        final RecordingInstaller installer = new RecordingInstaller(true /* present after transient */,
+            1 /* one transient probe failure */, true /* retryable probe failure */);
+        final Model model = mock(Model.class);
+        when(model.getName()).thenReturn("static_metric_transient");
+
+        // Must NOT propagate the transient (which would escape whenCreating and crash-loop the pod);
+        // must retry in-loop, then return on the defer path without creating. 10s covers the 3s sleep.
+        assertTimeoutPreemptively(Duration.ofSeconds(10), () ->
+            installer.whenCreating(model, StorageManipulationOpt.schemaCreateIfAbsent()));
+        assertEquals(0, installer.createTableCalls,
+            "a transient probe error must be retried, then defer to the init node without creating");
+        assertTrue(installer.probeCalls >= 2,
+            "the loop must probe again after the transient instead of escaping on the first throw");
+    }
+
+    @Test
+    void noInitDeferLoopPropagatesNonRetryableProbeError() {
+        RunningMode.setMode("no-init");
+        final RecordingInstaller installer = new RecordingInstaller(true /* unused */,
+            1 /* one probe failure */, false /* permanent/non-retryable */);
+        final Model model = mock(Model.class);
+        when(model.getName()).thenReturn("static_metric_bad_model");
+
+        assertThrows(StorageException.class,
+            () -> installer.whenCreating(model, StorageManipulationOpt.schemaCreateIfAbsent()),
+            "permanent model/config probe failures must not be converted into an infinite no-init wait");
+        assertEquals(1, installer.probeCalls,
+            "a non-retryable failure must escape without sleeping and probing again");
+        assertEquals(0, installer.createTableCalls);
+    }
+
+    @Test
+    void noInitDeferLoopPropagatesInterruptedSleep() {
+        RunningMode.setMode("no-init");
+        final RecordingInstaller installer = new RecordingInstaller(false /* resource absent */);
+        final Model model = mock(Model.class);
+        when(model.getName()).thenReturn("static_metric_wait_interrupted");
+
+        Thread.currentThread().interrupt();
+        try {
+            assertThrows(StorageException.class,
+                () -> installer.whenCreating(model, StorageManipulationOpt.schemaCreateIfAbsent()),
+                "an interrupted no-init wait must fail fast so shutdown can proceed");
+            assertTrue(Thread.currentThread().isInterrupted(),
+                "the interrupt flag must be restored for upstream shutdown handling");
+        } finally {
+            Thread.interrupted();
+        }
+    }
+
+    @Test
+    void withoutSchemaChangePopulatesLocalCacheAndIssuesNoBackendRpc() throws StorageException {
+        // Peer reconciler tick (inspectBackend=false): the installer must refresh the local
+        // schema cache (so a reshape re-add overwrites a now-stale entry) WITHOUT any backend
+        // existence probe or create. This is the C-1 fix for the worker-without-cache /
+        // stale-cache desync.
+        final RecordingInstaller installer = new RecordingInstaller(false /* unused */);
+        final Model model = mock(Model.class);
+        when(model.getName()).thenReturn("runtime_metric_peer");
+
+        installer.whenCreating(model, StorageManipulationOpt.withoutSchemaChange());
+
+        assertEquals(1, installer.populateLocalCacheCalls,
+            "a withoutSchemaChange (peer) apply must refresh the local schema cache");
+        assertEquals(0, installer.probeCalls,
+            "a withoutSchemaChange (peer) apply must issue zero backend existence probes");
+        assertEquals(0, installer.createTableCalls,
+            "a withoutSchemaChange (peer) apply must never create backend resources");
+    }
+
+    @Test
+    void whenRemovingPeerEvictsLocalCacheWithoutDroppingBackend() throws StorageException {
+        // Peer remove (dropOnRemoval=false): the backend drop is the main's job, but the peer
+        // must still evict its own cache entry so a removed model leaves no stale translation.
+        final RecordingInstaller installer = new RecordingInstaller(true /* unused */);
+        final Model model = mock(Model.class);
+        when(model.getName()).thenReturn("runtime_metric_remove_peer");
+
+        installer.whenRemoving(model, StorageManipulationOpt.withoutSchemaChange());
+
+        assertEquals(1, installer.evictLocalCacheCalls,
+            "a peer remove must evict the local schema cache entry");
+        assertEquals(0, installer.dropTableCalls,
+            "a peer remove (dropOnRemoval off) must not drop backend resources");
+    }
+
+    @Test
+    void whenRemovingMainDropsBackendThenEvictsLocalCache() throws StorageException {
+        // Main remove (withSchemaChange, dropOnRemoval=true): drop the backend AND evict the
+        // local cache so the insert-only registry does not keep a tombstoned model's entry.
+        final RecordingInstaller installer = new RecordingInstaller(true /* unused */);
+        final Model model = mock(Model.class);
+        when(model.getName()).thenReturn("runtime_metric_remove_main");
+
+        installer.whenRemoving(model, StorageManipulationOpt.withSchemaChange());
+
+        assertEquals(1, installer.dropTableCalls,
+            "a main remove must drop the backend resource");
+        assertEquals(1, installer.evictLocalCacheCalls,
+            "a main remove must evict the local schema cache entry after the drop");
+    }
+
     /** Minimal concrete {@link ModelInstaller} that records createTable calls and reports a
      *  fixed existence result, so the base whenCreating branching can be exercised without a
-     *  real storage backend. */
+     *  real storage backend. Optionally throws a transient {@link StorageException} on the first
+     *  {@code transientProbeFailures} existence probes to exercise the no-init defer-loop retry. */
     private static final class RecordingInstaller extends ModelInstaller {
         private final boolean resourcePresent;
+        private final int transientProbeFailures;
+        private final boolean retryableProbeFailure;
+        private int probeCalls;
         private int createTableCalls;
+        private int populateLocalCacheCalls;
+        private int evictLocalCacheCalls;
+        private int dropTableCalls;
 
         private RecordingInstaller(final boolean resourcePresent) {
+            this(resourcePresent, 0, false);
+        }
+
+        private RecordingInstaller(final boolean resourcePresent, final int transientProbeFailures) {
+            this(resourcePresent, transientProbeFailures, true);
+        }
+
+        private RecordingInstaller(final boolean resourcePresent, final int transientProbeFailures,
+                                   final boolean retryableProbeFailure) {
             super(null, null);
             this.resourcePresent = resourcePresent;
+            this.transientProbeFailures = transientProbeFailures;
+            this.retryableProbeFailure = retryableProbeFailure;
         }
 
         @Override
-        public InstallInfo isExists(final Model model, final StorageManipulationOpt opt) {
+        public InstallInfo isExists(final Model model, final StorageManipulationOpt opt) throws StorageException {
+            if (probeCalls++ < transientProbeFailures) {
+                throw new StorageException("transient backend error");
+            }
             final TestInstallInfo info = new TestInstallInfo(model);
             info.setAllExist(resourcePresent);
             return info;
         }
 
+        @Override
+        protected boolean isRetryableNoInitProbeFailure(final StorageException e) {
+            return retryableProbeFailure;
+        }
+
         @Override
         public void createTable(final Model model) {
             createTableCalls++;
         }
+
+        @Override
+        public void dropTable(final Model model) {
+            dropTableCalls++;
+        }
+
+        @Override
+        protected void populateLocalCacheOnly(final Model model, final StorageManipulationOpt opt) {
+            populateLocalCacheCalls++;
+        }
+
+        @Override
+        protected void evictLocalCache(final Model model) {
+            evictLocalCacheCalls++;
+        }
     }
 
     private static final class TestInstallInfo extends ModelInstaller.InstallInfo {
diff --git a/oap-server/server-core/src/test/java/org/apache/skywalking/oap/server/core/storage/model/StorageManipulationOptTest.java b/oap-server/server-core/src/test/java/org/apache/skywalking/oap/server/core/storage/model/StorageManipulationOptTest.java
new file mode 100644
index 000000000000..a6fd54be1474
--- /dev/null
+++ b/oap-server/server-core/src/test/java/org/apache/skywalking/oap/server/core/storage/model/StorageManipulationOptTest.java
@@ -0,0 +1,161 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ */
+
+package org.apache.skywalking.oap.server.core.storage.model;
+
+import java.util.concurrent.atomic.AtomicInteger;
+import org.apache.skywalking.oap.server.core.storage.StorageException;
+import org.junit.jupiter.api.Test;
+
+import static org.junit.jupiter.api.Assertions.assertDoesNotThrow;
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertFalse;
+import static org.junit.jupiter.api.Assertions.assertThrows;
+import static org.junit.jupiter.api.Assertions.assertTrue;
+
+/**
+ * Unit coverage for the batched-fence machinery on {@link StorageManipulationOpt}: a
+ * {@link StorageManipulationOpt#withSchemaChangeDeferredFence()} opt carries the same flags as
+ * {@link StorageManipulationOpt#withSchemaChange()} but lets the installer register a single
+ * {@link StorageManipulationOpt.DeferredFence} that the apply orchestration runs ONCE instead of
+ * one fence per metric/downsampling.
+ */
+class StorageManipulationOptTest {
+
+    @Test
+    void deferredFenceOptHasSameFlagsAsWithSchemaChange() {
+        final StorageManipulationOpt deferred = StorageManipulationOpt.withSchemaChangeDeferredFence();
+        final StorageManipulationOpt plain = StorageManipulationOpt.withSchemaChange();
+        // Behaviour must be identical except for the batching toggle — same mode/flags so every
+        // create/update/drop privilege the installer checks is unchanged.
+        assertEquals(plain.getMode(), deferred.getMode(),
+            "deferred-fence opt must keep the WITH_SCHEMA_CHANGE mode");
+        assertTrue(deferred.isWithSchemaChange(),
+            "deferred-fence opt must still report withSchemaChange semantics");
+        assertTrue(deferred.isDeferFence(),
+            "deferred-fence opt must flag deferFence");
+        assertFalse(plain.isDeferFence(),
+            "the plain withSchemaChange opt must NOT defer the fence");
+        assertFalse(StorageManipulationOpt.withoutSchemaChange().isDeferFence());
+    }
+
+    @Test
+    void runDeferredFenceInvokesRegisteredFenceOnce() throws StorageException {
+        final StorageManipulationOpt opt = StorageManipulationOpt.withSchemaChangeDeferredFence();
+        final AtomicInteger calls = new AtomicInteger();
+        opt.setDeferredFence(calls::incrementAndGet);
+
+        opt.runDeferredFence();
+
+        assertEquals(1, calls.get(), "the registered fence must run exactly once on flush");
+    }
+
+    @Test
+    void runDeferredFenceIsNoOpWhenNothingRegistered() {
+        // Peer / no-change applies (and non-BanyanDB backends) never register a fence; flushing
+        // must be a safe no-op, not an NPE.
+        assertDoesNotThrow(() -> StorageManipulationOpt.withSchemaChangeDeferredFence().runDeferredFence());
+        assertDoesNotThrow(() -> StorageManipulationOpt.withoutSchemaChange().runDeferredFence());
+    }
+
+    @Test
+    void runDeferredFencePropagatesStorageException() {
+        final StorageManipulationOpt opt = StorageManipulationOpt.withSchemaChangeDeferredFence();
+        opt.setDeferredFence(() -> {
+            throw new StorageException("barrier transport error");
+        });
+        // A barrier transport error must surface so the apply aborts exactly like an inline fence.
+        assertThrows(StorageException.class, opt::runDeferredFence);
+    }
+
+    @Test
+    void runDeferredFenceIsOneShotAcrossFiles() throws StorageException {
+        // A reconciler tick reuses ONE opt across every rule file. After a file flushes its
+        // fence, a later file that performed no DDL (registers no new closure) must NOT re-run
+        // the earlier file's stale fence.
+        final StorageManipulationOpt opt = StorageManipulationOpt.withSchemaChangeDeferredFence();
+        final AtomicInteger calls = new AtomicInteger();
+        opt.setDeferredFence(calls::incrementAndGet);
+
+        opt.runDeferredFence();
+        opt.runDeferredFence();
+
+        assertEquals(1, calls.get(), "the fence must run once and be cleared, not re-run for the next file");
+    }
+
+    @Test
+    void runDeferredFenceResetsRevisionAfterAwait() throws StorageException {
+        // The closure must observe this file's accumulated revision DURING await, then the opt
+        // resets so the next file fences on its own DDL only — not the cumulative max.
+        final StorageManipulationOpt opt = StorageManipulationOpt.withSchemaChangeDeferredFence();
+        opt.recordModRevision(42L);
+        final AtomicInteger seenDuringAwait = new AtomicInteger();
+        opt.setDeferredFence(() -> seenDuringAwait.set((int) opt.getMaxModRevision()));
+
+        opt.runDeferredFence();
+
+        assertEquals(42L, seenDuringAwait.get(), "the fence must see the recorded revision during await");
+        assertEquals(StorageManipulationOpt.DEFAULT_MOD_REVISION, opt.getMaxModRevision(),
+            "the revision must reset after the fence so a later file is not over-fenced");
+    }
+
+    @Test
+    void runDeferredFenceResetsRevisionEvenWithNoClosureRegistered() {
+        // A no-DDL file on a shared tick opt registers no closure, but a prior file (or its
+        // commit-tail drops) may have left a revision on the opt. runDeferredFence must still clear
+        // it so the next file is not over-fenced on a stale revision.
+        final StorageManipulationOpt opt = StorageManipulationOpt.withSchemaChangeDeferredFence();
+        opt.recordModRevision(77L);
+
+        assertDoesNotThrow(opt::runDeferredFence);
+
+        assertEquals(StorageManipulationOpt.DEFAULT_MOD_REVISION, opt.getMaxModRevision(),
+            "a no-closure flush must still reset the accumulated revision (shared-tick isolation)");
+    }
+
+    @Test
+    void runDeferredFenceResetsRevisionEvenWhenFenceThrows() {
+        // A barrier transport failure on one file must not leave a stale revision that the next
+        // file would inherit; the reset runs in finally.
+        final StorageManipulationOpt opt = StorageManipulationOpt.withSchemaChangeDeferredFence();
+        opt.recordModRevision(99L);
+        opt.setDeferredFence(() -> {
+            throw new StorageException("barrier transport error");
+        });
+
+        assertThrows(StorageException.class, opt::runDeferredFence);
+        assertEquals(StorageManipulationOpt.DEFAULT_MOD_REVISION, opt.getMaxModRevision(),
+            "the revision must reset even when the fence throws");
+    }
+
+    @Test
+    void laterSetDeferredFenceWins() throws StorageException {
+        // The installer may register the closure once per resource; the latest (equivalent) one
+        // wins and still runs a single time.
+        final StorageManipulationOpt opt = StorageManipulationOpt.withSchemaChangeDeferredFence();
+        final AtomicInteger first = new AtomicInteger();
+        final AtomicInteger second = new AtomicInteger();
+        opt.setDeferredFence(first::incrementAndGet);
+        opt.setDeferredFence(second::incrementAndGet);
+
+        opt.runDeferredFence();
+
+        assertEquals(0, first.get(), "an overwritten fence must not run");
+        assertEquals(1, second.get(), "the latest registered fence runs once");
+    }
+}
diff --git a/oap-server/server-starter/src/main/resources/application.yml b/oap-server/server-starter/src/main/resources/application.yml
index ec2d3d7bae4f..e02938cc8e60 100644
--- a/oap-server/server-starter/src/main/resources/application.yml
+++ b/oap-server/server-starter/src/main/resources/application.yml
@@ -718,6 +718,13 @@ receiver-runtime-rule:
     refreshRulesPeriod: ${SW_RECEIVER_RUNTIME_RULE_REFRESH_RULES_PERIOD:30}
     # SUSPENDED state self-heal threshold (seconds).
     selfHealThresholdSeconds: ${SW_RECEIVER_RUNTIME_RULE_SELF_HEAL_THRESHOLD_SECONDS:60}
+    # Timeout (seconds) for the runtime-rule deferred BanyanDB schema fence on the operator REST
+    # apply: the background wait for every data node to apply a new measure's schema revision. The
+    # apply returns immediately at FENCING (accepted, not yet durable); the rule row is persisted
+    # only after the fence confirms, then dispatch resumes. Poll GET /runtime/rule/status for
+    # fencing -> rolling-out -> applied (or degraded/failed). Inline/static/delete fences keep the
+    # short 2s constant.
+    deferredFenceTimeoutSeconds: ${SW_RECEIVER_RUNTIME_RULE_DEFERRED_FENCE_TIMEOUT_SECONDS:180}
 
 # DSL Debug API — sampling debugger for MAL / LAL / OAL rules. Read-only OAL
 # listing under /runtime/oal/*; session control plane under /dsl-debugging/*.
diff --git a/oap-server/server-starter/src/main/resources/otel-rules/banyandb/banyandb-instance.yaml b/oap-server/server-starter/src/main/resources/otel-rules/banyandb/banyandb-instance.yaml
index def21fc3086d..add9cdc45522 100644
--- a/oap-server/server-starter/src/main/resources/otel-rules/banyandb/banyandb-instance.yaml
+++ b/oap-server/server-starter/src/main/resources/otel-rules/banyandb/banyandb-instance.yaml
@@ -52,7 +52,7 @@ metricsRules:
     exp: banyandb_system_up_time
   # CPU usage (cores). process_* rides on every container including lifecycle.
   - name: cpu_usage
-    exp: process_cpu_seconds_total.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT15S')
+    exp: process_cpu_seconds_total.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT1M')
   # resident memory (bytes). Raw gauge, present on all containers.
   - name: rss_memory
     exp: process_resident_memory_bytes
@@ -75,45 +75,45 @@ metricsRules:
     exp: banyandb_system_disk.tagEqual('kind','used_percent').avg(['cluster','pod_name','container_name','node_role','node_type','path']) * 100
   # network throughput (bytes/s) by interface name.
   - name: network_recv
-    exp: banyandb_system_net_state.tagEqual('kind','bytes_recv').sum(['cluster','pod_name','container_name','node_role','node_type','name']).rate('PT15S')
+    exp: banyandb_system_net_state.tagEqual('kind','bytes_recv').sum(['cluster','pod_name','container_name','node_role','node_type','name']).rate('PT1M')
   - name: network_sent
-    exp: banyandb_system_net_state.tagEqual('kind','bytes_sent').sum(['cluster','pod_name','container_name','node_role','node_type','name']).rate('PT15S')
+    exp: banyandb_system_net_state.tagEqual('kind','bytes_sent').sum(['cluster','pod_name','container_name','node_role','node_type','name']).rate('PT1M')
   # Go runtime.
   - name: goroutines
     exp: go_goroutines
   # average GC pause (s) = rate(Σpause) / rate(Σcount). go_gc_duration_seconds is a summary (no buckets),
   # so this ratio of _sum/_count is the only valid average — do not apply histogram_percentile to it.
   - name: gc_pause_avg
-    exp: go_gc_duration_seconds_sum.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT15S').safeDiv(go_gc_duration_seconds_count.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT15S'))
+    exp: go_gc_duration_seconds_sum.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT1M').safeDiv(go_gc_duration_seconds_count.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT1M'))
   - name: heap_inuse
     exp: go_memstats_heap_inuse_bytes
   - name: heap_next_gc
     exp: go_memstats_next_gc_bytes
   - name: alloc_rate
-    exp: go_memstats_alloc_bytes_total.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT15S')
+    exp: go_memstats_alloc_bytes_total.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT1M')
 
   # ---- Liaison only (front door; the UI gates these on container_name == liaison) ----
   # query rate (req/s) by data-model service (measure/stream/trace/property). method literal is "query".
   - name: liaison_query_rate
-    exp: banyandb_liaison_grpc_total_started.tagEqual('method','query').sum(['cluster','pod_name','container_name','node_role','node_type','service']).rate('PT15S')
+    exp: banyandb_liaison_grpc_total_started.tagEqual('method','query').sum(['cluster','pod_name','container_name','node_role','node_type','service']).rate('PT1M')
   # gRPC errors/min. Three liaison-side error families (mirrors the Grafana "gRPC Error Rate" panel,
   # which sums total_err + registry_err + stream_msg_received_err). All lazily registered -> empty on a
   # healthy cluster; each pre-aggregated to the same label set before '+'.
   - name: liaison_grpc_error_rate
-    exp: (banyandb_liaison_grpc_total_err.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT15S') + banyandb_liaison_grpc_total_registry_err.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT15S') + banyandb_liaison_grpc_total_stream_msg_received_err.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT15S')) * 60
+    exp: (banyandb_liaison_grpc_total_err.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT1M') + banyandb_liaison_grpc_total_registry_err.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT1M') + banyandb_liaison_grpc_total_stream_msg_received_err.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT1M')) * 60
   # registry operation rate (req/s): schema registry ops on the liaison front door. total_started is
   # query-only on the wire, so the former tagNotEqual('method','query') term was empty and is dropped;
   # registry_started carries the non-query op count.
   - name: liaison_registry_op_rate
-    exp: banyandb_liaison_grpc_total_registry_started.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT15S')
+    exp: banyandb_liaison_grpc_total_registry_started.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT1M')
   # write rate (writes/s) seen at the liaison front door. group label dropped (instance-level total).
   - name: liaison_write_rate
-    exp: banyandb_measure_total_written.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT15S') + banyandb_stream_tst_total_written.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT15S') + banyandb_trace_tst_total_written.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT15S')
+    exp: banyandb_measure_total_written.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT1M') + banyandb_stream_tst_total_written.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT1M') + banyandb_trace_tst_total_written.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT1M')
   # tier-2 publish pipeline (liaison -> data): throughput by operation, bytes/s, and p99 send latency.
   - name: liaison_publish_throughput
-    exp: banyandb_queue_pub_total_finished.sum(['cluster','pod_name','container_name','node_role','node_type','operation']).rate('PT15S')
+    exp: banyandb_queue_pub_total_finished.sum(['cluster','pod_name','container_name','node_role','node_type','operation']).rate('PT1M')
   - name: liaison_publish_bytes
-    exp: banyandb_queue_pub_sent_bytes.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT15S')
+    exp: banyandb_queue_pub_sent_bytes.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT1M')
   - name: liaison_publish_latency_p99
     exp: banyandb_queue_pub_total_latency.sum(['cluster','pod_name','container_name','node_role','node_type','operation','le']).histogram().histogram_percentile([99])
   # tier-2 publish, batch granularity (BanyanDB #1169): batches published/s by operation and the batch
@@ -121,7 +121,7 @@ metricsRules:
   # BUILD-GATED: _batch_finished/_batch_latency are absent on current builds -> emit nothing until the
   # shipped BanyanDB build registers them.
   - name: liaison_publish_batch_throughput
-    exp: banyandb_queue_pub_total_batch_finished.sum(['cluster','pod_name','container_name','node_role','node_type','operation']).rate('PT15S')
+    exp: banyandb_queue_pub_total_batch_finished.sum(['cluster','pod_name','container_name','node_role','node_type','operation']).rate('PT1M')
   - name: liaison_publish_batch_latency_p99
     exp: banyandb_queue_pub_total_batch_latency.sum(['cluster','pod_name','container_name','node_role','node_type','operation','le']).histogram().histogram_percentile([99])
   # liaison write-queue (wqueue) depth: pending records buffered at the front door before publish.
@@ -144,34 +144,34 @@ metricsRules:
     exp: banyandb_measure_pending_data_count.sum(['cluster','pod_name','container_name','node_role','node_type']) + banyandb_stream_tst_pending_data_count.sum(['cluster','pod_name','container_name','node_role','node_type']) + banyandb_trace_tst_pending_data_count.sum(['cluster','pod_name','container_name','node_role','node_type'])
   # merge-loop iterations/s.
   - name: data_merge_file_rate
-    exp: banyandb_measure_total_merge_loop_started.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT15S') + banyandb_stream_tst_total_merge_loop_started.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT15S') + banyandb_trace_tst_total_merge_loop_started.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT15S')
+    exp: banyandb_measure_total_merge_loop_started.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT1M') + banyandb_stream_tst_total_merge_loop_started.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT1M') + banyandb_trace_tst_total_merge_loop_started.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT1M')
   # avg parts merged per merge loop on the file path (matches Grafana = rate(merged_parts{type=file}) /
   # rate(merge_loop_started)). type='file' is data-only on the wire (liaison emits only type='mem').
   - name: data_merge_file_partitions
-    exp: banyandb_measure_total_merged_parts.tagEqual('type','file').sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT15S').safeDiv(banyandb_measure_total_merge_loop_started.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT15S')) + banyandb_stream_tst_total_merged_parts.tagEqual('type','file').sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT15S').safeDiv(banyandb_stream_tst_total_merge_loop_started.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT15S')) + banyandb_trace_tst_total_merged_parts.tagEqual('type','file').sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT15S').safeDiv(banyandb_trace_tst_total_merge_loop_started.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT15S'))
+    exp: banyandb_measure_total_merged_parts.tagEqual('type','file').sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT1M').safeDiv(banyandb_measure_total_merge_loop_started.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT1M')) + banyandb_stream_tst_total_merged_parts.tagEqual('type','file').sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT1M').safeDiv(banyandb_stream_tst_total_merge_loop_started.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT1M')) + banyandb_trace_tst_total_merged_parts.tagEqual('type','file').sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT1M').safeDiv(banyandb_trace_tst_total_merge_loop_started.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT1M'))
   # avg file-merge latency (ms) per merge loop.
   - name: data_merge_file_latency
-    exp: (banyandb_measure_total_merge_latency.tagEqual('type','file').sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT15S').safeDiv(banyandb_measure_total_merge_loop_started.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT15S')) + banyandb_stream_tst_total_merge_latency.tagEqual('type','file').sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT15S').safeDiv(banyandb_stream_tst_total_merge_loop_started.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT15S')) + banyandb_trace_tst_total_merge_latency.tagEqual('type','file').sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT15S').safeDiv(banyandb_trace_tst_total_merge_loop_started.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT15S'))) * 1000
+    exp: (banyandb_measure_total_merge_latency.tagEqual('type','file').sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT1M').safeDiv(banyandb_measure_total_merge_loop_started.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT1M')) + banyandb_stream_tst_total_merge_latency.tagEqual('type','file').sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT1M').safeDiv(banyandb_stream_tst_total_merge_loop_started.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT1M')) + banyandb_trace_tst_total_merge_latency.tagEqual('type','file').sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT1M').safeDiv(banyandb_trace_tst_total_merge_loop_started.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT1M'))) * 1000
   # inverted-index (series) write rate / term-search rate / total docs. *_inverted_index_total_* are
   # # TYPE=gauge but cumulative, so rate() yields a per-window delta. Stream's series index is the
   # storage scope (stream_storage_*); the tst scope is reported separately below. Trace's series index
   # (trace_storage_*) is included so trace series writes/docs are not silently dropped.
   - name: data_series_write_rate
-    exp: banyandb_measure_inverted_index_total_updates.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT15S') + banyandb_stream_storage_inverted_index_total_updates.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT15S') + banyandb_trace_storage_inverted_index_total_updates.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT15S')
+    exp: banyandb_measure_inverted_index_total_updates.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT1M') + banyandb_stream_storage_inverted_index_total_updates.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT1M') + banyandb_trace_storage_inverted_index_total_updates.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT1M')
   - name: data_series_term_search_rate
-    exp: banyandb_measure_inverted_index_total_term_searchers_started.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT15S') + banyandb_stream_storage_inverted_index_total_term_searchers_started.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT15S') + banyandb_trace_storage_inverted_index_total_term_searchers_started.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT15S')
+    exp: banyandb_measure_inverted_index_total_term_searchers_started.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT1M') + banyandb_stream_storage_inverted_index_total_term_searchers_started.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT1M') + banyandb_trace_storage_inverted_index_total_term_searchers_started.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT1M')
   - name: data_total_series
     exp: banyandb_measure_inverted_index_total_doc_count.sum(['cluster','pod_name','container_name','node_role','node_type']) + banyandb_stream_storage_inverted_index_total_doc_count.sum(['cluster','pod_name','container_name','node_role','node_type']) + banyandb_trace_storage_inverted_index_total_doc_count.sum(['cluster','pod_name','container_name','node_role','node_type'])
   # stream time-series-table (tst) index, distinct from the stream series (storage) index above.
   - name: data_stream_tst_write_rate
-    exp: banyandb_stream_tst_inverted_index_total_updates.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT15S')
+    exp: banyandb_stream_tst_inverted_index_total_updates.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT1M')
   - name: data_stream_tst_term_search_rate
-    exp: banyandb_stream_tst_inverted_index_total_term_searchers_started.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT15S')
+    exp: banyandb_stream_tst_inverted_index_total_term_searchers_started.sum(['cluster','pod_name','container_name','node_role','node_type']).rate('PT1M')
   - name: data_stream_tst_total_docs
     exp: banyandb_stream_tst_inverted_index_total_doc_count.sum(['cluster','pod_name','container_name','node_role','node_type'])
   # subscribe-side queue (data receives from liaison): throughput by operation + p99 latency.
   - name: data_queue_sub_throughput
-    exp: banyandb_queue_sub_total_finished.sum(['cluster','pod_name','container_name','node_role','node_type','operation']).rate('PT15S')
+    exp: banyandb_queue_sub_total_finished.sum(['cluster','pod_name','container_name','node_role','node_type','operation']).rate('PT1M')
   - name: data_queue_sub_latency_p99
     exp: banyandb_queue_sub_total_latency.sum(['cluster','pod_name','container_name','node_role','node_type','operation','le']).histogram().histogram_percentile([99])
   # subscribe-side per-message throughput (BanyanDB #1169). A data node ingests writes via the
@@ -181,7 +181,7 @@ metricsRules:
   # intentionally not modeled here. Batch-level granularity lives on the liaison's publish side
   # (liaison_publish_batch_throughput / liaison_publish_batch_latency_p99 above).
   - name: data_queue_sub_message_throughput
-    exp: banyandb_queue_sub_total_message_finished.sum(['cluster','pod_name','container_name','node_role','node_type','operation']).rate('PT15S')
+    exp: banyandb_queue_sub_total_message_finished.sum(['cluster','pod_name','container_name','node_role','node_type','operation']).rate('PT1M')
   # retention disk-usage % per data-model scope (0-100 gauge). Kept per scope rather than summed (a sum
   # of three percentages is meaningless). Not in the upstream Grafana boards; a SkyWalking addition.
   - name: data_retention_measure_disk_usage_percent
diff --git a/oap-server/server-starter/src/main/resources/otel-rules/banyandb/banyandb-service.yaml b/oap-server/server-starter/src/main/resources/otel-rules/banyandb/banyandb-service.yaml
index 97c6cac8f6c8..45d505d8f43c 100644
--- a/oap-server/server-starter/src/main/resources/otel-rules/banyandb/banyandb-service.yaml
+++ b/oap-server/server-starter/src/main/resources/otel-rules/banyandb/banyandb-service.yaml
@@ -29,11 +29,11 @@ metricsRules:
   # cluster writes/s across the three data-model scopes (measure, stream, trace). Each scope's
   # write counter is collapsed to one per-cluster series before `+`.
   - name: cluster_write_rate
-    exp: (banyandb_measure_total_written.sum(['cluster']).rate('PT15S') + banyandb_stream_tst_total_written.sum(['cluster']).rate('PT15S') + banyandb_trace_tst_total_written.sum(['cluster']).rate('PT15S'))
+    exp: (banyandb_measure_total_written.sum(['cluster']).rate('PT1M') + banyandb_stream_tst_total_written.sum(['cluster']).rate('PT1M') + banyandb_trace_tst_total_written.sum(['cluster']).rate('PT1M'))
   # cluster queries/s. `service` on this family is BanyanDB's data-model facet
   # (measure/stream/trace/property), not a SkyWalking service; method literal is "query".
   - name: cluster_query_rate
-    exp: banyandb_liaison_grpc_total_started.tagEqual('method','query').sum(['cluster']).rate('PT15S')
+    exp: banyandb_liaison_grpc_total_started.tagEqual('method','query').sum(['cluster']).rate('PT1M')
   # cluster errors/min. The seven liaison-side error families mirror the upstream Grafana
   # "Error Rate" stat (grafana-fodc-workload.json). Each is pre-aggregated to ['cluster']
   # BEFORE `+` because their wire label sets differ (stream_msg_received_err carries
@@ -42,7 +42,7 @@ metricsRules:
   # registered and emit no series; MAL treats an empty operand as the additive identity, so the
   # sum emits from whatever has fired and renders absent-as-0 when nothing has.
   - name: cluster_error_rate
-    exp: (banyandb_liaison_grpc_total_err.sum(['cluster']).rate('PT15S') + banyandb_liaison_grpc_total_registry_err.sum(['cluster']).rate('PT15S') + banyandb_liaison_grpc_total_stream_msg_received_err.sum(['cluster']).rate('PT15S') + banyandb_queue_pub_total_err.sum(['cluster']).rate('PT15S') + banyandb_measure_total_sync_loop_err.sum(['cluster']).rate('PT15S') + banyandb_stream_tst_total_sync_loop_err.sum(['cluster']).rate('PT15S') + banyandb_trace_tst_total_sync_loop_err.sum(['cluster']).rate('PT15S')) * 60
+    exp: (banyandb_liaison_grpc_total_err.sum(['cluster']).rate('PT1M') + banyandb_liaison_grpc_total_registry_err.sum(['cluster']).rate('PT1M') + banyandb_liaison_grpc_total_stream_msg_received_err.sum(['cluster']).rate('PT1M') + banyandb_queue_pub_total_err.sum(['cluster']).rate('PT1M') + banyandb_measure_total_sync_loop_err.sum(['cluster']).rate('PT1M') + banyandb_stream_tst_total_sync_loop_err.sum(['cluster']).rate('PT1M') + banyandb_trace_tst_total_sync_loop_err.sum(['cluster']).rate('PT1M')) * 60
   # live container count by role. count(['cluster','container_name','pod_name']) groups by all
   # three then re-groups excluding the last key (pod_name), yielding one sample per
   # (cluster, container_name) whose value = distinct pod_name count -> data=N, liaison=M.
diff --git a/oap-server/server-storage-plugin/storage-banyandb-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/banyandb/BanyanDBIndexInstaller.java b/oap-server/server-storage-plugin/storage-banyandb-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/banyandb/BanyanDBIndexInstaller.java
index cabfb75276cd..4318806cab14 100644
--- a/oap-server/server-storage-plugin/storage-banyandb-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/banyandb/BanyanDBIndexInstaller.java
+++ b/oap-server/server-storage-plugin/storage-banyandb-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/banyandb/BanyanDBIndexInstaller.java
@@ -19,9 +19,11 @@
 package org.apache.skywalking.oap.server.storage.plugin.banyandb;
 
 import io.grpc.Status;
+import java.time.Duration;
 import java.util.HashMap;
 import java.util.HashSet;
 import java.util.List;
+import java.util.Locale;
 import java.util.Map;
 import java.util.Set;
 import java.util.stream.Collectors;
@@ -40,7 +42,6 @@
 import org.apache.skywalking.banyandb.database.v1.BanyandbDatabase.IndexRuleBinding;
 import org.apache.skywalking.banyandb.schema.v1.BanyandbSchema.SchemaKey;
 import org.apache.skywalking.banyandb.database.v1.BanyandbDatabase.TopNAggregation;
-import java.time.Duration;
 import org.apache.skywalking.library.banyandb.v1.client.BanyanDBClient;
 import org.apache.skywalking.library.banyandb.v1.client.SchemaWatcher;
 import org.apache.skywalking.library.banyandb.v1.client.grpc.exception.BanyanDBException;
@@ -103,6 +104,51 @@ public class BanyanDBIndexInstaller extends ModelInstaller {
     public BanyanDBIndexInstaller(Client client, ModuleManager moduleManager, BanyanDBStorageConfig config) {
         super(client, moduleManager);
         this.config = config;
+        // Let read/persist paths self-heal a missing local schema entry (MetadataRegistry.repopulateLocally):
+        // re-derive the model's Schema locally with zero server RPC via the same primitive the peer
+        // boot path uses. This closes the "<metric> is not registered" flood that arises when a
+        // withoutSchemaChange peer apply or a runtime-rule bundled fall-over rebuilds the dispatch
+        // worker but skips the populate. DownSamplingConfigService is resolved lazily per call — a
+        // self-heal only fires post-boot, when CoreModule is long up.
+        MetadataRegistry.INSTANCE.registerLocalSchemaPopulator(model -> {
+            final DownSamplingConfigService downSamplingConfigService = moduleManager.find(CoreModule.NAME)
+                                                                                     .provider()
+                                                                                     .getService(DownSamplingConfigService.class);
+            registerLocallyByKind(model, downSamplingConfigService);
+        });
+    }
+
+    @Override
+    protected boolean isRetryableNoInitProbeFailure(final StorageException e) {
+        Throwable cause = e.getCause();
+        while (cause != null) {
+            if (cause instanceof BanyanDBException) {
+                return isTransientBanyanDBProbeFailure((BanyanDBException) cause);
+            }
+            cause = cause.getCause();
+        }
+        return false;
+    }
+
+    private static boolean isTransientBanyanDBProbeFailure(final BanyanDBException e) {
+        final Status.Code code = e.getStatus();
+        if (Status.Code.UNAVAILABLE.equals(code)
+            || Status.Code.DEADLINE_EXCEEDED.equals(code)
+            || Status.Code.CANCELLED.equals(code)
+            || Status.Code.RESOURCE_EXHAUSTED.equals(code)
+            || Status.Code.ABORTED.equals(code)) {
+            return true;
+        }
+        if (!Status.Code.UNKNOWN.equals(code)) {
+            return false;
+        }
+        final String message = String.valueOf(e.getMessage()).toLowerCase(Locale.ROOT);
+        return message.contains("client connection is closing")
+            || message.contains("connection is closing")
+            || message.contains("transport is closing")
+            || message.contains("connection refused")
+            || message.contains("connection reset")
+            || message.contains("broken pipe");
     }
 
     @Override
@@ -242,10 +288,65 @@ public InstallInfo isExists(Model model, StorageManipulationOpt opt) throws Stor
      */
     private void fenceOnRevision(final BanyanDBClient client, final StorageManipulationOpt opt,
                                  final String context) throws BanyanDBException {
+        if (opt.isDeferFence()) {
+            // Batched apply: do NOT fence per resource. Register a single flush that the apply
+            // orchestration runs once after all DDL is fired (StorageManipulationOpt#runDeferredFence),
+            // so a multi-rule file waits on ONE barrier on the cumulative max revision instead of
+            // one fence per metric/downsampling. The closure reads opt.getMaxModRevision() at flush
+            // time, after every resource has recorded its revision.
+            opt.setDeferredFence(() -> {
+                try {
+                    doDeferredFence(client, opt, "batched apply");
+                } catch (final BanyanDBException e) {
+                    throw new StorageException("batched schema fence failed", e);
+                }
+            });
+            return;
+        }
+        doFenceOnRevision(client, opt, context);
+    }
+
+    private void doFenceOnRevision(final BanyanDBClient client, final StorageManipulationOpt opt,
+                                   final String context) throws BanyanDBException {
+        doFenceOnRevisionValue(client, opt.getMaxModRevision(), context);
+    }
+
+    /**
+     * The deferred (batched) fence the runtime-rule apply runs once after all DDL. Unlike the inline
+     * {@link #doFenceOnRevisionValue}, this (1) honors the opt's configured timeout
+     * ({@link StorageManipulationOpt#getFenceTimeoutMs()}, the runtime-rule 3-min budget) instead of
+     * the short inline {@link #FENCE_TIMEOUT}, and (2) records the outcome (applied + laggard node
+     * ids) on the opt so the orchestrator can mark {@code APPLIED} vs {@code DEGRADED} and gate the
+     * dispatch resume on this fence. A laggard timeout is still a non-fatal WARN.
+     */
+    private void doDeferredFence(final BanyanDBClient client, final StorageManipulationOpt opt,
+                                 final String context) throws BanyanDBException {
         final long rev = opt.getMaxModRevision();
         if (rev <= 0L) {
             return;
         }
+        final Duration timeout = opt.getFenceTimeoutMs() > 0L
+            ? Duration.ofMillis(opt.getFenceTimeoutMs())
+            : FENCE_TIMEOUT;
+        final SchemaWatcher.Result result = client.getSchemaWatcher().awaitRevisionApplied(rev, timeout);
+        if (!result.isApplied()) {
+            log.warn("BanyanDB schema-watch fence did NOT confirm revision {} within {} ms for {}; "
+                + "proceeding anyway. Laggards: {}", rev, timeout.toMillis(), context, result.getLaggards());
+            final List<String> laggardIds = result.getLaggards().stream()
+                .map(l -> l.getNode())
+                .collect(Collectors.toList());
+            opt.setFenceOutcome(new StorageManipulationOpt.FenceOutcome(false, laggardIds));
+        } else {
+            log.debug("BanyanDB schema-watch fence confirmed revision {} for {}", rev, context);
+            opt.setFenceOutcome(new StorageManipulationOpt.FenceOutcome(true, List.of()));
+        }
+    }
+
+    private void doFenceOnRevisionValue(final BanyanDBClient client, final long rev,
+                                        final String context) throws BanyanDBException {
+        if (rev <= 0L) {
+            return;
+        }
         final SchemaWatcher.Result result = client.getSchemaWatcher().awaitRevisionApplied(rev, FENCE_TIMEOUT);
         if (!result.isApplied()) {
             log.warn("BanyanDB schema-watch fence did NOT confirm revision {} within {} ms for {}; "
@@ -412,6 +513,13 @@ public void dropTable(Model model, StorageManipulationOpt opt) throws StorageExc
             final String group = metadata.getGroup();
             final String name = metadata.name();
             log.info("drop BanyanDB schema kind={} {}:{}", metadata.getKind(), group, name);
+            // Tombstone revision of THIS drop's primary resource only — used to decide the
+            // deletion fence. It must be the primary's own revision, NOT opt.getMaxModRevision():
+            // a single opt is reused across many files in a tick, so the cumulative max can carry
+            // an unrelated earlier create/binding revision and make a tombstone-less delete
+            // (primary revision 0) skip the AwaitSchemaDeleted fallback. 0 for trace/property,
+            // whose delete RPCs have no revision-returning variant — those always key-fence.
+            long primaryDeleteRev = StorageManipulationOpt.DEFAULT_MOD_REVISION;
             switch (metadata.getKind()) {
                 case MEASURE:
                     // Drop the TopN aggregations first (if any), then index rule bindings, index rules, then the measure.
@@ -423,11 +531,13 @@ public void dropTable(Model model, StorageManipulationOpt opt) throws StorageExc
                         }
                     }
                     dropIndexRuleBindingsBestEffort(client, group, name, opt);
-                    opt.recordModRevision(client.deleteMeasureWithRevision(group, name));
+                    primaryDeleteRev = client.deleteMeasureWithRevision(group, name);
+                    opt.recordModRevision(primaryDeleteRev);
                     break;
                 case STREAM:
                     dropIndexRuleBindingsBestEffort(client, group, name, opt);
-                    opt.recordModRevision(client.deleteStreamWithRevision(group, name));
+                    primaryDeleteRev = client.deleteStreamWithRevision(group, name);
+                    opt.recordModRevision(primaryDeleteRev);
                     break;
                 case TRACE:
                     dropIndexRuleBindingsBestEffort(client, group, name, opt);
@@ -441,9 +551,9 @@ public void dropTable(Model model, StorageManipulationOpt opt) throws StorageExc
                         "dropTable unsupported kind=" + metadata.getKind() + " for model " + model.getName());
             }
             // Fence: prefer the revision-based wait when the server recorded a tombstone
-            // mod_revision; otherwise fall back to AwaitSchemaDeleted keyed on the
-            // primary resource so callers get a hard "removed everywhere" signal.
-            fenceOnRevisionOrDeletion(client, opt, metadata, "dropTable:" + model.getName());
+            // mod_revision for THIS resource; otherwise fall back to AwaitSchemaDeleted keyed on
+            // the primary resource so callers get a hard "removed everywhere" signal.
+            fenceOnRevisionOrDeletion(client, metadata, primaryDeleteRev, "dropTable:" + model.getName());
         } catch (BanyanDBException ex) {
             if (Status.Code.NOT_FOUND.equals(ex.getStatus())) {
                 log.info("BanyanDB schema {} already absent on drop (idempotent)", model.getName());
@@ -454,22 +564,32 @@ public void dropTable(Model model, StorageManipulationOpt opt) throws StorageExc
     }
 
     /**
-     * Prefer {@code AwaitRevisionApplied(maxRev)} when the registry returned a
-     * non-zero tombstone revision; otherwise fall back to
+     * Prefer {@code AwaitRevisionApplied(primaryDeleteRev)} when the registry returned a
+     * non-zero tombstone revision for the primary resource; otherwise fall back to
      * {@code AwaitSchemaDeleted(key)} keyed on the primary resource. The fallback
      * exists because {@code mod_revision == 0} on a delete response means the server
      * did not record a tombstone — the revision-based fence cannot observe a
      * deletion that didn't get one.
+     *
+     * <p>The decision keys on {@code primaryDeleteRev} — the primary resource's own delete
+     * revision — NOT {@code opt.getMaxModRevision()}. A single opt is shared across every file
+     * in a reconciler tick, so its cumulative max can hold an unrelated earlier create/binding
+     * revision; using it here would make a tombstone-less primary delete take the revision
+     * branch and silently skip {@code AwaitSchemaDeleted}. Because the primary delete is issued
+     * last (after TopN + bindings), its revision is the highest of this drop and fencing on it
+     * also covers the earlier lower-revision deletes of the same drop.
      */
-    private void fenceOnRevisionOrDeletion(final BanyanDBClient client, final StorageManipulationOpt opt,
+    private void fenceOnRevisionOrDeletion(final BanyanDBClient client,
                                            final MetadataRegistry.SchemaMetadata metadata,
+                                           final long primaryDeleteRev,
                                            final String context) throws BanyanDBException {
-        final long rev = opt.getMaxModRevision();
-        if (rev > 0L) {
-            fenceOnRevision(client, opt, context);
+        if (primaryDeleteRev > 0L) {
+            // Drops fence inline (never deferred): a deletion's visibility is per-key and must
+            // not ride a batched revision flush — drops stay correct even under a deferFence opt.
+            doFenceOnRevisionValue(client, primaryDeleteRev, context);
             return;
         }
-        // mod_revision was 0 on every delete — fall back to key-based deletion fence.
+        // mod_revision was 0 on the primary delete — fall back to key-based deletion fence.
         final String kind;
         switch (metadata.getKind()) {
             case MEASURE:
@@ -1290,6 +1410,24 @@ private void checkTopNAggregation(Model model, BanyanDBClient client, StorageMan
      * schema cache the local DAOs read from so this node can translate Model ↔ BanyanDB
      * proto for sample ingest / queries.
      */
+    @Override
+    protected void populateLocalCacheOnly(final Model model, final StorageManipulationOpt opt) {
+        // inspectBackend=false (peer / local-cache-only tick): the main owns the backend
+        // resource; this node only (re)derives its local MetadataRegistry entry so its DAOs
+        // can translate this model. RPC-free, and an overwrite via register*Model — keeps a
+        // peer's cache in lockstep with a reshaped model that re-fires whenCreating, which the
+        // read-side self-heal (fill-if-absent only) cannot do.
+        final DownSamplingConfigService downSamplingConfigService = moduleManager.find(CoreModule.NAME)
+                                                                                 .provider()
+                                                                                 .getService(DownSamplingConfigService.class);
+        registerLocallyByKind(model, downSamplingConfigService);
+    }
+
+    @Override
+    protected void evictLocalCache(final Model model) {
+        MetadataRegistry.INSTANCE.evict(model);
+    }
+
     private void registerLocallyByKind(final Model model,
                                         final DownSamplingConfigService downSamplingConfigService) {
         if (model.isTimeSeries()) {
diff --git a/oap-server/server-storage-plugin/storage-banyandb-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/banyandb/BanyanDBNoneStreamDAO.java b/oap-server/server-storage-plugin/storage-banyandb-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/banyandb/BanyanDBNoneStreamDAO.java
index 09bd256e14a4..98763aa3e115 100644
--- a/oap-server/server-storage-plugin/storage-banyandb-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/banyandb/BanyanDBNoneStreamDAO.java
+++ b/oap-server/server-storage-plugin/storage-banyandb-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/banyandb/BanyanDBNoneStreamDAO.java
@@ -40,7 +40,13 @@ public BanyanDBNoneStreamDAO(BanyanDBStorageClient client, StorageBuilder<NoneSt
 
     @Override
     public void insert(Model model, NoneStream noneStream) throws IOException {
+        // Self-heal a missing local schema entry once (RPC-free re-derivation) before failing —
+        // see MetadataRegistry.repopulateLocally. Re-read via the same lookup; throw only if still absent.
         MetadataRegistry.Schema schema = MetadataRegistry.INSTANCE.findRecordMetadata(model.getName());
+        if (schema == null) {
+            MetadataRegistry.INSTANCE.repopulateLocally(model);
+            schema = MetadataRegistry.INSTANCE.findRecordMetadata(model.getName());
+        }
         if (schema == null) {
             throw new IOException(model.getName() + " is not registered");
         }
diff --git a/oap-server/server-storage-plugin/storage-banyandb-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/banyandb/MetadataRegistry.java b/oap-server/server-storage-plugin/storage-banyandb-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/banyandb/MetadataRegistry.java
index 958b4c615101..a8591e5ee9a3 100644
--- a/oap-server/server-storage-plugin/storage-banyandb-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/banyandb/MetadataRegistry.java
+++ b/oap-server/server-storage-plugin/storage-banyandb-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/banyandb/MetadataRegistry.java
@@ -82,6 +82,7 @@
 import java.util.Objects;
 import java.util.Optional;
 import java.util.Set;
+import java.util.concurrent.ConcurrentHashMap;
 import java.util.function.Function;
 import java.util.stream.Collectors;
 
@@ -94,7 +95,51 @@
 public enum MetadataRegistry {
     INSTANCE;
 
-    private final Map<String, Schema> registry = new HashMap<>();
+    // ConcurrentHashMap (not HashMap): boot populates single-threaded, but the self-heal path
+    // (repopulateLocally) writes from persistence/query threads concurrently with reads.
+    private final Map<String, Schema> registry = new ConcurrentHashMap<>();
+
+    /**
+     * Re-derive and locally register a model's BanyanDB {@link Schema} with NO server RPC.
+     * Registered once by the active {@code BanyanDBIndexInstaller} at boot and invoked by
+     * {@link #repopulateLocally(Model)} when a read path finds the cache empty for a model whose
+     * dispatch worker is already live — e.g. a {@code withoutSchemaChange} peer apply or a
+     * runtime-rule bundled fall-over rebuilt the worker but skipped the local populate. The
+     * {@code Model} is always known locally and its schema is a pure local derivation, so such a
+     * miss is always re-derivable without touching the backend.
+     */
+    @FunctionalInterface
+    public interface LocalSchemaPopulator {
+        void populateLocally(Model model);
+    }
+
+    private volatile LocalSchemaPopulator localSchemaPopulator;
+
+    /** Register the boot-time, RPC-free local schema populator. Called once by the active installer. */
+    public void registerLocalSchemaPopulator(final LocalSchemaPopulator populator) {
+        this.localSchemaPopulator = populator;
+    }
+
+    /**
+     * Best-effort, RPC-free re-derivation of a model's local {@link Schema} so a read/persist path
+     * can self-heal a missing cache entry instead of throwing {@code "<model> is not registered"}
+     * forever (the registry never evicts, so an entry that was never populated on this node stays
+     * absent otherwise). No-op when no populator is registered (e.g. non-BanyanDB unit tests).
+     * Swallows derivation exceptions so a self-heal attempt is never worse than the pre-existing
+     * throw — the caller re-reads and surfaces its own not-registered error if still absent.
+     */
+    public void repopulateLocally(final Model model) {
+        final LocalSchemaPopulator populator = this.localSchemaPopulator;
+        if (populator == null) {
+            return;
+        }
+        try {
+            populator.populateLocally(model);
+        } catch (final Exception e) {
+            log.debug("local schema self-heal re-derivation failed for model [{}]; "
+                + "caller will surface the not-registered error", model.getName(), e);
+        }
+    }
 
     public StreamModel registerStreamModel(Model model, BanyanDBStorageConfig config) {
         final SchemaMetadata schemaMetadata = parseMetadata(model, config, null);
@@ -398,6 +443,24 @@ public Schema findMetadata(final Model model) {
         return findMetricMetadata(model.getName(), model.getDownsampling());
     }
 
+    /**
+     * Drop the local {@link Schema} cache entry for {@code model}, keyed exactly as
+     * {@link #findMetadata(Model)} looks it up. The registry is otherwise insert-only, so this
+     * is the one path that removes an entry — invoked from {@code ModelInstaller.whenRemoving}
+     * on a runtime-rule hot-remove / reshape so a stale translation never outlives the model.
+     */
+    public void evict(final Model model) {
+        final String key;
+        if (!model.isTimeSeries() || model.isRecord()) {
+            key = model.getName();
+        } else {
+            key = SchemaMetadata.formatName(model.getName(), model.getDownsampling());
+        }
+        if (registry.remove(key) != null) {
+            log.debug("evicted local BanyanDB schema cache entry [{}]", key);
+        }
+    }
+
     private FieldSpec parseFieldSpec(ModelColumn modelColumn) {
         String colName = modelColumn.getColumnName().getStorageName();
         if (String.class.equals(modelColumn.getType())) {
diff --git a/oap-server/server-storage-plugin/storage-banyandb-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/banyandb/measure/BanyanDBMetricsDAO.java b/oap-server/server-storage-plugin/storage-banyandb-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/banyandb/measure/BanyanDBMetricsDAO.java
index 4f1ff1928e88..d52c70ec5ffd 100644
--- a/oap-server/server-storage-plugin/storage-banyandb-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/banyandb/measure/BanyanDBMetricsDAO.java
+++ b/oap-server/server-storage-plugin/storage-banyandb-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/banyandb/measure/BanyanDBMetricsDAO.java
@@ -63,12 +63,29 @@ public BanyanDBMetricsDAO(BanyanDBStorageClient client, StorageBuilder<Metrics>
         this.storageBuilder = storageBuilder;
     }
 
-    @Override
-    public List<Metrics> multiGet(Model model, List<Metrics> metrics) throws IOException {
+    /**
+     * Resolve the model's BanyanDB schema, self-healing a missing local entry once before failing.
+     * A null here means this node has a live persist worker for the model but its schema cache was
+     * never populated (or lost) — typically a {@code withoutSchemaChange} peer apply or a
+     * runtime-rule bundled fall-over that rebuilt the worker without the populate. Re-derive the
+     * schema locally with no server RPC and re-read; throw only if the entry is still absent, so
+     * a genuinely unknown model still fails fast instead of flooding forever.
+     */
+    private MetadataRegistry.Schema resolveSchema(Model model) throws IOException {
         MetadataRegistry.Schema schema = MetadataRegistry.INSTANCE.findMetadata(model);
         if (schema == null) {
-            throw new IOException(model.getName() + " is not registered");
+            MetadataRegistry.INSTANCE.repopulateLocally(model);
+            schema = MetadataRegistry.INSTANCE.findMetadata(model);
+            if (schema == null) {
+                throw new IOException(model.getName() + " is not registered");
+            }
         }
+        return schema;
+    }
+
+    @Override
+    public List<Metrics> multiGet(Model model, List<Metrics> metrics) throws IOException {
+        MetadataRegistry.Schema schema = resolveSchema(model);
         final Map<String, List<String>> seriesIDColumns = new HashMap<>();
         if (model.getBanyanDBModelExtension().isIndexMode()) {
             seriesIDColumns.put(ID, new ArrayList<>());
@@ -144,10 +161,7 @@ protected void apply(MeasureQuery query) {
 
     @Override
     public InsertRequest prepareBatchInsert(Model model, Metrics metrics, SessionCacheCallback callback) throws IOException {
-        MetadataRegistry.Schema schema = MetadataRegistry.INSTANCE.findMetadata(model);
-        if (schema == null) {
-            throw new IOException(model.getName() + " is not registered");
-        }
+        MetadataRegistry.Schema schema = resolveSchema(model);
         MeasureWrite measureWrite = getClient().createMeasureWrite(schema.getMetadata().getGroup(), // group name
                 schema.getMetadata().name(), // measure-name
                 TimeBucket.getTimestamp(metrics.getTimeBucket(), model.getDownsampling())); // timestamp
@@ -161,10 +175,7 @@ public InsertRequest prepareBatchInsert(Model model, Metrics metrics, SessionCac
 
     @Override
     public UpdateRequest prepareBatchUpdate(Model model, Metrics metrics, SessionCacheCallback callback) throws IOException {
-        MetadataRegistry.Schema schema = MetadataRegistry.INSTANCE.findMetadata(model);
-        if (schema == null) {
-            throw new IOException(model.getName() + " is not registered");
-        }
+        MetadataRegistry.Schema schema = resolveSchema(model);
         MeasureWrite measureWrite = getClient().createMeasureWrite(schema.getMetadata().getGroup(), // group name
                 schema.getMetadata().name(), // measure-name
                 TimeBucket.getTimestamp(metrics.getTimeBucket(), model.getDownsampling())); // timestamp
diff --git a/oap-server/server-storage-plugin/storage-banyandb-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/banyandb/stream/BanyanDBRecordDAO.java b/oap-server/server-storage-plugin/storage-banyandb-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/banyandb/stream/BanyanDBRecordDAO.java
index 8bb0d28f8009..4632982b7631 100644
--- a/oap-server/server-storage-plugin/storage-banyandb-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/banyandb/stream/BanyanDBRecordDAO.java
+++ b/oap-server/server-storage-plugin/storage-banyandb-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/banyandb/stream/BanyanDBRecordDAO.java
@@ -50,7 +50,13 @@ public BanyanDBRecordDAO(BanyanDBStorageClient client, StorageBuilder<Record> st
 
     @Override
     public InsertRequest prepareBatchInsert(Model model, Record record) throws IOException {
+        // Self-heal a missing local schema entry once (RPC-free re-derivation) before failing —
+        // see MetadataRegistry.repopulateLocally. Throw only if the entry is still absent.
         MetadataRegistry.Schema schema = MetadataRegistry.INSTANCE.findMetadata(model);
+        if (schema == null) {
+            MetadataRegistry.INSTANCE.repopulateLocally(model);
+            schema = MetadataRegistry.INSTANCE.findMetadata(model);
+        }
         if (schema == null) {
             throw new IOException(model.getName() + " is not registered");
         }
@@ -60,6 +66,9 @@ public InsertRequest prepareBatchInsert(Model model, Record record) throws IOExc
                 if (record instanceof BanyanDBTrace.MergeTable) {
                     BanyanDBTrace.MergeTable mergeTable = (BanyanDBTrace.MergeTable) record;
                     MetadataRegistry.Schema mergeTableSchema = MetadataRegistry.INSTANCE.findRecordMetadata(mergeTable.getMergeTableName());
+                    if (mergeTableSchema == null) {
+                        throw new IOException(mergeTable.getMergeTableName() + " is not registered");
+                    }
 
                     traceWrite = getClient().createTraceWrite(
                         schema.getMetadata().getGroup(),
diff --git a/oap-server/server-storage-plugin/storage-banyandb-plugin/src/test/java/org/apache/skywalking/oap/server/storage/plugin/banyandb/MetadataRegistryTest.java b/oap-server/server-storage-plugin/storage-banyandb-plugin/src/test/java/org/apache/skywalking/oap/server/storage/plugin/banyandb/MetadataRegistryTest.java
new file mode 100644
index 000000000000..6e7ef96e35d9
--- /dev/null
+++ b/oap-server/server-storage-plugin/storage-banyandb-plugin/src/test/java/org/apache/skywalking/oap/server/storage/plugin/banyandb/MetadataRegistryTest.java
@@ -0,0 +1,110 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ */
+
+package org.apache.skywalking.oap.server.storage.plugin.banyandb;
+
+import java.util.concurrent.atomic.AtomicInteger;
+import org.apache.skywalking.oap.server.core.analysis.DownSampling;
+import org.apache.skywalking.oap.server.core.storage.model.Model;
+import org.junit.jupiter.api.AfterEach;
+import org.junit.jupiter.api.Test;
+
+import static org.junit.jupiter.api.Assertions.assertDoesNotThrow;
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.mockito.Mockito.mock;
+import static org.mockito.Mockito.when;
+
+/**
+ * Unit coverage for the local schema-cache self-heal on {@link MetadataRegistry}. A read/persist
+ * path that finds the cache empty for a model whose dispatch worker is already live (e.g. a
+ * {@code withoutSchemaChange} peer apply or a runtime-rule bundled fall-over that rebuilt the
+ * worker but skipped the populate) must be able to re-derive the schema locally with no server
+ * RPC, instead of throwing {@code "<model> is not registered"} forever.
+ */
+class MetadataRegistryTest {
+
+    @AfterEach
+    void clearPopulator() {
+        // MetadataRegistry is an enum singleton; clear the populator so global state set by a test
+        // does not leak into others.
+        MetadataRegistry.INSTANCE.registerLocalSchemaPopulator(null);
+    }
+
+    @Test
+    void repopulateLocallyInvokesRegisteredPopulator() {
+        final Model model = mock(Model.class);
+        when(model.getName()).thenReturn("meter_test_metric");
+        final AtomicInteger calls = new AtomicInteger();
+        MetadataRegistry.INSTANCE.registerLocalSchemaPopulator(m -> calls.incrementAndGet());
+
+        MetadataRegistry.INSTANCE.repopulateLocally(model);
+
+        assertEquals(1, calls.get(), "a registered populator must be invoked on a self-heal attempt");
+    }
+
+    @Test
+    void repopulateLocallyIsNoOpWhenNoPopulatorRegistered() {
+        MetadataRegistry.INSTANCE.registerLocalSchemaPopulator(null);
+        final Model model = mock(Model.class);
+        assertDoesNotThrow(() -> MetadataRegistry.INSTANCE.repopulateLocally(model),
+            "self-heal with no populator (e.g. a non-BanyanDB context) must be a no-op");
+    }
+
+    @Test
+    void repopulateLocallySwallowsPopulatorError() {
+        final Model model = mock(Model.class);
+        when(model.getName()).thenReturn("meter_test_metric");
+        MetadataRegistry.INSTANCE.registerLocalSchemaPopulator(m -> {
+            throw new RuntimeException("derivation boom");
+        });
+
+        // A failed re-derivation must never be worse than the pre-existing throw: the caller
+        // re-reads and surfaces its own not-registered error, so repopulateLocally itself must
+        // not propagate.
+        assertDoesNotThrow(() -> MetadataRegistry.INSTANCE.repopulateLocally(model),
+            "a throwing populator must be swallowed so self-heal never worsens the failure");
+    }
+
+    @Test
+    void evictExercisesEveryFindMetadataKeyBranchSafely() {
+        // evict() must key its removal exactly as findMetadata() looks an entry up — across
+        // management (name), record (name) and metric (formatName(name, downSampling)) kinds.
+        // Exercise all three so a wrong-branch or formatName regression surfaces, and confirm
+        // evicting an absent entry is a safe no-op (the registry is otherwise insert-only).
+        final Model management = mock(Model.class);
+        when(management.isTimeSeries()).thenReturn(false);
+        when(management.getName()).thenReturn("management_model");
+
+        final Model record = mock(Model.class);
+        when(record.isTimeSeries()).thenReturn(true);
+        when(record.isRecord()).thenReturn(true);
+        when(record.getName()).thenReturn("record_model");
+
+        final Model metric = mock(Model.class);
+        when(metric.isTimeSeries()).thenReturn(true);
+        when(metric.isRecord()).thenReturn(false);
+        when(metric.getName()).thenReturn("metric_model");
+        when(metric.getDownsampling()).thenReturn(DownSampling.Minute);
+
+        assertDoesNotThrow(() -> {
+            MetadataRegistry.INSTANCE.evict(management);
+            MetadataRegistry.INSTANCE.evict(record);
+            MetadataRegistry.INSTANCE.evict(metric);
+        }, "evict must be a safe no-op for an absent entry across all model kinds");
+    }
+}
diff --git a/test/e2e-v2/cases/runtime-rule/cluster/cluster-flow.sh b/test/e2e-v2/cases/runtime-rule/cluster/cluster-flow.sh
index 0740a9947c6b..50499d20c5fc 100755
--- a/test/e2e-v2/cases/runtime-rule/cluster/cluster-flow.sh
+++ b/test/e2e-v2/cases/runtime-rule/cluster/cluster-flow.sh
@@ -141,6 +141,23 @@ await_hash() {
     done
 }
 
+# Poll a node until its row is ACTIVE with a contentHash different from prev, then echo the new hash.
+# Needed after a STRUCTURAL apply on an already-ACTIVE row: the status stays ACTIVE across the async
+# apply (which returns at FENCING and persists the new content in the background after the schema
+# fence), so await_status "ACTIVE" returns on the first iteration with the OLD hash. Gating on the
+# contentHash advancing is the only signal the new content is durable on this node.
+await_hash_change() {
+    local base="$1" prev="$2" deadline=$(( $(date +%s) + CONVERGE_TIMEOUT_S ))
+    while true; do
+        local got; got="$(list_hash "${base}")"
+        [ -n "${got}" ] && [ "${got}" != "${prev}" ] && { echo "${got}"; return 0; }
+        if [ "$(date +%s)" -ge "${deadline}" ]; then
+            fail "${base} contentHash did not advance past '${prev:0:8}…' within ${CONVERGE_TIMEOUT_S}s (last='${got:0:8}…', applyError='$(list_apply_error "${base}")')"
+        fi
+        sleep 2
+    done
+}
+
 await_absent() {
     local base="$1" deadline=$(( $(date +%s) + CONVERGE_TIMEOUT_S ))
     while true; do
@@ -190,9 +207,10 @@ log "OAP-2 converged to ${hash_initial:0:8}…"
 # --- Phase 2: STRUCTURAL update on OAP-1 — second measure created on no-init --------
 log "=== Phase 2: STRUCTURAL on OAP-1 ==="
 apply_on "${OAP1_BASE}" "${SEED_STRUCT}" "allowStorageChange=true" >/dev/null
-await_status "${OAP1_BASE}" "ACTIVE"
+# Structural apply is async: wait for OAP-1's own contentHash to ADVANCE (the row is ACTIVE before
+# and after, so await_status "ACTIVE" alone would read the stale pre-apply hash) before capturing it.
+hash_struct="$(await_hash_change "${OAP1_BASE}" "${hash_initial}")"
 assert_no_apply_error "${OAP1_BASE}"
-hash_struct="$(list_hash "${OAP1_BASE}")"
 [ "${hash_struct}" != "${hash_initial}" ] || fail "OAP-1 contentHash unchanged after STRUCTURAL apply"
 log "OAP-1 → ACTIVE @ ${hash_struct:0:8}… (was ${hash_initial:0:8}…)"
 await_hash "${OAP2_BASE}" "${hash_struct}"
diff --git a/test/e2e-v2/cases/runtime-rule/lal/lal-flow.sh b/test/e2e-v2/cases/runtime-rule/lal/lal-flow.sh
index aa57fd58bda1..38e964aa343b 100755
--- a/test/e2e-v2/cases/runtime-rule/lal/lal-flow.sh
+++ b/test/e2e-v2/cases/runtime-rule/lal/lal-flow.sh
@@ -76,6 +76,49 @@ list_field() {
     list_row "${catalog}" "${name}" | jq -r '."'"${field}"'" // empty'
 }
 
+# Budget for an async apply to land in /list. A NEW / STRUCTURAL addOrUpdate is async: it returns
+# immediately at FENCING and the rule row is persisted only AFTER a background schema fence, so a
+# single read right after the 2xx can miss the row (or read a stale contentHash). The waiters below
+# poll within this budget. FILTER_ONLY edits persist synchronously, so they return on the first poll.
+APPLY_LAND_S="${APPLY_LAND_S:-200}"
+
+# Poll until the (catalog,name) row reaches expected_status, up to APPLY_LAND_S.
+await_status() {
+    local catalog="$1" name="$2" expected="$3"
+    log "  await /list ${catalog}/${name} status=${expected} (budget ${APPLY_LAND_S}s)"
+    local deadline=$(( $(date +%s) + APPLY_LAND_S )) got=""
+    while true; do
+        got="$(list_field "${catalog}" "${name}" status)"
+        [ "${got}" = "${expected}" ] && return 0
+        if [ "$(date +%s)" -ge "${deadline}" ]; then
+            fail "${catalog}/${name} did not reach status='${expected}' within ${APPLY_LAND_S}s (last='${got}')"
+        fi
+        sleep 2
+    done
+}
+
+# Poll until the (catalog,name) row is ACTIVE with a contentHash different from prev, then echo the
+# new hash. The wait-condition for a swap whose status stays ACTIVE before and after the apply, so
+# the contentHash advancing is the only signal the new content actually landed (subsumes the
+# async-persist window and a possible STRUCTURAL classification of the swap).
+await_hash_changed() {
+    local catalog="$1" name="$2" prev="$3"
+    log "  await /list ${catalog}/${name} contentHash≠${prev:0:8}… (budget ${APPLY_LAND_S}s)"
+    local deadline=$(( $(date +%s) + APPLY_LAND_S )) status="" hash=""
+    while true; do
+        status="$(list_field "${catalog}" "${name}" status)"
+        hash="$(list_field "${catalog}" "${name}" contentHash)"
+        if [ "${status}" = "ACTIVE" ] && [ -n "${hash}" ] && [ "${hash}" != "${prev}" ]; then
+            echo "${hash}"
+            return 0
+        fi
+        if [ "$(date +%s)" -ge "${deadline}" ]; then
+            fail "${catalog}/${name} contentHash did not advance past '${prev:0:8}…' within ${APPLY_LAND_S}s (last status='${status}' hash='${hash:0:8}…')"
+        fi
+        sleep 2
+    done
+}
+
 apply_rule() {
     local catalog="$1" name="$2" body="$3"
     admin runtime-rule add --catalog "${catalog}" --name "${name}" -f "${body}" >/dev/null \
@@ -155,15 +198,13 @@ log "OAP ready"
 # --- Phase 0: apply log-mal aggregation -----------------------------------------------
 log "=== Phase 0: apply log-mal aggregation rule ==="
 apply_rule "${MAL_CATALOG}" "${MAL_NAME}" "${SEED_MAL}"
-mal_status="$(list_field "${MAL_CATALOG}" "${MAL_NAME}" status)"
-[ "${mal_status}" = "ACTIVE" ] || fail "MAL rule expected ACTIVE, got '${mal_status}'"
+await_status "${MAL_CATALOG}" "${MAL_NAME}" "ACTIVE"
 log "log-mal → ACTIVE"
 
 # --- Phase 1: apply LAL v1 ------------------------------------------------------------
 log "=== Phase 1: apply LAL v1 (extractor stamps step=v1) ==="
 apply_rule "${LAL_CATALOG}" "${LAL_NAME}" "${SEED_V1}"
-status="$(list_field "${LAL_CATALOG}" "${LAL_NAME}" status)"
-[ "${status}" = "ACTIVE" ] || fail "v1 expected ACTIVE, got '${status}'"
+await_status "${LAL_CATALOG}" "${LAL_NAME}" "ACTIVE"
 hash_v1="$(list_field "${LAL_CATALOG}" "${LAL_NAME}" contentHash)"
 [ -n "${hash_v1}" ] || fail "v1 contentHash empty"
 log "v1 → ACTIVE @ ${hash_v1:0:8}…"
@@ -172,10 +213,7 @@ await_metric_for_step "v1"
 # --- Phase 2: swap to LAL v2 (same key, step flips to v2) -----------------------------
 log "=== Phase 2: swap to LAL v2 (extractor stamps step=v2) ==="
 apply_rule "${LAL_CATALOG}" "${LAL_NAME}" "${SEED_V2}"
-status="$(list_field "${LAL_CATALOG}" "${LAL_NAME}" status)"
-[ "${status}" = "ACTIVE" ] || fail "v2 expected ACTIVE, got '${status}'"
-hash_v2="$(list_field "${LAL_CATALOG}" "${LAL_NAME}" contentHash)"
-[ "${hash_v2}" != "${hash_v1}" ] || fail "v2 contentHash unchanged from v1 (${hash_v2:0:8}…)"
+hash_v2="$(await_hash_changed "${LAL_CATALOG}" "${LAL_NAME}" "${hash_v1}")"
 log "v2 → ACTIVE @ ${hash_v2:0:8}… (was ${hash_v1:0:8}…) — swap applied"
 await_metric_for_step "v2"
 
diff --git a/test/e2e-v2/cases/runtime-rule/mal-storage/runtime-rule-flow.sh b/test/e2e-v2/cases/runtime-rule/mal-storage/runtime-rule-flow.sh
index d9832ec10a33..860ddd2f807f 100755
--- a/test/e2e-v2/cases/runtime-rule/mal-storage/runtime-rule-flow.sh
+++ b/test/e2e-v2/cases/runtime-rule/mal-storage/runtime-rule-flow.sh
@@ -196,40 +196,123 @@ assert_apply_status() {
     || fail "expected applyStatus=${expected}, not found in: ${actual}"
 }
 
-# GET /runtime/rule/list and ensure the row matches the expected status. Returns
-# the matching JSON line on stdout for callers that want to inspect contentHash.
+# Budget for the async apply state machine to reach a terminal phase on GET /runtime/rule/status.
+APPLY_TERMINAL_S="${APPLY_TERMINAL_S:-200}"
+
+# Drive the new async apply surface to a terminal phase. A STRUCTURAL addOrUpdate (and a
+# /delete?mode=revertToBundled) returns immediately at FENCING with an applyId; the row is persisted
+# and dispatch resumed in the BACKGROUND, after the schema fence. Given the apply's JSON response,
+# extract its applyId and poll GET /runtime/rule/status until the phase is terminal:
+#   APPLIED / DEGRADED → the durable row was written (DEGRADED == committed-and-durable, only the
+#                        cluster-wide fence confirmation lagged) → return 0
+#   FAILED            → a pre-commit error, nothing was committed → fail
+#   anything else (FENCING / DDL / ROLLING_OUT / PENDING / UNKNOWN) → keep polling
+# A synchronous apply (filter_only / inactivate / default delete) carries no applyId, so this is a
+# no-op — that response is already durable on return. swctl has no runtime-rule `status` subcommand,
+# so this goes through curl (the status endpoint lives on the same REST port). Passing catalog+name
+# lets the main answer from the durable rule row once the live apply-id is TTL-evicted, so a slow
+# poll converges instead of false-timing-out.
+await_apply_terminal() {
+  local resp="$1"
+  local rule_name="${2:-${NAME}}"
+  local apply_id
+  apply_id="$(echo "${resp}" | jq -r '.applyId // empty' 2>/dev/null || true)"
+  if [[ -z "${apply_id}" ]]; then
+    return 0
+  fi
+  log "runtime-rule status → polling (≤${APPLY_TERMINAL_S}s) for apply ${apply_id} of ${CATALOG}/${rule_name} to reach a terminal phase"
+  local deadline=$(( $(date +%s) + APPLY_TERMINAL_S ))
+  local body phase=""
+  while :; do
+    body="$(curl -s "${REST_BASE}/runtime/rule/status?applyId=${apply_id}&catalog=${CATALOG}&name=${rule_name}" 2>/dev/null || true)"
+    phase="$(echo "${body}" | jq -r '.phase // empty' 2>/dev/null || true)"
+    case "${phase}" in
+      APPLIED|DEGRADED)
+        log "  ✓ apply ${apply_id} → ${phase} (durable)"
+        return 0
+        ;;
+      FAILED)
+        fail "apply ${apply_id} of ${CATALOG}/${rule_name} reached FAILED: ${body}"
+        ;;
+    esac
+    if (( $(date +%s) >= deadline )); then
+      fail "apply ${apply_id} of ${CATALOG}/${rule_name} did not reach a terminal phase within ${APPLY_TERMINAL_S}s (last phase='${phase}', body: ${body})"
+    fi
+    sleep 2
+  done
+}
+
+# Budget for an async structural apply to land in /list. A structural addOrUpdate returns
+# immediately at FENCING (accepted, not yet durable): the rule row is persisted only AFTER the
+# background schema fence confirms, and BanyanDB's meta→data-node schema sync can take 1-2 minutes,
+# so a single /list read right after the 2xx can miss the row. The /list assertions poll within this
+# budget (covers the fence timeout + the sync). ES/JDBC have no such fence — they land in under a
+# second, so the poll returns on its first iteration there.
+APPLY_LAND_S="${APPLY_LAND_S:-200}"
+
+# Poll GET /runtime/rule/list until the row for (catalog, rule_name) shows the expected status,
+# up to APPLY_LAND_S. Returns the matching JSON line on stdout for callers that inspect contentHash.
+#
+# Optional 3rd arg differ_hash: when set, the poll additionally requires the row's contentHash to
+# differ from it. This is the wait-condition for a STRUCTURAL update of an ALREADY-ACTIVE row — the
+# status is ACTIVE both before and after the async apply, so a status-only poll would return on the
+# first iteration with the OLD (pre-apply) contentHash, before the background fence→persist tail
+# has written the new content. Gating on "status==expected AND contentHash advanced" blocks until
+# the new content is durable and visible.
 list_row() {
   local expected_status="$1"
   local rule_name="${2:-${NAME}}"
-  log "runtime-rule list → looking for ${CATALOG}/${rule_name} status=${expected_status}"
-  local lines
-  lines="$(admin runtime-rule list)" \
-    || fail "runtime-rule list failed"
-  local match
-  match="$(echo "${lines}" | jq -c ".rules[] | select(.catalog==\"${CATALOG}\" and .name==\"${rule_name}\")" 2>/dev/null || true)"
-  [[ -n "${match}" ]] \
-    || fail "/list has no row for ${CATALOG}/${rule_name} (got: ${lines})"
-  local actual_status
-  actual_status="$(echo "${match}" | jq -r '.status')"
-  [[ "${actual_status}" == "${expected_status}" ]] \
-    || fail "expected /list status=${expected_status}, got '${actual_status}' (row: ${match})"
-  echo "${match}"
+  local differ_hash="${3:-}"
+  log "runtime-rule list → waiting (≤${APPLY_LAND_S}s) for ${CATALOG}/${rule_name} status=${expected_status}${differ_hash:+ contentHash≠${differ_hash:0:8}…}"
+  local deadline=$(( $(date +%s) + APPLY_LAND_S ))
+  local lines match actual_status="" actual_hash=""
+  while :; do
+    lines="$(admin runtime-rule list)" \
+      || fail "runtime-rule list failed"
+    match="$(echo "${lines}" | jq -c ".rules[] | select(.catalog==\"${CATALOG}\" and .name==\"${rule_name}\")" 2>/dev/null || true)"
+    if [[ -n "${match}" ]]; then
+      actual_status="$(echo "${match}" | jq -r '.status')"
+      actual_hash="$(echo "${match}" | jq -r '.contentHash')"
+      if [[ "${actual_status}" == "${expected_status}" \
+            && ( -z "${differ_hash}" || "${actual_hash}" != "${differ_hash}" ) ]]; then
+        echo "${match}"
+        return 0
+      fi
+    fi
+    if (( $(date +%s) >= deadline )); then
+      if [[ -n "${match}" ]]; then
+        fail "expected /list status=${expected_status}${differ_hash:+ with advanced contentHash}, got status='${actual_status}' hash='${actual_hash}' within ${APPLY_LAND_S}s (row: ${match})"
+      fi
+      fail "/list has no row for ${CATALOG}/${rule_name} within ${APPLY_LAND_S}s (got: ${lines})"
+    fi
+    sleep 2
+  done
 }
 
-# Assert that /list does NOT have a row for the given (catalog, name).
+# Poll until /list has NO row (or status n/a) for the given (catalog, name), up to APPLY_LAND_S.
+# A /delete?mode=revertToBundled runs the async apply pipeline (the bundled re-apply), so the row's
+# removal can lag the same way a structural apply's appearance does.
 list_no_row() {
   local rule_name="${1:-${NAME}}"
-  log "runtime-rule list → expect NO row for ${CATALOG}/${rule_name}"
-  local lines match
-  lines="$(admin runtime-rule list)" \
-    || fail "runtime-rule list failed"
-  match="$(echo "${lines}" | jq -c ".rules[] | select(.catalog==\"${CATALOG}\" and .name==\"${rule_name}\")" 2>/dev/null || true)"
-  if [[ -n "${match}" ]]; then
-    local status
+  log "runtime-rule list → waiting (≤${APPLY_LAND_S}s) for NO row for ${CATALOG}/${rule_name}"
+  local deadline=$(( $(date +%s) + APPLY_LAND_S ))
+  local lines match status
+  while :; do
+    lines="$(admin runtime-rule list)" \
+      || fail "runtime-rule list failed"
+    match="$(echo "${lines}" | jq -c ".rules[] | select(.catalog==\"${CATALOG}\" and .name==\"${rule_name}\")" 2>/dev/null || true)"
+    if [[ -z "${match}" ]]; then
+      return 0
+    fi
     status="$(echo "${match}" | jq -r '.status')"
-    [[ "${status}" == "n/a" ]] \
-      || fail "/list still has row for ${CATALOG}/${rule_name} status=${status} (row: ${match})"
-  fi
+    if [[ "${status}" == "n/a" ]]; then
+      return 0
+    fi
+    if (( $(date +%s) >= deadline )); then
+      fail "/list still has row for ${CATALOG}/${rule_name} status=${status} within ${APPLY_LAND_S}s (row: ${match})"
+    fi
+    sleep 2
+  done
 }
 
 # Per-phase entity scope. SHAPE-BREAK reshapes the metric from SERVICE to
@@ -445,6 +528,7 @@ log "=== Phase 1: CREATE seed-rule.yaml ==="
 step_set "create"
 resp="$(post_rule "${SEED_RULES_DIR}/seed-rule.yaml")"
 assert_apply_status "structural_applied" "${resp}"
+await_apply_terminal "${resp}"
 list_row "ACTIVE" >/dev/null
 hash_initial="$(list_row ACTIVE | jq -r '.contentHash')"
 log "  initial contentHash=${hash_initial}"
@@ -455,7 +539,10 @@ log "=== Phase 2: UPDATE-FILTER seed-rule-filter-only.yaml ==="
 step_set "update_filter"
 resp="$(post_rule "${SEED_RULES_DIR}/seed-rule-filter-only.yaml")"
 assert_apply_status "filter_only_applied" "${resp}"
-hash_filter_only="$(list_row ACTIVE | jq -r '.contentHash')"
+await_apply_terminal "${resp}"
+# FILTER_ONLY persists synchronously (no applyId), so the new hash is already durable here; the
+# differ-gate still hardens the read against any list lag and proves the row actually advanced.
+hash_filter_only="$(list_row ACTIVE "${NAME}" "${hash_initial}" | jq -r '.contentHash')"
 [[ "${hash_filter_only}" != "${hash_initial}" ]] \
   || fail "FILTER_ONLY apply did not advance /list contentHash"
 log "  contentHash advanced to ${hash_filter_only}"
@@ -466,7 +553,11 @@ log "=== Phase 3: UPDATE-STRUCTURAL seed-rule-structural.yaml ==="
 step_set "structural"
 resp="$(post_rule "${SEED_RULES_DIR}/seed-rule-structural.yaml" "allowStorageChange=true")"
 assert_apply_status "structural_applied" "${resp}"
-hash_structural="$(list_row ACTIVE | jq -r '.contentHash')"
+await_apply_terminal "${resp}"
+# STRUCTURAL update of an already-ACTIVE row: status stays ACTIVE across the async apply, so gate on
+# the contentHash advancing past the filter-only hash, not just on status — otherwise the read races
+# the background fence→persist tail and returns the stale pre-apply hash.
+hash_structural="$(list_row ACTIVE "${NAME}" "${hash_filter_only}" | jq -r '.contentHash')"
 [[ "${hash_structural}" != "${hash_filter_only}" ]] \
   || fail "STRUCTURAL apply did not advance /list contentHash"
 log "  contentHash advanced to ${hash_structural}"
@@ -569,6 +660,7 @@ log "=== Phase 5h: HAPPY-PATH + RESTART dynamic-LAYER round-trip ==="
 struct_baseline="$(latest_bucket_id_for_step "e2e_rr_requests" "structural")"
 resp="$(post_rule "${SEED_RULES_DIR}/seed-rule-sibling-with-layer.yaml" "" "${SIBLING_NAME}")"
 assert_apply_status "structural_applied" "${resp}"
+await_apply_terminal "${resp}" "${SIBLING_NAME}"
 list_row "ACTIVE" "${SIBLING_NAME}" >/dev/null
 sleep 2
 layers_after_create="$(swctl --display yaml \
@@ -644,6 +736,7 @@ step_set "shape_break_new"
 log "  POST INSTANCE-scope rule v4"
 resp="$(post_rule "${SEED_RULES_DIR}/seed-rule-instance.yaml")"
 assert_apply_status "structural_applied" "${resp}"
+await_apply_terminal "${resp}"
 hash_shape_break="$(list_row ACTIVE | jq -r '.contentHash')"
 log "  contentHash after shape break = ${hash_shape_break}"
 # Rule v4 is INSTANCE-scope; swctl now needs --instance-name to resolve
@@ -670,6 +763,7 @@ resp="$(post_rule "${SEED_RULES_DIR}/seed-rule-instance.yaml")"
 status="$(echo "${resp}" | jq -r '.applyStatus // empty')"
 [[ "${status}" == "structural_applied" || "${status}" == "no_change" ]] \
   || fail "ACTIVATE: unexpected applyStatus=${status} (full: ${resp})"
+await_apply_terminal "${resp}"
 list_row "ACTIVE" >/dev/null
 await_metric_for_step "e2e_rr_requests" "activate"
 # NOTE: we do NOT re-assert "no step=inactivate rows" here. Phase 7's in-window
diff --git a/test/e2e-v2/cases/storage/expected/config-dump.yml b/test/e2e-v2/cases/storage/expected/config-dump.yml
index e2a37ee996cb..960e6efc809b 100644
--- a/test/e2e-v2/cases/storage/expected/config-dump.yml
+++ b/test/e2e-v2/cases/storage/expected/config-dump.yml
@@ -183,6 +183,7 @@
   "receiver-pprof.provider": "default",
   "receiver-profile.provider": "default",
   "receiver-register.provider": "default",
+  "receiver-runtime-rule.default.deferredFenceTimeoutSeconds": "180",
   "receiver-runtime-rule.default.refreshRulesPeriod": "30",
   "receiver-runtime-rule.default.selfHealThresholdSeconds": "60",
   "receiver-runtime-rule.provider": "default",