feat(service): async context propagation for task executor by flyingImer · Pull Request #4061 · apache/polaris

flyingImer · 2026-03-25T21:00:22Z

This PR adds RequestIdHolder and a concrete context propagation helper for TaskExecutorImpl. Fixes #3444

Problem

When TaskExecutorImpl schedules async work, the task runs on a different thread with a fresh CDI request scope. Request-scoped context (realm, principal, request ID) was previously propagated via ad-hoc hardcoded logic, and request IDs were not propagated at all, since the only way to read them was through RESTEasy's internal CurrentRequestManager API, which is unavailable on task threads.

Solution

RequestIdHolder is a new @RequestScoped CDI bean replacing the removed ServiceProducers.requestIdSupplier() that depended on RESTEasy internals. It produces RequestIdSupplier via CDI so any component can inject it without depending on JAX-RS types. RequestIdFilter now writes to this holder on each request.

TaskContextPropagator is a package-private helper that captures realm, principal, and request ID on the request thread and restores them into the task thread's fresh CDI request scope. It directly injects RealmContextHolder, PolarisPrincipalHolder, and RequestIdHolder. No new SPI or extension point is introduced. The implementation follows the same pattern as Bootstrapper.

CurrentRequestManager is no longer referenced anywhere in the codebase.

Out of scope (follow-up candidates)

MDC propagation: request ID is not currently written to SLF4J MDC on task threads. Can be added in a follow-up.
X-Request-ID header validation: client-supplied header is used verbatim. Pre-existing behavior in RequestIdFilter, not introduced by this PR.

Checklist

Don't disclose security issues! (contact security@apache.org)
Clearly explained why the changes are needed, or linked related issues: Fixes Provide a robust CDI way to inject request_ids #3444
Added/updated tests with good coverage, or manually tested (and explained how)
- Unit tests for TaskContextPropagator (capture, restore, round-trip)
- TaskExecutorImplTest updated for new constructor signature
Added comments for complex logic
Updated CHANGELOG.md (if needed)
Updated documentation in site/content/in-dev/unreleased (if needed)

Disclaimer

Javadoc is mainly assisted by coding agent.

dimas-b

Hi @flyingImer ! Good idea to normalize the code that deals with async context values propagation! Some comments and suggestions below.

jbonofre

Overall good to me.

Can you please fix the spotless issues ? I will do a new pass after.

Thanks !

flyingImer · 2026-03-30T19:47:30Z

Thanks @dimas-b and @jbonofre. I’ve addressed the earlier feedback and fixed the CI issues. The latest run is green now.

When you have a chance, could you please take another look at the current diff? Thanks!

flyingImer · 2026-04-02T17:15:27Z

@dimas-b I think your latest comments convinced me the shape issue is central enough to address in this PR.

My current cut is to keep the scope narrow, but switch this SPI to a state/action model so callers no longer deal with raw Object state, and TaskExecutorImpl only deals with one captured object per propagator.

I may still keep a generic cleanup hook with a default no-op, but the intent there would be generic lifecycle cleanup, not letting MDC shape the main propagation contract.

I also plan to clean up the smaller @Unremovable and RequestIdHolder points while touching this area.

Does that sound like the right cut?

dimas-b · 2026-04-02T20:42:35Z

@flyingImer : I'd prefer to keep MDC out of this PR (feel free to improve MDC data in a follow-up PR). I believe it is a totally different concern, not related to Request Context. Let's keep this PR focused on CDI concerns.

Looking forward to a new diff.

dimas-b

Thanks for bearing with me, @flyingImer 🙂

dimas-b · 2026-04-03T23:48:55Z

+   */
+  CapturedTaskContext capture() {
+    return new CapturedTaskContext(
+        realmContextHolder.get(),


nit: for good measure, it might be best to create a copy for the realm context too. It's similar to the principal in many aspects.

Request ID is just a String, so it's fine to reuse it.

Thanks for calling this out! Let me tackle it in a follow up pr later

callContext is cloned at submission and already carries the realm context we care about. So CapturedTaskContext doesn't need to store realm. In fact, downstream code use the one from the callContext in multiple places:

https://github.com/flyrain/polaris/blob/cfb175441de39bae1bdc5b362958b2029e56b614/runtime/service/src/main/java/org/apache/polaris/service/task/TaskExecutorImpl.java#L215.

https://github.com/flyrain/polaris/blob/cfb175441de39bae1bdc5b362958b2029e56b614/runtime/service/src/main/java/org/apache/polaris/service/task/TaskExecutorImpl.java#L290

Looks like we didn't change these places. I'd suggest to avoid having the realmContext here to avoid any future fragmentation, as we have two ways to use realmContext now.

You're right that realm currently flows through two paths on the worker (callContext.getRealmContext() at L206/L276 and the holder via captured state). The fragmentation is real but between the clone path (pre-existing) and the holder path (this PR), not within CapturedTaskContext itself.

Holder propagation is structurally required regardless: RangerPolarisAuthorizerFactory and PolarisEventMetadataFactory @Inject RealmContext directly, so the holder must be populated even if callContext is also passed.

Current CapturedTaskContext shape ({realm, principal, requestId}) matches the follow-up's target state. The CallContext.copy() removal follow-up (thread on TaskExecutorImpl.java:148) drops the callContext parameter and has the worker produce CallContext via CDI, collapsing the two paths into one without changing CapturedTaskContext.

Moving realm out now would require adding it back in the follow-up. Principal precedent: principal is captured as a value, not bundled in a bean, so keeping realm symmetric aligns with existing convention.

My comment is NOT related to holder and principal. It is about the duplicated realm context pass-by. Why do we need to add the real context back? We can always visit it via call context.

The two realm paths today are a transitional overlap: (a) callContext clone parameter, (b) CapturedTaskContext → RealmContextHolder. Path (b) is the mechanism taking over, path (a) is the legacy being retired.

After the follow-up (thread), callContext stops being a cross-thread parcel on the worker side. It becomes an @Inject-produced bean in the worker's request scope, the same role it has on HTTP threads. CDI produces it using RealmContext from the holder that TaskContextPropagator populated. At that point, the only mechanism for realm to reach the worker is captured → holder → CDI. Realm must be in CapturedTaskContext for that to work.

So the current shape ({realm, principal, requestId} captured) is the end-state shape, not a duplication choice. Dropping realm from CapturedTaskContext in this PR would need to re-add it in the follow-up.

@flyrain Stepping back to make sure we're aligned on both the near-term shape and the longer-term direction.

Why CapturedTaskContext.realmContext stays

Reading your concern as "don't let worker threads grow a second realm access path on top of what CallContext already provides" — that's a legitimate design principle, and I think the PR's shape actually serves it rather than fighting it.

RealmContextHolder isn't introduced here. It's been in the repo since the initial commit, and existing HTTP-path code already injects realm through it (e.g. RangerPolarisAuthorizerFactory, PolarisEventMetadataFactory). What PR #4061 adds is a capture/restore pair that populates the holder on the worker's fresh request scope, so the existing CDI convention keeps working across the async boundary instead of breaking at it.

CapturedTaskContext.realmContext is specifically the value that feeds that holder on the worker side. Dropping it would leave the worker's holder empty, which breaks not just @Inject RealmContext but also the CallContext producer itself (the producer reads RealmContext through the holder). So it's load-bearing, not additive.

What the follow-up collapses

The current two-path situation on the worker (clone .getRealmContext() vs holder-backed access) is transitional. The follow-up @adutra acked does:

Drop the CallContext parameter from the three worker methods (tryHandleTask, handleTaskWithTracing, handleTask)

Replace it with @Inject CallContext as a field

Drop the callContext.copy() call in addTaskHandlerContext

After that, the worker's CallContext is produced by CDI from the populated holder, the same way HTTP threads produce it today. One data source (RealmContextHolder), two access writings (@Inject RealmContext and ctx.getRealmContext()) that converge on the same instance. Same composition pattern the HTTP path uses already.

If "one access pattern" is the goal

Worth naming that there's a direction here. The two writings aren't symmetric (below sheet created with help of Claude code):

@Inject RealmContext ctx.getRealmContext()

Source RealmContextHolder same, via CallContext producer

Instance identity original original (post-follow-up); lambda repackaging (today's clone path)

Dependency declared narrow full bag

CDI idiomatic yes wrapper

Covers every realm-access scenario yes yes

@Inject RealmContext covers every scenario ctx.getRealmContext() does, and it's the narrower, more idiomatic form. The reverse direction (collapsing toward ctx.getRealmContext()) would require retrofitting every existing @Inject RealmContext site to go through CallContext, which widens dependencies and is the path the codebase has historically not taken.

So if we want to reduce to a single access API in the longer run, I'd suggest the right move is to deprecate CallContext.getRealmContext() in favor of @Inject RealmContext. Separate RFC, not in this PR's scope, but I'm happy to open it after this lands.

Want to confirm we're aligned on:

CapturedTaskContext.realmContext stays in this PR (it's the infrastructure, not a duplicate)

The follow-up collapses the transitional two-path state to one data source

Any further collapse to a single access API is a separate RFC, and the direction is deprecating ctx.getRealmContext() rather than the other way around

If any of those three don't match your read, happy to dig in further.

I think CapturedTaskContext.realmContext should still be a fresh object reusing only the realm ID (String) from the parent request context.

RealmContext can be a CDI beam, and in that case it will not be reusable after its owner context is terminated.

I'm fine with dealing with CallContext in a follow-up PR.

flyingImer · 2026-04-07T00:39:49Z

Hi @jbonofre, I addressed the earlier feedback and the latest CI is green now.
Dimas already approved the current diff.
When you have a chance, could you take a quick final look? Thanks!

adutra · 2026-04-08T13:00:44Z

Worth noting: this PR introduces a new TaskContextPropagator bean. The bean is functionally equivalent to a custom org.eclipse.microprofile.context.spi.ThreadContextProvider.

I wonder if it wouldn't be cleaner to just wire up a custom ThreadContextProvider instead of providing an equivalent class that must be invoked explicitly at the beginning of each task.

The custom ThreadContextProvider would still require to clear ThreadContext.CDI + @ActivateRequestContext + manual population of beans. So it wouldn't look radically different for sure, but we would at least benefit from implicit execution.

dimas-b · 2026-04-08T14:21:20Z

I wonder if it wouldn't be cleaner to just wire up a custom ThreadContextProvider instead of providing an equivalent class that must be invoked explicitly at the beginning of each task.

+1

dimas-b · 2026-04-08T14:24:56Z

A note on the "Holder" class pattern (RealmContextHolder, PolarisPrincipalHolder, etc.):

These classes are meant to allow Polaris code to manage corresponding request-scoped data without relying on the REST framework (ContainerRequestContext) during async task execution.

flyingImer · 2026-04-22T21:39:45Z

Worth noting: this PR introduces a new TaskContextPropagator bean. The bean is functionally equivalent to a custom org.eclipse.microprofile.context.spi.ThreadContextProvider.

I wonder if it wouldn't be cleaner to just wire up a custom ThreadContextProvider instead of providing an equivalent class that must be invoked explicitly at the beginning of each task.

The custom ThreadContextProvider would still require to clear ThreadContext.CDI + @ActivateRequestContext + manual population of beans. So it wouldn't look radically different for sure, but we would at least benefit from implicit execution.

@adutra @dimas-b
I spiked this. It can work, but the cost outweighs the benefit. You're right that .cleared(CDI) + @ActivateRequestContext + manual holder population stays the same either way. The difference is in what the SPI route adds on top IIUC.

The provider is ServiceLoader-instantiated, so no @Inject. Holder lookups go through CDI.current(). That's livable on its own, but resolving the principal during capture triggers Mutiny context propagation, which re-enters the provider. You need a recursion guard to break the cycle.
The provider also fires globally, not just for task submissions. Bootstrapper has no request scope when it submits, so you need a scope-active check. Same for any future executor user.
There's also the question of who owns the worker's request scope. .cleared(CDI) and the custom provider both want to manage it. Either you rely on SmallRye's ordering between built-in and ServiceLoader providers, or the provider takes over entirely. Neither is clean.

Bottomline: ~50 lines of defensive code to remove ~3 explicit lines from TaskExecutorImpl. The explicit helper avoids all of that because it's a regular CDI bean.

The implicit execution benefit is real but thin here. We're propagating three app-specific values at one call site, not a cross-cutting concern. I'd keep the explicit helper. WDYT?

dimas-b · 2026-04-23T03:40:48Z

@flyingImer : so, you prefer to go with the current approach in this PR?

flyingImer · 2026-04-23T19:02:49Z

@flyingImer : so, you prefer to go with the current approach in this PR?

@dimas-b yes, I believe this is a balanced approach at the moment

adutra · 2026-04-27T08:44:43Z

That's livable on its own, but resolving the principal during capture triggers Mutiny context propagation, which re-enters the provider. You need a recursion guard to break the cycle.

That's true, I had to implement a re-entrance guard in my own attempt. Not the end of the world, but I agree that it adds some complexity.

OK, I think I'm fine with the current approach.

flyingImer · 2026-04-28T00:34:33Z

@dimas-b @adutra @jbonofre I think the group is aligned on the PR impl directions and they are now reflected in the PR. Could you please take another look once get a chance? Would like to get it merged, so that I can follow up with other corresponding PRs

flyingImer · 2026-04-28T21:34:24Z

@dimas-b @adutra @jbonofre Addressed the latest inline comments: TaskExecutorImpl is now package-private (a4750ba), and the CallContext.copy() removal is covered in the thread reply as a follow-up PR. CI is green, open threads resolved. Could you take a final pass?

flyrain

Thanks @flyingImer for the change. The PR looks great overall. The only major concern is the extra realmContext we passed into the task execution.

flyrain · 2026-04-30T01:41:03Z

+    return new CapturedTaskContext(
+        realmContextHolder.get(),
+        ImmutablePolarisPrincipal.builder().from(polarisPrincipal).build(),
+        requestIdHolder.get());


I think we should also include the callContext bean here. I'm fine with a followup though.

The direction agreed on the TaskExecutorImpl.java:148 thread is removing callContext from the worker path (follow-up PR), not bundling it into CapturedTaskContext. See the thread on TaskContextPropagator.java:75 for the fragmentation discussion.

Can you post the link? I don't think we can easily remove the call context. There are downstream reference to its field PolarisCallContext

adutra's ack on the follow-up direction: #4061 (comment)

To be precise: CallContext the class stays, and downstream code keeps calling ctx.getPolarisCallContext() / ctx.getRealmContext() unchanged. What changes is how callContext appears on the worker.

Today, the submitter clones callContext and passes the clone through tryHandleTask / handleTaskWithTracing / handleTask as a parameter. It's a plain object ferried across threads.

After the follow-up, the worker gets callContext the same way HTTP threads already get it: via @Inject CallContext, with CDI producing it in the worker's request scope from the populated holders. The role shifts from "parcel passed across the boundary" to "bean produced in the local scope", which is its normal role elsewhere in the codebase. Consumer code at the call site is unchanged.

flyrain · 2026-04-30T01:49:35Z

+   */
+  CapturedTaskContext capture() {
+    return new CapturedTaskContext(
+        realmContextHolder.get(),


callContext is cloned at submission and already carries the realm context we care about. So CapturedTaskContext doesn't need to store realm. In fact, downstream code use the one from the callContext in multiple places:

https://github.com/flyrain/polaris/blob/cfb175441de39bae1bdc5b362958b2029e56b614/runtime/service/src/main/java/org/apache/polaris/service/task/TaskExecutorImpl.java#L215.

https://github.com/flyrain/polaris/blob/cfb175441de39bae1bdc5b362958b2029e56b614/runtime/service/src/main/java/org/apache/polaris/service/task/TaskExecutorImpl.java#L290

Looks like we didn't change these places. I'd suggest to avoid having the realmContext here to avoid any future fragmentation, as we have two ways to use realmContext now.

… for context management

…e helper

Move CapturedTaskContext from a nested record in TaskContextPropagator to a standalone public record, as it is exposed in the protected handleTaskWithTracing method signature.

dimas-b

LGTM overall, just a couple of minor concerns remaining from my side. One is below. I'll comment separately on the other one.

dimas-b · 2026-05-06T21:48:03Z

+    try {
+      return Optional.ofNullable(requestIdHolder.get());
+    } catch (ContextNotActiveException e) {
+      // No active request scope (e.g. background thread without @ActivateRequestContext).


I'm still hesitant about this... This method is called on creating a PolarisEventMetadata. If request context is not active at that time, it would be a logical (coding) mistake.

I tend to think we should not try to catch this exception. WDYT?

github-project-automation Bot added this to Basic Kanban Board Mar 25, 2026

github-project-automation Bot moved this to PRs In Progress in Basic Kanban Board Mar 25, 2026

flyingImer mentioned this pull request Mar 25, 2026

Provide a robust CDI way to inject request_ids #3444

Open

flyingImer force-pushed the async branch from dfb3a00 to 0968869 Compare March 25, 2026 21:10

dimas-b reviewed Mar 25, 2026

View reviewed changes

flyingImer requested a review from dimas-b March 25, 2026 22:16

jbonofre reviewed Mar 27, 2026

View reviewed changes

flyingImer requested a review from jbonofre March 27, 2026 18:50

dimas-b reviewed Mar 30, 2026

View reviewed changes

flyingImer requested a review from dimas-b April 2, 2026 21:22

dimas-b reviewed Apr 2, 2026

View reviewed changes

flyingImer changed the title ~~feat(service): async context propagation SPI for task executor~~ feat(service): async context propagation for task executor Apr 3, 2026

flyingImer requested a review from dimas-b April 3, 2026 21:09

dimas-b previously approved these changes Apr 3, 2026

View reviewed changes

github-project-automation Bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board Apr 3, 2026

adutra reviewed Apr 7, 2026

View reviewed changes

dimas-b mentioned this pull request Apr 10, 2026

JDBC: Replace coarse-grained synchronized methods with per-realm locking #4054

Merged

6 tasks

flyingImer dismissed dimas-b’s stale review via f5b878e April 17, 2026 01:21

flyingImer force-pushed the async branch from e1e1005 to f5b878e Compare April 17, 2026 01:21

flyingImer force-pushed the async branch from f5b878e to 7853252 Compare April 22, 2026 21:52

flyingImer requested a review from adutra April 23, 2026 18:52

flyingImer requested a review from dimas-b April 23, 2026 18:52

flyingImer force-pushed the async branch 2 times, most recently from 757d98e to fbbd34e Compare April 27, 2026 22:47

adutra reviewed Apr 28, 2026

View reviewed changes

Comment thread runtime/service/src/main/java/org/apache/polaris/service/task/TaskExecutorImpl.java

Comment thread runtime/service/src/main/java/org/apache/polaris/service/task/TaskExecutorImpl.java

adutra previously approved these changes Apr 29, 2026

View reviewed changes

flyrain reviewed Apr 30, 2026

View reviewed changes

flyingImer dismissed adutra’s stale review via f55d384 April 30, 2026 17:47

flyingImer force-pushed the async branch from a4750ba to f55d384 Compare April 30, 2026 17:47

flyingImer added 13 commits May 5, 2026 19:14

feat(service): async context propagation SPI for task executor

215ddd9

docs: add async context propagation to changelog

01680a7

refactor(service): align RequestIdHolder with sibling holder patterns

8cab873

chore(service): align test method names with repo testXxx convention

1fbd057

chore(service): apply spotless formatting

0369536

chore: incorporate with comments

8f52108

fix: CI

512595c

refactor(service): update AsyncContextPropagator to use RestoreAction…

3f770ad

… for context management

refactor(service): replace async context propagation SPI with concret…

bbeb784

…e helper

refactor(service): use RequestIdHolder in event metadata factory

dadc46f

refactor(service): make CapturedTaskContext public and top-level

4aba4b0

Move CapturedTaskContext from a nested record in TaskContextPropagator to a standalone public record, as it is exposed in the protected handleTaskWithTracing method signature.

refactor(service): make TaskExecutorImpl package-private

2c4eb2f

docs(service): trim RequestIdHolder javadoc to contract-only

59e4389

flyingImer force-pushed the async branch from f55d384 to 59e4389 Compare May 5, 2026 19:14

dimas-b reviewed May 6, 2026

View reviewed changes

	`@Inject RealmContext`	`ctx.getRealmContext()`
Source	`RealmContextHolder`	same, via `CallContext` producer
Instance identity	original	original (post-follow-up); lambda repackaging (today's clone path)
Dependency declared	narrow	full bag
CDI idiomatic	yes	wrapper
Covers every realm-access scenario	yes	yes

Conversation

flyingImer commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Out of scope (follow-up candidates)

Checklist

Disclaimer

Uh oh!

dimas-b left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jbonofre left a comment

Choose a reason for hiding this comment

Uh oh!

flyingImer commented Mar 30, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

flyingImer commented Apr 2, 2026

Uh oh!

dimas-b commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dimas-b left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

flyingImer May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

flyingImer commented Apr 7, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

adutra commented Apr 8, 2026

Uh oh!

dimas-b commented Apr 8, 2026

Uh oh!

dimas-b commented Apr 8, 2026

flyingImer commented Mar 25, 2026 •

edited

Loading

dimas-b commented Apr 2, 2026 •

edited

Loading

flyingImer May 5, 2026 •

edited

Loading

flyingImer Apr 30, 2026 •

edited

Loading