feat(service): async context propagation for task executor#4061
feat(service): async context propagation for task executor#4061flyingImer wants to merge 13 commits into
Conversation
dimas-b
left a comment
There was a problem hiding this comment.
Hi @flyingImer ! Good idea to normalize the code that deals with async context values propagation! Some comments and suggestions below.
jbonofre
left a comment
There was a problem hiding this comment.
Overall good to me.
Can you please fix the spotless issues ? I will do a new pass after.
Thanks !
|
@dimas-b I think your latest comments convinced me the shape issue is central enough to address in this PR. My current cut is to keep the scope narrow, but switch this SPI to a state/action model so callers no longer deal with raw I may still keep a generic cleanup hook with a default no-op, but the intent there would be generic lifecycle cleanup, not letting MDC shape the main propagation contract. I also plan to clean up the smaller Does that sound like the right cut? |
|
@flyingImer : I'd prefer to keep MDC out of this PR (feel free to improve MDC data in a follow-up PR). I believe it is a totally different concern, not related to Request Context. Let's keep this PR focused on CDI concerns. Looking forward to a new diff. |
dimas-b
left a comment
There was a problem hiding this comment.
Thanks for bearing with me, @flyingImer 🙂
| */ | ||
| CapturedTaskContext capture() { | ||
| return new CapturedTaskContext( | ||
| realmContextHolder.get(), |
There was a problem hiding this comment.
nit: for good measure, it might be best to create a copy for the realm context too. It's similar to the principal in many aspects.
Request ID is just a String, so it's fine to reuse it.
There was a problem hiding this comment.
Thanks for calling this out! Let me tackle it in a follow up pr later
There was a problem hiding this comment.
callContext is cloned at submission and already carries the realm context we care about. So CapturedTaskContext doesn't need to store realm. In fact, downstream code use the one from the callContext in multiple places:
- https://github.com/flyrain/polaris/blob/cfb175441de39bae1bdc5b362958b2029e56b614/runtime/service/src/main/java/org/apache/polaris/service/task/TaskExecutorImpl.java#L215.
- https://github.com/flyrain/polaris/blob/cfb175441de39bae1bdc5b362958b2029e56b614/runtime/service/src/main/java/org/apache/polaris/service/task/TaskExecutorImpl.java#L290
Looks like we didn't change these places. I'd suggest to avoid having the realmContext here to avoid any future fragmentation, as we have two ways to use realmContext now.
There was a problem hiding this comment.
You're right that realm currently flows through two paths on the worker (callContext.getRealmContext() at L206/L276 and the holder via captured state). The fragmentation is real but between the clone path (pre-existing) and the holder path (this PR), not within CapturedTaskContext itself.
Holder propagation is structurally required regardless: RangerPolarisAuthorizerFactory and PolarisEventMetadataFactory @Inject RealmContext directly, so the holder must be populated even if callContext is also passed.
Current CapturedTaskContext shape ({realm, principal, requestId}) matches the follow-up's target state. The CallContext.copy() removal follow-up (thread on TaskExecutorImpl.java:148) drops the callContext parameter and has the worker produce CallContext via CDI, collapsing the two paths into one without changing CapturedTaskContext.
Moving realm out now would require adding it back in the follow-up. Principal precedent: principal is captured as a value, not bundled in a bean, so keeping realm symmetric aligns with existing convention.
There was a problem hiding this comment.
My comment is NOT related to holder and principal. It is about the duplicated realm context pass-by. Why do we need to add the real context back? We can always visit it via call context.
There was a problem hiding this comment.
The two realm paths today are a transitional overlap: (a) callContext clone parameter, (b) CapturedTaskContext → RealmContextHolder. Path (b) is the mechanism taking over, path (a) is the legacy being retired.
After the follow-up (thread), callContext stops being a cross-thread parcel on the worker side. It becomes an @Inject-produced bean in the worker's request scope, the same role it has on HTTP threads. CDI produces it using RealmContext from the holder that TaskContextPropagator populated. At that point, the only mechanism for realm to reach the worker is captured → holder → CDI. Realm must be in CapturedTaskContext for that to work.
So the current shape ({realm, principal, requestId} captured) is the end-state shape, not a duplication choice. Dropping realm from CapturedTaskContext in this PR would need to re-add it in the follow-up.
There was a problem hiding this comment.
@flyrain Stepping back to make sure we're aligned on both the near-term shape and the longer-term direction.
Why CapturedTaskContext.realmContext stays
Reading your concern as "don't let worker threads grow a second realm access path on top of what CallContext already provides" — that's a legitimate design principle, and I think the PR's shape actually serves it rather than fighting it.
RealmContextHolder isn't introduced here. It's been in the repo since the initial commit, and existing HTTP-path code already injects realm through it (e.g. RangerPolarisAuthorizerFactory, PolarisEventMetadataFactory). What PR #4061 adds is a capture/restore pair that populates the holder on the worker's fresh request scope, so the existing CDI convention keeps working across the async boundary instead of breaking at it.
CapturedTaskContext.realmContext is specifically the value that feeds that holder on the worker side. Dropping it would leave the worker's holder empty, which breaks not just @Inject RealmContext but also the CallContext producer itself (the producer reads RealmContext through the holder). So it's load-bearing, not additive.
What the follow-up collapses
The current two-path situation on the worker (clone .getRealmContext() vs holder-backed access) is transitional. The follow-up @adutra acked does:
- Drop the
CallContextparameter from the three worker methods (tryHandleTask,handleTaskWithTracing,handleTask) - Replace it with
@Inject CallContextas a field - Drop the
callContext.copy()call inaddTaskHandlerContext
After that, the worker's CallContext is produced by CDI from the populated holder, the same way HTTP threads produce it today. One data source (RealmContextHolder), two access writings (@Inject RealmContext and ctx.getRealmContext()) that converge on the same instance. Same composition pattern the HTTP path uses already.
If "one access pattern" is the goal
Worth naming that there's a direction here. The two writings aren't symmetric (below sheet created with help of Claude code):
@Inject RealmContext |
ctx.getRealmContext() |
|
|---|---|---|
| Source | RealmContextHolder |
same, via CallContext producer |
| Instance identity | original | original (post-follow-up); lambda repackaging (today's clone path) |
| Dependency declared | narrow | full bag |
| CDI idiomatic | yes | wrapper |
| Covers every realm-access scenario | yes | yes |
@Inject RealmContext covers every scenario ctx.getRealmContext() does, and it's the narrower, more idiomatic form. The reverse direction (collapsing toward ctx.getRealmContext()) would require retrofitting every existing @Inject RealmContext site to go through CallContext, which widens dependencies and is the path the codebase has historically not taken.
So if we want to reduce to a single access API in the longer run, I'd suggest the right move is to deprecate CallContext.getRealmContext() in favor of @Inject RealmContext. Separate RFC, not in this PR's scope, but I'm happy to open it after this lands.
Want to confirm we're aligned on:
CapturedTaskContext.realmContextstays in this PR (it's the infrastructure, not a duplicate)- The follow-up collapses the transitional two-path state to one data source
- Any further collapse to a single access API is a separate RFC, and the direction is deprecating
ctx.getRealmContext()rather than the other way around
If any of those three don't match your read, happy to dig in further.
There was a problem hiding this comment.
I think CapturedTaskContext.realmContext should still be a fresh object reusing only the realm ID (String) from the parent request context.
RealmContext can be a CDI beam, and in that case it will not be reusable after its owner context is terminated.
There was a problem hiding this comment.
I'm fine with dealing with CallContext in a follow-up PR.
|
Hi @jbonofre, I addressed the earlier feedback and the latest CI is green now. |
|
Worth noting: this PR introduces a new I wonder if it wouldn't be cleaner to just wire up a custom The custom |
+1 |
|
A note on the "Holder" class pattern ( These classes are meant to allow Polaris code to manage corresponding request-scoped data without relying on the REST framework ( |
@adutra @dimas-b
Bottomline: ~50 lines of defensive code to remove ~3 explicit lines from TaskExecutorImpl. The explicit helper avoids all of that because it's a regular CDI bean. The implicit execution benefit is real but thin here. We're propagating three app-specific values at one call site, not a cross-cutting concern. I'd keep the explicit helper. WDYT? |
|
@flyingImer : so, you prefer to go with the current approach in this PR? |
@dimas-b yes, I believe this is a balanced approach at the moment |
That's true, I had to implement a re-entrance guard in my own attempt. Not the end of the world, but I agree that it adds some complexity. OK, I think I'm fine with the current approach. |
757d98e to
fbbd34e
Compare
flyrain
left a comment
There was a problem hiding this comment.
Thanks @flyingImer for the change. The PR looks great overall. The only major concern is the extra realmContext we passed into the task execution.
| return new CapturedTaskContext( | ||
| realmContextHolder.get(), | ||
| ImmutablePolarisPrincipal.builder().from(polarisPrincipal).build(), | ||
| requestIdHolder.get()); |
There was a problem hiding this comment.
I think we should also include the callContext bean here. I'm fine with a followup though.
There was a problem hiding this comment.
The direction agreed on the TaskExecutorImpl.java:148 thread is removing callContext from the worker path (follow-up PR), not bundling it into CapturedTaskContext. See the thread on TaskContextPropagator.java:75 for the fragmentation discussion.
There was a problem hiding this comment.
Can you post the link? I don't think we can easily remove the call context. There are downstream reference to its field PolarisCallContext
There was a problem hiding this comment.
adutra's ack on the follow-up direction: #4061 (comment)
To be precise: CallContext the class stays, and downstream code keeps calling ctx.getPolarisCallContext() / ctx.getRealmContext() unchanged. What changes is how callContext appears on the worker.
Today, the submitter clones callContext and passes the clone through tryHandleTask / handleTaskWithTracing / handleTask as a parameter. It's a plain object ferried across threads.
After the follow-up, the worker gets callContext the same way HTTP threads already get it: via @Inject CallContext, with CDI producing it in the worker's request scope from the populated holders. The role shifts from "parcel passed across the boundary" to "bean produced in the local scope", which is its normal role elsewhere in the codebase. Consumer code at the call site is unchanged.
| */ | ||
| CapturedTaskContext capture() { | ||
| return new CapturedTaskContext( | ||
| realmContextHolder.get(), |
There was a problem hiding this comment.
callContext is cloned at submission and already carries the realm context we care about. So CapturedTaskContext doesn't need to store realm. In fact, downstream code use the one from the callContext in multiple places:
- https://github.com/flyrain/polaris/blob/cfb175441de39bae1bdc5b362958b2029e56b614/runtime/service/src/main/java/org/apache/polaris/service/task/TaskExecutorImpl.java#L215.
- https://github.com/flyrain/polaris/blob/cfb175441de39bae1bdc5b362958b2029e56b614/runtime/service/src/main/java/org/apache/polaris/service/task/TaskExecutorImpl.java#L290
Looks like we didn't change these places. I'd suggest to avoid having the realmContext here to avoid any future fragmentation, as we have two ways to use realmContext now.
… for context management
Move CapturedTaskContext from a nested record in TaskContextPropagator to a standalone public record, as it is exposed in the protected handleTaskWithTracing method signature.
dimas-b
left a comment
There was a problem hiding this comment.
LGTM overall, just a couple of minor concerns remaining from my side. One is below. I'll comment separately on the other one.
| try { | ||
| return Optional.ofNullable(requestIdHolder.get()); | ||
| } catch (ContextNotActiveException e) { | ||
| // No active request scope (e.g. background thread without @ActivateRequestContext). |
There was a problem hiding this comment.
I'm still hesitant about this... This method is called on creating a PolarisEventMetadata. If request context is not active at that time, it would be a logical (coding) mistake.
I tend to think we should not try to catch this exception. WDYT?
This PR adds RequestIdHolder and a concrete context propagation helper for TaskExecutorImpl. Fixes #3444
Problem
When TaskExecutorImpl schedules async work, the task runs on a different thread with a fresh CDI request scope. Request-scoped context (realm, principal, request ID) was previously propagated via ad-hoc hardcoded logic, and request IDs were not propagated at all, since the only way to read them was through RESTEasy's internal CurrentRequestManager API, which is unavailable on task threads.
Solution
RequestIdHolder is a new @RequestScoped CDI bean replacing the removed ServiceProducers.requestIdSupplier() that depended on RESTEasy internals. It produces RequestIdSupplier via CDI so any component can inject it without depending on JAX-RS types. RequestIdFilter now writes to this holder on each request.
TaskContextPropagator is a package-private helper that captures realm, principal, and request ID on the request thread and restores them into the task thread's fresh CDI request scope. It directly injects RealmContextHolder, PolarisPrincipalHolder, and RequestIdHolder. No new SPI or extension point is introduced. The implementation follows the same pattern as Bootstrapper.
CurrentRequestManager is no longer referenced anywhere in the codebase.
Out of scope (follow-up candidates)
Checklist
- Unit tests for TaskContextPropagator (capture, restore, round-trip)
- TaskExecutorImplTest updated for new constructor signature
Disclaimer
Javadoc is mainly assisted by coding agent.