[improve][fn] PIP-484: Expose incremental window events via IncrementalWindowFunction#25967
[improve][fn] PIP-484: Expose incremental window events via IncrementalWindowFunction#25967Dream95 wants to merge 1 commit into
Conversation
…talWindowFunction Signed-off-by: Dream95 <zhou_8621@163.com>
|
overall LGTM, just 2 points:
|
david-streamlio
left a comment
There was a problem hiding this comment.
Review of PIP-484. Overall this is a clear, well-scoped proposal — good background, an honest backward-compat analysis, a concrete example, and a diagram; it passes the "can a reader understand it without hours of code reading" health check. The inline comments below are (1) a few required template sections that are missing and (2) some public-API design points to nail down. I've left out the two points already raised on the PR (the getFunctionClassParent NPE dependency and list mutability).
|
|
||
| ## Pulsar Geo-Replication Upgrade & Downgrade/Rollback Considerations | ||
|
|
||
| There is no wire-protocol change between Functions Workers. No special geo-replication considerations apply. |
There was a problem hiding this comment.
The template requires three sections that are currently missing — could you add them, even if brief?
- Security Considerations — this is a pure API addition with no new endpoints, so a sentence confirming "no new REST/protocol surface, no new auth or multi-tenancy implications" is enough.
- Monitoring / Metrics — please state explicitly "no new metrics; runtime behavior is unchanged."
- Alternatives — the most important one. Why a new interface rather than (a)
defaultmethods onWindowFunction, (b) an overloadedprocess(Window, ...), or (c) a config flag? Documenting why these were rejected will pre-empt the obvious review questions. It's also the right place to defend the nameIncrementalWindowFunction, since it exposes expired events too, not just increments.
There was a problem hiding this comment.
Option A seems better than the current approach of adding new interfaces. Let me think about whether there are any compatibility issues.
| } | ||
| ``` | ||
|
|
||
| The existing internal `Window.java` is replaced by a reference to the `api-java` interface (or removed entirely, with `WindowImpl` implementing the new public interface directly). |
There was a problem hiding this comment.
Promoting an internal type to public API is exactly the surface the PIP process exists to scrutinize, so this shouldn't be left as an either/or ("replaced by a reference … or removed entirely"). Please commit to one approach and spell out what happens to any existing references to the old org.apache.pulsar.functions.windowing.Window (even though it's an internal package today).
| * @param inputWindow the window view for this activation, providing access to | ||
| * all current events ({@link Window#get()}), | ||
| * newly added events ({@link Window#getNew()}), and | ||
| * expired events ({@link Window#getExpired()}). |
There was a problem hiding this comment.
In addition to the list-mutability question already raised on this PR, please document the lifetime of the Window reference: is it valid only during the process() call, or may a user retain it across triggers? Lifetime/ownership contracts matter once this interface is public.
| #### 3a. Add field | ||
|
|
||
| ```java | ||
| protected IncrementalWindowFunction<T, X> incrementalWindowFunction; |
There was a problem hiding this comment.
The public interface is declared IncrementalWindowFunction<X, T> (X=input, T=output), but this executor field is <T, X>. This matches the internal WindowFunction<T,X> convention, so it's defensible — but the doc shows both orderings without comment, which will trip readers. A one-line note clarifying the convention would help.
| if (userClassObject instanceof java.util.function.Function) { | ||
| // existing logic, unchanged | ||
| bareWindowFunction = ...; | ||
| } else if (userClassObject instanceof IncrementalWindowFunction) { |
There was a problem hiding this comment.
The dispatch order here is Function → IncrementalWindowFunction → WindowFunction. A user class implementing both IncrementalWindowFunction and WindowFunction (or both Function and IncrementalWindowFunction) resolves by this precedence. Since that becomes an observable public-API contract, please state the ordering explicitly and confirm it's intentional.
| |------|--------| | ||
| | `FunctionConfigUtils.doJavaChecks()` | Add `IncrementalWindowFunction` to the allowed user-class interfaces. | | ||
| | `FunctionCommon.getFunctionClassParent()` | When `windowConfig` is set, resolve `IncrementalWindowFunction` before `WindowFunction` so input/output type inference for SerDe and schema checks stays correct. | | ||
|
|
There was a problem hiding this comment.
Not required by the template, but reviewers usually ask: a sentence on intended test coverage (executor dispatch for each interface type, and deployment-validation acceptance of the new interface) would strengthen the proposal.
| | `List<T> getNew()` | Events added since the last trigger | | ||
| | `List<T> getExpired()` | Events removed since the last trigger | | ||
| | `Long getStartTimestamp()` | Window start time (non-null for time-based windows, otherwise `null`) | | ||
| | `Long getEndTimestamp()` | Window end time (watermark in event-time mode, system time in processing-time mode) | |
There was a problem hiding this comment.
getStartTimestamp() documents its null behavior, but getEndTimestamp()'s description implies it is never null. Please confirm and capture this in the Javadoc, since both methods are now public.
|
A few PIP-process items (separate from the proposal content):
|
Motivation
Pulsar Window Functions currently invoke
WindowFunction.process(Collection<Record<X>>, ...)with all messages in the window on every trigger. Internally,WindowManageralready classifies events into full, newly added, and expired lists on each activation, butWindowFunctionExecutordropsgetNew()andgetExpired()before calling the user function.This makes incremental computation inefficient for sliding windows and forces users to manually diff full collections.
Modifications
This PR adds PIP-484, which proposes:
Window<T>interface to the public API (get(),getNew(),getExpired(), timestamps).IncrementalWindowFunction<X, T>interface that receivesWindow<Record<X>>on each trigger.WindowFunctionExecutorwith no new configuration.FunctionConfigUtils,FunctionCommon) to accept the new interface.Existing
WindowFunctionimplementations remain fully backward compatible.Does this pull request potentially affect one of the following parts:
If the box was checked, please highlight the changes