[Feature] Support TTL on ShortTermMemory#657
Conversation
|
Hi, @da-daken, thanks for your contribution. Due to the imminent code freeze for version 0.3, I will prioritize handling some must-have items for this release. This may delay my review of your two PRs, but I estimate that I will complete the reviews within a week at the latest. |
weiqingy
left a comment
There was a problem hiding this comment.
Thanks for picking this up — a few cross-cutting questions before this lands:
-
Python mirror is missing.
python/flink_agents/api/core_options.pymirrors every entry in the JavaAgentExecutionOptionstoday (MAX_RETRIES,RETRY_WAIT_INTERVAL, the async flags, etc.). The three new short-term-memory TTL options are not added on the Python side, so a Python user can't reach them via the typed API — they'd have to fall back to raw string keys. Could we add the mirrors (and a matching test) in this PR? CLAUDE.md is explicit about Java↔Python parity for shared options. -
Are we comfortable exposing
StateTtlConfig.UpdateTypeandStateTtlConfig.StateVisibilitydirectly in the public API? Two of the new options take Flink internal enums as their value type, in a class underorg.apache.flink.agents.api.agents. That couples the public surface to Flink's state API forever, and makes the Python mirror painful (we'd need to redefine the enums on the Python side or fall back to strings). One alternative is to wrap them in our own enums (ShortTermMemoryTtlUpdate/ShortTermMemoryTtlVisibility) with one-to-one mappings — costs a few lines and a translation step, buys API independence. What do you think?
| } | ||
|
|
||
| @Test | ||
| void testTTLConfigurationApplied() throws Exception { |
There was a problem hiding this comment.
Two gaps in coverage worth closing while we're here:
- TTL disabled path. No test exercises the early-return in
maybeEnableShortTermMemoryTTL(TTL_MS = 0or unset). That branch is the default for every existing pipeline, so a regression that accidentally enables TTL with 0ms would be silent. A short scenario withttlMs = 0confirming the thirdevent1returnsEXISTINGregardless of sleep would cover it. - Default update-type / visibility. Both tests explicitly override the update-type to
OnCreateAndWrite. The option defaults (OnReadAndWrite/NeverReturnExpired) are never actually exercised, so a regression in how the defaults are wired would slip through. One scenario that sets onlyTTL_MSand asserts the same expiry behavior would catch that.
| /** | ||
| * When {@link AgentExecutionOptions#SHORT_TERM_MEMORY_STATE_TTL_MS} is positive, attaches Flink | ||
| * {@link StateTtlConfig} to the short-term memory {@link MapStateDescriptor}. Unset, null, or | ||
| * non-positive values disable TTL (Flink does not allow zero/negative TTL). |
There was a problem hiding this comment.
Nice — recording "Flink does not allow zero/negative TTL" at the workaround site spares future maintainers a trip through Flink source.
# Conflicts: # python/flink_agents/api/core_options.py
@weiqingy Thanks for your review. You're right, I submitted the new code. PTAL |
| new ConfigOption<>("rag.async", Boolean.class, true); | ||
|
|
||
| public static final ConfigOption<Long> SHORT_TERM_MEMORY_STATE_TTL_MS = | ||
| new ConfigOption<>("short-term-memory.state-ttl.ms", Long.class, 0L); |
There was a problem hiding this comment.
Nit: 0L doubles as the "TTL disabled" sentinel, but that contract only lives in OperatorStateManager.maybeEnableShortTermMemoryTTL. A one-line javadoc here — e.g. "Set to a positive value in milliseconds to enable TTL; 0 (the default) disables it" — would spare future readers a trip into the runtime. Same applies to the two enum options below: worth noting they're only consulted when TTL_MS > 0.
| } | ||
|
|
||
| @Test | ||
| void testTTLConfigurationNotApplied() throws Exception { |
There was a problem hiding this comment.
Nit: testTTLConfigurationNotApplied vs testTTLConfigurationApplied (line 109) is a bit misleading — TTL is configured in both; what differs is whether enough time elapsed for entries to expire (sleep_ms=0 vs 2000). Names like testValueStillVisibleBeforeTTLExpiry / testValueExpiresAfterTTL would express the actual assertion. Same suggestion for the Python mirror in short_term_memory_ttl_test.py:126,159.
|
LGTM — both prior concerns are addressed cleanly. Only two small readability nits inline. Thanks for the quick turnaround. |
Linked issue: #580
Purpose of change
Support controlling TTL on Short Term memory.
Tests
ut
API
no api
Documentation
doc-neededdoc-not-neededdoc-included