refactor(ltm): redesign long-term memory with append-only incremental contexts#8144
refactor(ltm): redesign long-term memory with append-only incremental contexts#8144RC-CHN wants to merge 30 commits into
Conversation
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- The MAX_* limits (MAX_MSGS_PER_USER_SEGMENT, MAX_CHARS_PER_USER_SEGMENT, MAX_RAW_BYTES) are currently hard-coded; consider wiring these through configuration (e.g., provider_ltm_settings) so different deployments or groups can tune memory usage and retention behavior without code changes.
- In _trim_raw_records, total is recomputed by summing len(s.encode()) on every call, which is O(n); if this runs frequently on busy groups, consider tracking a running byte-size counter per umo to avoid repeatedly traversing the deque.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The MAX_* limits (MAX_MSGS_PER_USER_SEGMENT, MAX_CHARS_PER_USER_SEGMENT, MAX_RAW_BYTES) are currently hard-coded; consider wiring these through configuration (e.g., provider_ltm_settings) so different deployments or groups can tune memory usage and retention behavior without code changes.
- In _trim_raw_records, total is recomputed by summing len(s.encode()) on every call, which is O(n); if this runs frequently on busy groups, consider tracking a running byte-size counter per umo to avoid repeatedly traversing the deque.
## Individual Comments
### Comment 1
<location path="astrbot/builtin_stars/astrbot/long_term_memory.py" line_range="290-299" />
<code_context>
+ # 裁剪
+ # =========================================================================
+
+ def _trim_raw_records(self, umo: str) -> None:
+ """仅淘汰 cursor 之前的条目。cursor 之后的绝不碰(issue #2)。"""
+ dq = self.raw_records[umo]
+ cursor = self._raw_cursor[umo]
+
+ # 1. 无条件清除 cursor 之前的条目(已消费)
+ while dq and cursor > 0:
+ dq.popleft()
+ cursor -= 1
+ self._raw_cursor[umo] = cursor
+
+ # 2. 按大小继续从前面淘汰(限制极端情况的总内存)
+ total = sum(len(s.encode()) for s in dq)
+ while total > MAX_RAW_BYTES and dq and cursor > 0:
+ removed = dq.popleft()
+ total -= len(removed.encode())
</code_context>
<issue_to_address>
**issue (bug_risk):** Size-based trimming branch is effectively dead due to cursor reset logic.
In `_trim_raw_records`, the first loop always decrements `cursor` to 0 and then writes it back to `self._raw_cursor[umo]`. As a result, in the size-based loop `while total > MAX_RAW_BYTES and dq and cursor > 0:`, `cursor` is always 0 and the loop never runs, so `MAX_RAW_BYTES` is never enforced.
To preserve the intended behavior (always drop fully-consumed entries, and then optionally drop additional consumed entries to satisfy `MAX_RAW_BYTES`), you’ll need to decouple the notion of “consumed index” from the deque length. For example, track how many entries are removed in the first loop and use that to derive which entries are safe to drop in the size-based phase, rather than relying on `cursor > 0` after the first loop.
</issue_to_address>
### Comment 2
<location path="tests/unit/test_long_term_memory.py" line_range="207-216" />
<code_context>
+ def test_tool_call_then_result_then_bot(self):
</code_context>
<issue_to_address>
**suggestion (testing):** Add a test for `_build_segments` when a tool result appears without a preceding tool call.
Current `_build_segments` tests only cover well-formed tool flows (`<T:CALL>` → `<T:RES>` → `<BOT>`). Please add a case where a `<T:RES>` appears without a prior `<T:CALL>`, e.g.:
```python
def test_tool_result_without_call_then_bot(self):
lines = [
"<T:RES id=orphan>data</T:RES>",
"<BOT/14:30>: ok",
]
result = _build_segments(lines)
# assert behavior: either a valid tool segment or clean ignore, no exception,
# and an intact assistant segment.
```
This helps ensure `_build_segments` behaves predictably with partial or inconsistent histories.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Code Review
This pull request introduces Long Term Memory (LTM) v2, which significantly improves chatroom memory management by implementing incremental context building, support for tool-call history, and memory-efficient message tracking using deques and cursors. The changes also include a fallback mechanism for LLM compression and extensive unit tests. Feedback focuses on several critical areas: a potential memory leak in the contexts dictionary which is currently append-only, a logic error in the size-based trimming of raw records that renders some code unreachable, and the risk of KeyError crashes when parsing malformed tool-call records. Additionally, there is a discrepancy between the system prompt's description of bot message markers and the actual role-based formatting sent to the LLM.
| self.contexts: dict[str, list[dict]] = defaultdict(list) | ||
| """累积累积态 LLM 上下文。由 ContextManager 修改后保留。""" |
There was a problem hiding this comment.
The self.contexts dictionary is append-only and never pruned. In long-running sessions or active group chats, this will lead to a memory leak as the list of segments grows indefinitely. While append-only contexts help with KV cache hits, you should still implement a maximum context length (e.g., based on the provider's window or a safe segment count) to prevent unbounded memory growth.
| async with self._lock: | ||
| umo = event.unified_msg_origin | ||
|
|
||
| # 记录写入前索引 → on_req_llm 精确排除(issue #1, #9) | ||
| raw_idx = len(self.raw_records[umo]) | ||
| event.set_extra("_ltm_raw_idx", raw_idx) |
There was a problem hiding this comment.
handle_message appends to raw_records but never triggers trimming. In groups that rarely interact with the bot, raw_records will grow indefinitely because _trim_raw_records is only called during an agent run. Trimming should be performed here (before calculating raw_idx) to ensure memory usage remains bounded. Note that since this logic is synchronous and does not contain 'await' calls, it is executed atomically in the asyncio event loop and does not require an explicit lock.
umo = event.unified_msg_origin
self._trim_raw_records(umo)
# 记录写入前索引 → on_req_llm 精确排除(issue #1, #9)
raw_idx = len(self.raw_records[umo])
event.set_extra("_ltm_raw_idx", raw_idx)References
- In a single-threaded asyncio event loop, synchronous functions (code blocks without 'await') are executed atomically and will not be interrupted by other coroutines. Therefore, they are safe from race conditions when modifying shared state within that block.
…ol call re-persistence & add truncate tool
…strategies exclusive
…records_max_bytes
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- LongTermMemory.on_agent_done holds the asyncio.Lock while awaiting LLM summary calls, which can block all LTM operations for that process during slow network I/O; consider releasing the lock before invoking the provider and then re-acquiring it only to apply the resulting summary/compaction updates.
- LongTermMemory.remove_session mutates raw_records, contexts, cursors, and summary state without acquiring _lock, which can race with concurrent handle_message/on_req_llm/on_agent_done; consider wrapping the cleanup in the same lock to keep internal structures consistent.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- LongTermMemory.on_agent_done holds the asyncio.Lock while awaiting LLM summary calls, which can block all LTM operations for that process during slow network I/O; consider releasing the lock before invoking the provider and then re-acquiring it only to apply the resulting summary/compaction updates.
- LongTermMemory.remove_session mutates raw_records, contexts, cursors, and summary state without acquiring _lock, which can race with concurrent handle_message/on_req_llm/on_agent_done; consider wrapping the cleanup in the same lock to keep internal structures consistent.
## Individual Comments
### Comment 1
<location path="astrbot/core/config/default.py" line_range="4155-4113" />
<code_context>
+ "provider_ltm_settings.ltm_compaction_strategy": "llm_summary",
+ },
+ },
+ "provider_ltm_settings.ltm_summary_provider_id": {
+ "description": "LTM 摘要模型",
+ "type": "string",
+ "_special": "select_provider",
+ "hint": "llm_summary 策略使用的模型,留空使用当前聊天模型。",
+ "condition": {
+ "provider_ltm_settings.ltm_compaction_strategy": "llm_summary",
+ },
</code_context>
<issue_to_address>
**issue (bug_risk):** The "leave empty to use current chat model" hint for LTM summary provider does not match the implementation.
The hint suggests `ltm_summary_provider_id` can be left empty to use the current chat model, but `LongTermMemory.on_agent_done` only calls `_compact_with_llm_summary` when `provider_id` is truthy, and `_compact_with_llm_summary` requires a concrete provider ID. So an empty value actually disables LTM summarisation.
To fix this, either make an empty `ltm_summary_provider_id` resolve a provider (e.g. via `context.get_using_provider(umo=...)` inside `_compact_with_llm_summary`), or update the hint to clarify that an empty value turns off summarisation, so users aren’t misled about whether summaries are running.
</issue_to_address>
### Comment 2
<location path="tests/unit/test_long_term_memory.py" line_range="674-683" />
<code_context>
+ async def test_tool_call_recorded_in_raw(self, ltm, mock_event):
</code_context>
<issue_to_address>
**suggestion (testing):** Add a test to cover `history_tool_result_truncate=False` so that long tool outputs are not unexpectedly truncated
`on_agent_done` uses `history_tool_result_truncate` and `history_tool_result_max_chars` to optionally truncate tool results before persisting them. We already test `_truncate_tool_result_for_history` and that tool chains are recorded into contexts, but there’s no test for the case where `history_tool_result_truncate` is explicitly `False`.
Please add a test that:
1. Configures an `ltm` whose `cfg` returns `history_tool_result_truncate=False` for the relevant group.
2. Sends a tool message with very long `content` through `on_agent_done`.
3. Asserts that the stored tool result in `raw_records` and the derived `contexts` contains the full content (no truncation marker).
This will ensure the config flag is respected and avoid silent data loss when truncation is disabled.
Suggested implementation:
```python
return LongTermMemory(MagicMock(), ctx)
@pytest.mark.asyncio
async def test_tool_result_not_truncated_when_disabled(self, ltm, mock_event):
"""当 history_tool_result_truncate=False 时,长工具结果不会被截断。"""
from unittest.mock import MagicMock
from astrbot.api.provider import LLMResponse
# 1. 配置 ltm 的 cfg,使对应分组的 history_tool_result_truncate=False
# 同时给 max_chars 一个很小的值,以确保如果截断开启就一定会生效。
def cfg_side_effect(group, key, default=None):
if key == "history_tool_result_truncate":
return False
if key == "history_tool_result_max_chars":
return 10
return default
# 如果已有 cfg 方法,这里用 side_effect 覆盖;如果没有,则视需要在实现中调整
if hasattr(ltm, "cfg"):
original_cfg = ltm.cfg
ltm.cfg = MagicMock(side_effect=cfg_side_effect)
else:
ltm.cfg = MagicMock(side_effect=cfg_side_effect)
original_cfg = None
try:
# 2. 构造一个非常长的工具输出,通过 on_agent_done 走完整存储链路
long_content = "TOOL-RESULT-" + ("X" * 5000)
tool_msg = MagicMock()
tool_msg.role = "tool"
tool_msg.tool_call_id = "tool-1"
tool_msg.name = "dummy_tool"
tool_msg.content = long_content
response = LLMResponse(
id="test_tool_result_not_truncated_when_disabled",
model="test-model",
choices=[
LLMResponse.Choice(
index=0,
message=tool_msg,
finish_reason="stop",
)
],
)
# 通过 on_agent_done 触发持久化逻辑
await ltm.on_agent_done(mock_event, response)
# 3. 断言 raw_records 和 contexts 中都包含完整内容且没有截断标记
# 这里假设 ltm 暴露了 raw_records/contexts 或类似结构,
# 如果命名不同,可在实现中按实际结构调整。
raw_repr = repr(getattr(ltm, "raw_records", getattr(ltm, "raw_store", "")))
ctx_repr = repr(getattr(ltm, "contexts", getattr(ltm, "context_store", "")))
assert long_content in raw_repr
assert long_content in ctx_repr
# 常见的截断标记(按实现调整):例如 "…", "...(truncated)", "[TRUNCATED]"
assert "TRUNCATED" not in raw_repr.upper()
assert "TRUNCATED" not in ctx_repr.upper()
finally:
# 恢复原始 cfg(如果有的话),避免影响其它测试
if original_cfg is not None:
ltm.cfg = original_cfg
@pytest.mark.asyncio
async def test_tool_call_recorded_in_raw(self, ltm, mock_event):
"""工具调用被记录到 contexts 中的 assistant(tool_calls) + tool 消息。"""
from astrbot.api.provider import LLMResponse
from astrbot.api.message_components import Plain
mock_event.get_messages.return_value = [Plain(text="@bot weather")]
await ltm.handle_message(mock_event)
from unittest.mock import MagicMock
tc_msg = MagicMock()
tc_msg.role = "assistant"
```
1. 根据 `LongTermMemory` 的真实接口调整测试中对配置和存储的访问:
- 如果配置接口不是 `ltm.cfg(group, key, default=None)`,请将 `cfg_side_effect` 中的参数签名及调用方式改成你们实际使用的配置 API(例如 `ltm.cfg("history_tool_result_truncate", group)` 或 `ltm.cfg.get(...)` 等)。
- 将 `raw_repr`/`ctx_repr` 的获取逻辑改为你们真实的持久化结构,例如:
- 如果有 `ltm.raw_records`,且每条记录为 `{"tool_result": {"content": ...}}`,可以改为:
```python
raw_records = ltm.raw_records
assert any(r["tool_result"]["content"] == long_content for r in raw_records)
```
- 如果 contexts 是 `ltm.contexts` 或 `ltm.context_store`,按实际字段名取出内容并断言包含 `long_content` 且不含截断标记。
2. 如果 `on_agent_done` 的签名不同(例如 `await ltm.on_agent_done(response, mock_event)` 或需要额外参数),请按实现调整调用顺序和参数。
3. 如果截断的标记字符串与 `"TRUNCATED"` 不同(例如 `"…(truncated)"`、`"...(truncated)"`、`"[TRUNCATED]"` 等),请将断言中检查的标记替换为你们实际使用的截断标记字符串。
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| async def test_tool_call_recorded_in_raw(self, ltm, mock_event): | ||
| """工具调用被记录到 contexts 中的 assistant(tool_calls) + tool 消息。""" | ||
| from astrbot.api.provider import LLMResponse | ||
| from astrbot.api.message_components import Plain | ||
|
|
||
| mock_event.get_messages.return_value = [Plain(text="@bot weather")] | ||
| await ltm.handle_message(mock_event) | ||
|
|
||
| from unittest.mock import MagicMock | ||
| tc_msg = MagicMock() |
There was a problem hiding this comment.
suggestion (testing): Add a test to cover history_tool_result_truncate=False so that long tool outputs are not unexpectedly truncated
on_agent_done uses history_tool_result_truncate and history_tool_result_max_chars to optionally truncate tool results before persisting them. We already test _truncate_tool_result_for_history and that tool chains are recorded into contexts, but there’s no test for the case where history_tool_result_truncate is explicitly False.
Please add a test that:
- Configures an
ltmwhosecfgreturnshistory_tool_result_truncate=Falsefor the relevant group. - Sends a tool message with very long
contentthroughon_agent_done. - Asserts that the stored tool result in
raw_recordsand the derivedcontextscontains the full content (no truncation marker).
This will ensure the config flag is respected and avoid silent data loss when truncation is disabled.
Suggested implementation:
return LongTermMemory(MagicMock(), ctx)
@pytest.mark.asyncio
async def test_tool_result_not_truncated_when_disabled(self, ltm, mock_event):
"""当 history_tool_result_truncate=False 时,长工具结果不会被截断。"""
from unittest.mock import MagicMock
from astrbot.api.provider import LLMResponse
# 1. 配置 ltm 的 cfg,使对应分组的 history_tool_result_truncate=False
# 同时给 max_chars 一个很小的值,以确保如果截断开启就一定会生效。
def cfg_side_effect(group, key, default=None):
if key == "history_tool_result_truncate":
return False
if key == "history_tool_result_max_chars":
return 10
return default
# 如果已有 cfg 方法,这里用 side_effect 覆盖;如果没有,则视需要在实现中调整
if hasattr(ltm, "cfg"):
original_cfg = ltm.cfg
ltm.cfg = MagicMock(side_effect=cfg_side_effect)
else:
ltm.cfg = MagicMock(side_effect=cfg_side_effect)
original_cfg = None
try:
# 2. 构造一个非常长的工具输出,通过 on_agent_done 走完整存储链路
long_content = "TOOL-RESULT-" + ("X" * 5000)
tool_msg = MagicMock()
tool_msg.role = "tool"
tool_msg.tool_call_id = "tool-1"
tool_msg.name = "dummy_tool"
tool_msg.content = long_content
response = LLMResponse(
id="test_tool_result_not_truncated_when_disabled",
model="test-model",
choices=[
LLMResponse.Choice(
index=0,
message=tool_msg,
finish_reason="stop",
)
],
)
# 通过 on_agent_done 触发持久化逻辑
await ltm.on_agent_done(mock_event, response)
# 3. 断言 raw_records 和 contexts 中都包含完整内容且没有截断标记
# 这里假设 ltm 暴露了 raw_records/contexts 或类似结构,
# 如果命名不同,可在实现中按实际结构调整。
raw_repr = repr(getattr(ltm, "raw_records", getattr(ltm, "raw_store", "")))
ctx_repr = repr(getattr(ltm, "contexts", getattr(ltm, "context_store", "")))
assert long_content in raw_repr
assert long_content in ctx_repr
# 常见的截断标记(按实现调整):例如 "…", "...(truncated)", "[TRUNCATED]"
assert "TRUNCATED" not in raw_repr.upper()
assert "TRUNCATED" not in ctx_repr.upper()
finally:
# 恢复原始 cfg(如果有的话),避免影响其它测试
if original_cfg is not None:
ltm.cfg = original_cfg
@pytest.mark.asyncio
async def test_tool_call_recorded_in_raw(self, ltm, mock_event):
"""工具调用被记录到 contexts 中的 assistant(tool_calls) + tool 消息。"""
from astrbot.api.provider import LLMResponse
from astrbot.api.message_components import Plain
mock_event.get_messages.return_value = [Plain(text="@bot weather")]
await ltm.handle_message(mock_event)
from unittest.mock import MagicMock
tc_msg = MagicMock()
tc_msg.role = "assistant"- 根据
LongTermMemory的真实接口调整测试中对配置和存储的访问:- 如果配置接口不是
ltm.cfg(group, key, default=None),请将cfg_side_effect中的参数签名及调用方式改成你们实际使用的配置 API(例如ltm.cfg("history_tool_result_truncate", group)或ltm.cfg.get(...)等)。 - 将
raw_repr/ctx_repr的获取逻辑改为你们真实的持久化结构,例如:- 如果有
ltm.raw_records,且每条记录为{"tool_result": {"content": ...}},可以改为:raw_records = ltm.raw_records assert any(r["tool_result"]["content"] == long_content for r in raw_records)
- 如果 contexts 是
ltm.contexts或ltm.context_store,按实际字段名取出内容并断言包含long_content且不含截断标记。
- 如果有
- 如果配置接口不是
- 如果
on_agent_done的签名不同(例如await ltm.on_agent_done(response, mock_event)或需要额外参数),请按实现调整调用顺序和参数。 - 如果截断的标记字符串与
"TRUNCATED"不同(例如"…(truncated)"、"...(truncated)"、"[TRUNCATED]"等),请将断言中检查的标记替换为你们实际使用的截断标记字符串。
…ry_provider_id is empty
Motivation
Fixes #8080
Rewrite the long-term memory (LTM) module from a ring buffer to an append-only architecture that keeps context prefixes stable across requests — enabling KV cache hits and the associated cost discounts (typically 1/10 of standard pricing across OpenAI, Anthropic, DeepSeek, and cloud providers).
Modifications / 改动点
Core:
astrbot/builtin_stars/astrbot/long_term_memory.pymax_cntring buffer withraw_records(deque) +_raw_cursor+contexts(append-only list). Old segments are never rebuilt._build_segments()converts raw chat lines into OpenAI-format context segments, handling tool calls, parallel tools, and multi-step chains.<BOT/>markers replace[You/]to avoid nickname collisions.on_agent_donerecords tool-call chains and now includes the @bot prompt in contexts so future rounds see the user's original message.asyncio.Lockfor concurrency safety;remove_session()for cleanup.Hook wiring:
astrbot/builtin_stars/astrbot/main.py@on_llm_response→@on_agent_donefor accurate tool-chain recording.group_icl_enable=trueskips Conversation DB query (conversation=None).Config:
astrbot/builtin_stars/astrbot/default.pycontext_limit_reached_strategy→"llm_compress".Agent runner:
astrbot/core/astr_main_agent.py_get_compress_providerauto-falls back to the main chat provider whenllm_compress_provider_idis unset, preventing silent truncation.Tests:
tests/unit/test_long_term_memory.py(new, 47 tests)Pure functions: extract, parse, truncate, build_segments (31 tests).
Integration: round-trip lifecycle, multi-round accumulation, tool chains, persona preservation, concurrent safety (16 tests).
This is NOT a breaking change. / 这不是一个破坏性变更。
Screenshots or Test Results / 运行截图或测试结果
Tested on personal self-hosted astrbot.

Checklist / 检查清单
😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
/ 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。
👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
/ 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”。
🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in
requirements.txtandpyproject.toml./ 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到
requirements.txt和pyproject.toml文件相应位置。😮 My changes do not introduce malicious code.
/ 我的更改没有引入恶意代码。
Summary by Sourcery
Refactor the long-term memory subsystem to use an append-only, incremental context architecture and integrate it with agent completion hooks, while improving default compression behavior and regression coverage.
Enhancements:
Tests:
Summary by Sourcery
Refactor group long-term memory to an append-only, incrementally built context model integrated with agent completion hooks, while tightening context compression behavior and isolating request-time context guarding from persistent history management.
Enhancements:
Tests: