Skip to content

fix: 降低 LLM 上下文调用成本风险#8139

Open
Sisyphbaous-DT-Project wants to merge 2 commits into
AstrBotDevs:masterfrom
Sisyphbaous-DT-Project:修复-issue-8080-缓存成本保护

Hidden character warning

The head ref may contain hidden characters: "\u4fee\u590d-issue-8080-\u7f13\u5b58\u6210\u672c\u4fdd\u62a4"
Open

fix: 降低 LLM 上下文调用成本风险#8139
Sisyphbaous-DT-Project wants to merge 2 commits into
AstrBotDevs:masterfrom
Sisyphbaous-DT-Project:修复-issue-8080-缓存成本保护

Conversation

@Sisyphbaous-DT-Project
Copy link
Copy Markdown
Contributor

@Sisyphbaous-DT-Project Sisyphbaous-DT-Project commented May 10, 2026

Motivation / 动机

这个 PR 修关于调用成本 / 缓存直接相关的事,包括默认上下文策略、群聊上下文感知、图片转述以及插件修改 system prompt 时的隐患。

之前的默认行为对新用户不够友好:

  • 默认不对历史轮数做限制,长聊之后输入 token 会一直涨;
  • 撞到上限后每轮只丢 1 轮,上下文窗口每轮都在滑动,容易把 prefix cache 打掉;
  • 群聊 ICL 的动态聊天记录会直接拼到 system_prompt 里,群聊每轮变化都会导致 system prompt 跟着变;
  • 群聊图片转述一旦打开,每张图都可能单独触发一次多模态请求;
  • 插件通过 OnLLMRequestEvent 修改 system prompt 时,很难快速定位是谁破坏了缓存。

这里按最小改动处理这些点:不动现有功能入口,只收紧默认参数和注入方式,让默认行为对成本和缓存更友好。

Modifications / 改动点

1. 调整默认上下文策略

修改 astrbot/core/config/default.pyastrbot/core/astr_main_agent.py

  • max_context_length: -130
  • dequeue_context_length: 110

新用户默认不再无限保留上下文,达到上限后会一次丢掉 10 轮,避免长聊 token 一直增长、prefix cache 持续被打断。

2. 群聊上下文不再注入 system prompt

修改 astrbot/builtin_stars/astrbot/long_term_memory.py

群聊 ICL 的动态聊天记录不再拼到 req.system_prompt,改为作为当前 user message 的临时 extra content 注入:

  • 模型本轮仍然能看到群聊上下文;
  • 不再污染 system prompt,减少 prefix cache 失效风险;
  • 使用 mark_as_temp() 标记,不会被写入 conversation history,避免群聊上下文在历史里反复堆积。

3. 给群聊 ICL 增加 token budget

新增 provider_ltm_settings.group_icl_token_budget,默认值 4000

群聊上下文会按近似的 token 预算裁剪,只保留较新的群聊消息;发生裁剪时会打 warning,方便管理员感知上下文被压缩。group_message_max_cnt 的默认值也从 300 下调到 50,以减少默认配置下的内存和上下文压力。

4. 增加 LLM 请求成本预检日志

修改 astrbot/core/agent/runners/tool_loop_agent_runner.py

在请求模型之前插入 preflight 日志,打印 provider / model、估算 input tokens、图片数。如果估算 token 偏高或图片偏多,输出 warning,方便快速定位高成本请求。

5. system prompt 被 hook 修改时输出 warning

修改 astrbot/core/pipeline/process_stage/method/agent_sub_stages/internal.py

OnLLMRequestEvent 前后对比 system prompt,如果被插件或 hook 改了,就打 warning 并给出修改前后的长度,方便排查哪个插件导致缓存失效。

6. 群聊图片转述增加成本提示

当群聊 ICL 开启 image_caption 时,每张群聊图片都可能触发额外的多模态请求,现在会输出 warning,提醒用户这个配置可能带来额外费用。

7. 增加单元测试

新增 tests/unit/test_long_term_memory_cost_safety.py,覆盖:

  • 群聊上下文不再写入 system_prompt

  • 群聊上下文会作为临时 user content 注入

  • 临时群聊上下文不会被持久化到 history

  • 群聊上下文裁剪后不超过 token budget

  • This is NOT a breaking change. / 这不是一个破坏性变更。

Screenshots or Test Results / 运行截图或测试结果

已执行以下检查:

python -m pytest tests\unit\test_long_term_memory_cost_safety.py -q
python -m pytest tests\agent\test_context_manager.py tests\test_tool_loop_agent_runner.py::test_normal_completion_without_max_step -q
python -m ruff check .
python -m ruff format . --check

本次改动不引入新依赖。


Checklist / 检查清单

  • If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
    / 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。

  • My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
    / 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”

  • I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in requirements.txt and pyproject.toml.
    / 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到 requirements.txtpyproject.toml 文件相应位置。

  • My changes do not introduce malicious code.
    / 我的更改没有引入恶意代码。

Summary by Sourcery

Tighten default context and group chat handling to reduce LLM request cost and cache disruption while adding observability around expensive requests and prompt hooks.

New Features:

  • Add a configurable token budget for group chat ICL context to bound injected history size per request.
  • Introduce preflight logging of estimated input tokens and image count for each LLM request.
  • Emit warnings when request hooks modify the system prompt or when group image captioning may trigger extra multimodal calls.

Enhancements:

  • Adjust default context length settings to limit retained turns and drop more messages at once when the limit is reached.
  • Change group chat ICL to inject recent history as temporary user content instead of modifying the system prompt, and trim it to fit within the configured token budget.
  • Lower the default maximum stored group messages and document the new group ICL token budget option in configuration metadata.

Tests:

  • Add unit tests to ensure group chat context is injected via temporary user content, is not persisted in history, and respects the configured token budget.

@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. labels May 10, 2026
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • The repeated logger.warning calls for group image captioning and for every high-cost request (token/image count) may be quite noisy in busy group chats; consider downgrading some to info or adding simple rate-limiting/aggregation so they remain actionable without flooding logs.
  • In _trim_text_to_token_budget, the final loop that repeatedly shortens result and recomputes _estimate_text_tokens on every iteration could be costly for long strings; you could avoid the per-character loop by tightening the binary search condition or by trimming in larger chunks to keep the complexity closer to O(n log n).
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The repeated `logger.warning` calls for group image captioning and for every high-cost request (token/image count) may be quite noisy in busy group chats; consider downgrading some to `info` or adding simple rate-limiting/aggregation so they remain actionable without flooding logs.
- In `_trim_text_to_token_budget`, the final loop that repeatedly shortens `result` and recomputes `_estimate_text_tokens` on every iteration could be costly for long strings; you could avoid the per-character loop by tightening the binary search condition or by trimming in larger chunks to keep the complexity closer to O(n log n).

## Individual Comments

### Comment 1
<location path="astrbot/builtin_stars/astrbot/long_term_memory.py" line_range="34-43" />
<code_context>
             max_cnt = int(cfg["provider_ltm_settings"]["group_message_max_cnt"])
         except BaseException as e:
             logger.error(e)
-            max_cnt = 300
+            max_cnt = self.DEFAULT_MAX_GROUP_MESSAGES
+        max_cnt = max(1, max_cnt)
+        try:
+            group_icl_token_budget = int(
+                cfg["provider_ltm_settings"].get(
+                    "group_icl_token_budget",
+                    self.DEFAULT_GROUP_ICL_TOKEN_BUDGET,
+                )
+            )
+        except BaseException as e:
+            logger.error(e)
+            group_icl_token_budget = self.DEFAULT_GROUP_ICL_TOKEN_BUDGET
+        group_icl_token_budget = max(1, group_icl_token_budget)
         image_caption_prompt = cfg["provider_settings"]["image_caption_prompt"]
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Catching `BaseException` here is broader than needed and may hide unexpected issues.

This `except BaseException` will also catch `KeyboardInterrupt` and `SystemExit`, which we usually want to propagate. Since only config parsing and `int` casting can fail here, narrowing to `Exception` (or even `ValueError`/`TypeError`) would avoid swallowing truly exceptional conditions while still handling malformed configs safely.

Suggested implementation:

```python
        except (ValueError, TypeError, KeyError) as e:
            logger.error(e)
            max_cnt = self.DEFAULT_MAX_GROUP_MESSAGES

```

```python
        except (ValueError, TypeError) as e:
            logger.error(e)
            group_icl_token_budget = self.DEFAULT_GROUP_ICL_TOKEN_BUDGET

```
</issue_to_address>

### Comment 2
<location path="astrbot/core/agent/runners/tool_loop_agent_runner.py" line_range="114-117" />
<code_context>
     TOOL_RESULT_MAX_ESTIMATED_TOKENS = 27_500
     TOOL_RESULT_PREVIEW_MAX_ESTIMATED_TOKENS = 7000
+    REQUEST_WARN_ESTIMATED_INPUT_TOKENS = 16_000
+    REQUEST_WARN_IMAGE_COUNT = 1
     EMPTY_OUTPUT_RETRY_ATTEMPTS = 3
     EMPTY_OUTPUT_RETRY_WAIT_MIN_S = 1
</code_context>
<issue_to_address>
**nitpick:** The `REQUEST_WARN_IMAGE_COUNT` threshold name/usage is slightly confusing.

With `REQUEST_WARN_IMAGE_COUNT = 1` and the check `image_count > self.REQUEST_WARN_IMAGE_COUNT`, the warning only fires for 2+ images. That’s a reasonable behavior, but the name suggests “warn at 1 image.” Either switch the comparison to `>=` if you want to warn on the first image, or rename the constant (e.g., `REQUEST_WARN_IMAGE_COUNT_THRESHOLD`) to better reflect the current `>` semantics.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines 34 to +43
except BaseException as e:
logger.error(e)
max_cnt = 300
max_cnt = self.DEFAULT_MAX_GROUP_MESSAGES
max_cnt = max(1, max_cnt)
try:
group_icl_token_budget = int(
cfg["provider_ltm_settings"].get(
"group_icl_token_budget",
self.DEFAULT_GROUP_ICL_TOKEN_BUDGET,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): Catching BaseException here is broader than needed and may hide unexpected issues.

This except BaseException will also catch KeyboardInterrupt and SystemExit, which we usually want to propagate. Since only config parsing and int casting can fail here, narrowing to Exception (or even ValueError/TypeError) would avoid swallowing truly exceptional conditions while still handling malformed configs safely.

Suggested implementation:

        except (ValueError, TypeError, KeyError) as e:
            logger.error(e)
            max_cnt = self.DEFAULT_MAX_GROUP_MESSAGES
        except (ValueError, TypeError) as e:
            logger.error(e)
            group_icl_token_budget = self.DEFAULT_GROUP_ICL_TOKEN_BUDGET

Comment on lines +114 to 117
REQUEST_WARN_IMAGE_COUNT = 1
EMPTY_OUTPUT_RETRY_ATTEMPTS = 3
EMPTY_OUTPUT_RETRY_WAIT_MIN_S = 1
EMPTY_OUTPUT_RETRY_WAIT_MAX_S = 4
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: The REQUEST_WARN_IMAGE_COUNT threshold name/usage is slightly confusing.

With REQUEST_WARN_IMAGE_COUNT = 1 and the check image_count > self.REQUEST_WARN_IMAGE_COUNT, the warning only fires for 2+ images. That’s a reasonable behavior, but the name suggests “warn at 1 image.” Either switch the comparison to >= if you want to warn on the first image, or rename the constant (e.g., REQUEST_WARN_IMAGE_COUNT_THRESHOLD) to better reflect the current > semantics.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements token budgeting and cost safety measures for group chat context (ICL). Key changes include introducing a token estimation mechanism, truncating chat history to fit a configurable budget, and updating default context limits to more conservative values. It also adds pre-flight logging for high-token or multi-image requests and moves group context from the system prompt to temporary user content parts to prevent persistence in history. Review feedback identifies efficiency and accuracy issues in the token estimation logic, recommends against catching BaseException, suggests preventing log flooding from image caption warnings, and points out a missing warning case for truncated single messages.

Comment on lines +78 to +80
chinese_count = len([c for c in text if "\u4e00" <= c <= "\u9fff"])
other_count = len(text) - chinese_count
return int(chinese_count * 0.6 + other_count * 0.3)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

这里的 Token 估算逻辑存在两个问题:

  1. 效率问题:使用列表推导式 [c for c in text if ...] 会在内存中创建一个完整的字符列表,对于长文本(如群聊历史)会造成不必要的内存开销。建议改用 sum(1 for c in text if ...)
  2. 准确性问题:中文字符的倍率 0.6 严重偏低。在主流模型(如 GPT-4o, DeepSeek)的 tokenizer 中,一个中文字符通常对应 1.5 到 2.0 个 tokens。使用 0.6 会导致严重低估实际消耗,从而削弱此 PR 旨在降低成本风险的效果。建议将中文字符倍率调整为 1.5 左右。
Suggested change
chinese_count = len([c for c in text if "\u4e00" <= c <= "\u9fff"])
other_count = len(text) - chinese_count
return int(chinese_count * 0.6 + other_count * 0.3)
chinese_count = sum(1 for c in text if "\u4e00" <= c <= "\u9fff")
other_count = len(text) - chinese_count
return int(chinese_count * 1.5 + other_count * 0.3)

Comment on lines 34 to +47
except BaseException as e:
logger.error(e)
max_cnt = 300
max_cnt = self.DEFAULT_MAX_GROUP_MESSAGES
max_cnt = max(1, max_cnt)
try:
group_icl_token_budget = int(
cfg["provider_ltm_settings"].get(
"group_icl_token_budget",
self.DEFAULT_GROUP_ICL_TOKEN_BUDGET,
)
)
except BaseException as e:
logger.error(e)
group_icl_token_budget = self.DEFAULT_GROUP_ICL_TOKEN_BUDGET
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

捕获 BaseException 是不推荐的做法,因为它会捕获包括 SystemExitKeyboardInterrupt 在内的所有异常,这可能会导致程序无法正常退出或调试困难。建议改为捕获具体的异常(如 ValueError, KeyError)或者至少改为 Exception

Suggested change
except BaseException as e:
logger.error(e)
max_cnt = 300
max_cnt = self.DEFAULT_MAX_GROUP_MESSAGES
max_cnt = max(1, max_cnt)
try:
group_icl_token_budget = int(
cfg["provider_ltm_settings"].get(
"group_icl_token_budget",
self.DEFAULT_GROUP_ICL_TOKEN_BUDGET,
)
)
except BaseException as e:
logger.error(e)
group_icl_token_budget = self.DEFAULT_GROUP_ICL_TOKEN_BUDGET
except Exception as e:
logger.error(e)
max_cnt = self.DEFAULT_MAX_GROUP_MESSAGES
max_cnt = max(1, max_cnt)
try:
group_icl_token_budget = int(
cfg["provider_ltm_settings"].get(
"group_icl_token_budget",
self.DEFAULT_GROUP_ICL_TOKEN_BUDGET,
)
)
except Exception as e:
logger.error(e)
group_icl_token_budget = self.DEFAULT_GROUP_ICL_TOKEN_BUDGET

Comment on lines +213 to +217
logger.warning(
"Group ICL image caption is enabled. Each group image may trigger an extra multimodal request. umo=%s, provider=%s",
event.unified_msg_origin,
cfg["image_caption_provider_id"],
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

handle_message 中对每张图片都输出 warning 日志会导致日志洪泛(Log Flooding),尤其是在活跃的群聊中。这会干扰管理员查看其他重要日志。建议增加一个标记位,确保每个会话或每次启动仅针对该配置提醒一次。

                        if not getattr(self, "_image_caption_warned", False):
                            logger.warning(
                                "Group ICL image caption is enabled. Each group image may trigger an extra multimodal request. umo=%s, provider=%s",
                                event.unified_msg_origin,
                                cfg["image_caption_provider_id"],
                            )
                            self._image_caption_warned = True

self.session_chats[event.unified_msg_origin],
cfg["group_icl_token_budget"],
)
if omitted > 0:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

目前的逻辑仅在 omitted > 0(即有完整的消息被丢弃)时输出警告。如果群聊中只有一条消息但该消息被截断了(omitted 为 0),则不会触发警告。建议同时检查 chats_str 是否包含截断标记。

Suggested change
if omitted > 0:
if omitted > 0 or chats_str.startswith("[truncated]"):

@RC-CHN
Copy link
Copy Markdown
Member

RC-CHN commented May 11, 2026

将在 #8144 中处理重构
image
在我个人部署的实例上进行了初步测试,显示新策略能较好地命中KV cache,我个人认为旧有的群聊上下文管理机制存在一定历史遗留问题需要重构,对其再打补丁已不是很有必要,仍然感谢您所做的贡献。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants