fix: 降低 LLM 上下文调用成本风险#8139
Hidden character warning
Conversation
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- The repeated
logger.warningcalls for group image captioning and for every high-cost request (token/image count) may be quite noisy in busy group chats; consider downgrading some toinfoor adding simple rate-limiting/aggregation so they remain actionable without flooding logs. - In
_trim_text_to_token_budget, the final loop that repeatedly shortensresultand recomputes_estimate_text_tokenson every iteration could be costly for long strings; you could avoid the per-character loop by tightening the binary search condition or by trimming in larger chunks to keep the complexity closer to O(n log n).
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The repeated `logger.warning` calls for group image captioning and for every high-cost request (token/image count) may be quite noisy in busy group chats; consider downgrading some to `info` or adding simple rate-limiting/aggregation so they remain actionable without flooding logs.
- In `_trim_text_to_token_budget`, the final loop that repeatedly shortens `result` and recomputes `_estimate_text_tokens` on every iteration could be costly for long strings; you could avoid the per-character loop by tightening the binary search condition or by trimming in larger chunks to keep the complexity closer to O(n log n).
## Individual Comments
### Comment 1
<location path="astrbot/builtin_stars/astrbot/long_term_memory.py" line_range="34-43" />
<code_context>
max_cnt = int(cfg["provider_ltm_settings"]["group_message_max_cnt"])
except BaseException as e:
logger.error(e)
- max_cnt = 300
+ max_cnt = self.DEFAULT_MAX_GROUP_MESSAGES
+ max_cnt = max(1, max_cnt)
+ try:
+ group_icl_token_budget = int(
+ cfg["provider_ltm_settings"].get(
+ "group_icl_token_budget",
+ self.DEFAULT_GROUP_ICL_TOKEN_BUDGET,
+ )
+ )
+ except BaseException as e:
+ logger.error(e)
+ group_icl_token_budget = self.DEFAULT_GROUP_ICL_TOKEN_BUDGET
+ group_icl_token_budget = max(1, group_icl_token_budget)
image_caption_prompt = cfg["provider_settings"]["image_caption_prompt"]
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Catching `BaseException` here is broader than needed and may hide unexpected issues.
This `except BaseException` will also catch `KeyboardInterrupt` and `SystemExit`, which we usually want to propagate. Since only config parsing and `int` casting can fail here, narrowing to `Exception` (or even `ValueError`/`TypeError`) would avoid swallowing truly exceptional conditions while still handling malformed configs safely.
Suggested implementation:
```python
except (ValueError, TypeError, KeyError) as e:
logger.error(e)
max_cnt = self.DEFAULT_MAX_GROUP_MESSAGES
```
```python
except (ValueError, TypeError) as e:
logger.error(e)
group_icl_token_budget = self.DEFAULT_GROUP_ICL_TOKEN_BUDGET
```
</issue_to_address>
### Comment 2
<location path="astrbot/core/agent/runners/tool_loop_agent_runner.py" line_range="114-117" />
<code_context>
TOOL_RESULT_MAX_ESTIMATED_TOKENS = 27_500
TOOL_RESULT_PREVIEW_MAX_ESTIMATED_TOKENS = 7000
+ REQUEST_WARN_ESTIMATED_INPUT_TOKENS = 16_000
+ REQUEST_WARN_IMAGE_COUNT = 1
EMPTY_OUTPUT_RETRY_ATTEMPTS = 3
EMPTY_OUTPUT_RETRY_WAIT_MIN_S = 1
</code_context>
<issue_to_address>
**nitpick:** The `REQUEST_WARN_IMAGE_COUNT` threshold name/usage is slightly confusing.
With `REQUEST_WARN_IMAGE_COUNT = 1` and the check `image_count > self.REQUEST_WARN_IMAGE_COUNT`, the warning only fires for 2+ images. That’s a reasonable behavior, but the name suggests “warn at 1 image.” Either switch the comparison to `>=` if you want to warn on the first image, or rename the constant (e.g., `REQUEST_WARN_IMAGE_COUNT_THRESHOLD`) to better reflect the current `>` semantics.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| except BaseException as e: | ||
| logger.error(e) | ||
| max_cnt = 300 | ||
| max_cnt = self.DEFAULT_MAX_GROUP_MESSAGES | ||
| max_cnt = max(1, max_cnt) | ||
| try: | ||
| group_icl_token_budget = int( | ||
| cfg["provider_ltm_settings"].get( | ||
| "group_icl_token_budget", | ||
| self.DEFAULT_GROUP_ICL_TOKEN_BUDGET, | ||
| ) |
There was a problem hiding this comment.
suggestion (bug_risk): Catching BaseException here is broader than needed and may hide unexpected issues.
This except BaseException will also catch KeyboardInterrupt and SystemExit, which we usually want to propagate. Since only config parsing and int casting can fail here, narrowing to Exception (or even ValueError/TypeError) would avoid swallowing truly exceptional conditions while still handling malformed configs safely.
Suggested implementation:
except (ValueError, TypeError, KeyError) as e:
logger.error(e)
max_cnt = self.DEFAULT_MAX_GROUP_MESSAGES except (ValueError, TypeError) as e:
logger.error(e)
group_icl_token_budget = self.DEFAULT_GROUP_ICL_TOKEN_BUDGET| REQUEST_WARN_IMAGE_COUNT = 1 | ||
| EMPTY_OUTPUT_RETRY_ATTEMPTS = 3 | ||
| EMPTY_OUTPUT_RETRY_WAIT_MIN_S = 1 | ||
| EMPTY_OUTPUT_RETRY_WAIT_MAX_S = 4 |
There was a problem hiding this comment.
nitpick: The REQUEST_WARN_IMAGE_COUNT threshold name/usage is slightly confusing.
With REQUEST_WARN_IMAGE_COUNT = 1 and the check image_count > self.REQUEST_WARN_IMAGE_COUNT, the warning only fires for 2+ images. That’s a reasonable behavior, but the name suggests “warn at 1 image.” Either switch the comparison to >= if you want to warn on the first image, or rename the constant (e.g., REQUEST_WARN_IMAGE_COUNT_THRESHOLD) to better reflect the current > semantics.
There was a problem hiding this comment.
Code Review
This pull request implements token budgeting and cost safety measures for group chat context (ICL). Key changes include introducing a token estimation mechanism, truncating chat history to fit a configurable budget, and updating default context limits to more conservative values. It also adds pre-flight logging for high-token or multi-image requests and moves group context from the system prompt to temporary user content parts to prevent persistence in history. Review feedback identifies efficiency and accuracy issues in the token estimation logic, recommends against catching BaseException, suggests preventing log flooding from image caption warnings, and points out a missing warning case for truncated single messages.
| chinese_count = len([c for c in text if "\u4e00" <= c <= "\u9fff"]) | ||
| other_count = len(text) - chinese_count | ||
| return int(chinese_count * 0.6 + other_count * 0.3) |
There was a problem hiding this comment.
这里的 Token 估算逻辑存在两个问题:
- 效率问题:使用列表推导式
[c for c in text if ...]会在内存中创建一个完整的字符列表,对于长文本(如群聊历史)会造成不必要的内存开销。建议改用sum(1 for c in text if ...)。 - 准确性问题:中文字符的倍率
0.6严重偏低。在主流模型(如 GPT-4o, DeepSeek)的 tokenizer 中,一个中文字符通常对应 1.5 到 2.0 个 tokens。使用0.6会导致严重低估实际消耗,从而削弱此 PR 旨在降低成本风险的效果。建议将中文字符倍率调整为1.5左右。
| chinese_count = len([c for c in text if "\u4e00" <= c <= "\u9fff"]) | |
| other_count = len(text) - chinese_count | |
| return int(chinese_count * 0.6 + other_count * 0.3) | |
| chinese_count = sum(1 for c in text if "\u4e00" <= c <= "\u9fff") | |
| other_count = len(text) - chinese_count | |
| return int(chinese_count * 1.5 + other_count * 0.3) |
| except BaseException as e: | ||
| logger.error(e) | ||
| max_cnt = 300 | ||
| max_cnt = self.DEFAULT_MAX_GROUP_MESSAGES | ||
| max_cnt = max(1, max_cnt) | ||
| try: | ||
| group_icl_token_budget = int( | ||
| cfg["provider_ltm_settings"].get( | ||
| "group_icl_token_budget", | ||
| self.DEFAULT_GROUP_ICL_TOKEN_BUDGET, | ||
| ) | ||
| ) | ||
| except BaseException as e: | ||
| logger.error(e) | ||
| group_icl_token_budget = self.DEFAULT_GROUP_ICL_TOKEN_BUDGET |
There was a problem hiding this comment.
捕获 BaseException 是不推荐的做法,因为它会捕获包括 SystemExit 和 KeyboardInterrupt 在内的所有异常,这可能会导致程序无法正常退出或调试困难。建议改为捕获具体的异常(如 ValueError, KeyError)或者至少改为 Exception。
| except BaseException as e: | |
| logger.error(e) | |
| max_cnt = 300 | |
| max_cnt = self.DEFAULT_MAX_GROUP_MESSAGES | |
| max_cnt = max(1, max_cnt) | |
| try: | |
| group_icl_token_budget = int( | |
| cfg["provider_ltm_settings"].get( | |
| "group_icl_token_budget", | |
| self.DEFAULT_GROUP_ICL_TOKEN_BUDGET, | |
| ) | |
| ) | |
| except BaseException as e: | |
| logger.error(e) | |
| group_icl_token_budget = self.DEFAULT_GROUP_ICL_TOKEN_BUDGET | |
| except Exception as e: | |
| logger.error(e) | |
| max_cnt = self.DEFAULT_MAX_GROUP_MESSAGES | |
| max_cnt = max(1, max_cnt) | |
| try: | |
| group_icl_token_budget = int( | |
| cfg["provider_ltm_settings"].get( | |
| "group_icl_token_budget", | |
| self.DEFAULT_GROUP_ICL_TOKEN_BUDGET, | |
| ) | |
| ) | |
| except Exception as e: | |
| logger.error(e) | |
| group_icl_token_budget = self.DEFAULT_GROUP_ICL_TOKEN_BUDGET |
| logger.warning( | ||
| "Group ICL image caption is enabled. Each group image may trigger an extra multimodal request. umo=%s, provider=%s", | ||
| event.unified_msg_origin, | ||
| cfg["image_caption_provider_id"], | ||
| ) |
There was a problem hiding this comment.
在 handle_message 中对每张图片都输出 warning 日志会导致日志洪泛(Log Flooding),尤其是在活跃的群聊中。这会干扰管理员查看其他重要日志。建议增加一个标记位,确保每个会话或每次启动仅针对该配置提醒一次。
if not getattr(self, "_image_caption_warned", False):
logger.warning(
"Group ICL image caption is enabled. Each group image may trigger an extra multimodal request. umo=%s, provider=%s",
event.unified_msg_origin,
cfg["image_caption_provider_id"],
)
self._image_caption_warned = True| self.session_chats[event.unified_msg_origin], | ||
| cfg["group_icl_token_budget"], | ||
| ) | ||
| if omitted > 0: |
|
将在 #8144 中处理重构 |

Motivation / 动机
这个 PR 修关于调用成本 / 缓存直接相关的事,包括默认上下文策略、群聊上下文感知、图片转述以及插件修改 system prompt 时的隐患。
之前的默认行为对新用户不够友好:
system_prompt里,群聊每轮变化都会导致 system prompt 跟着变;OnLLMRequestEvent修改 system prompt 时,很难快速定位是谁破坏了缓存。这里按最小改动处理这些点:不动现有功能入口,只收紧默认参数和注入方式,让默认行为对成本和缓存更友好。
Modifications / 改动点
1. 调整默认上下文策略
修改
astrbot/core/config/default.py和astrbot/core/astr_main_agent.py:max_context_length:-1→30dequeue_context_length:1→10新用户默认不再无限保留上下文,达到上限后会一次丢掉 10 轮,避免长聊 token 一直增长、prefix cache 持续被打断。
2. 群聊上下文不再注入 system prompt
修改
astrbot/builtin_stars/astrbot/long_term_memory.py。群聊 ICL 的动态聊天记录不再拼到
req.system_prompt,改为作为当前 user message 的临时 extra content 注入:mark_as_temp()标记,不会被写入 conversation history,避免群聊上下文在历史里反复堆积。3. 给群聊 ICL 增加 token budget
新增
provider_ltm_settings.group_icl_token_budget,默认值4000。群聊上下文会按近似的 token 预算裁剪,只保留较新的群聊消息;发生裁剪时会打 warning,方便管理员感知上下文被压缩。
group_message_max_cnt的默认值也从300下调到50,以减少默认配置下的内存和上下文压力。4. 增加 LLM 请求成本预检日志
修改
astrbot/core/agent/runners/tool_loop_agent_runner.py。在请求模型之前插入 preflight 日志,打印 provider / model、估算 input tokens、图片数。如果估算 token 偏高或图片偏多,输出 warning,方便快速定位高成本请求。
5. system prompt 被 hook 修改时输出 warning
修改
astrbot/core/pipeline/process_stage/method/agent_sub_stages/internal.py。在
OnLLMRequestEvent前后对比 system prompt,如果被插件或 hook 改了,就打 warning 并给出修改前后的长度,方便排查哪个插件导致缓存失效。6. 群聊图片转述增加成本提示
当群聊 ICL 开启
image_caption时,每张群聊图片都可能触发额外的多模态请求,现在会输出 warning,提醒用户这个配置可能带来额外费用。7. 增加单元测试
新增
tests/unit/test_long_term_memory_cost_safety.py,覆盖:群聊上下文不再写入
system_prompt群聊上下文会作为临时 user content 注入
临时群聊上下文不会被持久化到 history
群聊上下文裁剪后不超过 token budget
This is NOT a breaking change. / 这不是一个破坏性变更。
Screenshots or Test Results / 运行截图或测试结果
已执行以下检查:
python -m ruff check .python -m ruff format . --check本次改动不引入新依赖。
Checklist / 检查清单
If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
/ 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。
My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
/ 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”。
I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in
requirements.txtandpyproject.toml./ 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到
requirements.txt和pyproject.toml文件相应位置。My changes do not introduce malicious code.
/ 我的更改没有引入恶意代码。
Summary by Sourcery
Tighten default context and group chat handling to reduce LLM request cost and cache disruption while adding observability around expensive requests and prompt hooks.
New Features:
Enhancements:
Tests: