fix: 降低 LLM 上下文调用成本风险 by Sisyphbaous-DT-Project · Pull Request #8139 · AstrBotDevs/AstrBot

Sisyphbaous-DT-Project · 2026-05-10T13:57:51Z

Motivation / 动机

这个 PR 修关于调用成本 / 缓存直接相关的事，包括默认上下文策略、群聊上下文感知、图片转述以及插件修改 system prompt 时的隐患。

之前的默认行为对新用户不够友好：

默认不对历史轮数做限制，长聊之后输入 token 会一直涨；
撞到上限后每轮只丢 1 轮，上下文窗口每轮都在滑动，容易把 prefix cache 打掉；
群聊 ICL 的动态聊天记录会直接拼到 system_prompt 里，群聊每轮变化都会导致 system prompt 跟着变；
群聊图片转述一旦打开，每张图都可能单独触发一次多模态请求；
插件通过 OnLLMRequestEvent 修改 system prompt 时，很难快速定位是谁破坏了缓存。

这里按最小改动处理这些点：不动现有功能入口，只收紧默认参数和注入方式，让默认行为对成本和缓存更友好。

Modifications / 改动点

1. 调整默认上下文策略

修改 astrbot/core/config/default.py 和 astrbot/core/astr_main_agent.py：

max_context_length: -1 → 30
dequeue_context_length: 1 → 10

新用户默认不再无限保留上下文，达到上限后会一次丢掉 10 轮，避免长聊 token 一直增长、prefix cache 持续被打断。

2. 群聊上下文不再注入 system prompt

修改 astrbot/builtin_stars/astrbot/long_term_memory.py。

群聊 ICL 的动态聊天记录不再拼到 req.system_prompt，改为作为当前 user message 的临时 extra content 注入：

模型本轮仍然能看到群聊上下文；
不再污染 system prompt，减少 prefix cache 失效风险；
使用 mark_as_temp() 标记，不会被写入 conversation history，避免群聊上下文在历史里反复堆积。

3. 给群聊 ICL 增加 token budget

新增 provider_ltm_settings.group_icl_token_budget，默认值 4000。

群聊上下文会按近似的 token 预算裁剪，只保留较新的群聊消息；发生裁剪时会打 warning，方便管理员感知上下文被压缩。group_message_max_cnt 的默认值也从 300 下调到 50，以减少默认配置下的内存和上下文压力。

4. 增加 LLM 请求成本预检日志

修改 astrbot/core/agent/runners/tool_loop_agent_runner.py。

在请求模型之前插入 preflight 日志，打印 provider / model、估算 input tokens、图片数。如果估算 token 偏高或图片偏多，输出 warning，方便快速定位高成本请求。

5. system prompt 被 hook 修改时输出 warning

修改 astrbot/core/pipeline/process_stage/method/agent_sub_stages/internal.py。

在 OnLLMRequestEvent 前后对比 system prompt，如果被插件或 hook 改了，就打 warning 并给出修改前后的长度，方便排查哪个插件导致缓存失效。

6. 群聊图片转述增加成本提示

当群聊 ICL 开启 image_caption 时，每张群聊图片都可能触发额外的多模态请求，现在会输出 warning，提醒用户这个配置可能带来额外费用。

7. 增加单元测试

新增 tests/unit/test_long_term_memory_cost_safety.py，覆盖：

群聊上下文不再写入 system_prompt
群聊上下文会作为临时 user content 注入
临时群聊上下文不会被持久化到 history
群聊上下文裁剪后不超过 token budget
This is NOT a breaking change. / 这不是一个破坏性变更。

Screenshots or Test Results / 运行截图或测试结果

已执行以下检查：

python -m pytest tests\unit\test_long_term_memory_cost_safety.py -q

python -m pytest tests\agent\test_context_manager.py tests\test_tool_loop_agent_runner.py::test_normal_completion_without_max_step -q

python -m ruff check .

python -m ruff format . --check

本次改动不引入新依赖。

Checklist / 检查清单

If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
/ 如果 PR 中有新加入的功能，已经通过 Issue / 邮件等方式和作者讨论过。
My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
/ 我的更改经过了良好的测试，并已在上方提供了“验证步骤”和“运行截图”。
I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in requirements.txt and pyproject.toml.
/ 我确保没有引入新依赖库，或者引入了新依赖库的同时将其添加到 requirements.txt 和 pyproject.toml 文件相应位置。
My changes do not introduce malicious code.
/ 我的更改没有引入恶意代码。

Summary by Sourcery

Tighten default context and group chat handling to reduce LLM request cost and cache disruption while adding observability around expensive requests and prompt hooks.

New Features:

Add a configurable token budget for group chat ICL context to bound injected history size per request.
Introduce preflight logging of estimated input tokens and image count for each LLM request.
Emit warnings when request hooks modify the system prompt or when group image captioning may trigger extra multimodal calls.

Enhancements:

Adjust default context length settings to limit retained turns and drop more messages at once when the limit is reached.
Change group chat ICL to inject recent history as temporary user content instead of modifying the system prompt, and trim it to fit within the configured token budget.
Lower the default maximum stored group messages and document the new group ICL token budget option in configuration metadata.

Tests:

Add unit tests to ensure group chat context is injected via temporary user content, is not persisted in history, and respects the configured token budget.

sourcery-ai

Hey - I've found 2 issues, and left some high level feedback:

The repeated logger.warning calls for group image captioning and for every high-cost request (token/image count) may be quite noisy in busy group chats; consider downgrading some to info or adding simple rate-limiting/aggregation so they remain actionable without flooding logs.
In _trim_text_to_token_budget, the final loop that repeatedly shortens result and recomputes _estimate_text_tokens on every iteration could be costly for long strings; you could avoid the per-character loop by tightening the binary search condition or by trimming in larger chunks to keep the complexity closer to O(n log n).

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The repeated `logger.warning` calls for group image captioning and for every high-cost request (token/image count) may be quite noisy in busy group chats; consider downgrading some to `info` or adding simple rate-limiting/aggregation so they remain actionable without flooding logs.
- In `_trim_text_to_token_budget`, the final loop that repeatedly shortens `result` and recomputes `_estimate_text_tokens` on every iteration could be costly for long strings; you could avoid the per-character loop by tightening the binary search condition or by trimming in larger chunks to keep the complexity closer to O(n log n).

## Individual Comments

### Comment 1
<location path="astrbot/builtin_stars/astrbot/long_term_memory.py" line_range="34-43" />
<code_context>
             max_cnt = int(cfg["provider_ltm_settings"]["group_message_max_cnt"])
         except BaseException as e:
             logger.error(e)
-            max_cnt = 300
+            max_cnt = self.DEFAULT_MAX_GROUP_MESSAGES
+        max_cnt = max(1, max_cnt)
+        try:
+            group_icl_token_budget = int(
+                cfg["provider_ltm_settings"].get(
+                    "group_icl_token_budget",
+                    self.DEFAULT_GROUP_ICL_TOKEN_BUDGET,
+                )
+            )
+        except BaseException as e:
+            logger.error(e)
+            group_icl_token_budget = self.DEFAULT_GROUP_ICL_TOKEN_BUDGET
+        group_icl_token_budget = max(1, group_icl_token_budget)
         image_caption_prompt = cfg["provider_settings"]["image_caption_prompt"]
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Catching `BaseException` here is broader than needed and may hide unexpected issues.

This `except BaseException` will also catch `KeyboardInterrupt` and `SystemExit`, which we usually want to propagate. Since only config parsing and `int` casting can fail here, narrowing to `Exception` (or even `ValueError`/`TypeError`) would avoid swallowing truly exceptional conditions while still handling malformed configs safely.

Suggested implementation:

```python
        except (ValueError, TypeError, KeyError) as e:
            logger.error(e)
            max_cnt = self.DEFAULT_MAX_GROUP_MESSAGES

```

```python
        except (ValueError, TypeError) as e:
            logger.error(e)
            group_icl_token_budget = self.DEFAULT_GROUP_ICL_TOKEN_BUDGET

```
</issue_to_address>

### Comment 2
<location path="astrbot/core/agent/runners/tool_loop_agent_runner.py" line_range="114-117" />
<code_context>
     TOOL_RESULT_MAX_ESTIMATED_TOKENS = 27_500
     TOOL_RESULT_PREVIEW_MAX_ESTIMATED_TOKENS = 7000
+    REQUEST_WARN_ESTIMATED_INPUT_TOKENS = 16_000
+    REQUEST_WARN_IMAGE_COUNT = 1
     EMPTY_OUTPUT_RETRY_ATTEMPTS = 3
     EMPTY_OUTPUT_RETRY_WAIT_MIN_S = 1
</code_context>
<issue_to_address>
**nitpick:** The `REQUEST_WARN_IMAGE_COUNT` threshold name/usage is slightly confusing.

With `REQUEST_WARN_IMAGE_COUNT = 1` and the check `image_count > self.REQUEST_WARN_IMAGE_COUNT`, the warning only fires for 2+ images. That’s a reasonable behavior, but the name suggests “warn at 1 image.” Either switch the comparison to `>=` if you want to warn on the first image, or rename the constant (e.g., `REQUEST_WARN_IMAGE_COUNT_THRESHOLD`) to better reflect the current `>` semantics.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2026-05-10T13:59:02Z

        except BaseException as e:
            logger.error(e)
-            max_cnt = 300
+            max_cnt = self.DEFAULT_MAX_GROUP_MESSAGES
+        max_cnt = max(1, max_cnt)
+        try:
+            group_icl_token_budget = int(
+                cfg["provider_ltm_settings"].get(
+                    "group_icl_token_budget",
+                    self.DEFAULT_GROUP_ICL_TOKEN_BUDGET,
+                )


suggestion (bug_risk): Catching BaseException here is broader than needed and may hide unexpected issues.

This except BaseException will also catch KeyboardInterrupt and SystemExit, which we usually want to propagate. Since only config parsing and int casting can fail here, narrowing to Exception (or even ValueError/TypeError) would avoid swallowing truly exceptional conditions while still handling malformed configs safely.

Suggested implementation:

except (ValueError, TypeError, KeyError) as e: logger.error(e) max_cnt = self.DEFAULT_MAX_GROUP_MESSAGES

except (ValueError, TypeError) as e: logger.error(e) group_icl_token_budget = self.DEFAULT_GROUP_ICL_TOKEN_BUDGET

sourcery-ai · 2026-05-10T13:59:02Z

+    REQUEST_WARN_IMAGE_COUNT = 1
    EMPTY_OUTPUT_RETRY_ATTEMPTS = 3
    EMPTY_OUTPUT_RETRY_WAIT_MIN_S = 1
    EMPTY_OUTPUT_RETRY_WAIT_MAX_S = 4


nitpick: The REQUEST_WARN_IMAGE_COUNT threshold name/usage is slightly confusing.

With REQUEST_WARN_IMAGE_COUNT = 1 and the check image_count > self.REQUEST_WARN_IMAGE_COUNT, the warning only fires for 2+ images. That’s a reasonable behavior, but the name suggests “warn at 1 image.” Either switch the comparison to >= if you want to warn on the first image, or rename the constant (e.g., REQUEST_WARN_IMAGE_COUNT_THRESHOLD) to better reflect the current > semantics.

gemini-code-assist

Code Review

This pull request implements token budgeting and cost safety measures for group chat context (ICL). Key changes include introducing a token estimation mechanism, truncating chat history to fit a configurable budget, and updating default context limits to more conservative values. It also adds pre-flight logging for high-token or multi-image requests and moves group context from the system prompt to temporary user content parts to prevent persistence in history. Review feedback identifies efficiency and accuracy issues in the token estimation logic, recommends against catching BaseException, suggests preventing log flooding from image caption warnings, and points out a missing warning case for truncated single messages.

gemini-code-assist · 2026-05-10T14:02:13Z

+        chinese_count = len([c for c in text if "\u4e00" <= c <= "\u9fff"])
+        other_count = len(text) - chinese_count
+        return int(chinese_count * 0.6 + other_count * 0.3)


这里的 Token 估算逻辑存在两个问题：

效率问题：使用列表推导式 [c for c in text if ...] 会在内存中创建一个完整的字符列表，对于长文本（如群聊历史）会造成不必要的内存开销。建议改用 sum(1 for c in text if ...)。

准确性问题：中文字符的倍率 0.6 严重偏低。在主流模型（如 GPT-4o, DeepSeek）的 tokenizer 中，一个中文字符通常对应 1.5 到 2.0 个 tokens。使用 0.6 会导致严重低估实际消耗，从而削弱此 PR 旨在降低成本风险的效果。建议将中文字符倍率调整为 1.5 左右。

Suggested change

chinese_count = len([c for c in text if "\u4e00" <= c <= "\u9fff"])

other_count = len(text) - chinese_count

return int(chinese_count * 0.6 + other_count * 0.3)

chinese_count = sum(1 for c in text if "\u4e00" <= c <= "\u9fff")

other_count = len(text) - chinese_count

return int(chinese_count * 1.5 + other_count * 0.3)

gemini-code-assist · 2026-05-10T14:02:13Z

        except BaseException as e:
            logger.error(e)
-            max_cnt = 300
+            max_cnt = self.DEFAULT_MAX_GROUP_MESSAGES
+        max_cnt = max(1, max_cnt)
+        try:
+            group_icl_token_budget = int(
+                cfg["provider_ltm_settings"].get(
+                    "group_icl_token_budget",
+                    self.DEFAULT_GROUP_ICL_TOKEN_BUDGET,
+                )
+            )
+        except BaseException as e:
+            logger.error(e)
+            group_icl_token_budget = self.DEFAULT_GROUP_ICL_TOKEN_BUDGET


捕获 BaseException 是不推荐的做法，因为它会捕获包括 SystemExit 和 KeyboardInterrupt 在内的所有异常，这可能会导致程序无法正常退出或调试困难。建议改为捕获具体的异常（如 ValueError, KeyError）或者至少改为 Exception。

Suggested change

except BaseException as e:

logger.error(e)

max_cnt = 300

max_cnt = self.DEFAULT_MAX_GROUP_MESSAGES

max_cnt = max(1, max_cnt)

try:

group_icl_token_budget = int(

cfg["provider_ltm_settings"].get(

"group_icl_token_budget",

self.DEFAULT_GROUP_ICL_TOKEN_BUDGET,

)

)

except BaseException as e:

logger.error(e)

group_icl_token_budget = self.DEFAULT_GROUP_ICL_TOKEN_BUDGET

except Exception as e:

logger.error(e)

max_cnt = self.DEFAULT_MAX_GROUP_MESSAGES

max_cnt = max(1, max_cnt)

try:

group_icl_token_budget = int(

cfg["provider_ltm_settings"].get(

"group_icl_token_budget",

self.DEFAULT_GROUP_ICL_TOKEN_BUDGET,

)

)

except Exception as e:

logger.error(e)

group_icl_token_budget = self.DEFAULT_GROUP_ICL_TOKEN_BUDGET

gemini-code-assist · 2026-05-10T14:02:13Z

+                        logger.warning(
+                            "Group ICL image caption is enabled. Each group image may trigger an extra multimodal request. umo=%s, provider=%s",
+                            event.unified_msg_origin,
+                            cfg["image_caption_provider_id"],
+                        )


在 handle_message 中对每张图片都输出 warning 日志会导致日志洪泛（Log Flooding），尤其是在活跃的群聊中。这会干扰管理员查看其他重要日志。建议增加一个标记位，确保每个会话或每次启动仅针对该配置提醒一次。

if not getattr(self, "_image_caption_warned", False): logger.warning( "Group ICL image caption is enabled. Each group image may trigger an extra multimodal request. umo=%s, provider=%s", event.unified_msg_origin, cfg["image_caption_provider_id"], ) self._image_caption_warned = True

gemini-code-assist · 2026-05-10T14:02:13Z

+            self.session_chats[event.unified_msg_origin],
+            cfg["group_icl_token_budget"],
+        )
+        if omitted > 0:


目前的逻辑仅在 omitted > 0（即有完整的消息被丢弃）时输出警告。如果群聊中只有一条消息但该消息被截断了（omitted 为 0），则不会触发警告。建议同时检查 chats_str 是否包含截断标记。

Suggested change

if omitted > 0:

if omitted > 0 or chats_str.startswith("[truncated]"):

RC-CHN · 2026-05-11T09:02:16Z

将在 #8144 中处理重构

在我个人部署的实例上进行了初步测试，显示新策略能较好地命中KV cache，我个人认为旧有的群聊上下文管理机制存在一定历史遗留问题需要重构，对其再打补丁已不是很有必要，仍然感谢您所做的贡献。

Sisyphbaous-DT-Project added 2 commits May 10, 2026 21:35

fix: reduce LLM context cost risks

8190ef2

fix: 修复群聊上下文临时化与预算裁剪

ec9add4

dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. labels May 10, 2026

sourcery-ai Bot reviewed May 10, 2026

View reviewed changes

gemini-code-assist Bot reviewed May 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: 降低 LLM 上下文调用成本风险#8139

Hidden character warning

fix: 降低 LLM 上下文调用成本风险#8139
Sisyphbaous-DT-Project wants to merge 2 commits into
AstrBotDevs:masterfrom
Sisyphbaous-DT-Project:修复-issue-8080-缓存成本保护

Sisyphbaous-DT-Project commented May 10, 2026 •

edited

Loading

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

sourcery-ai Bot May 10, 2026

Uh oh!

sourcery-ai Bot May 10, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 10, 2026

Uh oh!

gemini-code-assist Bot May 10, 2026

Uh oh!

gemini-code-assist Bot May 10, 2026

Uh oh!

gemini-code-assist Bot May 10, 2026

Uh oh!

RC-CHN commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	if omitted > 0:
	if omitted > 0 or chats_str.startswith("[truncated]"):

Uh oh!

Conversation

Sisyphbaous-DT-Project commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation / 动机

Modifications / 改动点

1. 调整默认上下文策略

2. 群聊上下文不再注入 system prompt

3. 给群聊 ICL 增加 token budget

4. 增加 LLM 请求成本预检日志

5. system prompt 被 hook 修改时输出 warning

6. 群聊图片转述增加成本提示

7. 增加单元测试

Screenshots or Test Results / 运行截图或测试结果

Checklist / 检查清单

Summary by Sourcery

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

RC-CHN commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Sisyphbaous-DT-Project commented May 10, 2026 •

edited

Loading