Skip to content

fix(browser_planner): gate multimodal content on per-model vision support#251

Open
haroldship wants to merge 4 commits into
mainfrom
fix/214-hybrid-web-agent-multimodal-content
Open

fix(browser_planner): gate multimodal content on per-model vision support#251
haroldship wants to merge 4 commits into
mainfrom
fix/214-hybrid-web-agent-multimodal-content

Conversation

@haroldship
Copy link
Copy Markdown
Collaborator

@haroldship haroldship commented May 20, 2026

Bug fix

Fixes #214

Summary

When advanced_features.use_vision = true (the default), load_prompt_with_image unconditionally builds the browser planner's user message as a list-typed multimodal [image_url, text] block. LLM endpoints that don't accept multimodal content (e.g. WatsonX served via an openai-compatible gateway) then reject the request with 400 messages[1].content must be a string, killing hybrid-mode web-agent runs before the first step.

This change introduces a per-model enable_vision flag (resolved by the new model_supports_vision helper, defaulting to True to preserve existing behavior for vision-capable models). Both the prompt template structure (load_prompt_with_image) and the runtime image attachment (BrowserPlannerAgent.run) now gate on settings.advanced_features.use_vision AND model_supports_vision(model_config). Operators whose LLM endpoint rejects multimodal content can set enable_vision = false on the relevant agent.*.model config (e.g. agent.planner.model) to force string-only user content.

Why a per-model flag

The bug surfaces when the endpoint (not the model) cannot deserialize multimodal content blocks. Detecting endpoint capabilities at runtime is brittle, and silently downgrading vision would mask misconfigurations. An explicit enable_vision = false opt-out is small, discoverable, and additive — existing model configs continue to work unchanged because the default is True.

Testing

Rebased onto main (69b7e40a) — branch contains a single commit 9b7bad82.

New regression tests in tests/unit/test_load_prompt_with_image_vision_gate.py (7 tests):

  • model_supports_vision defaults: None config and missing-attribute config both default to True; explicit enable_vision=False and enable_vision=True are honored.
  • Human message is a string when the model disables vision, even with global use_vision=True (the regression for [Bug]: Web agent in hybrid mode fails with "messages[1].content must be a string" (400 invalid_request_error) #214).
  • Human message is multimodal (list with image_url + text items) when both global setting and per-model flag are True.
  • Human message is a string when the global setting is False regardless of per-model flag.

Test results (against this branch, rebased on main)

Suite Result
tests/unit/test_load_prompt_with_image_vision_gate.py (new) ✅ 7 / 7 passed
src/cuga/backend/tools_env/registry/tests/ ✅ all passed
src/cuga/backend/cuga_graph/nodes/api/variables_manager/tests/ ✅ all passed
src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/tests/ (non-e2b) ✅ all passed
tests/unit/test_task_analyzer_app_matching.py ✅ all passed
tests/unit/test_cuga_lite_bind_tools.py ✅ all passed
src/cuga/backend/cuga_graph/nodes/cuga_lite/executors/tests/test_e2b_lite.py ⚠️ 3 failed — RuntimeError: e2b-code-interpreter package not installed (env-only; CI runs these with uv sync --extra e2b per src/scripts/run_tests.sh)
src/cuga/sdk_core/tests/ (live LLM integration) ⚠️ flaky locally — pass individually, fail when run in large batches due to external-service rate limits / setup. Unrelated to this diff.
ruff check (touched files) ✅ clean
ruff format --check (touched files) ✅ clean
CodeRabbit CLI (--type committed --base main) ✅ no findings against this diff

Aggregate: 211 passing across the unit-test subset relevant to this change, with the only failures being env/network issues that also fail on main.

  • Verified fix locally; tests pass

Summary by CodeRabbit

  • Bug Fixes
    • Validate per-model vision support before attaching images; if a model rejects multimodal input, disable vision for subsequent attempts and retry with text-only payloads.
  • New Features
    • Added a per-model vision capability check and a constructor-level toggle to control vision usage.
  • Tests
    • Added tests covering vision gating, multimodal vs. text-only prompt formats, defaults, and fallback behavior when vision is disabled.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 20, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1ac00501-b555-4aba-a32e-2a868f9e7dd5

📥 Commits

Reviewing files that changed from the base of the PR and between 24ef14f and be8a969.

📒 Files selected for processing (1)
  • src/cuga/backend/cuga_graph/nodes/browser/browser_planner_agent/browser_planner_agent.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/cuga/backend/cuga_graph/nodes/browser/browser_planner_agent/browser_planner_agent.py

📝 Walkthrough

Walkthrough

Gates multimodal prompts using a per-model capability check; BrowserPlannerAgent receives an effective vision flag, conditionally attaches images, detects vision-rejection responses, disables vision, and retries text-only. Tests cover model- and global-level gating and prompt formatting.

Changes

Per-model vision capability detection and gating

Layer / File(s) Summary
Model vision capability detection and prompt formatting
src/cuga/backend/llm/utils/helpers.py
Adds model_supports_vision() (defaults true; reads enable_vision) and updates load_prompt_with_image() to use global setting AND per-model capability to choose string vs multimodal human message.
BrowserPlannerAgent vision capability integration
src/cuga/backend/cuga_graph/nodes/browser/browser_planner_agent/browser_planner_agent.py
Constructor gains use_vision_effective; run() sets data["use_vision"] from it, conditionally attaches the last image, detects multimodal rejection markers, disables vision on rejection, removes img, updates data, and retries the chain with text-only payload. create() computes and injects use_vision_effective.
Vision gating regression tests
tests/unit/test_load_prompt_with_image_vision_gate.py
New pytest module exercising model_supports_vision() defaults and gating; asserts load_prompt_with_image() yields string when vision disabled and multimodal list containing both image_url and text when enabled.

Sequence Diagram(s)

sequenceDiagram
  participant Factory as create()
  participant Settings as settings
  participant ModelCheck as model_supports_vision()
  participant Agent as BrowserPlannerAgent
  participant LLM as chain.ainvoke

  Factory->>Settings: read advanced_features.use_vision
  Factory->>ModelCheck: check dyna_model capability
  ModelCheck->>Factory: return enable_vision?
  Factory->>Agent: __init__(use_vision_effective=...)
  Agent->>Agent: store use_vision_effective
  Agent->>LLM: chain.ainvoke(data with img if use_vision_effective)
  LLM-->>Agent: returns or raises vision rejection
  Agent->>Agent: on rejection set use_vision_effective=False, remove img
  Agent->>LLM: chain.ainvoke(text-only data)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I packed an image for the call,
but the model frowned and dropped it all.
So now I check per-model and global too,
retry with words when lists won't do.
Hooray — no more image-induced fall!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 55.56% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(browser_planner): gate multimodal content on per-model vision support' clearly summarizes the main change—gating multimodal behavior based on per-model vision configuration.
Linked Issues check ✅ Passed The changes fully address issue #214 by preventing list-typed multimodal messages to models rejecting them, introducing per-model enable_vision config, preserving multimodal by default, and adding comprehensive tests.
Out of Scope Changes check ✅ Passed All changes are in scope: BrowserPlannerAgent refactored to use per-model vision gating, model_supports_vision helper added, load_prompt_with_image updated for dual gating, and unit tests added—all directly addressing issue #214.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/214-hybrid-web-agent-multimodal-content

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cuga/sdk_core/tests/test_sdk_token_usage_tracker.py`:
- Around line 86-118: The test currently creates a local ActivityTracker and
compares it with the tracker inside TokenUsageTracker, but the SDK should expose
a shared tracker on the agent; update the test to use the agent's shared tracker
instead of creating a new one. Specifically, remove the local `tracker =
ActivityTracker()` and instead obtain the tracker from the agent (e.g., `tracker
= agent._activity_tracker` or after building callbacks assert
`token_tracker.tracker is agent._activity_tracker`), reset `tracker.prompts =
[]`, then proceed to call `tracker.collect_prompt(...)` and assert prompts; also
replace the identity assertion `token_tracker.tracker is tracker` with
`token_tracker.tracker is agent._activity_tracker` (or equivalent) to match the
fixed SDK behavior.

In `@src/cuga/sdk.py`:
- Around line 1346-1366: The ActivityTracker is recreated each time in
_build_callbacks(), preventing accumulation across calls; initialize
self._activity_tracker once in __init__ (create ActivityTracker() there) and
then change _build_callbacks() to use that instance (e.g.,
TokenUsageTracker(self._activity_tracker)) so all invocations from
_apply_callbacks(), invoke(), stream() and graph creation share the same
tracker; if needed, lazily create self._activity_tracker inside
_build_callbacks() only when it is None to preserve backward compatibility.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: aeaf6429-3498-49e7-814c-ae256d66bb68

📥 Commits

Reviewing files that changed from the base of the PR and between 69b7e40 and 819d5b5.

📒 Files selected for processing (5)
  • src/cuga/backend/cuga_graph/nodes/browser/browser_planner_agent/browser_planner_agent.py
  • src/cuga/backend/llm/utils/helpers.py
  • src/cuga/sdk.py
  • src/cuga/sdk_core/tests/test_sdk_token_usage_tracker.py
  • tests/unit/test_load_prompt_with_image_vision_gate.py

Comment thread src/cuga/sdk_core/tests/test_sdk_token_usage_tracker.py
Comment thread src/cuga/sdk.py
…n support

When the global `advanced_features.use_vision = true` setting is on,
`load_prompt_with_image` would unconditionally wrap the browser planner's
human message as a list-typed multimodal `[image_url, text]` block. LLM
endpoints that don't accept multimodal content (e.g. WatsonX served via an
openai-compatible gateway) then reject the request with
`400 messages[1].content must be a string`, killing hybrid-mode web-agent
runs before the first step (issue #214).

Add a per-model `enable_vision` flag (resolved by `model_supports_vision`,
defaulting to `True` to preserve existing behavior for vision-capable
models) and gate both the prompt template structure and the runtime image
attachment on `use_vision AND model_supports_vision`. Operators whose LLM
endpoint rejects multimodal content can set `enable_vision = false` on the
relevant `agent.*.model` config to force string-only user content.

Closes #214
@haroldship haroldship force-pushed the fix/214-hybrid-web-agent-multimodal-content branch from 819d5b5 to 9b7bad8 Compare May 20, 2026 14:35
@haroldship
Copy link
Copy Markdown
Collaborator Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 20, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@haroldship haroldship requested a review from sami-marreed May 20, 2026 16:19
"""
if model_config is None:
return True
return bool(getattr(model_config, 'enable_vision', True))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this checks if the enable_vision is given from model config? not like based on model e.g. claude 4.6 supports vision why do we need to add flag

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question — the short answer is the bug isn't actually about model capability; it's about endpoint capability.

The 400 in #214 came from a setup where the model itself does support vision (Llama 4 Maverick served via WatsonX, which is multimodal-capable), but the openai-compatible gateway in front of it doesn't accept multimodal content blocks — it rejects messages[1].content if it's a list. So a name-based allowlist ("claude-4.6 → multimodal OK") would have given the wrong answer here: it would have said yes, sent the list-typed payload, and produced the same 400.

The reason for the model name failing as a signal:

  • Same model, multiple endpoints — e.g. WatsonX-via-openai-compatible vs WatsonX native. One accepts multimodal, the other doesn't.
  • Provider names drift fast (litellm prefixes, openrouter aliases, custom model_name strings). A registry of "these names support vision" would need constant upkeep and would still miss the gateway dimension.
  • Silently downgrading vision when we think the model doesn't support it would mask configuration mistakes — e.g. someone pointing at the wrong endpoint and not noticing because we quietly stripped images.

The enable_vision flag defaults to True, so anyone whose endpoint already works keeps working with no change. The only people who need to set it are those with a non-multimodal gateway — and for them, an explicit opt-out is more discoverable than "why did my screenshots stop being sent?". Open to a follow-up that adds a name-based default if that's useful for the common case, but I'd want it to be additive (a hint, not authoritative) so we can still override per endpoint.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so can we try catch error when image doesn't work and simply then try without

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea — implemented in 24ef14f. BrowserPlannerAgent.run() now attempts the multimodal call first; if it raises with a vision-rejection signature (content must be a string, image_url, vision, etc.) and we actually attached an image, it logs a warning, flips self.use_vision_effective = False so the next turn doesn't repeat the rejection, and retries the current turn text-only (also flipping the use_vision template var so the prompt's image block stays in sync). Unrelated exceptions re-raise unchanged. The static enable_vision flag stays as a fast opt-out for configs where you already know the endpoint is text-only — but it's no longer required for correctness.

@haroldship
Copy link
Copy Markdown
Collaborator Author

Merged latest main into this branch (clean merge, no conflicts despite #250's cuga_agent_core refactor — this PR only touches browser_planner_agent.py and llm/utils/helpers.py, which #250 didn't move).

uv run ruff format + uv run ruff check: clean.
PR-specific tests (tests/unit/test_load_prompt_with_image_vision_gate.py): 7/7 passing.
uv run pytest tests/unit/: 189 passing, 56 deselected. The deselected tests (test_manage_publish_sync.py, test_workspace_sandbox.py, test_tool_use_failed_recovery.py, test_skill_loader.py) are all pre-existing failures on main at 27bd35e — unrelated to this PR.

@haroldship
Copy link
Copy Markdown
Collaborator Author

Local stability test (uv run run_stability_tests.py --method local): 10/10 passed, 100% pass rate after the main merge.

…al payload

Per review feedback, the per-model ``enable_vision`` flag isn't always known
ahead of time (especially when the same model is fronted by different
openai-compatible gateways). Catch vision-rejection errors at call time, log
a warning, disable vision on this agent for subsequent calls, and retry the
current invocation with a text-only payload.

Detection is a string-based heuristic against the exception message
(``content must be a string``, ``image_url``, ``vision``, etc.); unrelated
exceptions are re-raised unchanged.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
src/cuga/backend/cuga_graph/nodes/browser/browser_planner_agent/browser_planner_agent.py (1)

24-31: Consider monitoring vision-rejection patterns in production logs.

The marker set is intentionally heuristic but may not cover all provider-specific rejection messages. Monitor warnings from line 88-91 in production to identify new patterns and expand this tuple as needed.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@src/cuga/backend/cuga_graph/nodes/browser/browser_planner_agent/browser_planner_agent.py`
around lines 24 - 31, The tuple _VISION_REJECTION_MARKERS is a heuristic list of
vision-rejection phrases and should be monitored and expanded over time; update
the code around the warning emission that references _VISION_REJECTION_MARKERS
(the warning block that currently logs when a vision rejection is detected) to
emit a structured production metric/event (and include the raw provider message)
whenever a match occurs so you can aggregate and inspect new rejection patterns
in logs/monitoring; use that telemetry to periodically extend
_VISION_REJECTION_MARKERS with newly observed phrases.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@src/cuga/backend/cuga_graph/nodes/browser/browser_planner_agent/browser_planner_agent.py`:
- Around line 88-91: The warning in browser_planner_agent.py currently logs only
the exception type in the logger.warning call in BrowserPlannerAgent (the
logger.warning call around the vision rejection handling); update that log to
include the exception message as well (e.g., include str(exc) or format as
f"{type(exc).__name__}: {exc}") so the actual rejection reason/API error is
captured alongside the type for diagnosing entries added to
_VISION_REJECTION_MARKERS. Ensure you modify the logger.warning invocation that
currently uses type(exc).__name__ to include the exception message in the same
structured/logged message.
- Around line 34-37: Change the function signature for
_looks_like_vision_rejection to accept Exception instead of BaseException to
avoid catching system-level exceptions; update the parameter type in the def
line and any internal references (msg = str(exc).lower()) remains the same, and
ensure uses of _VISION_REJECTION_MARKERS are unchanged so the heuristic still
checks markers in the exception message.
- Around line 75-80: ActivityTracker is a singleton with a shared mutable list
images that's read in browser_planner_agent (the tracker variable) without
synchronization, risking race conditions between len(tracker.images) and
tracker.images[-1]; fix by adding thread-safety inside ActivityTracker:
introduce a threading.Lock (or RLock) and wrap all mutations (e.g.,
collect_image) and reads with that lock, and expose a safe accessor like
ActivityTracker.get_last_image() or ActivityTracker.snapshot_images() that
returns a stable copy or the last image under the lock; then update
browser_planner_agent to call that accessor (e.g., img =
tracker.get_last_image()) and only attach data['img'] if the accessor returns a
non-None value.

---

Nitpick comments:
In
`@src/cuga/backend/cuga_graph/nodes/browser/browser_planner_agent/browser_planner_agent.py`:
- Around line 24-31: The tuple _VISION_REJECTION_MARKERS is a heuristic list of
vision-rejection phrases and should be monitored and expanded over time; update
the code around the warning emission that references _VISION_REJECTION_MARKERS
(the warning block that currently logs when a vision rejection is detected) to
emit a structured production metric/event (and include the raw provider message)
whenever a match occurs so you can aggregate and inspect new rejection patterns
in logs/monitoring; use that telemetry to periodically extend
_VISION_REJECTION_MARKERS with newly observed phrases.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d821a190-00a6-443f-9bbe-d4c1c8100d73

📥 Commits

Reviewing files that changed from the base of the PR and between 9b7bad8 and 24ef14f.

📒 Files selected for processing (1)
  • src/cuga/backend/cuga_graph/nodes/browser/browser_planner_agent/browser_planner_agent.py

Copy link
Copy Markdown
Contributor

@sami-marreed sami-marreed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the #214 fix and the rebase.

Scope looks correct (helpers + browser planner + unit tests). The old CodeRabbit threads on sdk.py and test_sdk_token_usage_tracker.py are out of scope for this PR — fine to leave resolved.

Design LGTM: per-model enable_vision plus runtime multimodal→text-only fallback is the right approach for gateways that reject list-typed content. The retry in BrowserPlannerAgent.run() addresses the try/catch concern.

Requesting two small fixes on browser_planner_agent.py (inline). Once those are in, good to merge from my side.

"""
if model_config is None:
return True
return bool(getattr(model_config, 'enable_vision', True))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Design LGTM. Explicit enable_vision on the model config is the right signal when the failure is the endpoint/gateway rejecting multimodal content, not the model lacking vision. Default True keeps existing setups unchanged.

No change needed in helpers.py — the runtime fallback in BrowserPlannerAgent.run() covers the try/catch / retry-without-image case.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks — agreed, helpers.py stays as the fast opt-out via enable_vision = false, and the runtime try/catch in BrowserPlannerAgent.run() handles the case where the gateway/endpoint rejects multimodal content despite the model supporting it.

Comment on lines +34 to +37
def _looks_like_vision_rejection(exc: BaseException) -> bool:
"""Heuristic: did the endpoint reject the request because of image content?"""
msg = str(exc).lower()
return any(marker in msg for marker in _VISION_REJECTION_MARKERS)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change _looks_like_vision_rejection(exc: BaseException) to exc: Exception. run() already catches Exception; BaseException is too broad (e.g. KeyboardInterrupt, SystemExit) and shouldn't be part of this path.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in be8a969 — signature is now _looks_like_vision_rejection(exc: Exception).

Comment on lines +88 to +91
logger.warning(
"Vision request rejected ({}); disabling vision for this agent and retrying text-only.",
type(exc).__name__,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please include str(exc) in the vision-rejection warning, not only type(exc).__name__. We need the actual API error text to validate and extend _VISION_REJECTION_MARKERS over time.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in be8a969exc is now interpolated into the warning alongside the type name, so the raw API error text is captured for future updates to _VISION_REJECTION_MARKERS.

…ge read

Review feedback follow-ups:

- ``_looks_like_vision_rejection`` now takes ``exc: Exception`` (was
  ``BaseException``) — interpreter-control exceptions (``KeyboardInterrupt``,
  ``SystemExit``) were never meant to flow through this heuristic, and
  ``run()`` already catches ``Exception``.
- Vision-rejection warning now includes ``str(exc)`` alongside the type, so
  the actual API error text is captured for validating/extending
  ``_VISION_REJECTION_MARKERS`` over time.
- Image attach is now an atomic try/except snapshot on ``tracker.images[-1]``
  instead of length-check-then-index, so concurrent ``collect_image`` calls
  on the shared ``ActivityTracker`` singleton can't cause an ``IndexError``
  or stale-index attachment between the two reads.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Web agent in hybrid mode fails with "messages[1].content must be a string" (400 invalid_request_error)

2 participants