Skip to content

Retrievable tool-output overflow: implement and wire an ArtifactStore so large tool outputs are preserved, not discarded #2148

Description

@MervinPraison

Summary

When a tool returns a large result, the agent runtime truncates it to a fixed character budget and discards the overflow. The truncated content is lost — the agent cannot page back to it. This degrades reliability on exactly the workflows that matter most in CLI-first/coding contexts: grepping a repo, reading a large file, capturing a build/test log, or a verbose API response.

The core SDK already defines the right abstraction for this — ArtifactStoreProtocol and ArtifactRef (with head / tail / grep / chunk / load) in praisonaiagents/context/artifacts.py — but there is no concrete implementation and it is not wired into the tool-execution truncation path. This issue is about finishing that loop.

Current behaviour

  • praisonaiagents/agent/tool_execution.py (~lines 309–345) truncates tool output to a head+tail preview and throws the middle away:
    limit = getattr(self, 'tool_output_limit', 16000)
    if len(result_str) > limit:
        tail_size = min(limit // 5, 2000)
        head = result_str[:limit - tail_size]
        tail = result_str[-tail_size:] if tail_size > 0 else ""
        truncated = f"{head}\n...[{len(result_str):,} chars, showing first/last portions]...\n{tail}"
    The dropped content is unrecoverable; nothing is persisted and no reference is returned to the model.
  • Built-in shell tooling caps output with a hardcoded byte limit (max_output_size default ~10000 in praisonaiagents/tools/shell_tools.py), tail-biased, with no spill.
  • Limits are hardcoded (16000 chars in the agent loop, ~10000 bytes in shell) — there is no CLI flag, no YAML key, and no Config knob to tune them, and no line-based (max_lines) limit or head/tail direction control.
  • ArtifactStoreProtocol, ArtifactRef, ArtifactMetadata, GrepMatch are all defined in praisonaiagents/context/artifacts.py (the protocol exposes store, load, tail, head, grep, chunk, delete, list_artifacts), but:
    • the only FileSystemArtifactStore reference is inside the module docstring (illustration), not a real class;
    • no concrete adapter exists in praisonaiagents or praisonai;
    • nothing in the tool loop ever constructs an ArtifactRef or calls the store.

Desired behaviour

When a tool result exceeds the configured budget, the runtime should:

  1. Spill the full output to a content-addressed file via a concrete ArtifactStore.
  2. Return to the model a compact preview plus a retrievable ArtifactRef (using the existing ArtifactRef.to_inline()), so the model knows the full output exists and where it is.
  3. Expose lightweight follow-up operations the agent can call to page through the preserved output on demand: head, tail, grep, chunk, load (the protocol already specifies these).
  4. Make the budget configurable (per-run and per-tool) with max_bytes, max_lines, and head/tail direction, surfaced via Python config, YAML, and a CLI flag.
  5. Garbage-collect spilled artifacts after a retention window so the store does not grow unbounded.

This turns a lossy, silent truncation into a durable, browsable artifact — the agent retries less, wastes fewer tokens re-running tools, and large outputs stop blowing the context window.

Layer placement

  • Primary layer: core (praisonaiagents)
  • Why not core → (this is core): The truncation/spill happens inside the agent tool-execution loop and the built-in tools; the protocol it completes (ArtifactStoreProtocol) already lives in core. A default FileSystemArtifactStore adapter + the wiring belong next to the protocol so every entry point (Python, YAML, CLI) benefits uniformly.
  • Why not wrapper: Wrapper would only cover the CLI path; the data-loss bug affects every Agent run regardless of entry point, so a wrapper-only fix leaves the SDK and YAML paths broken. (Wrapper still gets a thin surface — see secondary touch.)
  • Why not tools: This is lifecycle behaviour of the tool-execution loop (how the runtime handles any tool's output), not a single agent-callable integration. Re-implementing per tool in praisonai-tools would duplicate logic and miss MCP/plugin tools.
  • Why not plugins: It is not a policy/guardrail hook — it is core runtime output handling that must run on the hot path by default, not an optional lifecycle plugin.
  • Secondary touch (optional): wrapper exposes a CLI flag (e.g. --tool-output-max-bytes / --tool-output-max-lines) and config/YAML keys (tool_output.max_bytes, tool_output.max_lines, tool_output.direction, tool_output.retention_days); the existing OutputConfig/ExecutionConfig consolidation is the natural home for the Python surface.
  • 3-way surface (CLI + YAML + Python): yes

Proposed approach

  1. Add a concrete FileSystemArtifactStore adapter in core implementing ArtifactStoreProtocol, persisting under the existing data dir (e.g. ~/.praisonai/artifacts/<agent_id>/<run_id>/), content-addressed by SHA256, with the metadata already modelled in ArtifactRef/ArtifactMetadata.
  2. In tool_execution.py, replace lossy truncation with: build the preview and store(...) the full result, then attach ArtifactRef.to_inline() to what the model sees. Keep the existing zero-overhead fast path for small outputs (no store call when under budget).
  3. Add agent-callable retrieval tools (artifact_head, artifact_tail, artifact_grep, artifact_chunk, artifact_load) thin-wrapping the store, registered lazily so they cost nothing until an overflow occurs.
  4. Introduce a ToolOutputConfig (or extend OutputConfig/ExecutionConfig) for max_bytes, max_lines, direction, retention_days; thread it through Python, YAML loader, and a CLI flag on praisonai run.
  5. Add a retention/GC sweep (age-based, mirroring the existing cache/checkpoint cleanup conventions) so artifacts are pruned.

Resolution sketch

  • praisonaiagents/context/artifacts.py — add FileSystemArtifactStore(ArtifactStoreProtocol) (currently only a docstring stub).
  • praisonaiagents/agent/tool_execution.py (~309–345) — spill-on-overflow + return ArtifactRef; preserve fast path for small results.
  • praisonaiagents/tools/ — new lazy artifact_* retrieval tools backed by the store.
  • praisonaiagents/config/ToolOutputConfig (max_bytes, max_lines, direction, retention_days); follow the False=disabled / True=defaults / Config=custom consolidation pattern.
  • praisonaiagents/paths.pyget_artifacts_dir() helper + GC hook.
  • praisonai/praisonai/cli/commands/run.py — surface the budget flags; YAML loader maps tool_output: keys.
  • Backward compatible: defaults reproduce today's preview behaviour; spill + retrieval are additive and opt-in-by-default-safe (no API breakage).

Severity

High. This is silent data loss on the agent's hot path. It directly causes failed/looping tool use, wasted tokens, and context-window overflows in long CLI/coding sessions — the headline use case. The fix is well-scoped because the protocol and reference types already exist; only the implementation and wiring are missing.

Validation

  • Real value: Production CLI/coding runs routinely produce tool outputs far above the 16000-char/10000-byte caps (repo greps, large file reads, build/test logs). Today that content is gone; the agent cannot recover it.
  • Traced in code: lossy truncation at praisonaiagents/agent/tool_execution.py:~309–345; hardcoded shell cap in praisonaiagents/tools/shell_tools.py; complete-but-unimplemented ArtifactStoreProtocol/ArtifactRef in praisonaiagents/context/artifacts.py (the FileSystemArtifactStore mention is docstring-only; no concrete class, no callers).
  • Reference pattern exists in the wild: mature terminal coding agents spill overflowing tool output to a temp file, hand the model a path/preview, let it page back via head/tail/grep, and GC after a retention window — exactly the API ArtifactStoreProtocol already specifies.
  • Layer-validated: single primary layer (core); fix on the hot path benefits Python + YAML + CLI uniformly; wrapper/tools/plugins rejected for the reasons above.
  • Aligned with design principles: protocol-first (implements an existing protocol), lazy (retrieval tools and store calls only engage on overflow), safe defaults (additive; current preview preserved), backward compatible.

Acceptance criteria

  • Concrete ArtifactStore adapter implementing the existing protocol.
  • Tool outputs over budget are spilled and surfaced to the model as a preview + ArtifactRef, with no loss of the full content.
  • Agent can retrieve more via head/tail/grep/chunk/load.
  • Budget (max_bytes, max_lines, direction) configurable via Python, YAML, and CLI.
  • Age-based GC of stored artifacts.
  • No regression in import time or the small-output fast path.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingclaudeAuto-trigger Claude analysisdocumentationImprovements or additions to documentation

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions