Context
Text/structured external integrations are already well served for agents: MCP servers, Tools, and OpenAPI Actions cover them cleanly, and LibreChat + @librechat/agents handle the wiring (including programmatic tool calling into the sandbox). No changes needed there.
The gap appears when files are involved. There is currently no good way for sandbox-executed code to use an external API that consumes or produces binary files, because network is capped for good reasons.
Net effect: anything shaped like "send file(s) + params → external API → get file(s)/text back" has no supported path today.
Worked example: DeepL document translation
A user uploads contract.docx in chat and asks the agent to translate it to German.
- DeepL's Document API is multipart upload → poll → download of binary files (up to ~30 MB).
- The agent's code runs in the sandbox, which has the file in the session store but no way to reach DeepL.
- MCP/Action can't carry the document in or the translated file back.
Possible directions
Option 1 — General capability: configurable "egress services" in the code-interpreter egress gateway
Extend the existing egress gateway (service/src/egress-gateway.ts) — which already does grant validation, the egress ledger, and scoped session-object read/write — with an operator-configured service registry and a generic, deny-by-default forwarder. The adapter for each external API would be a gateway configuration.
- New grant claim
allowed_services (deny-by-default), minted by the worker per execution.
- New route family
POST /services/:service/*, grant-gated and ledger-counted.
Example registry entry:
services:
deepl:
upstream: https://api.deepl.com
allowed_methods: [POST]
allowed_paths: ["/v2/document", "/v2/document/*"]
inject_header: { Authorization: "DeepL-Auth-Key ${DEEPL_API_KEY}" }
max_body_bytes: 31457280
rate: 60/min
Pros: no LibreChat code; adding a new file-API integration becomes config-only after the first PR; reuses grants/ledger/object-scope; sandbox keeps clone_newnet: true; secret stays out of the sandbox.
Cons / open points: the gateway is the component that runs in front of untrusted code, so this adds the one capability the architecture deliberately denies (external egress). Needs a strict static allowlist (SSRF), no redirect following, response-header hygiene (avoid reflecting the injected secret), and care around body-size/DoS (by-reference avoids large-body streaming through the proxy; a streaming/passthrough variant for large bodies would reopen the proxy's DoS model). Secrets in gateway config is a tradeoff vs. isolating them in a separate adapter service.
Option 2 — Accept the LibreChat custom-tool path
This path already works (like for Image Tools) but has much more friction.
The file handling on the Image Tools has a lot of less standardized plumbing. At the same time it's also quite complex, enabling multi upload and image editing etc.
The DeepL Example might be easier but we should solve this structurally. Otherwise any use-case involving files will hit a wall inside librechat.
Questions
- Strategic direction: Should we build the general file-egress capability in the code-interpreter egress gateway (Option 1), or standardize on the LibreChat custom-tool path (Option 2) for file-based external APIs?
- If you prefer keeping this out of the gateway, what's the intended pattern for file-bearing external APIs from agent workflows — is custom LibreChat tooling the expected/supported route?
- If Option 1 is welcome: is configuring per-service adapters inside the egress gateway acceptable, or do you prefer the adapters (and their secrets) to live in separate services that the gateway only allowlists as upstreams?
- Any constraints we should respect up front — grant-claim schema, ledger accounting, body-size/streaming policy, secret handling — so a PR aligns with the security model?
Happy to draft a detailed design (gateway forwarder + allowed_services claim + by-reference broker protocol + threat model) with DeepL as the first adapter, once there's a steer on direction.
Context
Text/structured external integrations are already well served for agents: MCP servers, Tools, and OpenAPI Actions cover them cleanly, and LibreChat +
@librechat/agentshandle the wiring (including programmatic tool calling into the sandbox). No changes needed there.The gap appears when files are involved. There is currently no good way for sandbox-executed code to use an external API that consumes or produces binary files, because network is capped for good reasons.
Net effect: anything shaped like "send file(s) + params → external API → get file(s)/text back" has no supported path today.
Worked example: DeepL document translation
A user uploads
contract.docxin chat and asks the agent to translate it to German.Possible directions
Option 1 — General capability: configurable "egress services" in the code-interpreter egress gateway
Extend the existing egress gateway (
service/src/egress-gateway.ts) — which already does grant validation, the egress ledger, and scoped session-object read/write — with an operator-configured service registry and a generic, deny-by-default forwarder. The adapter for each external API would be a gateway configuration.allowed_services(deny-by-default), minted by the worker per execution.POST /services/:service/*, grant-gated and ledger-counted.Example registry entry:
Pros: no LibreChat code; adding a new file-API integration becomes config-only after the first PR; reuses grants/ledger/object-scope; sandbox keeps
clone_newnet: true; secret stays out of the sandbox.Cons / open points: the gateway is the component that runs in front of untrusted code, so this adds the one capability the architecture deliberately denies (external egress). Needs a strict static allowlist (SSRF), no redirect following, response-header hygiene (avoid reflecting the injected secret), and care around body-size/DoS (by-reference avoids large-body streaming through the proxy; a streaming/passthrough variant for large bodies would reopen the proxy's DoS model). Secrets in gateway config is a tradeoff vs. isolating them in a separate adapter service.
Option 2 — Accept the LibreChat custom-tool path
This path already works (like for Image Tools) but has much more friction.
The file handling on the Image Tools has a lot of less standardized plumbing. At the same time it's also quite complex, enabling multi upload and image editing etc.
The DeepL Example might be easier but we should solve this structurally. Otherwise any use-case involving files will hit a wall inside librechat.
Questions
Happy to draft a detailed design (gateway forwarder +
allowed_servicesclaim + by-reference broker protocol + threat model) with DeepL as the first adapter, once there's a steer on direction.