Skip to content

feat(curl): config-driven URL allowlist to bypass schema-mode rewrite#1677

Open
globalsecurepayments wants to merge 5 commits intortk-ai:developfrom
globalsecurepayments:fleet-url-bypass-cr-783
Open

feat(curl): config-driven URL allowlist to bypass schema-mode rewrite#1677
globalsecurepayments wants to merge 5 commits intortk-ai:developfrom
globalsecurepayments:fleet-url-bypass-cr-783

Conversation

@globalsecurepayments
Copy link
Copy Markdown

Problem

rtk curl rewrites every curl URL to rtk curl URL and pipes the response
through rtk json --schema, which produces field-type literals (field: int,
field: string) and a (N) array-length suffix. That's a token-savings win for
human-facing exploration of arbitrary third-party APIs, but it actively breaks
downstream JSON parsing for private or internal APIs whose responses are
consumed by parsers (jq, python json.load, agents that round-trip JSON,
anything that needs the literal payload).

Today the only escape hatch is [hooks] exclude_commands = ["curl"], which
disables curl rewriting entirely — losing the token savings on every other
curl invocation in the same session.

Proposal

Add a narrow opt-in mechanism: a substring allowlist that bypasses the rewrite
on a per-URL basis. Default empty, so users who don't configure anything see
identical behavior to today.

[curl]
bypass_url_markers = [
  "localhost:8080/api/",
  "//internal.example.com/v1/",
]

Each marker is a substring match against the full command segment (after
env-prefix stripping, e.g. PROXY=… curl …), so it composes cleanly with
curl's flag positioning (-X POST, -H, -d, etc). Bypass is per-segment:
in a && curl <internal> | jq, the curl segment passes through unchanged
while a and the pipe target rewrite as normal.

Mirrors the existing #196 bypass shape for gh --json/--jq/--template (detect
a structured-output consumer, skip schema-mode filtering).

API surface

  • New crate::core::config::CurlConfig { bypass_url_markers: Vec<String> } in
    src/core/config.rs. Default empty.
  • New crate::discover::registry::RewriteOptions { curl_bypass_url_markers }.
  • New rewrite_command_with_options(cmd, excluded, &opts) for callers that
    want explicit control (tests, callers that already have config in hand).
  • Existing rewrite_command(cmd, excluded) is unchanged for callers — it now
    reads Config.curl.bypass_url_markers from disk internally and forwards.

Tests

  • 4 new config tests: default-empty, [curl] roundtrip, missing-section
    tolerance.
  • 9 new registry tests: localhost / loopback / hostname marker bypass; POST
    with headers + payload; multi-marker OR; default-empty preserves rewrite;
    unmatched URL still rewrites; port-specific narrowness; per-segment behavior
    in compound commands.
  • Full suite: cargo test1699 passed, 0 failed, 6 ignored on
    aarch64-apple-darwin against master @ v0.38.0.

Real-world motivation

Hit while running rtk against an internal API server: agents reading
curl … | jq got field: string instead of the actual JSON values, leading
to three misfiled "the API is broken" reports before realising the schema
output was the rewrite, not the upstream response. The narrow allowlist
preserves token savings for the 95% case while letting users opt private
endpoints out cleanly.

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented May 2, 2026

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 3 committers have signed the CLA.

✅ globalsecurepayments
❌ github-actions[bot]
❌ rtk-release-bot[bot]
You have signed the CLA already but the status is still pending? Let us recheck it.

@pszymkowiak pszymkowiak added effort-medium 1-2 jours, quelques fichiers enhancement New feature or request labels May 2, 2026
@pszymkowiak
Copy link
Copy Markdown
Collaborator

[w] wshm · Automated triage by AI

📊 Automated PR Analysis

Type feature
🟢 Risk low

Summary

Adds a config-driven URL substring allowlist ([curl] bypass_url_markers) that lets users opt specific curl invocations out of RTK's automatic rtk curl ... | rtk json --schema rewrite. This preserves raw JSON output for private/internal APIs consumed by downstream parsers, while keeping the default schema-mode rewrite for all other curl calls.

Review Checklist

  • Tests present
  • Breaking change
  • Docs updated

Linked issues: #196


Analyzed automatically by wshm · This is an automated analysis, not a human review.

github-actions Bot and others added 5 commits May 2, 2026 15:57
`rtk curl` rewrites every `curl URL` to `rtk curl URL` and pipes the
response through `rtk json --schema`, which produces field-type literals
(`field: int`) and a `(N)` array-length suffix. That's a token-savings
win for human-facing exploration of arbitrary third-party APIs, but it
ACTIVELY BREAKS downstream JSON parsing (jq, `python json.load`,
agent-side filtering, anything that round-trips JSON) for private or
internal APIs whose responses are consumed by parsers rather than read
by humans.

Today the only escape hatch is `[hooks] exclude_commands = ["curl"]`,
which disables curl rewriting entirely — losing the token savings on the
3rd-party APIs that motivated the rewrite in the first place.

This change adds a narrow opt-in mechanism: a substring allowlist that
opts specific URLs out of the rewrite while leaving everything else
rewriting as before.

  [curl]
  bypass_url_markers = [
    "localhost:8080/api/",
    "//internal.example.com/v1/",
  ]

Each marker is matched as a substring against the full command segment
(after env-prefix stripping), so it composes cleanly with curl's flag
positioning (`-X POST`, `-H`, `-d`, etc). Bypass is per-segment for
compound commands: `a && curl <internal> | jq` bypasses the curl segment
while `a` and the pipe target still rewrite as normal.

Default `bypass_url_markers = []` preserves historical behavior — users
who don't configure anything see no change.

API surface
-----------
* New `crate::core::config::CurlConfig { bypass_url_markers: Vec<String> }`
* New `crate::discover::registry::RewriteOptions { curl_bypass_url_markers }`
* New `rewrite_command_with_options(cmd, excluded, &opts)` — takes
  options explicitly, useful for tests and callers that already have
  config in hand.
* Existing `rewrite_command(cmd, excluded)` is unchanged for callers but
  now reads `Config.curl.bypass_url_markers` internally and forwards.

Mirrors the existing rtk-ai#196 bypass shape for `gh --json/--jq/--template`:
detect a structured-output consumer and skip schema-mode filtering.

Tests
-----
* 4 new config tests (default empty / [curl] roundtrip / missing
  section).
* 9 new registry tests covering: localhost / loopback / hostname
  marker bypass; POST with headers + payload; multi-marker OR;
  default-empty preserves rewrite; unmatched URL still rewrites;
  port-specific narrowness; per-segment behavior in compound commands.
* Full suite: 1699 passed, 0 failed, 6 ignored.
@globalsecurepayments globalsecurepayments force-pushed the fleet-url-bypass-cr-783 branch from 7aa6224 to e63665a Compare May 2, 2026 14:59
@globalsecurepayments globalsecurepayments changed the base branch from master to develop May 2, 2026 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

effort-medium 1-2 jours, quelques fichiers enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants