feat(curl): config-driven URL allowlist to bypass schema-mode rewrite#1677
feat(curl): config-driven URL allowlist to bypass schema-mode rewrite#1677globalsecurepayments wants to merge 5 commits intortk-ai:developfrom
Conversation
|
|
📊 Automated PR Analysis
SummaryAdds a config-driven URL substring allowlist ( Review Checklist
Linked issues: #196 Analyzed automatically by wshm · This is an automated analysis, not a human review. |
`rtk curl` rewrites every `curl URL` to `rtk curl URL` and pipes the
response through `rtk json --schema`, which produces field-type literals
(`field: int`) and a `(N)` array-length suffix. That's a token-savings
win for human-facing exploration of arbitrary third-party APIs, but it
ACTIVELY BREAKS downstream JSON parsing (jq, `python json.load`,
agent-side filtering, anything that round-trips JSON) for private or
internal APIs whose responses are consumed by parsers rather than read
by humans.
Today the only escape hatch is `[hooks] exclude_commands = ["curl"]`,
which disables curl rewriting entirely — losing the token savings on the
3rd-party APIs that motivated the rewrite in the first place.
This change adds a narrow opt-in mechanism: a substring allowlist that
opts specific URLs out of the rewrite while leaving everything else
rewriting as before.
[curl]
bypass_url_markers = [
"localhost:8080/api/",
"//internal.example.com/v1/",
]
Each marker is matched as a substring against the full command segment
(after env-prefix stripping), so it composes cleanly with curl's flag
positioning (`-X POST`, `-H`, `-d`, etc). Bypass is per-segment for
compound commands: `a && curl <internal> | jq` bypasses the curl segment
while `a` and the pipe target still rewrite as normal.
Default `bypass_url_markers = []` preserves historical behavior — users
who don't configure anything see no change.
API surface
-----------
* New `crate::core::config::CurlConfig { bypass_url_markers: Vec<String> }`
* New `crate::discover::registry::RewriteOptions { curl_bypass_url_markers }`
* New `rewrite_command_with_options(cmd, excluded, &opts)` — takes
options explicitly, useful for tests and callers that already have
config in hand.
* Existing `rewrite_command(cmd, excluded)` is unchanged for callers but
now reads `Config.curl.bypass_url_markers` internally and forwards.
Mirrors the existing rtk-ai#196 bypass shape for `gh --json/--jq/--template`:
detect a structured-output consumer and skip schema-mode filtering.
Tests
-----
* 4 new config tests (default empty / [curl] roundtrip / missing
section).
* 9 new registry tests covering: localhost / loopback / hostname
marker bypass; POST with headers + payload; multi-marker OR;
default-empty preserves rewrite; unmatched URL still rewrites;
port-specific narrowness; per-segment behavior in compound commands.
* Full suite: 1699 passed, 0 failed, 6 ignored.
7aa6224 to
e63665a
Compare
Problem
rtk curlrewrites everycurl URLtortk curl URLand pipes the responsethrough
rtk json --schema, which produces field-type literals (field: int,field: string) and a(N)array-length suffix. That's a token-savings win forhuman-facing exploration of arbitrary third-party APIs, but it actively breaks
downstream JSON parsing for private or internal APIs whose responses are
consumed by parsers (jq,
python json.load, agents that round-trip JSON,anything that needs the literal payload).
Today the only escape hatch is
[hooks] exclude_commands = ["curl"], whichdisables curl rewriting entirely — losing the token savings on every other
curlinvocation in the same session.Proposal
Add a narrow opt-in mechanism: a substring allowlist that bypasses the rewrite
on a per-URL basis. Default empty, so users who don't configure anything see
identical behavior to today.
Each marker is a substring match against the full command segment (after
env-prefix stripping, e.g.
PROXY=… curl …), so it composes cleanly withcurl's flag positioning (
-X POST,-H,-d, etc). Bypass is per-segment:in
a && curl <internal> | jq, the curl segment passes through unchangedwhile
aand the pipe target rewrite as normal.Mirrors the existing #196 bypass shape for
gh --json/--jq/--template(detecta structured-output consumer, skip schema-mode filtering).
API surface
crate::core::config::CurlConfig { bypass_url_markers: Vec<String> }insrc/core/config.rs. Default empty.crate::discover::registry::RewriteOptions { curl_bypass_url_markers }.rewrite_command_with_options(cmd, excluded, &opts)for callers thatwant explicit control (tests, callers that already have config in hand).
rewrite_command(cmd, excluded)is unchanged for callers — it nowreads
Config.curl.bypass_url_markersfrom disk internally and forwards.Tests
[curl]roundtrip, missing-sectiontolerance.
with headers + payload; multi-marker OR; default-empty preserves rewrite;
unmatched URL still rewrites; port-specific narrowness; per-segment behavior
in compound commands.
cargo test→ 1699 passed, 0 failed, 6 ignored onaarch64-apple-darwinagainstmaster @ v0.38.0.Real-world motivation
Hit while running
rtkagainst an internal API server: agents readingcurl … | jqgotfield: stringinstead of the actual JSON values, leadingto three misfiled "the API is broken" reports before realising the schema
output was the rewrite, not the upstream response. The narrow allowlist
preserves token savings for the 95% case while letting users opt private
endpoints out cleanly.