Skip to content

fix(security): SSRF sub-resource + fail-closed, webhook 500, trusted rate-limit IP#2

Merged
vedantggwp merged 1 commit into
masterfrom
cc/security-hardening-followups
Jun 11, 2026
Merged

fix(security): SSRF sub-resource + fail-closed, webhook 500, trusted rate-limit IP#2
vedantggwp merged 1 commit into
masterfrom
cc/security-hardening-followups

Conversation

@vedantggwp

Copy link
Copy Markdown
Owner

Summary

Post-merge security follow-ups from the PR #1 review (3 independent sub-agents + orchestrator verification). All fixes verified locally.

scan-service SSRF — scanner.ts → new request-guard.ts

  • Validate every http(s) request (main nav, redirect hops, AND sub-resources) via DNS-resolving checkHostSafety — closes the hostname sub-resource SSRF hole (only literal IPs were blocked before).
  • Fail closed: guard errors now abort() instead of continue().
  • Pass through data:/blob:; per-scan host cache.
  • Extracted to its own module; 7 offline unit tests added.

webhook idempotency — app/app/api/webhook/route.ts

  • Return 500 on non-23505 insert failures so Stripe retries. A 200 silently dropped a paid report. The pre-insert lookup + 23505 guard make retries safe.

rate-limit key — app/lib/client-ip.ts + 6 routes

  • Derive client IP from x-real-ip / last XFF hop, not the spoofable left-most x-forwarded-for token.

Test plan

  • scan-service: tsc clean, 59/59 vitest (7 new request-guard tests)
  • app: tsc --noEmit clean
  • Integration: live scan against a redirect/sub-resource target

Deliberately deferred (need E2E I can't run blind)

  • DNS-rebind IP-pinning — resolve-then-connect TOCTOU; needs --host-resolver-rules/CDP peer-IP + a rebind harness.
  • Coupon max_uses atomicity — atomic RPC + caller rejection; touches the Stripe path.
  • report-status dropping email — UI-coupled.

…rate-limit IP

Post-merge follow-ups from the PR #1 review (3 sub-agents + verification).

scan-service SSRF (scanner.ts -> request-guard.ts):
- Validate EVERY http(s) request -- main nav, redirect hops, AND sub-resources
  -- via DNS-resolving checkHostSafety, not just literal-IP hosts. Closes the
  hostname sub-resource SSRF hole (e.g. an <img src> to a name resolving to a
  private/metadata IP).
- Fail CLOSED: guard errors now abort the request (was req.continue()).
- Pass through non-network schemes (data:/blob:). Per-scan host cache.
- Extracted to request-guard.ts; 7 offline unit tests.

webhook (route.ts):
- Return 500 on non-23505 insert failures so Stripe retries; a 200 silently
  dropped a paid report. Pre-insert lookup + 23505 guard make retries safe.

rate-limit (client-ip.ts + all 6 routes):
- Derive client IP from x-real-ip / last XFF hop, not the spoofable left-most
  x-forwarded-for token.

Verified: scan-service tsc clean + 59/59 tests; app tsc --noEmit clean.
DNS-rebind IP-pinning + coupon max_uses atomicity tracked as follow-ups.
@vedantggwp vedantggwp merged commit f606612 into master Jun 11, 2026
6 checks passed
@devin-ai-integration

Copy link
Copy Markdown

Code Review — PR #2 (Security Hardening)

Overall: Significant security improvement. The SSRF guard extraction, fail-closed policy, and trusted IP derivation are all well-executed. This is the most impactful PR in the set. Detailed findings below:


🔴 Bugs / Security Issues

1. DNS-rebind TOCTOU is still exploitable (acknowledged but worth emphasizing)

request-guard.ts resolves hostnames via checkHostSafety and caches the verdict, but the actual HTTP connection is made separately by Chromium. Between the DNS resolve in the guard and the TCP connect in Chromium, a DNS-rebind attack can switch the A record from a public IP to 169.254.169.254. The per-host cache actually makes this worse — once a host is marked safe, all subsequent requests to it skip re-validation entirely, so a rebind after the first request is invisible.

The PR description correctly defers this ("needs --host-resolver-rules/CDP peer-IP + a rebind harness"), but the cache design amplifies the risk. Consider:

  • Adding a TTL to the cache (e.g. 5-10 seconds) so cached verdicts expire
  • Or using Puppeteer's CDP Fetch.requestPaused + Network.getResponseBody to check the actual connected IP

2. getClientIp trusts x-real-ip unconditionally (client-ip.ts:15-16)

const realIp = req.headers.get("x-real-ip")?.trim();
if (realIp) return realIp;

x-real-ip is only trustworthy if the upstream proxy (Vercel's edge in this case) sets it and strips any client-supplied value. On Vercel, the reliable header is actually x-forwarded-for (Vercel appends the real IP as the rightmost hop). If a client sends a spoofed x-real-ip: 1.2.3.4 header and the proxy doesn't strip it, the rate limit is trivially bypassed.

Recommendation: Verify Vercel's behavior. If Vercel doesn't set/overwrite x-real-ip, remove the x-real-ip preference and rely only on the rightmost XFF hop — or use Vercel's guaranteed x-vercel-forwarded-for header.

3. Rate-limit key collision on /api/scan

(Also noted in PR #4 review) — app/api/scan/route.ts:11 uses checkRateLimit(clientIp, 5, 60_000) with no prefix, while every other route uses prefix:${clientIp}. If any future route accidentally also uses a bare clientIp key, the counters will collide. This should be scan:${clientIp}.


🟡 Edge Cases

4. guardRequest race condition with Puppeteer

The page.on('request') handler calls void guardRequest(req, safeHosts) (scanner.ts:80). Because guardRequest is async and void drops the promise, multiple requests can fire concurrently. If two requests to the same never-seen host arrive simultaneously:

  • Both see safeHosts.get(host) === undefined
  • Both call checkHostSafety(host) — duplicated DNS lookups
  • Both set safeHosts.set(host, ...) — the second write wins

This is functionally correct (both will get the same verdict) but wastes DNS lookups. For correctness it's fine; for performance under load, consider storing the pending Promise in the map so concurrent callers await the same resolution.

5. checkHostSafety treats DNS resolution failure as "invalid" but doesn't distinguish transient errors

url-validator.ts:199-201:

if (resolved.length === 0) {
  return { valid: false, reason: 'Could not resolve hostname' };
}

A temporary DNS failure (e.g. resolver timeout) will block a legitimate site. Combined with the per-host cache in guardRequest, a transient DNS failure caches false for the entire scan — every subsequent sub-resource from that host is silently aborted. This is fail-closed (correct from a security perspective), but it could cause confusing scan failures. Consider not caching negative verdicts caused by DNS resolution failure (vs. explicitly private resolution).

6. hops.at(-1) ?? "unknown" is redundant (client-ip.ts:24)

The filter((h) => h.length > 0) on line 23 guarantees hops contains only non-empty strings, and the if (hops.length > 0) on line 24 guarantees at least one element. hops.at(-1) will never be undefined here, so the ?? "unknown" fallback is dead code. Not a bug, but it suggests the author wasn't fully confident in the type narrowing — worth simplifying for clarity.


🟢 Improvements

7. Tests don't cover the blob: scheme

request-guard.test.ts tests data: passthrough but not blob: or about: — both of which are explicitly called out in the code comment (line 22). Adding a blob: test would close the coverage gap.

8. Tests don't mock checkHostSafety

The tests for guardRequest call the real checkHostSafety (which calls real DNS). This works for literal IPs (no DNS needed) and localhost (caught statically), but means:

  • The allows a public literal IP test (line 33-37) is actually hitting the real IP check, not a mock
  • There's no test for "hostname resolves to private IP" — the key SSRF scenario this PR fixes

Consider adding a test with a mocked checkHostSafety that returns { valid: false } for a hostname, verifying the request is aborted.

9. Webhook idempotency — the 500 return is correct but the error response body leaks internals

return NextResponse.json(
  { error: "Failed to persist report" },
  { status: 500 },
);

This is fine — the error message is generic. Good practice.

10. The report-status route uses Request instead of NextRequest (route.ts line 22)

export async function POST(req: Request) {
  const clientIp = getClientIp(req);

getClientIp accepts Request, so this works. But all other routes use NextRequest for consistency. Using Request means you lose access to NextRequest-specific methods (like req.nextUrl) if needed later. Minor but worth standardizing.


Summary

The security posture is significantly improved by this PR. The SSRF sub-resource hole and the fail-open default were both real vulnerabilities, and the fix is well-structured. The main remaining concern is the x-real-ip trust assumption — verify Vercel's header behavior to ensure rate-limiting can't be trivially bypassed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant