Skip to content

Security: fix SSRF in POST /api/imports — validate VideoUrl against YouTube allowlist#178

Open
davidortinau wants to merge 4 commits into
mainfrom
op-fix-ssrf-video-url-validation
Open

Security: fix SSRF in POST /api/imports — validate VideoUrl against YouTube allowlist#178
davidortinau wants to merge 4 commits into
mainfrom
op-fix-ssrf-video-url-validation

Conversation

@davidortinau
Copy link
Copy Markdown
Owner

What

An authenticated user could submit any URL to POST /api/imports and the server would make an outbound HTTP request to it before performing any host or scheme validation. This includes Azure IMDS (http://169.254.169.254/metadata/instance), RFC-1918 addresses, and any other internal network resource — potentially exposing managed-identity credentials or internal services.

Fix

Added a YouTube allowlist guard in StartImport (before the VideoImport record is created or the pipeline is started). Any VideoUrl that does not satisfy all three conditions is rejected immediately with 400 Bad Request:

  • Parses as an absolute URI
  • Scheme is https
  • Host is exactly www.youtube.com, youtube.com, or youtu.be

The fix is API-boundary-only — no changes to VideoImportPipelineService or downstream services were needed.

Changes

File Change
src/SentenceStudio.Api/ImportEndpoints.cs 6-line allowlist guard after the null/whitespace check
tests/SentenceStudio.Api.Tests/ImportEndpointsSsrfTests.cs New test class (13 cases)
tests/SentenceStudio.Api.Tests/Infrastructure/TestJwtGenerator.cs Optional userProfileId param added to GenerateToken

Tests

ImportEndpointsSsrfTests covers:

Rejection (400) cases:

  • Azure IMDS: http://169.254.169.254/metadata/instance, http://169.254.169.254/
  • RFC-1918: http://10.0.0.1/internal, http://192.168.1.1/admin
  • Loopback: http://localhost/evil
  • Valid YouTube host, wrong scheme: http://www.youtube.com/...
  • Non-YouTube HTTPS host: https://evil.com/...
  • Wrong scheme altogether: ftp://www.youtube.com/...
  • Unparseable input: not-a-url-at-all
  • Empty string

Acceptance (non-400) cases:

  • https://www.youtube.com/watch?v=...
  • https://youtube.com/watch?v=...
  • https://youtu.be/...

Verification

  • Build passes
  • 120 unit/classification tests pass
  • 32 integration tests fail with a pre-existing IChatClient/AiService DI error that exists on main before this branch — not caused by this change

davidortinau and others added 4 commits April 24, 2026 10:57
- Orchestration log: DX24 vocab crash fix workflow (commit c9b1d0a)
- Session log: ExposureCount NULL→0 backfill, verified on device
- Decision merge: inbox → decisions.md (3 entries: lexical patch, validation strategy, directive)
- Deleted: inbox files + jayne-locale-screenshots (deduped)
- Wash history: append CRITICAL RULE for SQLite migration defense-in-depth (DEFAULT + idempotent backfill)
- No archival needed (no entries >30d old in decisions.md)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Orchestrated multi-agent session for new generic data import feature architecture:

- Zoe (Lead): Full MVP architecture covering UX flows, content detection (heuristic-first,
  AI fallback), data model (no new tables), parsing pipeline, AI integration, service layer,
  scope phasing. Placement revision: separate `/import-content` page (not tab on existing
  `/import`), per Captain directive.

- Wash (Data Layer Scout): Surveyed existing YouTube pipeline pattern, found file import
  UI/service gap, confirmed no schema changes needed, identified dedup inconsistency for
  standardization.

- River (AI Strategy): Designed 5 AI prompt tasks with confidence thresholds, heuristic
  fallbacks, structured DTOs via `SendPrompt<T>`, permissive grading principles.

- Kaylee (UI Scout): Documented form patterns, file picker abstractions, resource lookup
  UI, navigation placement. Recommendations for ContentImportService, InputFile, preview
  table reuse.

- Copilot Directive: Confirmed scope separation — Video Subscriptions (existing) separate
  from generic content import (new) to keep features focused and growth paths independent.

Artifacts:
- 5 orchestration logs (one per agent) in .squad/orchestration-log/
- Session log in .squad/log/2026-04-24T22:31-data-import-architecture-plan.md
- 6 decisions merged into .squad/decisions.md (168KB total)
- All agent histories updated with import-plan context

Next: Implementation team begins ContentImportService + ImportContent.razor page. River
engineers 3 prompt templates (Format Inference, Content Classification, Extraction).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Captain resolved 7 open questions from architecture proposal:

1. Dedup default: Skip and reuse existing (safest, matches YouTube pipeline)
2. Source attribution: Deferred to v2
3. Single-column imports: Use AI to translate on import (new mvp-single-column-translate work item)
4. Mobile clipboard paste: Not needed for MVP
5. Replace-all mode: Deferred to v2
6. Renamed YouTube page: 'Media Import' at /media-import (not 'Video Subscriptions')
7. Separate-page placement: Confirmed — /import-content independent from /media-import

Updated decisions.md with final rulings appended and prior superseded mentions annotated.
All prior architecture content preserved; only updated with Captain's decisions.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…n POST /api/imports

Add a YouTube URL allowlist guard in StartImport before the import record
is created or the pipeline is started. Any URL that is not an https:// URL
on www.youtube.com, youtube.com, or youtu.be is rejected with 400 Bad Request.

- ImportEndpoints.cs: insert 6-line guard after the null check (lines 67-72)
- TestJwtGenerator.cs: add optional userProfileId parameter so integration
  tests can supply the claim required by the /api/imports endpoint
- ImportEndpointsSsrfTests.cs: 13 new tests covering Azure IMDS, RFC-1918
  addresses, loopback, non-https YouTube, non-YouTube HTTPS hosts, unparseable
  input, and three valid YouTube URLs that must pass the guard

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant