Skip to content

feat(rss): RSS feeds ingestion MVP — 4 curated threat-intel feeds#49

Merged
Setounkpe7 merged 18 commits into
devfrom
feat/rss-feeds-ingestion
Jun 19, 2026
Merged

feat(rss): RSS feeds ingestion MVP — 4 curated threat-intel feeds#49
Setounkpe7 merged 18 commits into
devfrom
feat/rss-feeds-ingestion

Conversation

@Setounkpe7

Copy link
Copy Markdown
Owner

RSS feeds ingestion (MVP)

Adds 4 curated threat-intel RSS/Atom feeds as new data sources, built as a generic config-driven collector. Decision driven by a multi-agent convergence (Threat Detection / Security / Backend / Product) and validated by the maintainer; executed via subagent-driven TDD with a task review after every task and a final whole-branch review.

Feeds (MVP)

source_name kind threat_type confidence
cisa_advisories rss advisory 80
cisa_ics rss advisory 80
cisco_psirt rss advisory 70
dfir_report rss report 50

URLs are plausible-but-unverified patterns (marked verify-before-enable); the loader skips invalid entries and the scheduler backs off dead feeds. Nothing polls in prod until dev → main deploy.

Mechanism (two paths)

  • CVE/GHSA-bearing item → fans out to enrich every matched existing threat (multi-source bonus, tags). No merge_candidate, no duplicate threat.
  • No-CVE item → becomes an idempotent advisory/report threat, deduped on (source_id, external_id).
  • RSS sources never set canonical CVSS/severity.

Security

  • defusedxml safety gate (XXE / entity-bomb) before feedparser.
  • SSRF guard: https-only, public IPs only (loopback/RFC1918/link-local/169.254.169.254/CGNAT 100.64/10/reserved/multicast blocked), follow_redirects=False, 5 MiB streamed size cap.
  • clean_text on all text fields; extracted IOCs quarantined (confidence < 100, unverified-ioc tag).

Tests & quality

  • 405 tests passing (+37 RSS tests); Log4Shell cross-source dedup canary green at every ingestion-touching step.
  • mypy --strict + ruff clean on src/. No Alembic migration (all columns/enums pre-existed).

Notes

  • The autonomous-path cross-tick TOCTOU is documented; the durable UNIQUE(source_id, external_id) fix needs a migration and is deferred.
  • The read-side persona default-exclusion (?include_advisories) + source-freshness is a separate Plan 2 (not in this PR).
  • Plan: docs/superpowers/plans/2026-06-19-rss-feeds-ingestion-plan.md; subsystem docs: docs/COLLECTORS.md.

🤖 Generated with Claude Code

Setounkpe7 and others added 18 commits June 19, 2026 14:26
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…llectedEvent

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Add _net.py SSRF guard (assert_fetchable_url) for outbound feed URLs
- Add rss.py: RSSCollector subclassing BaseCollector, config-driven per feed
- Streams feed bytes with 5 MiB cap, defusedxml-gated parsing via _feedparse
- Extracts CVEs, GHSAs, IOCs; tags unverified-ioc/actively-exploited/ransomware
- Sets enrichment_mode=True, threat_type, indicator_confidence on CollectedEvent
- ClassVar shadows suppressed with type: ignore[misc] per mypy --strict contract

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add _BLOCKED_NETWORKS + _is_blocked() helper to _net.py to catch
  RFC 6598 CGNAT (100.64.0.0/10) which ipaddress flags miss
- Add test_rejects_cgnat to test_net.py
- Document intentionally-unused `since` param in RSSCollector.fetch
- Add DNS-rebinding window NOTE comment near resolver call

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… CLI parity, test hardening

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nt (ruff src tests)

uv.lock + requirements.lock now include feedparser/sgmllib3k/defusedxml so the
prod image and pip-audit resolve them (fixes docker runtime smoke). Wrap 3 long
test lines and fix an import-sort flagged by 'ruff check src tests' in CI.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ruff format --check gated PR #49 (7 files). Whitespace-only; no behavior
change. ruff check + the affected test files remain green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@Setounkpe7 Setounkpe7 merged commit 3d731ff into dev Jun 19, 2026
11 checks passed
@Setounkpe7 Setounkpe7 deleted the feat/rss-feeds-ingestion branch June 19, 2026 22:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant