feat(rss): RSS feeds ingestion MVP — 4 curated threat-intel feeds#49
Merged
Conversation
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…llectedEvent Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Add _net.py SSRF guard (assert_fetchable_url) for outbound feed URLs - Add rss.py: RSSCollector subclassing BaseCollector, config-driven per feed - Streams feed bytes with 5 MiB cap, defusedxml-gated parsing via _feedparse - Extracts CVEs, GHSAs, IOCs; tags unverified-ioc/actively-exploited/ransomware - Sets enrichment_mode=True, threat_type, indicator_confidence on CollectedEvent - ClassVar shadows suppressed with type: ignore[misc] per mypy --strict contract Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add _BLOCKED_NETWORKS + _is_blocked() helper to _net.py to catch RFC 6598 CGNAT (100.64.0.0/10) which ipaddress flags miss - Add test_rejects_cgnat to test_net.py - Document intentionally-unused `since` param in RSSCollector.fetch - Add DNS-rebinding window NOTE comment near resolver call Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tag/summary union
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… CLI parity, test hardening Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nt (ruff src tests) uv.lock + requirements.lock now include feedparser/sgmllib3k/defusedxml so the prod image and pip-audit resolve them (fixes docker runtime smoke). Wrap 3 long test lines and fix an import-sort flagged by 'ruff check src tests' in CI. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ruff format --check gated PR #49 (7 files). Whitespace-only; no behavior change. ruff check + the affected test files remain green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
# Conflicts: # uv.lock
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
RSS feeds ingestion (MVP)
Adds 4 curated threat-intel RSS/Atom feeds as new data sources, built as a generic config-driven collector. Decision driven by a multi-agent convergence (Threat Detection / Security / Backend / Product) and validated by the maintainer; executed via subagent-driven TDD with a task review after every task and a final whole-branch review.
Feeds (MVP)
cisa_advisoriescisa_icscisco_psirtdfir_reportMechanism (two paths)
merge_candidate, no duplicate threat.advisory/reportthreat, deduped on(source_id, external_id).Security
defusedxmlsafety gate (XXE / entity-bomb) beforefeedparser.https-only, public IPs only (loopback/RFC1918/link-local/169.254.169.254/CGNAT 100.64/10/reserved/multicast blocked),follow_redirects=False, 5 MiB streamed size cap.clean_texton all text fields; extracted IOCs quarantined (confidence < 100,unverified-ioctag).Tests & quality
mypy --strict+ruffclean onsrc/. No Alembic migration (all columns/enums pre-existed).Notes
UNIQUE(source_id, external_id)fix needs a migration and is deferred.?include_advisories) + source-freshness is a separate Plan 2 (not in this PR).docs/superpowers/plans/2026-06-19-rss-feeds-ingestion-plan.md; subsystem docs:docs/COLLECTORS.md.🤖 Generated with Claude Code