Skip to content

Cluster related stories pre-AI and propagate grouped IDs through briefing flow#67

Draft
Copilot wants to merge 4 commits into
mainfrom
copilot/feat-cluster-related-stories
Draft

Cluster related stories pre-AI and propagate grouped IDs through briefing flow#67
Copilot wants to merge 4 commits into
mainfrom
copilot/feat-cluster-related-stories

Conversation

Copilot AI commented May 8, 2026

Copy link
Copy Markdown
Contributor

This change adds deterministic pre-AI story clustering so multiple sources covering the same incident can be synthesized into one briefing item, while same-topic-but-different stories remain separate. It preserves deterministic ranking/diversity behavior by grouping before scoring and carrying grouped identity through downstream payloads.

  • Deterministic pre-AI clustering

    • Added cluster_related_stories() in src/wazzup/feeds.py.
    • Clustering uses a combination of:
      • normalized title match,
      • keyword overlap (with stopword filtering and overlap threshold),
      • anchor signals (e.g. CVE/APT/KB/numeric/long tokens),
      • canonical URL path token overlap,
      • publication-time proximity window.
    • Group winners are selected via existing item_priority, with clustered members stored in related_items.
  • Pipeline integration

    • Updated src/wazzup/pipeline.py to run clustering on windowed items before score_items(...).
    • This keeps AI inputs story-grouped while retaining existing ordering/selection semantics.
  • Grouped identity propagation

    • Updated src/wazzup/scoring.py duplicate_group_id derivation to hash sorted grouped item IDs (item + related_items) for deterministic group tracking across scoring/prompt/briefing flows.
  • Fixtures + coverage for clustering boundaries

    • Added tests/fixtures/story-clustering.xml with duplicate, near-duplicate, and same-topic-different-story examples.
    • Extended tests to validate:
      • duplicate collapsing,
      • near-duplicate clustering,
      • non-clustering of distinct stories sharing topic keywords (tests/test_feeds.py),
      • grouped output behavior in generated briefing/articles (tests/test_pipeline.py),
      • prompt payload includes grouped relatedItems context (tests/test_ai.py).
# src/wazzup/pipeline.py
window_items = filter_items_to_window(items, content_window_start, content_window_end)
window_items = cluster_related_stories(window_items)
scored = score_items(window_items, sources, app_config, now)

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • feeds.feedburner.com
    • Triggering command: /usr/bin/python3 python3 -m unittest discover -s tests (dns block)
    • Triggering command: /usr/bin/python3 python3 -m unittest tests.test_feeds tests.test_ai tests.test_pipeline (dns block)
    • Triggering command: /usr/bin/python3 python3 -m wazzup.pipeline --fixture-dir tests/fixtures --force-briefing auto (dns block)
  • github.blog
    • Triggering command: /usr/bin/python3 python3 -m unittest discover -s tests (dns block)
    • Triggering command: /usr/bin/python3 python3 -m unittest tests.test_feeds tests.test_ai tests.test_pipeline (dns block)
    • Triggering command: /usr/bin/python3 python3 -m wazzup.pipeline --fixture-dir tests/fixtures --force-briefing auto (dns block)
  • googleonlinesecurity.blogspot.com
    • Triggering command: /usr/bin/python3 python3 -m unittest discover -s tests (dns block)
    • Triggering command: /usr/bin/python3 python3 -m unittest tests.test_feeds tests.test_ai tests.test_pipeline (dns block)
    • Triggering command: /usr/bin/python3 python3 -m wazzup.pipeline --fixture-dir tests/fixtures --force-briefing auto (dns block)
  • info.linuxserver.io
    • Triggering command: /usr/bin/python3 python3 -m unittest discover -s tests (dns block)
    • Triggering command: /usr/bin/python3 python3 -m unittest tests.test_feeds tests.test_ai tests.test_pipeline (dns block)
    • Triggering command: /usr/bin/python3 python3 -m wazzup.pipeline --fixture-dir tests/fixtures --force-briefing auto (dns block)
  • krebsonsecurity.com
    • Triggering command: /usr/bin/python3 python3 -m unittest discover -s tests (dns block)
    • Triggering command: /usr/bin/python3 python3 -m unittest tests.test_feeds tests.test_ai tests.test_pipeline (dns block)
    • Triggering command: /usr/bin/python3 python3 -m wazzup.pipeline --fixture-dir tests/fixtures --force-briefing auto (dns block)
  • rss.nytimes.com
    • Triggering command: /usr/bin/python3 python3 -m unittest discover -s tests (dns block)
    • Triggering command: /usr/bin/python3 python3 -m unittest tests.test_feeds tests.test_ai tests.test_pipeline (dns block)
    • Triggering command: /usr/bin/python3 python3 -m wazzup.pipeline --fixture-dir tests/fixtures --force-briefing auto (dns block)
  • thalpius.com
    • Triggering command: /usr/bin/python3 python3 -m unittest discover -s tests (dns block)
    • Triggering command: /usr/bin/python3 python3 -m unittest tests.test_feeds tests.test_ai tests.test_pipeline (dns block)
    • Triggering command: /usr/bin/python3 python3 -m wazzup.pipeline --fixture-dir tests/fixtures --force-briefing auto (dns block)
  • thehackernews.com
    • Triggering command: /usr/bin/python3 python3 -m unittest discover -s tests (dns block)
    • Triggering command: /usr/bin/python3 python3 -m unittest tests.test_feeds tests.test_ai tests.test_pipeline (dns block)
    • Triggering command: /usr/bin/python3 python3 -m wazzup.pipeline --fixture-dir tests/fixtures --force-briefing auto (dns block)
  • therecord.media
    • Triggering command: /usr/bin/python3 python3 -m unittest discover -s tests (dns block)
    • Triggering command: /usr/bin/python3 python3 -m unittest tests.test_feeds tests.test_ai tests.test_pipeline (dns block)
    • Triggering command: /usr/bin/python3 python3 -m wazzup.pipeline --fixture-dir tests/fixtures --force-briefing auto (dns block)
  • www.apple.com
    • Triggering command: /usr/bin/python3 python3 -m unittest discover -s tests (dns block)
    • Triggering command: /usr/bin/python3 python3 -m unittest tests.test_feeds tests.test_ai tests.test_pipeline (dns block)
    • Triggering command: /usr/bin/python3 python3 -m wazzup.pipeline --fixture-dir tests/fixtures --force-briefing auto (dns block)
  • www.autosport.com
    • Triggering command: /usr/bin/python3 python3 -m unittest discover -s tests (dns block)
    • Triggering command: /usr/bin/python3 python3 -m unittest tests.test_feeds tests.test_ai tests.test_pipeline (dns block)
    • Triggering command: /usr/bin/python3 python3 -m wazzup.pipeline --fixture-dir tests/fixtures --force-briefing auto (dns block)
  • www.bleepingcomputer.com
    • Triggering command: /usr/bin/python3 python3 -m unittest discover -s tests (dns block)
    • Triggering command: /usr/bin/python3 python3 -m unittest tests.test_feeds tests.test_ai tests.test_pipeline (dns block)
    • Triggering command: /usr/bin/python3 python3 -m wazzup.pipeline --fixture-dir tests/fixtures --force-briefing auto (dns block)
  • www.cisecurity.org
    • Triggering command: /usr/bin/python3 python3 -m unittest discover -s tests (dns block)
    • Triggering command: /usr/bin/python3 python3 -m unittest tests.test_feeds tests.test_ai tests.test_pipeline (dns block)
    • Triggering command: /usr/bin/python3 python3 -m wazzup.pipeline --fixture-dir tests/fixtures --force-briefing auto (dns block)
  • www.economist.com
    • Triggering command: /usr/bin/python3 python3 -m unittest discover -s tests (dns block)
    • Triggering command: /usr/bin/python3 python3 -m unittest tests.test_feeds tests.test_ai tests.test_pipeline (dns block)
    • Triggering command: /usr/bin/python3 python3 -m wazzup.pipeline --fixture-dir tests/fixtures --force-briefing auto (dns block)
  • www.engadget.com
    • Triggering command: /usr/bin/python3 python3 -m unittest discover -s tests (dns block)
    • Triggering command: /usr/bin/python3 python3 -m unittest tests.test_feeds tests.test_ai tests.test_pipeline (dns block)
    • Triggering command: /usr/bin/python3 python3 -m wazzup.pipeline --fixture-dir tests/fixtures --force-briefing auto (dns block)
  • www.espn.com
    • Triggering command: /usr/bin/python3 python3 -m unittest discover -s tests (dns block)
    • Triggering command: /usr/bin/python3 python3 -m unittest tests.test_feeds tests.test_ai tests.test_pipeline (dns block)
    • Triggering command: /usr/bin/python3 python3 -m wazzup.pipeline --fixture-dir tests/fixtures --force-briefing auto (dns block)
  • www.formula1.com
    • Triggering command: /usr/bin/python3 python3 -m unittest discover -s tests (dns block)
    • Triggering command: /usr/bin/python3 python3 -m unittest tests.test_feeds tests.test_ai tests.test_pipeline (dns block)
    • Triggering command: /usr/bin/python3 python3 -m wazzup.pipeline --fixture-dir tests/fixtures --force-briefing auto (dns block)
  • www.ft.com
    • Triggering command: /usr/bin/python3 python3 -m unittest discover -s tests (dns block)
    • Triggering command: /usr/bin/python3 python3 -m unittest tests.test_feeds tests.test_ai tests.test_pipeline (dns block)
    • Triggering command: /usr/bin/python3 python3 -m wazzup.pipeline --fixture-dir tests/fixtures --force-briefing auto (dns block)
  • www.motorsport.com
    • Triggering command: /usr/bin/python3 python3 -m unittest discover -s tests (dns block)
    • Triggering command: /usr/bin/python3 python3 -m unittest tests.test_feeds tests.test_ai tests.test_pipeline (dns block)
    • Triggering command: /usr/bin/python3 python3 -m wazzup.pipeline --fixture-dir tests/fixtures --force-briefing auto (dns block)
  • www.nba.com
    • Triggering command: /usr/bin/python3 python3 -m unittest discover -s tests (dns block)
    • Triggering command: /usr/bin/python3 python3 -m unittest tests.test_feeds tests.test_ai tests.test_pipeline (dns block)
    • Triggering command: /usr/bin/python3 python3 -m wazzup.pipeline --fixture-dir tests/fixtures --force-briefing auto (dns block)
  • www.nist.gov
    • Triggering command: /usr/bin/python3 python3 -m unittest discover -s tests (dns block)
    • Triggering command: /usr/bin/python3 python3 -m unittest tests.test_feeds tests.test_ai tests.test_pipeline (dns block)
    • Triggering command: /usr/bin/python3 python3 -m wazzup.pipeline --fixture-dir tests/fixtures --force-briefing auto (dns block)
  • www.racefans.net
    • Triggering command: /usr/bin/python3 python3 -m unittest discover -s tests (dns block)
    • Triggering command: /usr/bin/python3 python3 -m unittest tests.test_feeds tests.test_ai tests.test_pipeline (dns block)
    • Triggering command: /usr/bin/python3 python3 -m wazzup.pipeline --fixture-dir tests/fixtures --force-briefing auto (dns block)
  • www.zandvoortsecourant.nl
    • Triggering command: /usr/bin/python3 python3 -m unittest discover -s tests (dns block)
    • Triggering command: /usr/bin/python3 python3 -m unittest tests.test_feeds tests.test_ai tests.test_pipeline (dns block)
    • Triggering command: /usr/bin/python3 python3 -m wazzup.pipeline --fixture-dir tests/fixtures --force-briefing auto (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Copilot AI and others added 3 commits May 8, 2026 18:27
Copilot AI changed the title [WIP] Add clustering for related stories before AI briefing generation Cluster related stories pre-AI and propagate grouped IDs through briefing flow May 8, 2026
Copilot AI requested a review from DevSecNinja May 8, 2026 18:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: cluster related stories before AI briefing generation

2 participants