Skip to content

feat: replace file-per-briefing tree with single paginated feed.json #70

@DevSecNinja

Description

@DevSecNinja

Problem

The current publisher writes a hierarchical file tree (briefings/YYYY/MM/DD/hourly-HH.yaml, articles/YYYY/MM/DD.yaml, manifest.yaml) with parallel JSON mirrors for every file. This means:

  • The PWA must fetch latest.json first, then a second request for the actual briefing JSON before it can render anything.
  • Retention logic requires recursive file scanning, date-parsing from path segments, and directory pruning (enforce_retention, prune_empty_directories).
  • The manifest file must be kept in sync with every write.
  • Adding historical browsing to the PWA requires multiple round-trips or loading the full manifest.

Proposed architecture

Replace the tree with two file shapes:

File Contents
data/feed.json Latest N briefings inline (e.g. last 7 days), source health, build metadata
data/archive/YYYY-MM.json Older briefings by month, written once and never mutated

The PWA loads only feed.json on startup — one fetch, everything needed to render. Archive files are fetched on demand if the user browses history.

What goes away

  • manifest.yaml / manifest.json
  • latest.yaml / latest.json
  • articles/ tree
  • briefings/ tree
  • enforce_retention recursive file scan
  • prune_empty_directories
  • All write_manifest logic

What changes

  • publisher.py reads the existing feed.json (restored from the Release asset), prepends the new briefing, trims entries older than retention_days, and writes the updated file.
  • validate_data.py validates feed.json shape instead of walking the tree.
  • sw.js caches feed.json as a single versioned blob — simpler than caching an unbounded set of per-briefing URLs.
  • The Release asset (wazzup-state.zip) contains only feed.json + the monthly archive files.

Trade-offs

Pros

  • Single fetch to render — faster PWA startup
  • Retention is a list slice, not a filesystem walk
  • Service worker caching is trivial
  • Publisher logic shrinks significantly

Cons

  • feed.json grows with the rolling window (35 days × up to 3 briefings/day ≈ ~100 entries); still small in practice
  • Monthly archive files are append-only and never pruned — fine for a personal tool, worth noting
  • Historical per-briefing URLs (currently stable) would change shape

Acceptance criteria

  • publisher.py produces data/feed.json with the last N briefings and source health inline
  • publisher.py writes data/archive/YYYY-MM.json for entries rolling out of the feed window
  • validate_data.py validates the new shapes
  • PWA renders from a single feed.json fetch with no intermediate latest.json hop
  • sw.js cache strategy updated
  • All existing unit tests updated or replaced
  • CI pipeline (task ci + task pipeline:generate:fixtures + task validate:data) passes

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions