Problem
The current publisher writes a hierarchical file tree (briefings/YYYY/MM/DD/hourly-HH.yaml, articles/YYYY/MM/DD.yaml, manifest.yaml) with parallel JSON mirrors for every file. This means:
- The PWA must fetch
latest.json first, then a second request for the actual briefing JSON before it can render anything.
- Retention logic requires recursive file scanning, date-parsing from path segments, and directory pruning (
enforce_retention, prune_empty_directories).
- The manifest file must be kept in sync with every write.
- Adding historical browsing to the PWA requires multiple round-trips or loading the full manifest.
Proposed architecture
Replace the tree with two file shapes:
| File |
Contents |
data/feed.json |
Latest N briefings inline (e.g. last 7 days), source health, build metadata |
data/archive/YYYY-MM.json |
Older briefings by month, written once and never mutated |
The PWA loads only feed.json on startup — one fetch, everything needed to render. Archive files are fetched on demand if the user browses history.
What goes away
manifest.yaml / manifest.json
latest.yaml / latest.json
articles/ tree
briefings/ tree
enforce_retention recursive file scan
prune_empty_directories
- All
write_manifest logic
What changes
publisher.py reads the existing feed.json (restored from the Release asset), prepends the new briefing, trims entries older than retention_days, and writes the updated file.
validate_data.py validates feed.json shape instead of walking the tree.
sw.js caches feed.json as a single versioned blob — simpler than caching an unbounded set of per-briefing URLs.
- The Release asset (
wazzup-state.zip) contains only feed.json + the monthly archive files.
Trade-offs
Pros
- Single fetch to render — faster PWA startup
- Retention is a list slice, not a filesystem walk
- Service worker caching is trivial
- Publisher logic shrinks significantly
Cons
feed.json grows with the rolling window (35 days × up to 3 briefings/day ≈ ~100 entries); still small in practice
- Monthly archive files are append-only and never pruned — fine for a personal tool, worth noting
- Historical per-briefing URLs (currently stable) would change shape
Acceptance criteria
Problem
The current publisher writes a hierarchical file tree (
briefings/YYYY/MM/DD/hourly-HH.yaml,articles/YYYY/MM/DD.yaml,manifest.yaml) with parallel JSON mirrors for every file. This means:latest.jsonfirst, then a second request for the actual briefing JSON before it can render anything.enforce_retention,prune_empty_directories).Proposed architecture
Replace the tree with two file shapes:
data/feed.jsondata/archive/YYYY-MM.jsonThe PWA loads only
feed.jsonon startup — one fetch, everything needed to render. Archive files are fetched on demand if the user browses history.What goes away
manifest.yaml/manifest.jsonlatest.yaml/latest.jsonarticles/treebriefings/treeenforce_retentionrecursive file scanprune_empty_directorieswrite_manifestlogicWhat changes
publisher.pyreads the existingfeed.json(restored from the Release asset), prepends the new briefing, trims entries older thanretention_days, and writes the updated file.validate_data.pyvalidatesfeed.jsonshape instead of walking the tree.sw.jscachesfeed.jsonas a single versioned blob — simpler than caching an unbounded set of per-briefing URLs.wazzup-state.zip) contains onlyfeed.json+ the monthly archive files.Trade-offs
Pros
Cons
feed.jsongrows with the rolling window (35 days × up to 3 briefings/day ≈ ~100 entries); still small in practiceAcceptance criteria
publisher.pyproducesdata/feed.jsonwith the last N briefings and source health inlinepublisher.pywritesdata/archive/YYYY-MM.jsonfor entries rolling out of the feed windowvalidate_data.pyvalidates the new shapesfeed.jsonfetch with no intermediatelatest.jsonhopsw.jscache strategy updatedtask ci+task pipeline:generate:fixtures+task validate:data) passes