Context
This is a spike/investigation issue to evaluate replacing the current YAML/JSON file tree with a single SQLite database served via GitHub Pages and read in the browser using sql.js (SQLite compiled to WebAssembly).
Raised from architecture discussion — see the alternative proposal in #(feed.json issue).
Proposed architecture
| Role |
Current |
With SQLite |
| Served to browser |
public/data/*.json via Pages |
public/data/wazzup.db via Pages |
| Persisted between pipeline runs |
wazzup-state.zip on Releases |
wazzup.db on Releases (same file, dual-purpose) |
Pipeline flow
- Download
wazzup.db from the news-state Release asset
- Insert new briefings, run retention
DELETE — no filesystem walk needed
- Write
public/data/wazzup.db
- Pages deploy serves it to the browser
- Re-upload
.db to Releases for the next run
What goes away
- All YAML/JSON output files and the publisher file-tree logic
manifest.yaml, latest.json, enforce_retention, prune_empty_directories
validate_data.py tree walk (replaced by SQL schema checks)
What's added
sqlite3 stdlib usage in publisher.py (zero new runtime dependencies)
- sql.js WASM (~1 MB) bundled or CDN-loaded in the PWA
- Schema migration strategy for the
.db file
Known trade-offs to evaluate during the spike
|
Impact |
| sql.js WASM size |
~1 MB download before the PWA can render — measure real impact on Time to Interactive |
| Whole-DB download |
Browser cannot range-request a subset; full DB downloaded on every cold load. Measure DB size at 35-day retention. |
| No partial SW caching |
The service worker must treat the DB as an opaque blob; individual briefing caching is not possible |
| Binary diff opacity |
.db changes are not human-readable in CI logs; debugging requires tooling |
| Pipeline rewrite scope |
publisher.py and all related tests require full rewrite |
| Richer frontend queries |
SQL enables filtering by date, source, score, keyword without backend — assess if this is actually needed |
Spike goals
Decision criteria
Recommend adoption if:
- DB size at max retention stays under ~2 MB
- Time to Interactive is within 500 ms of the current JSON approach
- The PWA gains query capabilities that are otherwise impractical to add
Otherwise, close as "won't do" and link to the feed.json issue as the preferred path.
Context
This is a spike/investigation issue to evaluate replacing the current YAML/JSON file tree with a single SQLite database served via GitHub Pages and read in the browser using sql.js (SQLite compiled to WebAssembly).
Raised from architecture discussion — see the alternative proposal in #(feed.json issue).
Proposed architecture
public/data/*.jsonvia Pagespublic/data/wazzup.dbvia Pageswazzup-state.zipon Releaseswazzup.dbon Releases (same file, dual-purpose)Pipeline flow
wazzup.dbfrom thenews-stateRelease assetDELETE— no filesystem walk neededpublic/data/wazzup.db.dbto Releases for the next runWhat goes away
manifest.yaml,latest.json,enforce_retention,prune_empty_directoriesvalidate_data.pytree walk (replaced by SQL schema checks)What's added
sqlite3stdlib usage inpublisher.py(zero new runtime dependencies).dbfileKnown trade-offs to evaluate during the spike
.dbchanges are not human-readable in CI logs; debugging requires toolingpublisher.pyand all related tests require full rewriteSpike goals
publisher.pywriting to SQLite instead of YAML/JSON.dbfile size at 35-day retention with realistic dataDecision criteria
Recommend adoption if:
Otherwise, close as "won't do" and link to the
feed.jsonissue as the preferred path.