spike: evaluate SQLite + sql.js as an alternative data layer

## Context

This is a spike/investigation issue to evaluate replacing the current YAML/JSON file tree with a single SQLite database served via GitHub Pages and read in the browser using [sql.js](https://github.com/sql-js/sql-js) (SQLite compiled to WebAssembly).

Raised from architecture discussion — see the alternative proposal in #(feed.json issue).

## Proposed architecture

| Role | Current | With SQLite |
|------|---------|-------------|
| Served to browser | `public/data/*.json` via Pages | `public/data/wazzup.db` via Pages |
| Persisted between pipeline runs | `wazzup-state.zip` on Releases | `wazzup.db` on Releases (same file, dual-purpose) |

### Pipeline flow
1. Download `wazzup.db` from the `news-state` Release asset
2. Insert new briefings, run retention `DELETE` — no filesystem walk needed
3. Write `public/data/wazzup.db`
4. Pages deploy serves it to the browser
5. Re-upload `.db` to Releases for the next run

### What goes away
- All YAML/JSON output files and the publisher file-tree logic
- `manifest.yaml`, `latest.json`, `enforce_retention`, `prune_empty_directories`
- `validate_data.py` tree walk (replaced by SQL schema checks)

### What's added
- `sqlite3` stdlib usage in `publisher.py` (zero new runtime dependencies)
- sql.js WASM (~1 MB) bundled or CDN-loaded in the PWA
- Schema migration strategy for the `.db` file

## Known trade-offs to evaluate during the spike

| | Impact |
|--|--------|
| sql.js WASM size | ~1 MB download before the PWA can render — measure real impact on Time to Interactive |
| Whole-DB download | Browser cannot range-request a subset; full DB downloaded on every cold load. Measure DB size at 35-day retention. |
| No partial SW caching | The service worker must treat the DB as an opaque blob; individual briefing caching is not possible |
| Binary diff opacity | `.db` changes are not human-readable in CI logs; debugging requires tooling |
| Pipeline rewrite scope | `publisher.py` and all related tests require full rewrite |
| Richer frontend queries | SQL enables filtering by date, source, score, keyword without backend — assess if this is actually needed |

## Spike goals

- [ ] Prototype `publisher.py` writing to SQLite instead of YAML/JSON
- [ ] Measure `.db` file size at 35-day retention with realistic data
- [ ] Measure PWA Time to Interactive with sql.js vs current JSON approach
- [ ] Assess service worker strategy for a binary DB blob
- [ ] Document a go/no-go recommendation with data

## Decision criteria

Recommend adoption if:
- DB size at max retention stays under ~2 MB
- Time to Interactive is within 500 ms of the current JSON approach
- The PWA gains query capabilities that are otherwise impractical to add

Otherwise, close as "won't do" and link to the `feed.json` issue as the preferred path.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spike: evaluate SQLite + sql.js as an alternative data layer #69

Context

Proposed architecture

Pipeline flow

What goes away

What's added

Known trade-offs to evaluate during the spike

Spike goals

Decision criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Role	Current	With SQLite
Served to browser	`public/data/*.json` via Pages	`public/data/wazzup.db` via Pages
Persisted between pipeline runs	`wazzup-state.zip` on Releases	`wazzup.db` on Releases (same file, dual-purpose)

	Impact
sql.js WASM size	~1 MB download before the PWA can render — measure real impact on Time to Interactive
Whole-DB download	Browser cannot range-request a subset; full DB downloaded on every cold load. Measure DB size at 35-day retention.
No partial SW caching	The service worker must treat the DB as an opaque blob; individual briefing caching is not possible
Binary diff opacity	`.db` changes are not human-readable in CI logs; debugging requires tooling
Pipeline rewrite scope	`publisher.py` and all related tests require full rewrite
Richer frontend queries	SQL enables filtering by date, source, score, keyword without backend — assess if this is actually needed

spike: evaluate SQLite + sql.js as an alternative data layer #69

Description

Context

Proposed architecture

Pipeline flow

What goes away

What's added

Known trade-offs to evaluate during the spike

Spike goals

Decision criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions