Summary
Evaluate a future architecture where Wazzup uses a non-hosted SQLite database as the internal pipeline state store, persisted through the existing GitHub Release asset flow, while continuing to publish small static JSON/YAML outputs for the PWA and integrations.
Context
The current architecture intentionally avoids a database: GitHub Actions runs the pipeline, generated YAML is canonical, JSON mirrors are published to GitHub Pages, and rolling state is stored in the news-state Release asset. This keeps the app simple, inspectable, static-hosting friendly, and easy to recover.
SQLite may become useful if the product grows toward richer retained history, source analytics, full-text/archive search, monthly recaps, or more complex deduplication/querying.
Proposed direction
Use SQLite only as an internal backend/pipeline state store, not as the primary browser contract:
- Use Python's built-in
sqlite3 in the backend pipeline.
- Persist the SQLite file in the existing
wazzup-state.zip Release asset.
- Keep publishing
latest.json, date-partitioned briefing JSON, article JSON, source status JSON, and manifest JSON to GitHub Pages.
- Optionally publish a downloadable
wazzup.sqlite archive later for power-user debugging/search.
- Avoid making the PWA depend on
sql.js unless a dedicated offline archive/search feature justifies the extra WASM/runtime complexity.
Potential benefits
- Transactional single-file state with indexes.
- Easier historical queries across articles, sources, scores, duplicate groups, citations, and briefings.
- Better foundation for monthly recaps, source reliability stats, archive search, and future API/MCP surfaces.
- Less awkward than managing increasingly complex state across many generated YAML files.
Risks and trade-offs
- Adds schema design, migration, validation, backup, and corruption-handling concerns.
- Reduces human inspectability compared with YAML/JSON release assets.
- Requires careful export logic so existing static consumers do not become DB-aware.
- Could be premature while the current 35-day static state remains small and simple.
Acceptance criteria for investigation
- Compare current YAML/JSON state against SQLite-backed state for complexity, debuggability, and reliability.
- Sketch a minimal schema for sources, content items, scored items, duplicate groups, briefings, citations, source health, and run metadata.
- Define migration/versioning strategy for the SQLite file.
- Define which JSON/YAML outputs remain public contracts.
- Decide whether SQLite should stay internal-only or whether a downloadable DB artifact is worthwhile.
- Document a recommendation in
docs/architecture.md or an ADR before implementation.
Non-goals
- Do not introduce a hosted database or always-on service.
- Do not replace the PWA's small static JSON entrypoints with a mandatory database download.
- Do not add
sql.js to the frontend unless there is a concrete offline archive/search use case.
Summary
Evaluate a future architecture where Wazzup uses a non-hosted SQLite database as the internal pipeline state store, persisted through the existing GitHub Release asset flow, while continuing to publish small static JSON/YAML outputs for the PWA and integrations.
Context
The current architecture intentionally avoids a database: GitHub Actions runs the pipeline, generated YAML is canonical, JSON mirrors are published to GitHub Pages, and rolling state is stored in the
news-stateRelease asset. This keeps the app simple, inspectable, static-hosting friendly, and easy to recover.SQLite may become useful if the product grows toward richer retained history, source analytics, full-text/archive search, monthly recaps, or more complex deduplication/querying.
Proposed direction
Use SQLite only as an internal backend/pipeline state store, not as the primary browser contract:
sqlite3in the backend pipeline.wazzup-state.zipRelease asset.latest.json, date-partitioned briefing JSON, article JSON, source status JSON, and manifest JSON to GitHub Pages.wazzup.sqlitearchive later for power-user debugging/search.sql.jsunless a dedicated offline archive/search feature justifies the extra WASM/runtime complexity.Potential benefits
Risks and trade-offs
Acceptance criteria for investigation
docs/architecture.mdor an ADR before implementation.Non-goals
sql.jsto the frontend unless there is a concrete offline archive/search use case.