Skip to content

Evaluate SQLite-backed pipeline state with static JSON exports #71

@DevSecNinja

Description

@DevSecNinja

Summary

Evaluate a future architecture where Wazzup uses a non-hosted SQLite database as the internal pipeline state store, persisted through the existing GitHub Release asset flow, while continuing to publish small static JSON/YAML outputs for the PWA and integrations.

Context

The current architecture intentionally avoids a database: GitHub Actions runs the pipeline, generated YAML is canonical, JSON mirrors are published to GitHub Pages, and rolling state is stored in the news-state Release asset. This keeps the app simple, inspectable, static-hosting friendly, and easy to recover.

SQLite may become useful if the product grows toward richer retained history, source analytics, full-text/archive search, monthly recaps, or more complex deduplication/querying.

Proposed direction

Use SQLite only as an internal backend/pipeline state store, not as the primary browser contract:

  • Use Python's built-in sqlite3 in the backend pipeline.
  • Persist the SQLite file in the existing wazzup-state.zip Release asset.
  • Keep publishing latest.json, date-partitioned briefing JSON, article JSON, source status JSON, and manifest JSON to GitHub Pages.
  • Optionally publish a downloadable wazzup.sqlite archive later for power-user debugging/search.
  • Avoid making the PWA depend on sql.js unless a dedicated offline archive/search feature justifies the extra WASM/runtime complexity.

Potential benefits

  • Transactional single-file state with indexes.
  • Easier historical queries across articles, sources, scores, duplicate groups, citations, and briefings.
  • Better foundation for monthly recaps, source reliability stats, archive search, and future API/MCP surfaces.
  • Less awkward than managing increasingly complex state across many generated YAML files.

Risks and trade-offs

  • Adds schema design, migration, validation, backup, and corruption-handling concerns.
  • Reduces human inspectability compared with YAML/JSON release assets.
  • Requires careful export logic so existing static consumers do not become DB-aware.
  • Could be premature while the current 35-day static state remains small and simple.

Acceptance criteria for investigation

  • Compare current YAML/JSON state against SQLite-backed state for complexity, debuggability, and reliability.
  • Sketch a minimal schema for sources, content items, scored items, duplicate groups, briefings, citations, source health, and run metadata.
  • Define migration/versioning strategy for the SQLite file.
  • Define which JSON/YAML outputs remain public contracts.
  • Decide whether SQLite should stay internal-only or whether a downloadable DB artifact is worthwhile.
  • Document a recommendation in docs/architecture.md or an ADR before implementation.

Non-goals

  • Do not introduce a hosted database or always-on service.
  • Do not replace the PWA's small static JSON entrypoints with a mandatory database download.
  • Do not add sql.js to the frontend unless there is a concrete offline archive/search use case.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions