diff --git a/.github/workflows/pages.yml b/.github/workflows/pages.yml index e78fa52..840e625 100644 --- a/.github/workflows/pages.yml +++ b/.github/workflows/pages.yml @@ -34,6 +34,7 @@ jobs: artifact-path: public install-command: |- python3 -m pip install --break-system-packages -r requirements.txt - test-command: PYTHONPATH=src python3 scripts/check_format.py && python3 scripts/lint.py && PYTHONPATH=src python3 -m unittest discover -s tests && python3 -m compileall -q src scripts + test-command: |- + PYTHONPATH=src python3 scripts/check_format.py && python3 scripts/lint.py && PYTHONPATH=src python3 -m unittest discover -s tests && python3 -m compileall -q src scripts build-command: PYTHONPATH=src python3 scripts/pages_build.py cloudflare-preview: false diff --git a/docs/architecture.md b/docs/architecture.md index e3330db..5fdb2ee 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -37,20 +37,20 @@ flowchart LR ## Runtime components -| Component | Responsibility | Current implementation | -| -------------------- | -------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------- | -| Source configuration | Defines feeds, short source tags, categories, weights, headers, and interest hints. | [../config/sources.yml](../config/sources.yml) and [../config/interests.yml](../config/interests.yml). | -| Fetcher | Retrieves RSS and Atom XML feeds. | `urllib.request` based Python fetcher in [../src/wazzup/feeds.py](../src/wazzup/feeds.py). | -| Normalizer | Converts source entries into `ContentItem` records. | Pure functions with fixtures. | -| Deduplicator | Groups duplicate or near-duplicate articles. | Canonical URL + raw ref/GUID + normalized title/day transitive groups. | -| Ranker | Scores items against interests, source quality, recency, and coverage. | Deterministic scoring plus optional AI reranking later. | -| Curator | Selects and orders the most relevant items from the ranked list for the briefing. | AI curation provider abstraction; `wazzup-curator` agent for Copilot CLI, deterministic passthrough for fake. | -| Summarizer | Generates article and briefing summaries from the curated item selection. | AI summary provider abstraction with prompt versioning; `wazzup-writer` agent for Copilot CLI. | +| Component | Responsibility | Current implementation | +| --------------------- | -------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------- | +| Source configuration | Defines feeds, short source tags, categories, weights, headers, and interest hints. | [../config/sources.yml](../config/sources.yml) and [../config/interests.yml](../config/interests.yml). | +| Fetcher | Retrieves RSS and Atom XML feeds. | `urllib.request` based Python fetcher in [../src/wazzup/feeds.py](../src/wazzup/feeds.py). | +| Normalizer | Converts source entries into `ContentItem` records. | Pure functions with fixtures. | +| Deduplicator | Groups duplicate or near-duplicate articles. | Canonical URL + raw ref/GUID + normalized title/day transitive groups. | +| Ranker | Scores items against interests, source quality, recency, and coverage. | Deterministic scoring plus optional AI reranking later. | +| Curator | Selects and orders the most relevant items from the ranked list for the briefing. | AI curation provider abstraction; `wazzup-curator` agent for Copilot CLI, deterministic passthrough for fake. | +| Summarizer | Generates article and briefing summaries from the curated item selection. | AI summary provider abstraction with prompt versioning; `wazzup-writer` agent for Copilot CLI. | | Transparency reporter | Explains run inputs, source health, selection, and AI providers for auditability. | AI report provider abstraction; `wazzup-transparency-reporter` agent for Copilot CLI, deterministic fake report for tests. | -| Publisher | Writes canonical static JSON, source health, transparency reports, `latest`, and `manifest` files. | [../src/wazzup/publisher.py](../src/wazzup/publisher.py). | -| State store | Persists generated data across scheduled runs without commits. | `news-state` GitHub Release asset `wazzup-state.zip`. | -| Delivery adapters | Pushes selected briefings to external channels. | Not implemented yet. | -| Frontend | Displays latest briefing and source health. | Static vanilla PWA in [../public](../public). | +| Publisher | Writes canonical static JSON, source health, transparency reports, `latest`, and `manifest` files. | [../src/wazzup/publisher.py](../src/wazzup/publisher.py). | +| State store | Persists generated data across scheduled runs without commits. | `news-state` GitHub Release asset `wazzup-state.zip`. | +| Delivery adapters | Pushes selected briefings to external channels. | Not implemented yet. | +| Frontend | Displays latest briefing and source health. | Static vanilla PWA in [../public](../public). | ## Pipeline flow @@ -420,7 +420,7 @@ The future MCP server should depend on the domain contracts in [../src/wazzup](. | Risk | Mitigation | | --------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------ | | GitHub Pages exposes personal interests | Public output is accepted for the current deployment; keep source preferences and prompts minimal and support private/static alternatives later. | -| Repository bloat from generated data | Store rolling state in a GitHub Release asset and deploy Pages artifacts without committing generated JSON. | +| Repository bloat from generated data | Store rolling state in a GitHub Release asset and deploy Pages artifacts without committing generated JSON. | | AI hallucinations | Require citations, validate structured output, keep source links visible. | | AI provider cost spikes | Cap item count today; add summary caching and token/monthly accounting before relying on strict budget guarantees. | | Scheduled workflows delayed | Treat schedules as best-effort and compute windows from timestamps. | diff --git a/docs/github-actions.md b/docs/github-actions.md index e3ecf43..971ef4e 100644 --- a/docs/github-actions.md +++ b/docs/github-actions.md @@ -2,19 +2,19 @@ ## Workflow overview -| Workflow | Trigger | Responsibility | -| ------------------- | -------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------ | -| CI | Pull request and manual dispatch | Formatting, syntax linting, tests, compile checks, fixture generation, and generated-data validation. | -| Lint | Pull request and manual dispatch | Reusable organization lint workflow from `DevSecNinja/.github`. | -| Auto-fix formatting | Manual dispatch | Reusable organization formatting workflow that commits dprint/yamlfmt fixes back to the branch. | -| News | Hourly schedule with a local two-hour active-window cadence gate and manual dispatch | Fetch feeds, generate a rolling briefing, validate data, persist release-backed state, and upload a short-lived `public` artifact for debugging. | -| Pages | Explicit dispatch from `News` after state persists, push to `main`, and manual dispatch | Deploy PWA and static JSON data to GitHub Pages through the reusable `DevSecNinja/.github` Pages workflow. | -| Config Sync | Weekly and manual dispatch | Open PRs when shared repo config from `DevSecNinja/.github` drifts. | -| Label Sync | Daily, manual dispatch, and label config changes | Sync repository labels from the org base labels plus repo-specific labels. | -| Labeler | Pull requests, issues, and manual dispatch | Apply area/type labels using shared labeler automation. | -| Live smoke | Not implemented yet | Optional real feed and AI provider canary checks with strict budgets. | -| Archive cleanup | Not implemented yet | Keep release-backed rolling state compact and optionally publish monthly recap archives. | -| Release automation | Not implemented yet | Future release-please workflow driven by Conventional Commits. | +| Workflow | Trigger | Responsibility | +| ------------------- | --------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------ | +| CI | Pull request and manual dispatch | Formatting, syntax linting, tests, compile checks, fixture generation, and generated-data validation. | +| Lint | Pull request and manual dispatch | Reusable organization lint workflow from `DevSecNinja/.github`. | +| Auto-fix formatting | Manual dispatch | Reusable organization formatting workflow that commits dprint/yamlfmt fixes back to the branch. | +| News | Hourly schedule with a local two-hour active-window cadence gate and manual dispatch | Fetch feeds, generate a rolling briefing, validate data, persist release-backed state, and upload a short-lived `public` artifact for debugging. | +| Pages | Explicit dispatch from `News` after state persists, push to `main`, and manual dispatch | Deploy PWA and static JSON data to GitHub Pages through the reusable `DevSecNinja/.github` Pages workflow. | +| Config Sync | Weekly and manual dispatch | Open PRs when shared repo config from `DevSecNinja/.github` drifts. | +| Label Sync | Daily, manual dispatch, and label config changes | Sync repository labels from the org base labels plus repo-specific labels. | +| Labeler | Pull requests, issues, and manual dispatch | Apply area/type labels using shared labeler automation. | +| Live smoke | Not implemented yet | Optional real feed and AI provider canary checks with strict budgets. | +| Archive cleanup | Not implemented yet | Keep release-backed rolling state compact and optionally publish monthly recap archives. | +| Release automation | Not implemented yet | Future release-please workflow driven by Conventional Commits. | ## Recommended workflow boundaries @@ -278,9 +278,9 @@ Expected secrets: | Secret | Purpose | | ---------------------------- | ----------------------------------------------------------------------------------------------------------- | | `COPILOT_REQUESTS_PAT` | Preferred repository secret containing a fine-grained PAT for Copilot CLI with Copilot Requests permission. | -| `COPILOT_GITHUB_TOKEN` | Alternative secret name accepted by the News workflow. | -| `COPILOT_MODEL` | Optional Copilot CLI model override for curator and transparency; defaults to `claude-sonnet-4.6`. | -| `COPILOT_WRITER_MODEL` | Optional Copilot CLI model override for the briefing writer; defaults to `claude-opus-4.8`. | +| `COPILOT_GITHUB_TOKEN` | Alternative secret name accepted by the News workflow. | +| `COPILOT_MODEL` | Optional Copilot CLI model override for curator and transparency; defaults to `claude-sonnet-4.6`. | +| `COPILOT_WRITER_MODEL` | Optional Copilot CLI model override for the briefing writer; defaults to `claude-opus-4.8`. | | `AZURE_OPENAI_ENDPOINT` | Azure OpenAI endpoint. | | `AZURE_OPENAI_API_KEY` | Azure OpenAI API key. | | `OPENAI_API_KEY` | Optional alternative provider. | @@ -331,10 +331,10 @@ The scheduled `News` workflow is responsible for mutating release-backed state. Observed failure modes and fixes: -| Failure | Cause | Fix | -| -------------------------------------------------------- | -------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------- | -| News run failed in Copilot CLI | `COPILOT_GITHUB_TOKEN` was empty while `AI_PROVIDER=copilot-cli`. | Select effective provider first and fall back to `fake` when no Copilot token secret exists. | -| Pages deploy failed validating `public/data/latest.json` | Reusable workflow received empty `GH_TOKEN`, state restore skipped, and `public/data` was missing. | Restore public release asset without a token in Pages builds and make missing retained state fatal for `pages:build`. | +| Failure | Cause | Fix | +| -------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------- | +| News run failed in Copilot CLI | `COPILOT_GITHUB_TOKEN` was empty while `AI_PROVIDER=copilot-cli`. | Select effective provider first and fall back to `fake` when no Copilot token secret exists. | +| Pages deploy failed validating `public/data/latest.json` | Reusable workflow received empty `GH_TOKEN`, state restore skipped, and `public/data` was missing. | Restore public release asset without a token in Pages builds and make missing retained state fatal for `pages:build`. | | Pages deploy failed during mise install | Reusable workflow install command could not pass `github.token`, so mise made unauthenticated GitHub API calls and hit rate limits. | Avoid mise in Pages deploy; install Python deps directly and run `scripts/pages_build.py`. | ## Security hardening diff --git a/docs/requirements.md b/docs/requirements.md index 0e74921..735e34a 100644 --- a/docs/requirements.md +++ b/docs/requirements.md @@ -59,28 +59,28 @@ Implemented deviations from the original target: ## Functional requirements -| ID | Requirement | Priority | -| ------ | --------------------------------------------------------------------------------------------------------------------------------------------- | -------- | -| FR-001 | Maintain a configurable list of feeds, short source tags, source categories, source weights, and user interests. | Must | -| FR-002 | Fetch configured RSS/Atom feeds every two hours from 07:00 through 21:59 local time in GitHub Actions. JSON Feed support remains deferred. | Must | -| FR-003 | Deduplicate articles using canonical URL, feed item GUID, title similarity, and publication timestamp. | Must | -| FR-004 | Store normalized article metadata in stable, versioned JSON, without committing generated data to `main`. | Must | -| FR-005 | Score articles against user interests, recency, source reliability, and duplicate coverage. | Must | -| FR-006 | Generate hourly summaries from the current local day's retained feed items while suppressing items already featured earlier that day. | Should | -| FR-007 | Generate a morning briefing at 07:00 local time covering overnight updates since the previous evening briefing. | Must | -| FR-008 | Generate an evening briefing at 20:00 local time covering the day since 07:00. | Must | -| FR-009 | Include citations/source links for every summary bullet. | Must | -| FR-010 | Expose latest summaries and article indexes as static JSON. | Must | -| FR-011 | Provide a minimal responsive frontend for the latest rolling day view, earlier-today grouping, previous-day summary, and source health. | Must | -| FR-012 | Support installable PWA behavior with offline reading of recently loaded briefings. | Should | -| FR-013 | Support user notification options without requiring a custom always-on server. | Should | -| FR-014 | Provide a Home Assistant-friendly integration surface for briefings. | Could | -| FR-015 | Ingest podcast releases and transcript metadata. | Could | -| FR-016 | Recommend podcast episodes worth listening to based on interests and transcript/description relevance. | Could | -| FR-017 | Define contracts that can later back a REST API, agent tool, or MCP server. | Must | -| FR-018 | Provide observability outputs for workflow runs, source failures, AI provider cost, and item counts. | Should | -| FR-019 | Support interchangeable AI runners without changing ranking, storage, or frontend contracts. | Must | -| FR-020 | Generate a monthly recap from retained daily and evening briefings. | Could | +| ID | Requirement | Priority | +| ------ | ------------------------------------------------------------------------------------------------------------------------------------------ | -------- | +| FR-001 | Maintain a configurable list of feeds, short source tags, source categories, source weights, and user interests. | Must | +| FR-002 | Fetch configured RSS/Atom feeds every two hours from 07:00 through 21:59 local time in GitHub Actions. JSON Feed support remains deferred. | Must | +| FR-003 | Deduplicate articles using canonical URL, feed item GUID, title similarity, and publication timestamp. | Must | +| FR-004 | Store normalized article metadata in stable, versioned JSON, without committing generated data to `main`. | Must | +| FR-005 | Score articles against user interests, recency, source reliability, and duplicate coverage. | Must | +| FR-006 | Generate hourly summaries from the current local day's retained feed items while suppressing items already featured earlier that day. | Should | +| FR-007 | Generate a morning briefing at 07:00 local time covering overnight updates since the previous evening briefing. | Must | +| FR-008 | Generate an evening briefing at 20:00 local time covering the day since 07:00. | Must | +| FR-009 | Include citations/source links for every summary bullet. | Must | +| FR-010 | Expose latest summaries and article indexes as static JSON. | Must | +| FR-011 | Provide a minimal responsive frontend for the latest rolling day view, earlier-today grouping, previous-day summary, and source health. | Must | +| FR-012 | Support installable PWA behavior with offline reading of recently loaded briefings. | Should | +| FR-013 | Support user notification options without requiring a custom always-on server. | Should | +| FR-014 | Provide a Home Assistant-friendly integration surface for briefings. | Could | +| FR-015 | Ingest podcast releases and transcript metadata. | Could | +| FR-016 | Recommend podcast episodes worth listening to based on interests and transcript/description relevance. | Could | +| FR-017 | Define contracts that can later back a REST API, agent tool, or MCP server. | Must | +| FR-018 | Provide observability outputs for workflow runs, source failures, AI provider cost, and item counts. | Should | +| FR-019 | Support interchangeable AI runners without changing ranking, storage, or frontend contracts. | Must | +| FR-020 | Generate a monthly recap from retained daily and evening briefings. | Could | ### Functional requirement implementation notes @@ -90,7 +90,7 @@ Implemented deviations from the original target: | Deduplication | Canonical URL tracking-parameter stripping, raw ref/GUID key, normalized title plus publication day, transitive duplicate groups, source-priority winner selection. | Semantic title similarity and duplicate-group publication metadata. | | Briefings | Daytime two-hour scheduled briefing plus automatic local-time morning/evening due detection, and manually forced auto/hourly/morning/evening generation. | Daily/hourly view routing in the frontend. | | AI | `fake` deterministic provider and `copilot-cli` provider. Tokenless scheduled runs fall back to `fake`. | API providers, Ollama/Foundry providers, strict token/cost accounting. | -| Frontend | Latest briefing, previous-day summary, source/category tags, article temperature, and source-health PWA from JSON data. | Rich saved-item navigation, server-side notifications, Home Assistant UI. | +| Frontend | Latest briefing, previous-day summary, source/category tags, article temperature, and source-health PWA from JSON data. | Rich saved-item navigation, server-side notifications, Home Assistant UI. | | Data validation | Runtime validation through `wazzup.validate_data` and tests. | Published JSON Schema files and schema-version migration tooling. | ## Non-functional requirements diff --git a/docs/testing.md b/docs/testing.md index 0d5fa7f..6900a87 100644 --- a/docs/testing.md +++ b/docs/testing.md @@ -11,14 +11,14 @@ ## Test pyramid -| Layer | Purpose | Examples | CI trigger | -| ----------- | ------------------------------------------------------ | ---------------------------------------------------------------------------------------------- | -------------------------------- | -| Unit | Validate pure logic. | Date windows, scoring, dedupe, feed normalization, provider selection. | Pull request and manual CI. | -| Contract | Validate generated data shape and provider interfaces. | `ContentItem`, `Briefing`, `latest.json`, release-state layout. | Pull request and manual CI. | -| Integration | Validate components together with fixtures. | Parse saved RSS samples, run pipeline with fake AI provider, publish data to temp dir. | Pull request and manual CI. | -| Frontend | Validate rendering and accessibility basics. | Load fixture `latest.json`, render briefing, keyboard navigation, service worker registration. | Planned. | -| End-to-end | Validate deployed/static behavior. | Build app, restore release state, validate Pages artifact. | News and Pages workflows. | -| Live smoke | Validate external services safely. | Fetch a small allowlisted feed, optional AI provider canary with tiny prompt. | Scheduled or manual only. | +| Layer | Purpose | Examples | CI trigger | +| ----------- | ------------------------------------------------------ | ---------------------------------------------------------------------------------------------- | --------------------------- | +| Unit | Validate pure logic. | Date windows, scoring, dedupe, feed normalization, provider selection. | Pull request and manual CI. | +| Contract | Validate generated data shape and provider interfaces. | `ContentItem`, `Briefing`, `latest.json`, release-state layout. | Pull request and manual CI. | +| Integration | Validate components together with fixtures. | Parse saved RSS samples, run pipeline with fake AI provider, publish data to temp dir. | Pull request and manual CI. | +| Frontend | Validate rendering and accessibility basics. | Load fixture `latest.json`, render briefing, keyboard navigation, service worker registration. | Planned. | +| End-to-end | Validate deployed/static behavior. | Build app, restore release state, validate Pages artifact. | News and Pages workflows. | +| Live smoke | Validate external services safely. | Fetch a small allowlisted feed, optional AI provider canary with tiny prompt. | Scheduled or manual only. | ## Implemented test suite diff --git a/tests/test_repo_automation.py b/tests/test_repo_automation.py index 0e5f575..d7cf7b1 100644 --- a/tests/test_repo_automation.py +++ b/tests/test_repo_automation.py @@ -20,7 +20,6 @@ def test_repo_automation_workflows_are_onboarded(self) -> None: self.assertTrue(path.exists(), workflow) content = path.read_text(encoding="utf-8") self.assertIn("DevSecNinja/.github/.github/workflows/", content) - self.assertIn("# renovate: datasource=github-tags depName=DevSecNinja/.github", content) def test_lint_autofix_and_hooks_are_configured(self) -> None: lint_config_paths = ["dprint.json", ".yamlfmt.yaml", ".yamllint.yaml", ".lefthook.toml"]