diff --git a/.github/agents/wazzup-writer.agent.md b/.github/agents/wazzup-writer.agent.md index 693ea7c..0f6c212 100644 --- a/.github/agents/wazzup-writer.agent.md +++ b/.github/agents/wazzup-writer.agent.md @@ -6,15 +6,18 @@ tools: [execute, edit] user-invocable: true argument-hint: "Path to prompt.json and requested output file" --- + You are the Wazzup briefing writer. Your job is to turn ranked source items into concise, source-grounded English news briefing JSON for a single technical reader. ## Boundaries + - Read only the input file named in the user prompt. - Write only the output file named in the user prompt. - Do not fetch web pages or add claims that are not present in the input item data. - Do not include Markdown fences, commentary, analysis notes, or prose outside the requested JSON object. ## Writing Rules + - Preserve the input item order so the newest hourly articles stay at the top, except when merging related items into one synthesized bullet. - Merge closely related input items into one synthesized bullet when they describe the same story, campaign, incident, vendor, product, or affected organization. - Every bullet must include citations containing source item IDs from the input. @@ -27,6 +30,7 @@ You are the Wazzup briefing writer. Your job is to turn ranked source items into - Never mention scoring internals such as source weight, score, recency bonus, or duplicate group IDs. ## Output Contract + Write strict JSON with this shape: ```json diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md index b97eaa4..ccaca88 100644 --- a/.github/copilot-instructions.md +++ b/.github/copilot-instructions.md @@ -68,11 +68,11 @@ mise exec -- task install The CI workflow ([.github/workflows/ci.yml](.github/workflows/ci.yml)) runs, in order: -1. `task install` – installs `PyYAML`. -2. `task ci` – runs `format:check`, `lint`, `test`, `build`. +1. `task install` – installs `PyYAML`. +2. `task ci` – runs `format:check`, `lint`, `test`, `build`. 3. `task pipeline:generate:fixtures` – deterministic briefing from `tests/fixtures` with `AI_PROVIDER=fake`. -4. `task validate:data` – schema/shape checks of `public/data`. +4. `task validate:data` – schema/shape checks of `public/data`. Always reproduce that exact order locally before pushing. The individual pieces (validated to work in this devcontainer) are: diff --git a/config/sources.yml b/config/sources.yml index ca6aa79..3dc649a 100644 --- a/config/sources.yml +++ b/config/sources.yml @@ -525,7 +525,8 @@ sources: - standings fetch: timeoutSeconds: 8 - notes: "Official NBA news feed for league-wide headlines. This endpoint can be slow or unreachable from server-side runners, so it uses a shorter timeout." + notes: "Official NBA news feed for league-wide headlines. This endpoint can be slow or unreachable from server-side runners, + so it uses a shorter timeout." - id: espn-nba name: ESPN NBA diff --git a/docs/architecture.md b/docs/architecture.md index 72a5481..db28e4b 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -249,6 +249,7 @@ Example fields: - `latestEveningBriefingUrl` - `generatedAt` - `health` +- `runStatus` (last attempted/successful run timestamps, provider, briefing kind, item/source counts, and status) ## Scheduling model @@ -406,12 +407,12 @@ The future MCP server should depend on the domain contracts in [../src/wazzup](. ## Key risks and mitigations -| Risk | Mitigation | -| --------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------- | +| Risk | Mitigation | +| --------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------ | | GitHub Pages exposes personal interests | Public output is accepted for the current deployment; keep source preferences and prompts minimal and support private/static alternatives later. | -| Repository bloat from generated data | Store rolling state in a GitHub Release asset and deploy Pages artifacts without committing generated YAML/JSON. | -| AI hallucinations | Require citations, validate structured output, keep source links visible. | -| AI provider cost spikes | Cap item count today; add summary caching and token/monthly accounting before relying on strict budget guarantees. | -| Scheduled workflows delayed | Treat schedules as best-effort and compute windows from timestamps. | -| Feed parsing failures | Isolate source failures and publish source health. | -| Copyright issues | Store metadata and summaries only; avoid republishing full content. | +| Repository bloat from generated data | Store rolling state in a GitHub Release asset and deploy Pages artifacts without committing generated YAML/JSON. | +| AI hallucinations | Require citations, validate structured output, keep source links visible. | +| AI provider cost spikes | Cap item count today; add summary caching and token/monthly accounting before relying on strict budget guarantees. | +| Scheduled workflows delayed | Treat schedules as best-effort and compute windows from timestamps. | +| Feed parsing failures | Isolate source failures and publish source health. | +| Copyright issues | Store metadata and summaries only; avoid republishing full content. | diff --git a/docs/github-actions.md b/docs/github-actions.md index 2798b8a..311d170 100644 --- a/docs/github-actions.md +++ b/docs/github-actions.md @@ -2,19 +2,19 @@ ## Workflow overview -| Workflow | Trigger | Responsibility | -| ------------------- | ----------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------ | -| CI | Pull request and manual dispatch | Formatting, syntax linting, tests, compile checks, fixture generation, and generated-data validation. | -| Lint | Pull request and manual dispatch | Reusable organization lint workflow from `DevSecNinja/.github`. | -| Auto-fix formatting | Manual dispatch | Reusable organization formatting workflow that commits dprint/yamlfmt fixes back to the branch. | -| News hourly | Hourly schedule with local cadence gate and manual dispatch | Fetch feeds, generate a rolling briefing, validate data, persist release-backed state, and upload a short-lived `public` artifact for debugging. | +| Workflow | Trigger | Responsibility | +| ------------------- | -------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------ | +| CI | Pull request and manual dispatch | Formatting, syntax linting, tests, compile checks, fixture generation, and generated-data validation. | +| Lint | Pull request and manual dispatch | Reusable organization lint workflow from `DevSecNinja/.github`. | +| Auto-fix formatting | Manual dispatch | Reusable organization formatting workflow that commits dprint/yamlfmt fixes back to the branch. | +| News hourly | Hourly schedule with local cadence gate and manual dispatch | Fetch feeds, generate a rolling briefing, validate data, persist release-backed state, and upload a short-lived `public` artifact for debugging. | | Pages | Successful `News hourly` workflow run, push to `main`, and manual dispatch | Deploy PWA and static YAML/JSON data to GitHub Pages through the reusable `DevSecNinja/.github` Pages workflow. | -| Config Sync | Weekly and manual dispatch | Open PRs when shared repo config from `DevSecNinja/.github` drifts. | -| Label Sync | Daily, manual dispatch, and label config changes | Sync repository labels from the org base labels plus repo-specific labels. | -| Labeler | Pull requests, issues, and manual dispatch | Apply area/type labels using shared labeler automation. | -| Live smoke | Not implemented yet | Optional real feed and AI provider canary checks with strict budgets. | -| Archive cleanup | Not implemented yet | Keep release-backed rolling state compact and optionally publish monthly recap archives. | -| Release automation | Not implemented yet | Future release-please workflow driven by Conventional Commits. | +| Config Sync | Weekly and manual dispatch | Open PRs when shared repo config from `DevSecNinja/.github` drifts. | +| Label Sync | Daily, manual dispatch, and label config changes | Sync repository labels from the org base labels plus repo-specific labels. | +| Labeler | Pull requests, issues, and manual dispatch | Apply area/type labels using shared labeler automation. | +| Live smoke | Not implemented yet | Optional real feed and AI provider canary checks with strict budgets. | +| Archive cleanup | Not implemented yet | Keep release-backed rolling state compact and optionally publish monthly recap archives. | +| Release automation | Not implemented yet | Future release-please workflow driven by Conventional Commits. | ## Recommended workflow boundaries @@ -44,12 +44,12 @@ Release Please remains deferred until the app has an explicit first release/vers The preferred current path is Copilot CLI because it is GitHub-native and can run directly inside a scheduled workflow. The pipeline should still expose a provider abstraction so the same request can be handled by Copilot CLI, an API provider, Ollama, or a fake test provider. -| Runner | When to use | Workflow implications | -| ------------- | --------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| Runner | When to use | Workflow implications | +| ------------- | --------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Copilot CLI | Preferred first production runner. | Install `@github/copilot`, set `COPILOT_GITHUB_TOKEN` from a fine-grained PAT with Copilot Requests permission, run `copilot -p` with `--model`, `--agent`, `--no-ask-user`, and restricted `--allow-tool`. | -| API provider | Fallback or production runner when strict structured output, model selection, or accounting is easier through an API. | Store provider keys in Actions secrets and call through the pipeline adapter. | -| Ollama | Optional local-model experiment or privacy-focused smoke run. | Install/start Ollama, pull/cache a small model, expect slower CPU inference on GitHub-hosted runners. | -| Fake provider | CI and deterministic tests. | No secrets or network calls. | +| API provider | Fallback or production runner when strict structured output, model selection, or accounting is easier through an API. | Store provider keys in Actions secrets and call through the pipeline adapter. | +| Ollama | Optional local-model experiment or privacy-focused smoke run. | Install/start Ollama, pull/cache a small model, expect slower CPU inference on GitHub-hosted runners. | +| Fake provider | CI and deterministic tests. | No secrets or network calls. | ## Implemented CI workflow @@ -159,6 +159,19 @@ jobs: The workflow triggers hourly because GitHub cron is UTC-only and does not understand `Europe/Amsterdam` daylight-saving transitions. A first cadence step computes the local hour and continues every hour from 06:00 to 21:59, then only on even local hours from 22:00 to 05:59. Manual dispatch always runs. +### Manual catch-up for delayed or missed runs + +When cron delivery is delayed, use the existing **News hourly** `workflow_dispatch` path: + +1. Open **Actions → News hourly → Run workflow**. +2. Keep `forceBriefing=auto` for normal catch-up, or select `hourly`/`morning`/`evening` for an explicit run. +3. Keep `aiProvider=copilot-cli` unless you intentionally want deterministic fallback with `fake`. +4. Run once, then verify `public/data/latest.json` shows a fresh `runStatus.lastSuccessfulRunAt`. + +### Lightweight stale-run alert path + +The PWA now marks pipeline status as **Stale** when the last attempted run age exceeds the UI threshold. For an operational alert, add a small scheduled workflow that checks `public/data/latest.json` (`runStatus.lastAttemptedRunAt`) and opens an issue or sends a notification when stale for too long. + Operational learning: the first live News hourly run failed because Copilot CLI was requested but the token secret was empty. The workflow now selects an effective provider before installing Node/Copilot. If `copilot-cli` is requested without `COPILOT_REQUESTS_PAT` or `COPILOT_GITHUB_TOKEN`, it logs a warning and uses `AI_PROVIDER=fake` so the release state and Pages deployment path can still be validated end to end. After enabling the Copilot PAT, one live run failed because Copilot CLI wrote JSON without the required `sections` array. The provider now treats invalid structured Copilot output as an AI-provider failure and falls back to the deterministic summary shape, recording `provider.type: copilot-cli-fallback` and the validation reason instead of failing the whole state/deploy pipeline. diff --git a/docs/requirements.md b/docs/requirements.md index b27a228..ba36a14 100644 --- a/docs/requirements.md +++ b/docs/requirements.md @@ -61,26 +61,26 @@ Implemented deviations from the original target: | ID | Requirement | Priority | | ------ | --------------------------------------------------------------------------------------------------------------------------------------------- | -------- | -| FR-001 | Maintain a configurable list of feeds, short source tags, source categories, source weights, and user interests. | Must | -| FR-002 | Fetch configured RSS/Atom feeds hourly from GitHub Actions. JSON Feed support remains deferred. | Must | -| FR-003 | Deduplicate articles using canonical URL, feed item GUID, title similarity, and publication timestamp. | Must | -| FR-004 | Store normalized article metadata in stable, versioned YAML with generated JSON browser mirrors, without committing generated data to `main`. | Must | -| FR-005 | Score articles against user interests, recency, source reliability, and duplicate coverage. | Must | -| FR-006 | Generate hourly summaries from the current local day's retained feed items while suppressing items already featured earlier that day. | Should | -| FR-007 | Generate a morning briefing at 07:00 local time covering the previous day plus overnight updates since 20:00. | Must | -| FR-008 | Generate an evening briefing at 20:00 local time covering the day since 07:00. | Must | -| FR-009 | Include citations/source links for every summary bullet. | Must | -| FR-010 | Expose latest summaries and article indexes as static YAML plus JSON browser mirrors. | Must | -| FR-011 | Provide a minimal responsive frontend for the latest rolling day view, earlier-today grouping, previous-day summary, and source health. | Must | -| FR-012 | Support installable PWA behavior with offline reading of recently loaded briefings. | Should | -| FR-013 | Support user notification options without requiring a custom always-on server. | Should | -| FR-014 | Provide a Home Assistant-friendly integration surface for briefings. | Could | -| FR-015 | Ingest podcast releases and transcript metadata. | Could | -| FR-016 | Recommend podcast episodes worth listening to based on interests and transcript/description relevance. | Could | -| FR-017 | Define contracts that can later back a REST API, agent tool, or MCP server. | Must | -| FR-018 | Provide observability outputs for workflow runs, source failures, AI provider cost, and item counts. | Should | -| FR-019 | Support interchangeable AI runners without changing ranking, storage, or frontend contracts. | Must | -| FR-020 | Generate a monthly recap from retained daily and evening briefings. | Could | +| FR-001 | Maintain a configurable list of feeds, short source tags, source categories, source weights, and user interests. | Must | +| FR-002 | Fetch configured RSS/Atom feeds hourly from GitHub Actions. JSON Feed support remains deferred. | Must | +| FR-003 | Deduplicate articles using canonical URL, feed item GUID, title similarity, and publication timestamp. | Must | +| FR-004 | Store normalized article metadata in stable, versioned YAML with generated JSON browser mirrors, without committing generated data to `main`. | Must | +| FR-005 | Score articles against user interests, recency, source reliability, and duplicate coverage. | Must | +| FR-006 | Generate hourly summaries from the current local day's retained feed items while suppressing items already featured earlier that day. | Should | +| FR-007 | Generate a morning briefing at 07:00 local time covering the previous day plus overnight updates since 20:00. | Must | +| FR-008 | Generate an evening briefing at 20:00 local time covering the day since 07:00. | Must | +| FR-009 | Include citations/source links for every summary bullet. | Must | +| FR-010 | Expose latest summaries and article indexes as static YAML plus JSON browser mirrors. | Must | +| FR-011 | Provide a minimal responsive frontend for the latest rolling day view, earlier-today grouping, previous-day summary, and source health. | Must | +| FR-012 | Support installable PWA behavior with offline reading of recently loaded briefings. | Should | +| FR-013 | Support user notification options without requiring a custom always-on server. | Should | +| FR-014 | Provide a Home Assistant-friendly integration surface for briefings. | Could | +| FR-015 | Ingest podcast releases and transcript metadata. | Could | +| FR-016 | Recommend podcast episodes worth listening to based on interests and transcript/description relevance. | Could | +| FR-017 | Define contracts that can later back a REST API, agent tool, or MCP server. | Must | +| FR-018 | Provide observability outputs for workflow runs, source failures, AI provider cost, and item counts. | Should | +| FR-019 | Support interchangeable AI runners without changing ranking, storage, or frontend contracts. | Must | +| FR-020 | Generate a monthly recap from retained daily and evening briefings. | Could | ### Functional requirement implementation notes @@ -95,19 +95,19 @@ Implemented deviations from the original target: ## Non-functional requirements -| ID | Requirement | Target | -| ------- | --------------- | -------------------------------------------------------------------------------------------- | -| NFR-001 | Maintainability | Modular pipeline with typed contracts, adapters, and small functions. | -| NFR-002 | Testability | Deterministic tests using fixture feeds and mocked AI provider responses. | -| NFR-003 | Security | Secrets only in GitHub Actions secrets; no secrets in static output or logs. | -| NFR-004 | Privacy | Explicit choice whether generated briefings and interests can be public. | +| ID | Requirement | Target | +| ------- | --------------- | ------------------------------------------------------------------------------------------------------------------- | +| NFR-001 | Maintainability | Modular pipeline with typed contracts, adapters, and small functions. | +| NFR-002 | Testability | Deterministic tests using fixture feeds and mocked AI provider responses. | +| NFR-003 | Security | Secrets only in GitHub Actions secrets; no secrets in static output or logs. | +| NFR-004 | Privacy | Explicit choice whether generated briefings and interests can be public. | | NFR-005 | Cost control | Configurable max items per run is implemented; token/monthly budget accounting and summary caching remain deferred. | -| NFR-006 | Reliability | Workflow should fail gracefully per feed and continue processing healthy sources. | -| NFR-007 | Portability | Core pipeline should run locally, in GitHub Actions, and later in an API service. | -| NFR-008 | Performance | Frontend initial load should remain lightweight; static data should be chunked by date. | -| NFR-009 | Accessibility | Frontend should meet WCAG 2.1 AA basics: keyboard navigation, contrast, semantic HTML. | -| NFR-010 | Auditability | Every summary should record model/provider, prompt version, generation time, and source IDs. | -| NFR-011 | Release hygiene | Commits must follow Conventional Commits so release-please can be introduced later. | +| NFR-006 | Reliability | Workflow should fail gracefully per feed and continue processing healthy sources. | +| NFR-007 | Portability | Core pipeline should run locally, in GitHub Actions, and later in an API service. | +| NFR-008 | Performance | Frontend initial load should remain lightweight; static data should be chunked by date. | +| NFR-009 | Accessibility | Frontend should meet WCAG 2.1 AA basics: keyboard navigation, contrast, semantic HTML. | +| NFR-010 | Auditability | Every summary should record model/provider, prompt version, generation time, and source IDs. | +| NFR-011 | Release hygiene | Commits must follow Conventional Commits so release-please can be introduced later. | ## Current scope @@ -200,7 +200,7 @@ Define versioned contracts for sources, content items, summaries, scores, and de | Decision | Current choice | | -------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------- | -| Data visibility | Public GitHub Pages is acceptable for the current deployment. | +| Data visibility | Public GitHub Pages is acceptable for the current deployment. | | AI runner | Copilot CLI is the default; explore Ollama, Foundry, or API providers later. | | Primary delivery | PWA first. | | Frontend stack | Implemented as vanilla HTML/CSS/JavaScript with no frontend build step. TypeScript/Web Components can be introduced later only if they add clear value. | diff --git a/public/app.js b/public/app.js index b98e84f..590f46f 100644 --- a/public/app.js +++ b/public/app.js @@ -17,6 +17,9 @@ const SEEN_BRIEFING_ITEMS_STORAGE_KEY = 'wazzup:seenBriefingItems'; const HIDE_SEEN_STORAGE_KEY = 'wazzup:hideSeen'; const SEEN_VISIBILITY_RATIO = 0.85; const SEEN_DWELL_MS = 1500; +const HOURS_TO_MINUTES = 60; +const STALE_RUN_THRESHOLD_MINUTES = 2 * HOURS_TO_MINUTES + 30; +const CATCH_UP_WORKFLOW_NAME = 'News hourly'; let briefingSeenObserver = null; let briefingSeenTimers = new WeakMap(); @@ -500,10 +503,53 @@ function renderHero(briefing) { heroSummaryEl.textContent = normalized?.description || 'No notable updates were found in today’s rolling briefing.'; } -function renderSources(status) { +function pipelineStatusBadge(runStatus, stale) { + if (stale) return { text: 'Stale', bad: true, warn: false }; + switch (runStatus?.status) { + case 'degraded_provider': + case 'degraded_provider_and_sources': + return { text: 'AI degraded', bad: true, warn: false }; + case 'degraded_sources': + return { text: 'Source degraded', bad: false, warn: true }; + default: + return { text: 'Healthy', bad: false, warn: false }; + } +} + +function pipelineStatusClassName(badge) { + if (badge.bad) return 'status status--bad'; + if (badge.warn) return 'status status--warn'; + return 'status'; +} + +function runAgeMinutes(runStatus) { + const lastAttemptedRunAt = runStatus?.lastAttemptedRunAt; + if (!lastAttemptedRunAt) return null; + const ageMs = Date.now() - new Date(lastAttemptedRunAt).getTime(); + if (!Number.isFinite(ageMs)) return null; + if (ageMs < 0) return 0; + return Math.floor(ageMs / 60000); +} + +function runIsStale(runStatus) { + const ageMinutes = runAgeMinutes(runStatus); + return ageMinutes !== null && ageMinutes > STALE_RUN_THRESHOLD_MINUTES; +} + +function renderSources(status, latest) { const sources = (status.sources || []) .slice() .sort((sourceA, sourceB) => sourceA.sourceId.localeCompare(sourceB.sourceId)); + const runStatus = latest?.runStatus || {}; + const stale = runIsStale(runStatus); + const badge = pipelineStatusBadge(runStatus, stale); + const generatedAt = runStatus.lastSuccessfulRunAt || latest?.generatedAt; + const provider = runStatus.provider || 'unknown'; + const generatedItemCount = Number(runStatus.generatedItemCount || 0); + const staleHint = stale + ? `

Latest pipeline run looks stale. Trigger ${escapeHtml(CATCH_UP_WORKFLOW_NAME)} manually from Actions, then run workflow_dispatch.

` + : ''; + const badgeClassName = pipelineStatusClassName(badge); const items = sources .map( (source) => `
  • @@ -516,6 +562,8 @@ function renderSources(status) { sourcesEl.innerHTML = `

    Source health

    ${sources.filter((source) => source.ok).length}/${sources.length} sources healthy

    +

    ${badge.text} ${generatedAt ? `Generated ${escapeHtml(formatDate(generatedAt))}` : 'Generated time unavailable'} · ${escapeHtml(provider)} · ${escapeHtml(generatedItemCount)} items

    + ${staleHint} `; } @@ -744,7 +792,7 @@ async function main() { renderHero(briefing); renderBriefing(todayBriefing, seenState); observeBriefingItems(seenState); - renderSources(status); + renderSources(status, latest); await renderYesterday(manifest, latest, briefing); await renderFooter(buildInfo); if ('serviceWorker' in navigator) { diff --git a/public/styles.css b/public/styles.css index 1bcfe96..c6c62e6 100644 --- a/public/styles.css +++ b/public/styles.css @@ -345,6 +345,21 @@ main { .status { color: var(--success); } .status--bad { color: var(--warning); } +.status--warn { color: #fcd34d; } + +.pipeline-meta { + display: flex; + flex-wrap: wrap; + align-items: center; + gap: 0.5rem; +} + +.pipeline-meta code { + border: 1px solid var(--border); + border-radius: 0.4rem; + padding: 0.05rem 0.35rem; + background: rgba(15, 23, 42, 0.5); +} .source-list, .yesterday-summary { diff --git a/src/wazzup/publisher.py b/src/wazzup/publisher.py index 7c1e985..5259474 100644 --- a/src/wazzup/publisher.py +++ b/src/wazzup/publisher.py @@ -151,6 +151,8 @@ def publish_outputs( "sources": [status.to_dict() for status in statuses], }, ) + failed_source_count = len([status for status in statuses if not status.ok]) + run_status = build_run_status(kind, generated_at, scored_items, summary, statuses, failed_source_count, previous_latest) latest = { "schemaVersion": 1, "canonicalFormat": "yaml", @@ -177,8 +179,9 @@ def publish_outputs( "health": { "ok": all(status.ok for status in statuses), "sourceCount": len(statuses), - "failedSourceCount": len([status for status in statuses if not status.ok]), + "failedSourceCount": failed_source_count, }, + "runStatus": run_status, } write_data(data_dir / "latest.yaml", latest) enforce_retention(data_dir, generated_at, app_config.retention_days) @@ -186,6 +189,54 @@ def publish_outputs( return latest +def build_run_status( + kind: BriefingKind, + generated_at: datetime, + scored_items: list[ScoredItem], + summary: SummaryResponse, + statuses: list[SourceStatus], + failed_source_count: int, + previous_latest: dict[str, Any], +) -> dict[str, Any]: + provider_type = summary.provider.get("type") + provider_type_text = provider_type.strip() if isinstance(provider_type, str) else "" + provider = provider_type_text if provider_type_text else "unknown" + provider_fallback_reason = summary.provider.get("fallbackReason") + provider_status = "degraded" if provider.endswith("-fallback") else "ok" + source_status = "degraded" if failed_source_count > 0 else "ok" + if provider_status == "degraded" and source_status == "degraded": + status = "degraded_provider_and_sources" + elif provider_status == "degraded": + status = "degraded_provider" + elif source_status == "degraded": + status = "degraded_sources" + else: + status = "ok" + previous_run_status = previous_latest.get("runStatus") + previous_last_successful = ( + previous_run_status.get("lastSuccessfulRunAt") + if isinstance(previous_run_status, dict) and isinstance(previous_run_status.get("lastSuccessfulRunAt"), str) + else None + ) + last_successful_run_at = isoformat(generated_at) if status == "ok" else (previous_last_successful or isoformat(generated_at)) + run_status = { + "schemaVersion": 1, + "status": status, + "sourceStatus": source_status, + "providerStatus": provider_status, + "lastAttemptedRunAt": isoformat(generated_at), + "lastSuccessfulRunAt": last_successful_run_at, + "provider": provider, + "briefingKind": str(kind), + "sourceCount": len(statuses), + "failedSourceCount": failed_source_count, + "generatedItemCount": len(scored_items), + } + if provider_status == "degraded" and isinstance(provider_fallback_reason, str) and provider_fallback_reason.strip(): + run_status["providerFallbackReason"] = provider_fallback_reason + return run_status + + def relative_data_url(data_dir: Path, path: Path) -> str: return path.relative_to(data_dir).as_posix() diff --git a/src/wazzup/validate_data.py b/src/wazzup/validate_data.py index 3f6acb7..fde5dcf 100644 --- a/src/wazzup/validate_data.py +++ b/src/wazzup/validate_data.py @@ -115,6 +115,7 @@ def validate_data_dir(data_dir: Path) -> None: "latestBriefingUrl", "latestArticlesUrl", "health", + "runStatus", ], "latest.json", ) @@ -129,6 +130,26 @@ def validate_data_dir(data_dir: Path) -> None: load_json(data_dir / "sources" / "status.json") load_yaml(data_dir / "sources" / "status.yaml") load_yaml(data_dir / "manifest.yaml") + run_status = latest.get("runStatus") + if not isinstance(run_status, dict): + raise ValidationError("latest.json runStatus must be an object") + require_keys( + run_status, + [ + "schemaVersion", + "status", + "sourceStatus", + "providerStatus", + "lastAttemptedRunAt", + "lastSuccessfulRunAt", + "provider", + "briefingKind", + "sourceCount", + "failedSourceCount", + "generatedItemCount", + ], + "latest.json runStatus", + ) def parse_args(argv: Sequence[str] | None = None) -> argparse.Namespace: diff --git a/tests/test_pipeline.py b/tests/test_pipeline.py index 89f6c05..c9c65ac 100644 --- a/tests/test_pipeline.py +++ b/tests/test_pipeline.py @@ -174,6 +174,19 @@ def test_generate_static_data_from_fixtures(self) -> None: self.assertIn("description", briefing_json["sections"][0]["bullets"][0]) status_json = json.loads((public_dir / "data" / "sources" / "status.json").read_text(encoding="utf-8")) self.assertTrue(all("lastArticleAt" in source for source in status_json["sources"])) + run_status = latest.get("runStatus") + self.assertIsInstance(run_status, dict) + self.assertEqual(1, run_status["schemaVersion"]) + self.assertEqual("degraded_sources", run_status["status"]) + self.assertEqual("degraded", run_status["sourceStatus"]) + self.assertEqual("ok", run_status["providerStatus"]) + self.assertEqual("fake", run_status["provider"]) + self.assertEqual("hourly", run_status["briefingKind"]) + self.assertEqual(len(status_json["sources"]), run_status["sourceCount"]) + self.assertEqual(latest["health"]["failedSourceCount"], run_status["failedSourceCount"]) + self.assertEqual(len(articles_json["items"]), run_status["generatedItemCount"]) + self.assertEqual(latest["generatedAt"], run_status["lastAttemptedRunAt"]) + self.assertEqual(latest["generatedAt"], run_status["lastSuccessfulRunAt"]) finally: if previous_provider is None: os.environ.pop("AI_PROVIDER", None) diff --git a/tests/test_publisher.py b/tests/test_publisher.py index ea165fb..179f72e 100644 --- a/tests/test_publisher.py +++ b/tests/test_publisher.py @@ -7,8 +7,8 @@ from pathlib import Path from wazzup.ai import SummaryResponse -from wazzup.models import AppConfig, ContentItem, ScoredItem -from wazzup.publisher import build_briefing, enforce_retention, write_data, write_manifest +from wazzup.models import AppConfig, ContentItem, ScoredItem, SourceStatus +from wazzup.publisher import build_briefing, build_run_status, enforce_retention, write_data, write_manifest class PublisherTests(unittest.TestCase): @@ -92,6 +92,33 @@ def test_build_briefing_includes_related_source_citations(self) -> None: self.assertEqual(["item-primary", "item-related"], briefing["sourceItemIds"]) self.assertEqual(["primary-source", "related-source"], [citation["sourceId"] for citation in briefing["citations"]]) + def test_build_run_status_keeps_last_successful_on_degraded_run(self) -> None: + first_generated_at = datetime(2026, 5, 6, 9, tzinfo=UTC) + first_status = build_run_status( + "hourly", + first_generated_at, + [], + SummaryResponse(headline="ok", sections=[{"title": "Top", "bullets": []}], provider={"type": "fake"}), + [SourceStatus("one", True, "2026-05-06T09:00:00Z", 1, "ok")], + 0, + {}, + ) + second_generated_at = datetime(2026, 5, 6, 10, tzinfo=UTC) + second_status = build_run_status( + "hourly", + second_generated_at, + [], + SummaryResponse(headline="degraded", sections=[{"title": "Top", "bullets": []}], provider={"type": "fake"}), + [SourceStatus("one", False, "2026-05-06T10:00:00Z", 0, "timeout")], + 1, + {"runStatus": first_status}, + ) + + self.assertEqual("ok", first_status["status"]) + self.assertEqual("degraded_sources", second_status["status"]) + self.assertEqual("2026-05-06T10:00:00Z", second_status["lastAttemptedRunAt"]) + self.assertEqual("2026-05-06T09:00:00Z", second_status["lastSuccessfulRunAt"]) + if __name__ == "__main__": unittest.main()