Improve topic-focused retrieval coverage for writing (increase recall, reduce missed related articles)

## Problem
When drafting an article for a specific topic/query, the system does not consistently surface all relevant analyzed articles. This reduces factual coverage and makes the output feel under-grounded in the Telegram-derived corpus.

Today we have:
- Topic discovery via embedding clustering
- Query-driven writing via nearest-neighbor retrieval over article embeddings

However, users report that many related articles are not being included for a given topic.

## Goals
- Increase recall for topic/query-driven writing so the retrieved source set is more complete and representative.
- Preserve precision (avoid flooding with unrelated sources).
- Keep the system transparent: users should be able to see why an article was selected.

## Proposed approach
- Add a two-stage retrieval strategy:
  1) Broad candidate fetch (larger K, e.g. 50-200) using pgvector cosine distance
  2) Re-rank + diversify (MMR-style) and downselect to N sources (e.g. 6-12)
- Incorporate additional signals:
  - Topic linkage graph (topic_articles)
  - Recency / popularity (views/forwards where available)
  - Source/domain quality heuristics (optional)
- Add debug output / telemetry:
  - Print top candidates with distance/similarity
  - Persist retrieval diagnostics in DB for later audit

## Acceptance criteria
- Given a stable analyzed DB snapshot, writing for a query returns a visibly improved source set (higher topical coverage).
- Retrieval prints (or can be enabled to print) the selected sources and their similarity scores.
- Unit/integration test added for retrieval selection logic (deterministic with fixed embeddings).

## Notes
- Keep TLS verification enabled; no changes to security posture.
- Avoid calling external web search: this should rely on our analyzed corpus.

## Tasks
- [ ] Implement broad candidate retrieval + reranking
- [ ] Add MMR/diversification (or similar)
- [ ] Add diagnostics (console + optional persisted metadata)
- [ ] Add tests


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve topic-focused retrieval coverage for writing (increase recall, reduce missed related articles) #1

Problem

Goals

Proposed approach

Acceptance criteria

Notes

Tasks

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Improve topic-focused retrieval coverage for writing (increase recall, reduce missed related articles) #1

Description

Problem

Goals

Proposed approach

Acceptance criteria

Notes

Tasks

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions