Local LLM news research pipeline.
Tsuzuri is an MVP Python implementation of a local-LLM-powered news research pipeline. It searches, fetches, summarizes, renders a cited Markdown report, and optionally uploads artifacts to Nextcloud WebDAV.
The current implementation contains a minimal runnable local pipeline. It can search through SearXNG, filter URLs, fetch HTML documents, summarize through Ollama, save local artifacts, and optionally upload artifacts to WebDAV.
This project is ready for local, personal, and demo use through the CLI, HTTP API, Docker Compose, and React web UI. It is not yet hardened for public multi-user production deployment.
Production gaps to address before public exposure:
- No authentication or authorization on the API/UI.
- Run state is in memory while the API process is alive.
- Run history is not reconstructed from existing
outputs/artifacts yet. - Cluster/global reduce summarization is not implemented.
- Discord notification is not implemented.
Implemented:
- Pydantic schemas for pipeline data.
- URL normalization, deduplication, domain filtering, and document-type routing.
- Rule-based query expansion.
- Async SearXNG JSON API client.
- HTML fetch validation and extraction with
httpxandtrafilatura. - PDF fetching with PyMuPDF.
- Local artifact storage under
outputs/{run_id}/. - Minimal orchestrator command for search, filtering, HTML fetch, summarization, report rendering, and artifact saving.
- Optional WebDAV artifact upload with warn-and-continue failure behavior.
- Citation extraction, validation, and final Markdown source rendering.
- FastAPI HTTP API for external applications.
- React dark themed web UI with progress polling and final report preview.
- Docker and Docker Compose support.
- Unit tests for the implemented modules.
Not implemented yet:
- Authentication / access control.
- Persistent run history across API restarts.
- Discord notification.
- Cluster/global reduce summarization.
- Python 3.11 or newer.
uv.- Optional: Nix with direnv support for the repository flake workflow.
Synchronize dependencies:
just syncOr directly:
uv syncCopy .env.example to .env when enabling external services later:
cp .env.example .envDo not commit real secrets.
Run the default local pipeline:
just deployRun with a custom query:
just deploy "AI regulation latest developments"Or call the CLI directly:
PYTHONPATH=src uv run python -m tsuzuri.cli run "AI regulation latest developments"Artifacts are saved under outputs/{run_id}/. WebDAV upload is attempted when
webdav_base_url, NEXTCLOUD_USERNAME, and NEXTCLOUD_PASSWORD are available.
Upload failures are reported as warnings and do not fail the run.
Start the API server:
just apiHealth check:
curl http://127.0.0.1:8000/healthzRun the pipeline from an external app:
curl -X POST http://127.0.0.1:8000/runs \
-H 'Content-Type: application/json' \
-d '{"query":"AI regulation latest developments"}'The response includes run counts, warnings, and the local path to
final_report.md.
The API starts runs in the background. Poll GET /runs/{run_id} for progress,
then fetch GET /runs/{run_id}/final-report after completion.
The React web UI is served from the API after building frontend assets:
just frontend-build
just apiOpen http://127.0.0.1:8000/ui/ for the dark themed run dashboard with progress
polling and final report preview.
For frontend-only development:
just frontend-devBuild the local image:
just docker-buildRun the API with Docker Compose:
just docker-upStop it:
just docker-downThe compose service exposes the API on http://127.0.0.1:8000, mounts
./outputs for artifacts, and reads .env for runtime configuration and
secrets when present.
The Docker image builds the React frontend and serves it from /ui/ in the same
FastAPI container.
You can run Tsuzuri from outside this repository with only a Compose file,
.env, and an outputs/ directory.
Copy example.compose.yml to your deployment directory as compose.yml:
mkdir -p tsuzuri-deploy/outputs
cd tsuzuri-deploy
curl -fsSLo compose.yml \
https://raw.githubusercontent.com/uPiscium/Tsuzuri/v0.1.1/example.compose.ymlCreate .env:
TSUZURI_SEARXNG_BASE_URL=https://your-searxng.example.com
TSUZURI_OLLAMA_BASE_URL=https://your-ollama.example.com
TSUZURI_OLLAMA_MODEL=gemma4:26b
TSUZURI_WEBDAV_BASE_URL=https://your-nextcloud.example.com/remote.php/dav/files/your-user/NAS/Tsuzuri
TSUZURI_QUERY_TIMEOUT_S=10.0
TSUZURI_FETCH_TIMEOUT_S=30.0
TSUZURI_OLLAMA_TIMEOUT_S=60.0
TSUZURI_UPLOAD_TIMEOUT_S=15.0
TSUZURI_MAX_CONCURRENT_FETCHES=3
TSUZURI_MIN_SUCCESS_CHARS=200
TSUZURI_MAX_MAP_DOCUMENTS=5
TSUZURI_SEARCH_LANGUAGE=en
TSUZURI_SEARCH_CATEGORIES=news,general
TSUZURI_ALLOWED_LANGUAGES=en,ja
TSUZURI_USER_AGENT=Tsuzuri/0.1
NEXTCLOUD_USERNAME=your-nextcloud-user
NEXTCLOUD_PASSWORD=your-nextcloud-app-password
DISCORD_WEBHOOK_URL=Start the service:
docker compose up --buildOpen the web UI:
http://127.0.0.1:8000/ui/
Run all checks:
just check-allFocused commands:
just lint src/
just typecheck src/
just test tests/Format source and tests:
just format src/
uv run ruff format tests/
uv run ruff check --fix tests/Non-secret defaults live in settings.toml for local runs. Docker deployments can
use TSUZURI_* environment variables instead of mounting settings.toml.
Secret values are expected through environment variables, with placeholders in
.env.example:
TSUZURI_SEARXNG_BASE_URLTSUZURI_OLLAMA_BASE_URLTSUZURI_OLLAMA_MODELTSUZURI_WEBDAV_BASE_URLNEXTCLOUD_USERNAMENEXTCLOUD_PASSWORDDISCORD_WEBHOOK_URL
Next implementation slices:
- Add deterministic document quality filtering.
- Add cluster/global reduce summarization.
- Add persistent run history from
outputs/{run_id}/summary.json. - Add API/UI authentication before public deployment.
- Add Discord notification.