feat(wave4): reliability — threadpools, timeouts, WAL, batching, OCR guard by vikranthreddimasu · Pull Request #15 · vikranthreddimasu/notebook-lm

vikranthreddimasu · 2026-04-18T01:54:44Z

Context

Wave 4 of 5 from PLAN.md. This one targets the structural weaknesses the tech audit flagged as "threats to actually works": unbounded timeouts, event-loop-blocking inference, OOM-able embedding batches, reads blocking on writes in SQLite, silent no-op ingestions for scanned PDFs, conversation splits when a meta event drops, startup deadlocks on slow Ollama.

Event loop stays responsive

`aembed` on the EmbeddingBackend protocol. `SentenceTransformersBackend` offloads `model.encode()` to the default threadpool (a 200–500ms encode was stalling concurrent health checks and sidebar refreshes). `HashEmbeddingBackend` runs inline.
Async query variants. `VectorStoreManager.aquery` / `aquery_across_notebooks` / `aquery_document_summaries` use `aembed`. RAGService `prepare_prompt` / `prepare_prompt_cross_notebook` and `LlamaIndexRAGService.query` now actually async all the way down.
LlamaCpp backend. `generate` and `stream_generate` use `loop.run_in_executor` for the blocking `llama.create_completion` calls. Streams pull chunks on a worker thread and yield on the event loop. Previously every second of inference froze all other requests.
Ollama stream timeout. `httpx.AsyncClient(timeout=httpx.Timeout(connect=5, read=300, write=10, pool=5))` instead of `timeout=None`. A hung Ollama instance no longer pins the connection forever.

OOM and re-upload guards

Batched embedding. `add_chunks` embeds in batches of 64 instead of one mega-call. A 500-page PDF no longer OOMs sentence-transformers.
Upsert, not add. `collection.upsert` makes re-uploading a file succeed cleanly. The old path failed mid-batch with `DuplicateIDError` and left Chroma half-updated.
OCR guard for scanned PDFs. `_load_pdf` now raises a clear `DocumentLoaderError` ("run it through OCR first") if extracted text is below a minimum threshold. Before, a scanned PDF "succeeded" with 0 chunks and queries returned "no relevant documents found" forever.

SQLite stays unblocked

`PRAGMA journal_mode = WAL` + `busy_timeout = 5000` in `notebook_store`, `conversation_store`, `metrics_store`. Readers don't block on writers. The sidebar refresh during stream completion used to stall under default DELETE journal.

Data integrity on stream

`conversation_id` echoed on `done` events too. If the `meta` event drops (parse error, transient glitch), the frontend still learns the id via done and doesn't fork the thread. `useChat` guards with `!activeConversationId` so it only sets when missing.

Startup correctness

Lazy Ollama resolution. Model resolution moved to an `asyncio` lifespan; a slow `/api/tags` round-trip at boot no longer delays FastAPI startup or trips Electron's `waitForBackend` timeout. `app.py` imports consolidated at the top (were E402-violating before).
Split token config defaults. `llm_context_window` 4096 (was 2048), `llm_max_tokens` 1024 (was 2048). One knob was doing two jobs — it capped replies AND silently truncated long RAG prompts.
Chroma telemetry off. `Settings(anonymized_telemetry=False)`. Offline-first is the whole promise; also stops the recurring "Failed to send telemetry event" log noise.

Frontend hardening

API timeouts. `request()` composes any caller-provided `AbortSignal` with a default 30s timeout. Uploads get their own 5-minute budget via `uploadDocument`. Previously a stalled endpoint hung the UI indefinitely.
User-facing error. Timeout errors flow through the Wave-3 `humanizeError` translator so the user sees "The request took too long…" instead of an `AbortError` stack dive.

Verification

`cd apps/desktop && npm run build` — clean (272 modules, 864ms)
`cd backend && uv run ruff check .` — clean
`cd backend && uv run pytest -q` — 33 passed

Intentionally not in this PR

4.1 Backend-crash recovery IPC (bigger Electron refactor)
4.3 Exponential backoff for 503/ECONNREFUSED (fiddly retry policy)
4.9 Stream-to-disk upload + size limit (multipart streaming rewrite)
4.13 Per-conversation asyncio lock (needs a store-level primitive)
4.15 `list_documents` via summaries collection (risks regressing docs that failed summary)
4.16 Defer summary LLM to a background task queue

These are planned for a follow-up or Wave 4b depending on priority.

🤖 Generated with Claude Code

…guard Wave 4 of 5 from .gstack/qa-reports/PLAN.md. The structural weaknesses the tech audit called out as "threats to actually works" — unbounded timeouts, event-loop-blocking inference, OOM-able embedding batches, reads blocking on writes in SQLite, silent no-op ingestions for scanned PDFs, conversation splits when a meta event drops, startup deadlocks on slow Ollama. ## Event-loop stays responsive now - Embedding backend grows an aembed() that offloads model.encode() to the default threadpool (SentenceTransformers) or runs inline (HashEmbedding is trivial). VectorStoreManager grows aquery / aquery_across_notebooks / aquery_document_summaries that use it. RAGService.prepare_prompt and prepare_prompt_cross_notebook are now truly async; a 200–500ms encode no longer stalls health checks and concurrent sidebar refreshes. - LlamaCppBackend.generate / stream_generate run llama_cpp's blocking calls via loop.run_in_executor. Streams pull chunks on a worker thread and yield on the event loop. Without this, every second of inference froze every other request. - OllamaBackend.stream_generate now uses a finite httpx.Timeout (connect 5s, read 300s, write 10s, pool 5s). Previous timeout=None held the connection open indefinitely when Ollama hung. ## OOM and re-upload guards - VectorStoreManager.add_chunks embeds in batches of 64 instead of handing the whole document to model.encode in one call. A 500-page PDF no longer OOMs sentence-transformers. - add_chunks also uses collection.upsert instead of add, so re-uploading a file succeeds cleanly instead of failing mid-batch with DuplicateIDError and leaving the collection in an inconsistent state. - Scanned PDFs with no embedded text now raise a clear DocumentLoaderError ("run it through OCR first") instead of silently ingesting 0 chunks and serving "no relevant documents found" on every subsequent query. ## SQLite stays unblocked - All three stores (notebook_store, conversation_store, metrics_store) now set PRAGMA journal_mode = WAL and PRAGMA busy_timeout = 5000. Readers no longer block on writers — the sidebar refresh during stream completion used to stall under the default DELETE journal. ## Data integrity on stream - Backend now echoes conversation_id on the `done` event as well as `meta`. If the meta event drops (parse error, transient network), the frontend still learns the id via done and doesn't create a second conversation on the next send. useChat honors both with a `!activeConversationId` guard so we only set when it's missing. ## Startup and shipping correctness - Ollama model resolution deferred to an asyncio lifespan so a slow /api/tags round-trip at boot no longer delays FastAPI startup and trips Electron's waitForBackend timeout. app.py imports consolidated at the top of the file (was E402-violating before). - Config split: llm_context_window default 4096 (was 2048), llm_max_tokens default 1024 (was 2048). Previously both were 2048 — the same knob was used for "how much context the model can hold" AND "how many tokens to generate", which quietly truncated long RAG prompts AND capped replies. - Chroma telemetry disabled via Settings(anonymized_telemetry=False). Keeps the offline-first promise and stops filling logs with "Failed to send telemetry event ClientCreateCollectionEvent" warnings. ## Frontend hardening - api.ts request() accepts an optional timeoutMs and composes caller signals with a default 30s AbortController. Prior code could hang on any stalled endpoint forever. Uploads get their own 5-minute budget rather than the default. - Network errors surface through the errorMessages humanizer that Wave 3 added, so the user sees "The backend isn't reachable…" instead of a raw "Failed to fetch". ## Verified - cd apps/desktop && npm run build — clean (272 modules, 864ms) - cd backend && uv run ruff check . — clean - cd backend && uv run pytest -q — 33 passed ## Intentionally not in this PR (for a later wave or follow-up) - 4.1 Backend-crash recovery IPC — bigger Electron main-process refactor - 4.3 Exponential backoff for 503/ECONNREFUSED — fiddly retry-policy work - 4.9 Stream-to-disk upload + size limit — requires multipart-streaming rewrite at the FastAPI boundary - 4.13 Per-conversation asyncio lock — needs a store-level primitive - 4.15 list_documents via summaries — risks regressing docs that failed summary generation; needs a unified doc registry first - 4.16 Document summary LLM deferred to background — needs a task queue Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

vikranthreddimasu merged commit 78b5ccc into main Apr 18, 2026
2 checks passed

vikranthreddimasu mentioned this pull request Apr 18, 2026

feat(wave5): polish — pane resize, fuzzy palette, wizard cohesion, reduced motion #16

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(wave4): reliability — threadpools, timeouts, WAL, batching, OCR guard#15

feat(wave4): reliability — threadpools, timeouts, WAL, batching, OCR guard#15
vikranthreddimasu merged 1 commit into
mainfrom
feat/wave4-reliability

vikranthreddimasu commented Apr 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant