You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Arivu is a local-first Electron desktop application for Retrieval-Augmented Generation (RAG) over your own documents. It runs entirely on your machine with no data sent to the cloud unless you explicitly configure a cloud LLM or embedding provider.
Every uploaded file goes through the following sequential stages. Progress and status are tracked in the database and surfaced in the UI.
Upload
│
▼
Validate file type
│
▼
Save to disk
│
▼ status: "parsing" | progress: 10%
Parse — extract raw text using file-type-specific loaders
│
▼ status: "chunking" | progress: 20%
Chunk — RecursiveCharacterTextSplitter
│ chunk_size: 1000 chars (configurable)
│ chunk_overlap: 200 chars (configurable)
│ separators: ["\n\n", "\n", ". ", " ", ""]
│
▼ status: "indexing" | progress: 40%
Initialise embeddings model
│
▼ status: "indexing" | progress: 60%
RAPTOR (only if document has >10 chunks)
│ → UMAP dimensionality reduction (2D, cosine)
│ → Gaussian Mixture Model clustering (k auto-selected by BIC)
│ → LLM summarises each cluster
│ → Hierarchical levels: level 0 = raw chunks, level 1+ = summaries
│
▼ status: "indexing" | progress: 80%
Add all chunks + summaries to ChromaDB vectorstore
│
▼ status: "indexed" | progress: 100%
Persist chunk metadata to SQLite
RAPTOR Hierarchical Summarization
RAPTOR (Recursive Abstractive Processing for Tree-Organized Retrieval) builds a tree of summaries over large documents, enabling the model to reason at multiple levels of abstraction simultaneously.
When it activates: documents with more than 10 chunks after splitting.
Steps:
Compute embeddings for all base chunks
Reduce to 2 dimensions with UMAP (cosine distance)
Cluster with a Gaussian Mixture Model; number of clusters is chosen automatically by minimising Bayesian Information Criterion (BIC), capped at sqrt(n_chunks)
The configured LLM writes a concise summary for each cluster
Summaries are stored in ChromaDB with level=1 (or higher for recursive passes) alongside the base chunks at level=0
At query time: the top-5 highest-level RAPTOR summaries are injected as global context ahead of the regular retrieved chunks.
Query Pipeline
User question
│
▼
Prepare chat history (last 6 messages)
│
▼ (if history present)
Condense question — LLM rewrites follow-up into a self-contained query
│
▼ (if multi-query enabled)
Multi-query expansion — LLM generates 3 alternative phrasings
│
▼
Retrieve chunks — for each query variant:
│ search_type: "similarity" (cosine) or "mmr" (Maximal Marginal Relevance)
│ k: configurable (default 5)
│ optional min_score filter
│
▼ (if reranking enabled)
Cross-encoder reranking — BAAI/bge-reranker-base scores all candidates,
│ top-k kept, rest discarded
│
▼
Fetch RAPTOR global context — top-5 level-1+ summaries for the project
│
▼
Merge context — RAPTOR summaries prepended to retrieved chunks,
│ deduplicated, capped at max_context_tokens
│
▼
LLM generation — stuff-documents chain
│
▼
Post-process — extract sources, scores, debug metadata
│
▼
Persist to chat history → return answer + sources + (optional) debug info
Retrieval modes:
Mode
Description
similarity
Standard cosine similarity search. Fast and deterministic.
mmr
Maximal Marginal Relevance. Trades some relevance for diversity to avoid redundant chunks.
Query translation options:
Option
Effect
Condense question
Rewrites follow-up questions using conversation history into a standalone query
Multi-query
Generates 3 alternative phrasings and merges results for better recall
Study Mode Pipeline
Fetch all vectorstore documents for the project
│
▼
Prioritise RAPTOR summaries (level ≥ 1):
│ If available: top 20 summaries sorted by level (highest first)
│ Fallback: raw chunks (limit 50)
│
▼
Format context with source info
│
▼
LLM generation with study prompt
│ mode: quiz | summary | flashcards
│ count: number of items
│ topic: optional focus topic
│
▼
Return content + sources
LLM & Embedding Providers
LLM Backends
Backend
Detection
Default model
Notes
Ollama (default)
Model name does not contain gpt, openai, o1, o3
llama3
Requires Ollama running locally at http://localhost:11434