Ambrew

A RAG system that lets you chat with your documents and highlights the exact source of its answers. 🔍

Why This Exists

When you upload a PDF to ChatGPT, Gemini, or most RAG-based assistants and ask a question, the response cites the document at best—never the exact passage. You are left trusting a black box. You cannot quickly verify whether the AI grounded its answer in the source or simply generated a plausible-sounding paraphrase.

Ambrew solves this by running a fully local pipeline where every citation is interactive. Clicking a citation jumps to the exact bounding-box region on the exact page it came from. You can inspect the source evidence, page numbers, and relevance scores before ever opening the file.

How It Works

Ambrew separates its document processing into two distinct flows: the Ingestion Pipeline (which runs at upload time to ingest, parse, enrich, and index files) and the Query / Retrieval Pipeline (which runs at search time to fetch context, synthesize responses, and enforce citation integrity).

1. Ingestion Pipeline

graph TD
    Upload["Upload File or URL"] --> Route{"Ingestion Mode?"}
    
    %% Fast Mode
    Route -->|Fast| FastExtract["Extract Text & Tables"]
    
    %% Academic Mode
    Route -->|Academic| AcadExtract["Run Academic Extraction<br/>using CodeFormulaV2"]
    AcadExtract --> VLM["AI describes figures<br/>(qwen2.5vl:3b)"]
    
    %% Auto Mode
    Route -->|Auto| AutoFastPass["Run Fast Pass First"]
    AutoFastPass --> AutoCheck{"Formula or figure<br/>details missing?"}
    AutoCheck -->|Yes| AcadExtract
    AutoCheck -->|No| BBoxCapture
    
    FastExtract --> BBoxCapture["Shared Bounding-Box<br/>Capture & Extraction"]
    VLM --> BBoxCapture
    
    BBoxCapture --> ChunkNorm["Normalize, chunk, merge<br/>& overlap segments"]
    ChunkNorm --> Embed["Dense (BAAI/bge-base-en-v1.5)<br/>+ Sparse (Qdrant/bm25) Embedding"]
    Embed --> Storage["Save to local databases<br/>(Qdrant & SQLite)"]

Ingestion Implementation Details

Format Routing & Fallbacks: Standard formats like PDF, DOCX, PPTX, XLSX, HTML, Markdown, CSV, JSON and images are parsed natively. Pandoc pre-converts EPUB and RTF formats to DOCX before processing. Format-specific extractors (e.g., PyMuPDF) act as fallbacks if Docling loading fails.
Ollama VLM Warmup: Academic mode triggers a pre-flight connection check and warms up the local qwen2.5vl:3b model with a lightweight test prompt before document extraction, preventing empty outputs or timeouts.
Deterministic Post-Processing: Chunks are processed through a strict pipeline: metadata normalization (unionizing page bounding boxes), figure envelope unification (consolidating bboxes of split multi-chunk figures), token-based splitting, iterative short-chunk merging, and 64-token sliding window overlapping.
Stable Fingerprinting: A final unique chunk_id is assigned using a stable SHA-256 fingerprint of the chunk contents to support duplicate detection and skipping existing chunks.
SSRF Protection: Direct URL ingestion validates redirected hops and blocks private, loopback, link-local, multicast, and cloud metadata addresses (including 169.254.169.254). Downloaded HTML pages are converted to clean Markdown prior to ingestion.

2. Query / Retrieval Pipeline

graph TD
    Query["User Query"] --> Rewrite["Conversation-Aware Rewrite<br/>(qwen2.5:3b)"]
    Rewrite --> Scope{"Scope Filter:<br/>Specific Doc vs All?"}
    
    Scope --> DenseSearch["Dense Retrieval<br/>(BAAI/bge-base-en-v1.5)"]
    Scope --> SparseSearch["Sparse Retrieval<br/>(Qdrant/bm25)"]
    
    DenseSearch --> RRFFusion["Reciprocal Rank<br/>Fusion (RRF)"]
    SparseSearch --> RRFFusion
    
    RRFFusion --> Rerank["Cross-Encoder Reranking<br/>(ms-marco-MiniLM-L-6-v2)"]
    Rerank --> LLMGen["LLM Generation<br/>(qwen2.5:3b)"]
    LLMGen --> OptionC["Citation Integrity Check"]
    OptionC --> Response["Streamed NDJSON Response"]

Query & Retrieval Implementation Details

Pre-Fusion Scope Filter: When a query is scoped to specific documents, the document ID filter is applied directly during the vector database dense and sparse prefetch operations inside Qdrant, preventing out-of-scope candidate chunks from entering the retrieval pool.
Selective Heuristic Rewriter: A local trigger check checks for pronouns or short queries. If triggered, the query is reformulated using a local LLM (qwen2.5:3b) into a standalone search query. If the check is not met or fails, it falls back cleanly to the user's original query.
Reciprocal Rank Fusion (RRF): Merges the dense vector search results and sparse BM25 scores from the Qdrant retrieval step to combine the lexical and semantic strengths of both models.
Option C Post-Processing: Citation integrity is enforced by expanding bracket ranges (e.g., [1-3] -> [1][2][3]), filtering out unreferenced source chunks, renumbering cited sources in contiguous ascending order, remapping inline citation numbers in the response text, and dropping orphan citations with logged warnings.

What Makes This Different?

Most RAG interfaces reference source files as a whole, leaving you to scroll through pages to find the relevant text. Ambrew implements exact-region grounding:

Ingestion Bounding Boxes: Document ingestion captures the physical bounding box coordinates of text segments, figures, tables, and formulas.
Highlighted Visual Overlay: When you click a citation in the chat panel, Ambrew opens a PDF viewer (via react-pdf) and renders a highlighted overlay directly over the source region.
Graceful Fallbacks: For document types or legacy files where bounding boxes are unavailable (e.g., text files, pre-migration uploads), the viewer falls back to opening the correct page.
Local-First Privacy: All vector embedding, database storage, and LLM reasoning run locally. No data leaves your machine unless you explicitly configure external providers.

Features

Ingestion

Multi-Format Parsing: Ingests PDF, DOCX, PPTX, XLSX, HTML, Markdown, TXT, EPUB, RTF, images, CSV, JSON, and raw URLs.
Document Ingestion Modes:
- Fast: Text-only ingestion (skips vision-language processing).
- Academic: Leverages a local vision-language model (qwen2.5vl:3b) to extract and describe figures, formulas, and photos.
- Auto: Automatically matches the best ingestion strategy.
Secure URL Ingestion: Pulls web pages directly. Includes SSRF protection (blocking private, loopback, and cloud metadata addresses, and validating HTTP redirects per-hop) and converts pages to clean Markdown before vectorization.

Querying & Citations

Document-Scoped Querying: Limit searches to a single target document or search your entire local library.
LaTeX & Markdown Chat Rendering: Seamlessly renders tables, code blocks, lists, and complex LaTeX mathematical expressions in both streaming and finalized states. Citations inside LaTeX blocks are parsed out to keep them clickable.
Strict Streaming Protocol: Streams response tokens and final source lists sequentially in NDJSON to guarantee layout stability.

Evidence & Trust

Option C Citation Integrity: Post-processor guarantees that the visible citations list matches what is referenced in the text, preventing crashes or orphan links.
Evidence Panel: Displays filename, page number, snippet, and relevance scores before opening files.
File-Serving Security: Resolves file requests using database-driven document ID lookup instead of exposing direct filesystem paths, with path-containment checks preventing path traversal.

Conversations

Persistent Sidebar: SQLite-backed navigation sidebar to switch between past research sessions.
Dynamic Titles: Automatically generates lightweight titles from the first message of a thread.
State Preservation: Keeps chat messages, sources, compliance warnings, and bounding box states across restarts.

Settings & Providers

Local & Cloud Generation: Uses local Ollama by default (qwen2.5:3b for generation/rewriting and qwen2.5vl:3b for VLM ingestion).
OpenAI-Compatible APIs: Connect to external endpoints (OpenAI, Cerebras, SiliconFlow, Groq, Gemini) with custom keys, models, and temperature limits.

Tech Stack

Component	Technology	Description
Backend	FastAPI, Python 3.13, Uvicorn	High-performance asynchronous API server
Frontend	React, Vite, TypeScript, Tailwind CSS	Local-first interactive UI (no heavy UI frameworks)
PDF Rendering	`react-pdf`	Document viewing and canvas-based bounding box highlights
Vector DB	Qdrant	Vector indexing and hybrid/scoped searches
Relational DB	SQLite	Relational database for settings and conversation history
LLM Engine	Ollama	Run-time model hosting for local text/VLM tasks
Retrieval Models	`BAAI/bge-base-en-v1.5` / FastEmbed BM25	Hybrid retrieval model configuration
Reranker	`cross-encoder/ms-marco-MiniLM-L-6-v2`	Cross-encoder matching for precise ranking

Screenshots & Demo

Interactive Workspace

The workspace features a persistent sidebar for managing conversation history, a central chat panel with inline citation grounding, and an interactive Evidence Panel on the right displaying document matches with relevance indicators.

Click-to-Highlight PDF Viewer

Clicking a cited source in the chat automatically opens the PDF viewer, jumps to the exact page, and highlights the source bounding box of the cited text or figures.

Document Library & Ingestion

Upload documents using local parsing modes (Fast, Academic, or Auto) or paste direct URLs for SSRF-protected web extraction. You can manage and delete files from your library in real-time.

Getting Started

Prerequisites

Python 3.13
Node.js (Vite supported version)
Ollama running locally with the following models pulled:
```
ollama pull qwen2.5:3b
ollama pull qwen2.5vl:3b
```

Local Setup

1. Configure the Environment

Copy .env.example to .env and adjust the variables if needed:

cp .env.example .env

2. Backend Installation & Run

# TODO: Verify if dependency installation should be run via uv (e.g. `uv sync`) or pip.
# The project contains a `pyproject.toml` and `uv.lock`.
# If using uv:
uv sync

# Activate the virtual environment
source .venv/bin/activate

# Start the FastAPI server
# TODO: Verify if the entrypoint is run via `python -m src.rag_backend.api.main` or via the pyproject script `ambrew-api`
uvicorn src.rag_backend.api.main:app --reload

3. Frontend Installation & Run

cd frontend
npm install
npm run dev

Open http://localhost:3000 in your browser.

Testing & Validation

Ambrew is a fully implemented and validated local-first research tool. The codebase is thoroughly covered by a suite of 222 automated tests passing across the document ingestion, retrieval, context assembly, and streaming layers to guarantee long-session reliability, security, and strict citation integrity.

License

Distributed under the MIT License. See LICENSE for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
docs		docs
frontend		frontend
scripts		scripts
src/rag_backend		src/rag_backend
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ambrew

Why This Exists

How It Works

1. Ingestion Pipeline

Ingestion Implementation Details

2. Query / Retrieval Pipeline

Query & Retrieval Implementation Details

What Makes This Different?

Features

Ingestion

Querying & Citations

Evidence & Trust

Conversations

Settings & Providers

Tech Stack

Screenshots & Demo

Interactive Workspace

Click-to-Highlight PDF Viewer

Document Library & Ingestion

Getting Started

Prerequisites

Local Setup

1. Configure the Environment

2. Backend Installation & Run

3. Frontend Installation & Run

Testing & Validation

License

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Ambrew

Why This Exists

How It Works

1. Ingestion Pipeline

Ingestion Implementation Details

2. Query / Retrieval Pipeline

Query & Retrieval Implementation Details

What Makes This Different?

Features

Ingestion

Querying & Citations

Evidence & Trust

Conversations

Settings & Providers

Tech Stack

Screenshots & Demo

Interactive Workspace

Click-to-Highlight PDF Viewer

Document Library & Ingestion

Getting Started

Prerequisites

Local Setup

1. Configure the Environment

2. Backend Installation & Run

3. Frontend Installation & Run

Testing & Validation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages