Fujinami RAG Service

A hybrid Retrieval-Augmented Generation (RAG) system that combines Microsoft GraphRAG, Semantic Kernel, and LanceDB to answer questions over your document collections using locally-hosted Ollama models.

Features

Hybrid search — blends dense vector search (LanceDB) with graph-based retrieval (GraphRAG knowledge graph) for richer answers
Three query modes — vector, hybrid, and global (community-level summaries)
Multi-collection — manage independent document collections via a REST API
Rich document ingestion — powered by Docling; supports documents (.pdf, .docx, .xlsx, .pptx, .md, .tex, .html, .csv, and more), images (.png, .jpg, .jpeg, .tiff, .bmp, .webp), audio (.wav, .mp3, .m4a, .aac, .ogg, .flac), and video (.mp4, .avi, .mov); embedded pictures are described inline by a VLM via Docling's built-in picture-description pipeline
Streaming responses — optional token-by-token streaming on query endpoints
Built-in Web UI — zero-configuration browser interface served at /
Fully local — all LLM, embedding, and VLM calls go to Ollama; no cloud APIs required
RAGAS evaluation — score RAG responses against 10 built-in metrics (Faithfulness, Context Recall, Context Precision, Response Relevancy, Factual Correctness, Noise Sensitivity, Semantic Similarity, BLEU, ROUGE) using a locally-hosted LLM

Architecture Overview

Documents (.pdf .docx .xlsx .pptx .md .txt …)
Images   (.png .jpg .tiff .webp …)
Audio    (.wav .mp3 .m4a …)
Video    (.mp4 .avi .mov …)
        │
        ▼
  DocumentLoader  ──▶  Docling DocumentConverter
    (docling[asr])         ├─ OCR + table extraction
                           ├─ VLM picture description (llava:7b)
                           └─ export_to_markdown()
        │
        ▼
  ┌─────────────────────────────────┐
  │         Index Pipeline          │
  │                                 │
  │  GraphRAG CLI  ──▶  entities,   │
  │  (subprocess)       communities │
  │                     reports     │
  │                                 │
  │  SK Embeddings ──▶  LanceDB     │
  │  (bge-m3:567m)      chunks      │
  └─────────────────────────────────┘
        │
        ▼
  FastAPI server  ──▶  Web UI  /  REST API
        │
        ▼
  Query (vector | hybrid | global)
        │
        ▼
  llama3.2:3b  →  answer + source chunks

See docs/dataflow-ragService.md for full pipeline diagrams.

Requirements

Requirement	Version
Python	3.12 (3.13+ not supported by onnxruntime)
uv	latest
Ollama	running locally on port `11434`

Required Ollama models

Pull these before first use:

# Chat and query-time (local)
ollama pull llama3.2:3b
ollama pull bge-m3:567m

# Index-time embeddings and VLM for picture description (can be on a remote GPU server)
ollama pull bge-m3:567m
ollama pull llava:7b   # used by Docling's picture-description pipeline

Setup

1. Create a `.env` file

# Remote Ollama server used during indexing (embeddings + VLM)
OLLAMA_INDEX_URL=

# Local Ollama server used at query time
OLLAMA_CHAT_URL=

# Model names
CHAT_MODEL=llama3.2:3b
EMBEDDING_MODEL=bge-m3:567m
VLM_MODEL=llava:7b

# Optional: VLM HTTP timeout in seconds (default 180)
VLM_TIMEOUT=180

# Model used for RAGAS evaluation (needs large context window, e.g. gemma4:e4b)
RAGAS_MODEL=gemma4:e4b

# Optional: Ollama request timeout for RAGAS evaluation in seconds (default 1800)
OLLAMA_TIMEOUT=1800

If you only have one Ollama instance, set both OLLAMA_INDEX_URL and OLLAMA_CHAT_URL to the same URL.

2. Create the virtual environment and install dependencies

# Install uv (once)
pip install uv

# Create .venv and install dependencies
uv venv
uv pip install -r requirements.txt

3. Start the development server

uv run poe dev
# equivalent to: uvicorn api:app --reload

Open http://localhost:8000 in your browser.

Usage

Web UI

Navigate to http://localhost:8000 for the built-in interface. From there you can:

Create and manage collections
Upload documents
Trigger indexing (with optional entity type selection)
Run queries with vector, hybrid, or global mode

REST API

Interactive docs are available at http://localhost:8000/docs.

Collections

GET    /collections                    # list all collections
POST   /collections                    # create a collection  { "name": "my-docs" }
PATCH  /collections/{name}             # rename               { "new_name": "new-name" }
DELETE /collections/{name}             # delete collection and all its data

Documents

GET    /collections/{name}/documents              # list uploaded documents
POST   /collections/{name}/documents              # upload a file (multipart/form-data)
DELETE /collections/{name}/documents/{filename}   # delete a document

Indexing

POST /collections/{name}/index          # trigger indexing (async, returns task_id)
                                        # body (optional): { "entity_types": ["person", "org"] }
GET  /collections/{name}/index/{task_id} # poll indexing status
GET  /tasks                              # list all pending/running tasks

Querying

POST /collections/{name}/query

{
  "query": "What are the main roles in the system?",
  "method": "hybrid",
  "top_k": 5,
  "stream": false
}

Field	Values	Default
`method`	`vector` \| `hybrid` \| `global`	`hybrid`
`top_k`	integer	`5`
`stream`	`true` \| `false`	`false`

Response includes answer, sources (chunk excerpts with doc references), and graphrag_context.

RAGAS Evaluation

GET  /api/metrics                  # list available metrics and their required fields
POST /api/evaluate/single          # evaluate a single sample
POST /api/evaluate/batch           # evaluate a batch from a JSON or CSV file

Single evaluation (POST /api/evaluate/single):

{
  "user_input": "What are the main roles in the system?",
  "response": "The main roles are Master, User, and Viewer.",
  "retrieved_contexts": ["Masters can manage …", "Viewers can only read …"],
  "reference": "The system has three roles: Master, User, and Viewer.",
  "metrics": ["faithfulness", "llm_context_recall", "response_relevancy"]
}

Returns { "scores": { "faithfulness": 0.95, "llm_context_recall": 0.88, … } }.

Batch evaluation (POST /api/evaluate/batch):

Upload a .json (array of sample objects) or .csv file via multipart/form-data with a metrics form field (JSON-encoded list of metric IDs).

Available metric IDs:

ID	Display Name	Required Fields	LLM	Embeddings
`faithfulness`	Faithfulness	`user_input`, `response`, `retrieved_contexts`	✓
`llm_context_recall`	LLM Context Recall	`user_input`, `retrieved_contexts`, `reference`	✓
`llm_context_precision`	LLM Context Precision	`user_input`, `retrieved_contexts`, `reference`	✓
`context_precision_without_reference`	Context Precision (No Ref)	`user_input`, `response`, `retrieved_contexts`	✓
`response_relevancy`	Response Relevancy	`user_input`, `response`	✓	✓
`factual_correctness`	Factual Correctness	`response`, `reference`	✓
`noise_sensitivity`	Noise Sensitivity	`user_input`, `retrieved_contexts`, `response`, `reference`	✓
`semantic_similarity`	Semantic Similarity	`response`, `reference`		✓
`bleu_score`	BLEU Score	`response`, `reference`
`rouge_score`	ROUGE Score	`response`, `reference`

Project Structure

Fujinami/
├── .env                        # environment variables (create this)
├── python/
│   ├── api.py                  # FastAPI application and all HTTP endpoints
│   ├── ragService.py           # RagService: indexing + search logic
│   ├── document_loader.py      # Docling-based loader; converts all supported formats to markdown
│   ├── ragas_runner.py         # RAGAS metric registry and async evaluation runner
│   ├── models.py               # Pydantic request/response schemas
│   ├── install_dependency.py   # Dependency installer script
│   ├── pyproject.toml          # Project metadata and poe tasks
│   ├── static/
│   │   └── index.html          # Single-page Web UI
│   ├── data/                   # Uploaded source documents (per collection)
│   └── ragdata/                # GraphRAG artifacts + LanceDB vector store (per collection)
└── docs/
    └── dataflow-ragService.md  # Detailed pipeline and data-flow documentation

Query Modes

Mode	How it works	Best for
`vector`	Dense cosine similarity over LanceDB chunk embeddings	Precise factual lookups
`hybrid`	Vector search + GraphRAG local search combined	General question answering
`global`	GraphRAG community-level summary search	Broad thematic / cross-document questions

Entity Types

When triggering indexing you can pass a list of entity types to tune the GraphRAG knowledge graph extraction:

organization  person  geo  event  concept  technology  product  process  system

Omitting entity_types uses the GraphRAG defaults.

Error Handling

Condition	Behaviour
Docling models not downloaded	First call to `DocumentConverter` triggers automatic download (~1 GB layout/OCR models); bake into Docker image with `RUN python -c "from docling.document_converter import DocumentConverter; DocumentConverter()"`
VLM picture description fails or times out	Warning logged by Docling; image rendered as placeholder; indexing continues
Unsupported file extension	File rejected at upload with HTTP 422
`graphrag index` subprocess fails	Indexing task transitions to `error`; detail message returned
Ollama server unreachable	HTTP 500 propagated to API caller

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.github		.github
.vscode		.vscode
data		data
docs		docs
graph_engine		graph_engine
indexer		indexer
static		static
.gitattributes		.gitattributes
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
README.md		README.md
__init__.py		__init__.py
api.py		api.py
document_loader.py		document_loader.py
models.py		models.py
pyproject.toml		pyproject.toml
ragService.py		ragService.py
ragas_runner.py		ragas_runner.py
requirements.txt		requirements.txt
retriever.py		retriever.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fujinami RAG Service

Features

Architecture Overview

Requirements

Required Ollama models

Setup

1. Create a `.env` file

2. Create the virtual environment and install dependencies

3. Start the development server

Usage

Web UI

REST API

Collections

Documents

Indexing

Querying

RAGAS Evaluation

Project Structure

Query Modes

Entity Types

Error Handling

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Fujinami RAG Service

Features

Architecture Overview

Requirements

Required Ollama models

Setup

1. Create a .env file

2. Create the virtual environment and install dependencies

3. Start the development server

Usage

Web UI

REST API

Collections

Documents

Indexing

Querying

RAGAS Evaluation

Project Structure

Query Modes

Entity Types

Error Handling

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. Create a `.env` file

Packages