Sado: RAGAS Evaluation Server

A FastAPI backend with a browser-based UI for evaluating RAG (Retrieval-Augmented Generation) outputs using RAGAS metrics. All LLM and embedding inference runs locally via Ollama.

Features

Single evaluation — score one RAG sample interactively through the browser UI
Batch evaluation — upload a .json or .csv file to score many samples at once
10 built-in metrics — Faithfulness, Context Recall, Context Precision, Response Relevancy, Factual Correctness, Noise Sensitivity, Semantic Similarity, BLEU, ROUGE, and more
100% local — no external API keys required; everything runs through Ollama
Zero-build frontend — single self-contained static/index.html with no framework or build step

Prerequisites

Python 3.11 or 3.12
Ollama running locally with at least one LLM and one embedding model pulled

Setup

1. Clone and install dependencies

git clone <repo-url>
cd Sado
python install_dependency.py

2. Configure environment

Create a .env file in the project root:

OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_LLM_MODEL=llama3.2
OLLAMA_EMBED_MODEL=nomic-embed-text

Variable	Description
`OLLAMA_BASE_URL`	Base URL of your Ollama instance
`OLLAMA_LLM_MODEL`	Model name for LLM-based metrics (e.g. `llama3.2`, `qwen3`)
`OLLAMA_EMBED_MODEL`	Model name for embedding-based metrics (e.g. `nomic-embed-text`)
`OLLAMA_NUM_CTX`	(optional) Context window size — default `8192`
`OLLAMA_MAX_TOKENS`	(optional) Max tokens for LLM responses — derived from `OLLAMA_NUM_CTX`

3. Start the server

uvicorn server:app --reload --port 8040

Open http://localhost:8040 in your browser.

Usage

Single Evaluation

Fill in the fields in the Single tab:

Field	Description
`user_input`	The original question or query
`response`	The answer generated by your RAG system
`retrieved_contexts`	One context chunk per line
`reference`	Ground-truth answer (required by some metrics)

Select one or more metrics and click Evaluate.

Batch Evaluation

Upload a file in the Batch tab. Supported formats:

JSON — array of objects:

[
  {
    "user_input": "What is the capital of France?",
    "response": "Paris",
    "retrieved_contexts": ["France is a country in Europe. Its capital is Paris."],
    "reference": "Paris"
  }
]

CSV — columns matching field names; retrieved_contexts must be a JSON array string:

user_input,response,retrieved_contexts,reference
"What is the capital of France?","Paris","[""France is a country. Its capital is Paris.""]","Paris"

Available Metrics

Metric	Required Fields	Needs LLM	Needs Embeddings
Faithfulness	`user_input`, `response`, `retrieved_contexts`	Yes	No
LLM Context Recall	`user_input`, `retrieved_contexts`, `reference`	Yes	No
LLM Context Precision	`user_input`, `retrieved_contexts`, `reference`	Yes	No
Context Precision (No Reference)	`user_input`, `response`, `retrieved_contexts`	Yes	No
Response Relevancy	`user_input`, `response`	Yes	Yes
Factual Correctness	`response`, `reference`	Yes	No
Noise Sensitivity	`user_input`, `retrieved_contexts`, `response`, `reference`	Yes	No
Semantic Similarity	`response`, `reference`	No	Yes
BLEU Score	`response`, `reference`	No	No
ROUGE Score	`response`, `reference`	No	No

API Reference

Endpoint	Method	Description
`/api/metrics`	`GET`	List all available metrics and their metadata
`/api/evaluate/single`	`POST`	Evaluate a single RAG sample (JSON body)
`/api/evaluate/batch`	`POST`	Evaluate a file of samples (multipart form)

`POST /api/evaluate/single`

{
  "user_input": "...",
  "response": "...",
  "retrieved_contexts": ["..."],
  "reference": "...",
  "metrics": ["faithfulness", "bleu_score"]
}

Returns:

{
  "scores": {
    "faithfulness": 0.85,
    "bleu_score": 0.42
  }
}

Docker

docker build -t ragas-server .
docker run -p 8040:8040 --env-file .env ragas-server

Note: The container needs network access to your Ollama instance. If Ollama runs on the host, use host.docker.internal as the hostname on Mac/Windows, or --network host on Linux.

Project Structure

server.py               FastAPI app, REST endpoints, static file serving
ragas_runner.py         Ollama LLM/embedding setup, metric registry, evaluate()
static/index.html       Single-page browser UI
install_dependency.py   Installs all Python dependencies
Dockerfile              Container image definition
.env                    Runtime config (gitignored)

Adding a New Metric

Import the metric class from ragas.metrics.collections in ragas_runner.py.
Add an entry to METRIC_REGISTRY with required_fields, needs_llm, needs_embedding, and cls.
No changes needed in server.py or index.html — both pick it up automatically.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github		.github
static		static
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
install_dependency.py		install_dependency.py
ragas_runner.py		ragas_runner.py
server.py		server.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sado: RAGAS Evaluation Server

Features

Prerequisites

Setup

1. Clone and install dependencies

2. Configure environment

3. Start the server

Usage

Single Evaluation

Batch Evaluation

Available Metrics

API Reference

`POST /api/evaluate/single`

Docker

Project Structure

Adding a New Metric

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sado: RAGAS Evaluation Server

Features

Prerequisites

Setup

1. Clone and install dependencies

2. Configure environment

3. Start the server

Usage

Single Evaluation

Batch Evaluation

Available Metrics

API Reference

POST /api/evaluate/single

Docker

Project Structure

Adding a New Metric

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`POST /api/evaluate/single`

Packages