Sherlock RAG

A retrieval-augmented generation (RAG) application that lets you ask questions about the complete Sherlock Holmes canon.

How It Works

User Question
     │
     ▼
┌─────────────┐     ┌──────────────┐     ┌─────────────┐
│  Embed the  │────▶│  Vector      │────▶│  Top-K      │
│  question   │     │  similarity  │     │  relevant   │
│             │     │  search      │     │  passages   │
└─────────────┘     └──────────────┘     └──────┬──────┘
                                                │
                                                ▼
                                         ┌─────────────┐
                         Answer ◀────────│  LLM with   │
                         + Sources       │  context     │
                                         └─────────────┘

Ingestion: All 9 books are downloaded from Project Gutenberg, split, chunked, and embedded into a ChromaDB vector store.
Retrieval: When you ask a question, it's embedded and compared against stored chunks via cosine similarity. The top 15 most relevant passages are retrieved.
Generation: The retrieved passages are sent to an LLM as context along with your question.
Sources: Every answer includes citations showing which stories they're came from.

Quick Start

Prerequisites

Python 3.11+
An OpenAI API key (or any OpenAI-compatible API)

Setup

# Clone the repo
git clone https://github.com/C187/Sherlock-RAG.git
cd Sherlock-RAG

# Create a virtual environment
python -m venv venv
source venv/bin/activate 

# Install dependencies
pip install -r backend/requirements.txt

# Set up your API key
cp .env.example .env
# Edit .env and add your OPENAI_API_KEY

# Download the Sherlock Holmes texts
python download_texts.py

# Ingest texts
python backend/ingest.py

# Start
uvicorn backend.server:app --reload

Open http://localhost:8000 in your browser.

Using Local Models

Sherlock RAG works with any OpenAI-compatible API.

Project Structure

Sherlock-RAG/
├── backend/
│   ├── server.py          # FastAPI app with RAG query endpoint
│   ├── ingest.py          # Text chunking and vector store ingestion
│   └── requirements.txt
├── frontend/
│   ├── index.html         # Chat interface
│   ├── styles.css         # UI styles
│   ├── app.js             # Client logic
│   └── baker-street.jpg   # Background image
├── data/
│   ├── raw/               # Downloaded text files (gitignored)
│   └── chroma_db/         # Vector store (gitignored)
├── download_texts.py      # Fetches texts from Project Gutenberg
├── .env.example
└── README.md

Tech Stack

Vector Store: ChromaDB with sentence-transformer embeddings (MiniLM-L6-v2)
Backend: FastAPI (Python)
LLM: OpenAI API (default: gpt-4.1-nano, configurable for any OpenAI-compatible endpoint)
Frontend: Vanilla HTML/CSS/JS

API

`POST /api/query`

{
  "question": "Which cases involve poison?",
  "top_k": 15
}

Response:

{
  "answer": "Several cases in the Holmes canon involve poison...",
  "sources": [
    {
      "book": "A Study in Scarlet",
      "story": "THE LAURISTON GARDENS MYSTERY",
      "text": "..."
    }
  ]
}

`GET /api/health`

`GET /api/stats`

License

MIT License. The Sherlock Holmes texts are in the public domain.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
backend		backend
frontend		frontend
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
download_texts.py		download_texts.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sherlock RAG

How It Works

Quick Start

Prerequisites

Setup

Using Local Models

Project Structure

Tech Stack

API

`POST /api/query`

`GET /api/health`

`GET /api/stats`

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sherlock RAG

How It Works

Quick Start

Prerequisites

Setup

Using Local Models

Project Structure

Tech Stack

API

POST /api/query

GET /api/health

GET /api/stats

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`POST /api/query`

`GET /api/health`

`GET /api/stats`

Packages