🧠 RAG from Scratch

Retrieval-Augmented Generation implemented step by step in Python — no magic, just code you can understand and extend.

🤔 What is RAG and why does it matter?

Large Language Models (LLMs) like GPT-4 are powerful but have a critical limitation: they don't know what they don't know. Their knowledge is frozen at training time, and they hallucinate when asked about private or recent data.

RAG solves this by giving the LLM a memory it can look things up in — at inference time.

Instead of:

User question → LLM → Answer (possibly hallucinated)

RAG does:

User question → Search knowledge base → Inject relevant context → LLM → Grounded answer

This is the foundation of every serious enterprise AI application today.

🏗️ Architecture

┌─────────────────────────────────────────────────────┐
│                   INDEXING PIPELINE                  │
│                                                     │
│  Documents → Chunking → Embeddings → Vector Store   │
└─────────────────────────────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────┐
│                  RETRIEVAL PIPELINE                  │
│                                                     │
│  Query → Embed Query → Similarity Search → Top-K    │
└─────────────────────────────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────┐
│                 GENERATION PIPELINE                  │
│                                                     │
│  Context + Query → Prompt Template → LLM → Answer   │
└─────────────────────────────────────────────────────┘

📁 Repository Structure

rag-from-scratch/
│
├── 01_basic_rag/
│   ├── simple_rag.py          # Minimal RAG in ~50 lines
│   └── README.md              # Explanation of each step
│
├── 02_chunking_strategies/
│   ├── fixed_chunking.py      # Split by character count
│   ├── semantic_chunking.py   # Split by meaning
│   └── README.md
│
├── 03_embeddings/
│   ├── openai_embeddings.py   # Using OpenAI Ada
│   ├── local_embeddings.py    # Using HuggingFace (free)
│   └── README.md
│
├── 04_vector_stores/
│   ├── faiss_store.py         # Local vector store
│   ├── chroma_store.py        # ChromaDB integration
│   └── README.md
│
├── 05_advanced_rag/
│   ├── reranking.py           # Improve retrieval with reranking
│   ├── hyde.py                # Hypothetical Document Embeddings
│   └── README.md
│
├── notebooks/
│   └── rag_walkthrough.ipynb  # Full interactive tutorial
│
├── data/
│   └── sample_docs/           # Sample documents for testing
│
├── requirements.txt
├── .env.example
└── README.md

🚀 Quick Start

1. Clone and install

git clone https://github.com/SuarezPM/rag-from-scratch.git
cd rag-from-scratch
pip install -r requirements.txt

2. Set up your API key

cp .env.example .env
# Edit .env and add your OPENAI_API_KEY

3. Run the basic RAG example

python 01_basic_rag/simple_rag.py

🔑 Core Concepts Covered

Concept	What you'll learn
Document Loading	How to ingest PDFs, text files, web pages
Text Chunking	Why chunk size matters and how to choose it
Embeddings	How text becomes numbers that capture meaning
Vector Similarity	Cosine similarity, dot product — how retrieval works
Prompt Engineering	How to inject context so the LLM uses it correctly
Hallucination Prevention	Techniques to keep answers grounded in sources

💡 The Minimal RAG — 50 lines

Here's the core idea, stripped to its essence:

from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.chat_models import ChatOpenAI
from langchain.chains import RetrievalQA

# 1. LOAD your documents
loader = TextLoader("data/my_document.txt")
docs = loader.load()

# 2. CHUNK — split into digestible pieces
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = splitter.split_documents(docs)

# 3. EMBED — convert text to vectors
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(chunks, embeddings)

# 4. RETRIEVE + GENERATE
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=vectorstore.as_retriever(search_kwargs={"k": 3})
)

# 5. ASK
answer = qa_chain.run("What does the document say about X?")
print(answer)

📚 Learning Path

This repo is structured to go from zero to production-ready:

Start with 01_basic_rag/ — understand the full pipeline end to end
Then 02_chunking_strategies/ — because chunking affects quality more than people think
Then 03_embeddings/ — understand what embeddings actually are
Then 04_vector_stores/ — local vs hosted, tradeoffs
Finally 05_advanced_rag/ — techniques used in production systems

🛠️ Requirements

langchain>=0.1.0
langchain-openai>=0.0.5
faiss-cpu>=1.7.4
chromadb>=0.4.0
python-dotenv>=1.0.0
tiktoken>=0.5.0

👤 Author

Pablo Suarez — AI Software Engineer
Bridging the gap between data science research and production-ready AI systems.

📄 License

MIT — use it, learn from it, build on it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 RAG from Scratch

🤔 What is RAG and why does it matter?

🏗️ Architecture

📁 Repository Structure

🚀 Quick Start

1. Clone and install

2. Set up your API key

3. Run the basic RAG example

🔑 Core Concepts Covered

💡 The Minimal RAG — 50 lines

📚 Learning Path

🛠️ Requirements

👤 Author

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
01_basic_rag		01_basic_rag
02_chunking_strategies		02_chunking_strategies
03_embeddings		03_embeddings
04_vector_stores		04_vector_stores
data/sample_docs		data/sample_docs
.env.example		.env.example
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🧠 RAG from Scratch

🤔 What is RAG and why does it matter?

🏗️ Architecture

📁 Repository Structure

🚀 Quick Start

1. Clone and install

2. Set up your API key

3. Run the basic RAG example

🔑 Core Concepts Covered

💡 The Minimal RAG — 50 lines

📚 Learning Path

🛠️ Requirements

👤 Author

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages