RAG

Retrieval-Augmented Generation (RAG) pipeline for ingesting and querying PDF documents using ChromaDB and local language models.

Description

Processes PDF documents, generates local embeddings and stores them in ChromaDB. Supports natural language queries with result reranking via CrossEncoder.

Main flow:

run_ingestor.py → src/pdf_ingestor.py: ingests PDFs, generates embeddings and stores them in ChromaDB
prompting-chromadb-reranker.py: queries the vector store with CrossEncoder reranking
prompting-improved-direct-chromadb-context-collection-works-works.py: simplified query version

Requirements

Python 3.10+
ChromaDB installed locally

Setup

# 1. Copy and edit the config file
cp config.example.py config.py

# 2. Edit config.py with your local paths and LM Studio URL

# 3. Install dependencies
pip install -r requirements.txt

Usage

# Ingest PDFs
python run_ingestor.py

# Query with reranking
python prompting-chromadb-reranker.py

# Simplified query
python "prompting-improved-direct-chromadb-context-collection-works-works.py"

Structure

RAG/
├── src/
│   └── pdf_ingestor.py                                          # Core ingestion engine
├── run_ingestor.py                                              # Ingestion entry point
├── prompting-chromadb-reranker.py                               # Query with reranking
├── prompting-improved-direct-chromadb-context-collection-...py  # Simplified query
├── ingestion-multi-pdf-PyMuPDF-direct Chroma-testqueries-works.py  # Alternative ingestion
├── config.example.py                                            # Config template (copy to config.py)
└── requirements.txt

Status

PDF ingestion with batching and logging
Local embeddings with sentence-transformers
Query with CrossEncoder reranking
User interface
Support for other document formats

Notes

Embeddings generated with sentence-transformers/all-MiniLM-L6-v2
Vector store persisted in local ChromaDB (path configurable in config.py)
Local paths and LM Studio URL are configured in config.py (see config.example.py)
Iterative development scripts are preserved in mabaeyens/RAG-experiments (private)

🛠️ Development Workflow: Human-AI Collaboration

This project is the result of a strategic collaboration between human design and AI-assisted code generation.

Architecture & Logic: Fully defined by the author. This includes system structure, business rules, data flow, and implementation strategy.
Code Generation: The syntactic implementation and line-by-line code writing was performed by Claude Code, following precise and iterative instructions provided by the author.
Supervision & Refinement: All code was manually reviewed, tested, and adjusted to ensure quality, consistency, and compliance with project standards.

This approach demonstrates the ability to direct advanced AI tools to accelerate development without sacrificing creative control or technical quality.

📄 License

This project is licensed under the MIT License. You can find the full text in the LICENSE file.

Note on authorship: Although much of the source code was generated by an AI, the creative direction, architecture, and final integration are human work. Usage rights are granted under the terms of the MIT License.

🚀 Contributing

Feel free to fork this project!

If you find a bug, open an issue.
If you have an improvement, submit a Pull Request.
Feel free to use this code in your own projects!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG

Description

Requirements

Setup

Usage

Structure

Status

Notes

🛠️ Development Workflow: Human-AI Collaboration

📄 License

🚀 Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.example.py		config.example.py
ingestion-multi-pdf-PyMuPDF-direct Chroma-testqueries-works.py		ingestion-multi-pdf-PyMuPDF-direct Chroma-testqueries-works.py
prompting-chromadb-reranker.py		prompting-chromadb-reranker.py
prompting-improved-direct-chromadb-context-collection-works-works.py		prompting-improved-direct-chromadb-context-collection-works-works.py
requirements.txt		requirements.txt
run_ingestor.py		run_ingestor.py

Folders and files

Latest commit

History

Repository files navigation

RAG

Description

Requirements

Setup

Usage

Structure

Status

Notes

🛠️ Development Workflow: Human-AI Collaboration

📄 License

🚀 Contributing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages