Rag and Bone

Cloud hostable RAG server with Google Gemini, LangChain 1.1, and FastAPI

A RAG (Retrieval-Augmented Generation) server designed for easy cloud deployment with comprehensive observability and professional tooling.

Features • Quick Start • Deployment • API Docs • Contributing

✨ Features

🧠 Core RAG Capabilities

Modern LangChain 1.1 LCEL - Clean, composable chains using LangChain Expression Language
Google Gemini Integration - Powered by Gemini 2.0 Flash and text-embedding-004
Persistent Vector Storage - ChromaDB with persistent storage (survives restarts)
Advanced Retrieval - MMR search with score thresholds and configurable k
Source Attribution - Responses include source documents with metadata

🚀 Server Features

Comprehensive Observability - Structured JSON logging + Prometheus metrics
Health Checks - /health and /ready endpoints for orchestration
Rate Limiting - Configurable per-IP rate limiting (60 req/min default)
Error Handling - Global error handling with graceful degradation
Configuration Management - Environment-based config with pydantic-settings
Input Validation - Pydantic models with length limits and sanitization

🛠️ Developer Experience

FastAPI + LangServe - Automatic /invoke, /batch, and /stream endpoints
Interactive API Docs - Auto-generated OpenAPI documentation at /docs
Structured Logging - JSON logs with request tracing and correlation IDs
Metrics - Prometheus-compatible metrics endpoint
Testing Suite - Comprehensive test script for all endpoints
GCP Deployment - Automated deployment scripts for Google Cloud Platform

🚀 Quick Start

Prerequisites

Python 3.13+
Google Gemini API key (Get one here)

Local Development

# Clone the repository
git clone https://github.com/andynicholson/rag-and-bone.git
cd rag-and-bone

# Set up virtual environment
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Configure environment
cp env.example .env
# Edit .env and add your GOOGLE_API_KEY

# Run the server
python3 app.py

🎉 Your server is now running at http://localhost:8000

Visit http://localhost:8000/docs for the interactive API documentation.

🧪 Test the API

# Run comprehensive tests
./test-api.sh local

# Or test individual endpoints:

# Query with source attribution
curl -X POST "http://localhost:8000/query" \
  -H "Content-Type: application/json" \
  -d '{"input": "What is the answer to this question?", "include_sources": true}'

# Health check
curl http://localhost:8000/health?detailed=true

# Prometheus metrics
curl http://localhost:8000/metrics

☁️ Deployment

GCP Deployment

Click to expand deployment instructions

Prerequisites

Google Cloud account with billing enabled
gcloud CLI installed and configured
SSH key configured

Deploy Steps

# 1. Edit deployment configuration
vim deploy-config.sh  # Set VM_NAME, ZONE, etc.

# 2. Create VM (first time only)
./recreate-vm.sh

# 3. Deploy application
./deploy.sh

# 4. Test remote deployment
./test-api.sh remote

View Logs

# Real-time logs
gcloud compute ssh rag-server --zone=australia-southeast1-a \
  -- "sudo journalctl -u rag-server -f"

Debug Issues

./debug.sh  # Comprehensive diagnostics

See GCP_DEPLOYMENT_GUIDE.md for detailed instructions.

Docker Deployment

Docker support (coming soon)

# Build image
docker build -t rag-and-bone .

# Run container
docker run -p 8000:8000 --env-file .env rag-and-bone

📁 Project Structure

rag-and-bone/
├── 📄 app.py                 # Main FastAPI application
├── ⚙️  config.py              # Configuration management
├── 📝 logging_config.py      # Structured logging
├── 🎯 prompts.py             # Versioned prompt templates
├── 📊 metrics.py             # Prometheus metrics
├── 🔧 middleware.py          # Rate limiting & error handling
├── 📥 ingest.py              # Document ingestion CLI
├── 🧪 load_sample_data.py   # Load sample docs for testing
├── 🔍 inspect_chroma.py     # ChromaDB inspection tool
├── 📋 requirements.txt       # Python dependencies
├── 🌍 env.example            # Environment variables template
├── 🚀 deploy.sh             # Deployment automation
├── 🛠️  recreate-vm.sh        # VM creation script
├── 🐛 debug.sh              # Debugging helper
├── 🧪 test-api.sh           # API testing script
├── ⚡ startup-script.sh     # GCP VM initialization
├── 🔧 rag-server.service    # Systemd service
├── 📖 GCP_DEPLOYMENT_GUIDE.md
└── 📜 LICENSE

🏗️ Architecture

RAG Pipeline Flow

┌─────────────┐     ┌──────────────┐     ┌───────────────┐
│  Document   │ --> │   Embedding  │ --> │   ChromaDB    │
│   Loader    │     │ (text-embed) │     │ Vector Store  │
└─────────────┘     └──────────────┘     └───────────────┘
                                                  │
                                                  ↓
┌─────────────┐     ┌──────────────┐     ┌───────────────┐
│  Response   │ <-- │  Gemini 2.0  │ <-- │   Retriever   │
│   (JSON)    │     │    Flash     │     │  (MMR Search) │
└─────────────┘     └──────────────┘     └───────────────┘

LangChain LCEL Chain

rag_chain = (
    {"context": retriever | format_docs, "input": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

🌐 API Endpoints

Health & Monitoring

Endpoint	Method	Description
`/`	GET	Root endpoint with API information
`/health`	GET	Health check (add `?detailed=true` for stats)
`/metrics`	GET	Prometheus metrics

Query Endpoints

Endpoint	Method	Description
`/query`	POST	Enhanced query with source attribution
`/rag/invoke`	POST	Single query (LangServe)
`/rag/batch`	POST	Multiple queries (LangServe)
`/rag/stream`	POST	Streaming response (LangServe)

Document Management

Endpoint	Method	Description
`/ingest`	POST	Ingest a document

Documentation

Endpoint	Method	Description
`/docs`	GET	Interactive OpenAPI documentation

⚙️ Configuration

Environment Variables

Variable	Description	Default
`GOOGLE_API_KEY`	Google Gemini API key	Required
`ENVIRONMENT`	Environment name	`development`
`LOG_LEVEL`	Logging level	`INFO`
`RATE_LIMIT_PER_MINUTE`	Rate limit per IP	`60`
`CHROMA_PERSIST_DIRECTORY`	ChromaDB storage path	`./chroma_db`

See env.example for all configuration options.

Models

Embeddings: models/text-embedding-004
LLM: gemini-2.0-flash

🛠️ Development

Code Formatting

# Format code
black app.py

# Sort imports
isort app.py

VS Code/Cursor will auto-format on save (configured in pyproject.toml).

Adding Documents

Load Sample Data (for testing):

# Load sample documents
python3 load_sample_data.py

# Reset collection and load sample data
python3 load_sample_data.py --reset

Using CLI:

python3 ingest.py /path/to/documents --name "My Docs"

Using API:

curl -X POST http://localhost:8000/ingest \
  -H "Content-Type: application/json" \
  -d '{"content": "Document text", "metadata": {"source": "manual"}}'

Inspecting ChromaDB

# Show statistics
python3 inspect_chroma.py stats

# List all documents
python3 inspect_chroma.py list

# Search for documents
python3 inspect_chroma.py search "your query"

# Filter by metadata
python3 inspect_chroma.py filter --field category --value benefits

🔒 Security Considerations

Rate Limiting

⚠️ The server includes in-memory rate limiting (60 requests/minute per IP by default). Note that this state resets on server restart. For production deployments, consider implementing persistent rate limiting with Redis.

CORS Configuration

⚠️ The default configuration allows all origins (CORS_ORIGINS=["*"]). For production:

CORS_ORIGINS=["https://yourdomain.com","https://app.yourdomain.com"]

Input Validation

API endpoints enforce maximum input lengths:

Query input: 2,000 characters
Document ingestion: 50,000 characters

Adjust these limits in app.py based on your requirements.

💰 Cost Estimates

GCP VM

Instance Type	Monthly Cost	Recommended For
`e2-medium`	~$24	Production workloads
`e2-small`	~$12	Development/testing

Gemini API

Free tier available with rate limits
Pay-per-use after free tier
See Google AI Pricing

🐛 Troubleshooting

Common Issues

Server won't start

# Check if port is already in use
lsof -i :8000

# Check logs
tail -f logs/rag-server.log

ChromaDB errors

# Inspect database
python3 inspect_chroma.py stats

# Reset database (WARNING: deletes all data)
rm -rf ./chroma_db

# Optionally load sample data for testing
python3 load_sample_data.py
# Or reset and load sample data
python3 load_sample_data.py --reset

Gemini API errors

Check your API key is set correctly in .env
Verify you haven't exceeded free tier limits
Check Google AI Status

See GCP_DEPLOYMENT_GUIDE.md for comprehensive troubleshooting.

📄 License

This project is dual-licensed:

Open Source: GNU General Public License v3.0 - Free for open source projects
Commercial: Contact intothemist@gmail.com for commercial licensing options

Copyright

🤝 Contributing

Contributions are welcome! Here's how:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'feat: add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Development Guidelines

Follow Black code style
Add tests for new features
Update documentation as needed
Use conventional commits (feat:, fix:, docs:, etc.)

🙏 Acknowledgments

Built with:

LangChain - RAG framework
FastAPI - Web framework
Google Gemini - LLM and embeddings
ChromaDB - Vector database

📬 Contact

A P Nicholson - intothemist@gmail.com

Project Link: https://github.com/andynicholson/rag-and-bone

⬆ Back to Top

Made with ❤️ and ☕

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github		.github
.gcloudignore		.gcloudignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
GCP_DEPLOYMENT_GUIDE.md		GCP_DEPLOYMENT_GUIDE.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
app.py		app.py
config.py		config.py
debug.sh		debug.sh
deploy-config.sh		deploy-config.sh
deploy.sh		deploy.sh
env.example		env.example
icon.svg		icon.svg
ingest.py		ingest.py
inspect_chroma.py		inspect_chroma.py
load_sample_data.py		load_sample_data.py
logging_config.py		logging_config.py
metrics.py		metrics.py
middleware.py		middleware.py
prompts.py		prompts.py
pyproject.toml		pyproject.toml
rag-server.service		rag-server.service
recreate-vm.sh		recreate-vm.sh
requirements.txt		requirements.txt
startup-script.sh		startup-script.sh
test-api.sh		test-api.sh

Folders and files

Latest commit

History

Repository files navigation

Rag and Bone

✨ Features

🧠 Core RAG Capabilities

🚀 Server Features

🛠️ Developer Experience

🚀 Quick Start

Prerequisites

Local Development

🧪 Test the API

☁️ Deployment

GCP Deployment

Prerequisites

Deploy Steps

View Logs

Debug Issues

Docker Deployment

📁 Project Structure

🏗️ Architecture

RAG Pipeline Flow

LangChain LCEL Chain

🌐 API Endpoints

Health & Monitoring

Query Endpoints

Document Management

Documentation

⚙️ Configuration

Environment Variables

Models

🛠️ Development

Code Formatting

Adding Documents

Inspecting ChromaDB

🔒 Security Considerations

Rate Limiting

CORS Configuration

Input Validation

💰 Cost Estimates

GCP VM

Gemini API

🐛 Troubleshooting

Server won't start

ChromaDB errors

Gemini API errors

📄 License

Copyright

🤝 Contributing

Development Guidelines

🙏 Acknowledgments

📬 Contact

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages