Skip to content

andynicholson/rag-and-bone

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Rag and Bone Logo

Rag and Bone

Cloud hostable RAG server with Google Gemini, LangChain 1.1, and FastAPI

License: GPL v3 Python 3.13+ FastAPI LangChain Code style: black

A RAG (Retrieval-Augmented Generation) server designed for easy cloud deployment with comprehensive observability and professional tooling.

FeaturesQuick StartDeploymentAPI DocsContributing


✨ Features

🧠 Core RAG Capabilities

  • Modern LangChain 1.1 LCEL - Clean, composable chains using LangChain Expression Language
  • Google Gemini Integration - Powered by Gemini 2.0 Flash and text-embedding-004
  • Persistent Vector Storage - ChromaDB with persistent storage (survives restarts)
  • Advanced Retrieval - MMR search with score thresholds and configurable k
  • Source Attribution - Responses include source documents with metadata

🚀 Server Features

  • Comprehensive Observability - Structured JSON logging + Prometheus metrics
  • Health Checks - /health and /ready endpoints for orchestration
  • Rate Limiting - Configurable per-IP rate limiting (60 req/min default)
  • Error Handling - Global error handling with graceful degradation
  • Configuration Management - Environment-based config with pydantic-settings
  • Input Validation - Pydantic models with length limits and sanitization

🛠️ Developer Experience

  • FastAPI + LangServe - Automatic /invoke, /batch, and /stream endpoints
  • Interactive API Docs - Auto-generated OpenAPI documentation at /docs
  • Structured Logging - JSON logs with request tracing and correlation IDs
  • Metrics - Prometheus-compatible metrics endpoint
  • Testing Suite - Comprehensive test script for all endpoints
  • GCP Deployment - Automated deployment scripts for Google Cloud Platform

🚀 Quick Start

Prerequisites

Local Development

# Clone the repository
git clone https://github.com/andynicholson/rag-and-bone.git
cd rag-and-bone

# Set up virtual environment
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Configure environment
cp env.example .env
# Edit .env and add your GOOGLE_API_KEY

# Run the server
python3 app.py

🎉 Your server is now running at http://localhost:8000

Visit http://localhost:8000/docs for the interactive API documentation.

🧪 Test the API

# Run comprehensive tests
./test-api.sh local

# Or test individual endpoints:

# Query with source attribution
curl -X POST "http://localhost:8000/query" \
  -H "Content-Type: application/json" \
  -d '{"input": "What is the answer to this question?", "include_sources": true}'

# Health check
curl http://localhost:8000/health?detailed=true

# Prometheus metrics
curl http://localhost:8000/metrics

☁️ Deployment

GCP Deployment

Click to expand deployment instructions

Prerequisites

  • Google Cloud account with billing enabled
  • gcloud CLI installed and configured
  • SSH key configured

Deploy Steps

# 1. Edit deployment configuration
vim deploy-config.sh  # Set VM_NAME, ZONE, etc.

# 2. Create VM (first time only)
./recreate-vm.sh

# 3. Deploy application
./deploy.sh

# 4. Test remote deployment
./test-api.sh remote

View Logs

# Real-time logs
gcloud compute ssh rag-server --zone=australia-southeast1-a \
  -- "sudo journalctl -u rag-server -f"

Debug Issues

./debug.sh  # Comprehensive diagnostics

See GCP_DEPLOYMENT_GUIDE.md for detailed instructions.

Docker Deployment

Docker support (coming soon)
# Build image
docker build -t rag-and-bone .

# Run container
docker run -p 8000:8000 --env-file .env rag-and-bone

📁 Project Structure

rag-and-bone/
├── 📄 app.py                 # Main FastAPI application
├── ⚙️  config.py              # Configuration management
├── 📝 logging_config.py      # Structured logging
├── 🎯 prompts.py             # Versioned prompt templates
├── 📊 metrics.py             # Prometheus metrics
├── 🔧 middleware.py          # Rate limiting & error handling
├── 📥 ingest.py              # Document ingestion CLI
├── 🧪 load_sample_data.py   # Load sample docs for testing
├── 🔍 inspect_chroma.py     # ChromaDB inspection tool
├── 📋 requirements.txt       # Python dependencies
├── 🌍 env.example            # Environment variables template
├── 🚀 deploy.sh             # Deployment automation
├── 🛠️  recreate-vm.sh        # VM creation script
├── 🐛 debug.sh              # Debugging helper
├── 🧪 test-api.sh           # API testing script
├── ⚡ startup-script.sh     # GCP VM initialization
├── 🔧 rag-server.service    # Systemd service
├── 📖 GCP_DEPLOYMENT_GUIDE.md
└── 📜 LICENSE

🏗️ Architecture

RAG Pipeline Flow

┌─────────────┐     ┌──────────────┐     ┌───────────────┐
│  Document   │ --> │   Embedding  │ --> │   ChromaDB    │
│   Loader    │     │ (text-embed) │     │ Vector Store  │
└─────────────┘     └──────────────┘     └───────────────┘
                                                  │
                                                  ↓
┌─────────────┐     ┌──────────────┐     ┌───────────────┐
│  Response   │ <-- │  Gemini 2.0  │ <-- │   Retriever   │
│   (JSON)    │     │    Flash     │     │  (MMR Search) │
└─────────────┘     └──────────────┘     └───────────────┘

LangChain LCEL Chain

rag_chain = (
    {"context": retriever | format_docs, "input": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

🌐 API Endpoints

Health & Monitoring

Endpoint Method Description
/ GET Root endpoint with API information
/health GET Health check (add ?detailed=true for stats)
/metrics GET Prometheus metrics

Query Endpoints

Endpoint Method Description
/query POST Enhanced query with source attribution
/rag/invoke POST Single query (LangServe)
/rag/batch POST Multiple queries (LangServe)
/rag/stream POST Streaming response (LangServe)

Document Management

Endpoint Method Description
/ingest POST Ingest a document

Documentation

Endpoint Method Description
/docs GET Interactive OpenAPI documentation

⚙️ Configuration

Environment Variables

Variable Description Default
GOOGLE_API_KEY Google Gemini API key Required
ENVIRONMENT Environment name development
LOG_LEVEL Logging level INFO
RATE_LIMIT_PER_MINUTE Rate limit per IP 60
CHROMA_PERSIST_DIRECTORY ChromaDB storage path ./chroma_db

See env.example for all configuration options.

Models

  • Embeddings: models/text-embedding-004
  • LLM: gemini-2.0-flash

🛠️ Development

Code Formatting

# Format code
black app.py

# Sort imports
isort app.py

VS Code/Cursor will auto-format on save (configured in pyproject.toml).

Adding Documents

Load Sample Data (for testing):

# Load sample documents
python3 load_sample_data.py

# Reset collection and load sample data
python3 load_sample_data.py --reset

Using CLI:

python3 ingest.py /path/to/documents --name "My Docs"

Using API:

curl -X POST http://localhost:8000/ingest \
  -H "Content-Type: application/json" \
  -d '{"content": "Document text", "metadata": {"source": "manual"}}'

Inspecting ChromaDB

# Show statistics
python3 inspect_chroma.py stats

# List all documents
python3 inspect_chroma.py list

# Search for documents
python3 inspect_chroma.py search "your query"

# Filter by metadata
python3 inspect_chroma.py filter --field category --value benefits

🔒 Security Considerations

Rate Limiting

⚠️ The server includes in-memory rate limiting (60 requests/minute per IP by default). Note that this state resets on server restart. For production deployments, consider implementing persistent rate limiting with Redis.

CORS Configuration

⚠️ The default configuration allows all origins (CORS_ORIGINS=["*"]). For production:

CORS_ORIGINS=["https://yourdomain.com","https://app.yourdomain.com"]

Input Validation

API endpoints enforce maximum input lengths:

  • Query input: 2,000 characters
  • Document ingestion: 50,000 characters

Adjust these limits in app.py based on your requirements.


💰 Cost Estimates

GCP VM

Instance Type Monthly Cost Recommended For
e2-medium ~$24 Production workloads
e2-small ~$12 Development/testing

Gemini API

  • Free tier available with rate limits
  • Pay-per-use after free tier
  • See Google AI Pricing

🐛 Troubleshooting

Common Issues

Server won't start

# Check if port is already in use
lsof -i :8000

# Check logs
tail -f logs/rag-server.log

ChromaDB errors

# Inspect database
python3 inspect_chroma.py stats

# Reset database (WARNING: deletes all data)
rm -rf ./chroma_db

# Optionally load sample data for testing
python3 load_sample_data.py
# Or reset and load sample data
python3 load_sample_data.py --reset

Gemini API errors

  • Check your API key is set correctly in .env
  • Verify you haven't exceeded free tier limits
  • Check Google AI Status

See GCP_DEPLOYMENT_GUIDE.md for comprehensive troubleshooting.


📄 License

This project is dual-licensed:

Copyright

Copyright (C) 2026 A P Nicholson intothemist@gmail.com


🤝 Contributing

Contributions are welcome! Here's how:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'feat: add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Development Guidelines

  • Follow Black code style
  • Add tests for new features
  • Update documentation as needed
  • Use conventional commits (feat:, fix:, docs:, etc.)

🙏 Acknowledgments

Built with:


📬 Contact

A P Nicholson - intothemist@gmail.com

Project Link: https://github.com/andynicholson/rag-and-bone


⬆ Back to Top

Made with ❤️ and ☕

About

A cloud hostable RAG (Retrieval-Augmented Generation) server built with LangChain 1.1, Google Gemini, and FastAPI.

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors