Simplify AI

Full-Stack SaaS RAG Platform — Secure document retrieval, conversational QA, and semantic search.

📌 Overview

Simplify AI is a decoupled full-stack application that provides users with secure workspaces to upload, index, and query private documents (PDF, DOCX, TXT, MD). The system generates vector embeddings, stores them in a spatial index, and uses grounded prompts to stream contextual answers with page-level citations in real time.

⚡ Live Deployments

Production Web App: https://simplify-ai-lilac.vercel.app/
API Health Endpoint: https://saas-rag-production.up.railway.app/api/v1/health (FastAPI backend deployed on Railway)

⚠️ The Problem

Many Retrieval-Augmented Generation (RAG) tools are built as monolithic scripts or generic wrappers around API endpoints. These designs suffer from:

Gateway Timeouts: Blocking API threads while parsing large documents or creating hundreds of embeddings.
Weak Security: Hardcoded single-token structures that risk session hijacking if tokens are stolen.
High Latency: Waiting for complete LLM inference blocks to finish before returning responses to the user.
No Contextual Grounding: Generating generic replies that lack verifiable references, leading to AI hallucinations.

💡 The Solution

Simplify AI addresses these issues using a production-adjacent, decoupled architecture:

Asynchronous Processing: Immediate API responses with processing flags, delegating extraction and embedding tasks to background execution queues.
Multi-Stage Auth Rotation: Short-lived JWT access tokens paired with long-lived rotated refresh tokens and revocation lists.
Low-Latency Stream Injection: Progressive token delivery using Server-Sent Events (SSE) and native browser streams.
Grounded Verification: A citation metadata pipeline that links every generated block to verified source page numbers and text excerpts.

✨ Features

SaaS-Ready Security: Email verification via SMTP One-Time Passwords (OTP), and JWT access/refresh token rotation with signature-based MongoDB revocation.
Multi-Format Parsing: Automatic structure detection and content extraction for .pdf, .docx, .txt, and .md files.
Semantic Vector Indexes: Chunks parsed text using character-overlapping splitters, converts chunks to 3072-dimensional embeddings via Gemini, and indexes them in Pinecone namespaces.
SSE Token Streaming: Pushes generator-driven chunk deltas from FastAPI to Next.js using StreamingResponse and fetches browser-side stream readers.
Grounded Citations: Maps similarity search nodes back to database-backed chunk excerpts, rendering clickable citation cards.
Responsive Dark Mode: Minimalist interface built on Tailwind CSS, Radix UI primitives, and state managed by Zustand.

🏗️ Architecture

graph TD
    %% Service boundaries
    subgraph Client [Client UI - Next.js / Vercel]
        Next[Next.js App Router]
        Zustand[Zustand State Store]
    end

    subgraph API [API Service - FastAPI / Railway]
        Fast[FastAPI Web Server]
        Auth[JWT & OTP Security Manager]
        RAG[RAG Orchestration Service]
    end

    subgraph Infrastructure [Data Infrastructure]
        Mongo[(MongoDB Atlas - Metadata & Revocations)]
        Supa[(Supabase Storage - Binary Source Files)]
        Pine[(Pinecone Vector Database - Vector Embeddings)]
        Gemini[Google Gemini API - Embeddings & Inference]
    end

    %% Network flows
    Next -->|HTTPS REST Request| Fast
    Next -->|Fetch SSE stream| Fast
    Fast --> Auth
    Fast --> RAG
    
    Auth -->|Read/Write User States| Mongo
    RAG -->|Upload Source File| Supa
    RAG -->|Embed Text Chunks| Gemini
    RAG -->|Upsert/Query Vectors| Pine
    RAG -->|Read/Write Chat Context| Mongo
    
    classDef default fill:#f9f9f9,stroke:#333,stroke-width:1px;
    classDef active fill:#e1f5fe,stroke:#01579b,stroke-width:2px;
    
    class Next,Fast,RAG active;

🧬 RAG Pipeline Flow

[ Upload File ]
       │
       ▼
[ Extension & Size Validation ]
       │
       ▼
[ Upload Binary to Supabase Storage ]
       │
       ▼
[ Parse Extract Text (pypdf/docx) ]
       │
       ▼
[ LangChain Text Chunking ] (800 chars, 200 overlap)
       │
       ▼
[ Gemini 3072-D Embeddings ] (gemini-embedding-001)
       │
       ▼
┌──────┴──────────────────────────┐
│                                 ▼
▼                         [ Upsert to Pinecone ]
[ Store Text Excerpts & Metadata ]  (Namespace isolation)
(MongoDB Atlas mappings)

Ingest & Validate: The client posts a file. The backend checks format restrictions and enforces a 25MB ceiling.
Persist Source: The raw binary is uploaded to Supabase Storage, segregating database records from raw binaries.
Extract & Segment: A background worker parses text, creating overlapping segments via character-based splitters to maintain cross-chunk context.
Vector Generation: Text segments are sent to models/gemini-embedding-001 to generate dense vector indices.
Index & Database Mappings:
- Embeddings are upserted into Pinecone within the user's isolated namespace.
- Excerpts, page numbers, offsets, and document links are indexed in MongoDB Atlas.
Semantic Similarity Retrieval: When a query is received, the backend generates an embedding of the query, retrieves the top $k=8$ matching nodes from Pinecone, and reads the original text chunks from MongoDB.
Grounded Generation: The server feeds the retrieved context and system instructions into gemini-2.5-flash, forcing it to respond only with the provided context.
SSE Streams: FastAPI streams delta chunks to the client, while Next.js parses the tokens, rendering interactive citation cards matching the source excerpts.

📸 Screenshots

Document Library	Conversational RAG Panel

Workspace Settings (SaaS Verification)

🛠️ Tech Stack

Layer	Technology	Role
Frontend UI	Next.js 15, React 19, TypeScript, Tailwind CSS	App Router client layout; state stores managed by Zustand.
Backend API	FastAPI 0.115, Python 3.11+, Uvicorn	Async route handlers, validation schemas (Pydantic v2).
Database	MongoDB Atlas (via `motor` driver)	Mappings, chat logs, user schemas, and token denylists.
Vector DB	Pinecone	Dense vector indexes and namespace queries.
Storage	Supabase Storage	Secure bucket hosting for raw binaries.
AI Models	Google Gemini API	`gemini-2.5-flash` (inference), `gemini-embedding-001` (embeddings).
Mail	SMTP Service	User signup and email OTP delivery.

🚀 Local Setup

Prerequisites

Node.js 22+
Python 3.11+
MongoDB, Pinecone, Supabase, and Google Gemini API keys.

1. Backend Setup

Navigate to the backend/ directory:
```
cd backend
```

Create and activate a Python virtual environment:

python -m venv .venv
# Windows:
.venv\Scripts\activate
# macOS/Linux:
source .venv/bin/activate

Install dependencies:
```
pip install -r requirements.txt
```

Copy the environment variables template and configure it:

cp .env.example .env
# Edit .env with your private credentials

Run the web server:

uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

2. Frontend Setup

Navigate to the frontend/ directory:
```
cd ../frontend
```
Install package dependencies:
```
npm install
```

Copy environment configurations:

cp .env.example .env.local
# Edit with Next.js configurations

Start the Next.js development server:
```
npm run dev
```
Open your browser and navigate to http://localhost:3000.

📋 Environment Variables

Backend Setup (`backend/.env`)

APP_ENV=development
DEBUG=true
API_V1_PREFIX=/api/v1
CORS_ORIGINS=http://localhost:3000

# Relational & NoSQL Metadata
MONGODB_URI=mongodb+srv://...
MONGODB_DB_NAME=simplify

# Vector Configurations
VECTOR_STORE_PROVIDER=pinecone
PINECONE_API_KEY=your-pinecone-key
PINECONE_INDEX_NAME=simplify-documents
PINECONE_NAMESPACE=simplify
PINECONE_DIMENSION=3072

# Storage & AI Mappings
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_SERVICE_KEY=your-service-role-key
SUPABASE_BUCKET=documents
GEMINI_API_KEY=your-gemini-key
GEMINI_CHAT_MODEL=models/gemini-2.5-flash
GEMINI_EMBEDDING_MODEL=models/gemini-embedding-001

# Security & Mail
JWT_SECRET_KEY=generate-a-strong-random-key
JWT_ALGORITHM=HS256
ACCESS_TOKEN_EXPIRE_MINUTES=30
REFRESH_TOKEN_EXPIRE_DAYS=7
SMTP_HOST=smtp.gmail.com
SMTP_PORT=587
SMTP_USERNAME=your-email@gmail.com
SMTP_PASSWORD=your-app-password
SMTP_FROM_EMAIL=no-reply@simplify.ai
SMTP_FROM_NAME="Simplify AI"

Frontend Setup (`frontend/.env.local`)

NEXT_PUBLIC_API_URL=http://127.0.0.1:8000
NEXT_PUBLIC_MAX_DOCUMENTS_PER_CHAT=8

📁 Project Structure

simplify-ai/
├── backend/
│   ├── app/
│   │   ├── api/             # REST routing entry points
│   │   ├── core/            # Configs, authorization middlewares
│   │   ├── db/              # MongoDB clients, session setups
│   │   ├── models/          # Data schemas and entities
│   │   ├── repositories/    # Database abstraction queries
│   │   ├── services/        # RAG pipeline, parsing, JWT managers
│   │   └── main.py          # FastAPI application initialization
│   ├── requirements.txt
│   └── Dockerfile
└── frontend/
    ├── app/                 # Next.js App Router (Layouts/Routes)
    ├── components/          # Reusable Radix UI & Citation widgets
    ├── lib/                 # Auth context, state store (Zustand), API fetchers
    ├── package.json
    └── tailwind.config.ts

🔑 Engineering Decisions

1. Multi-Stage Token Rotation

Problem: In single-token API designs, if a client-side JWT is stolen, malicious actors gain indefinite system access.
Solution: Implemented access/refresh token rotation. Access tokens are short-lived (30 minutes), while refresh tokens exist for 7 days and are rotated upon every validation cycle.
Security Enforcement: When a user logs out, the access token signature is cached in MongoDB under a denylist collection with a TTL index matching its expiration timestamp. This prevents session hijack attempts using discarded tokens.

2. Segregation of Data Boundaries

To keep database operations fast, datastores are segregated based on operational tasks:
- Supabase Storage hosts the heavy raw binary files (PDFs, DOCX).
- Pinecone is used exclusively for vector similarity search, preventing CPU-intensive calculations on relational or document servers.
- MongoDB Atlas stores metadata references and text chunks, facilitating high-speed citation retrieval.

🔒 Security Specifications

Role-Based Access Control (RBAC): Custom middlewares check token scopes, enforcing document deletions to administrative roles.
Secure Cookies: Auth tokens are transferred via HttpOnly, Secure, and SameSite cookies, shielding the application from XSS vector vulnerabilities.
Email Verification: Standard SMTP routing requires new signups to verify email ownership via dynamic OTP codes before profiles are activated.

⚡ Performance Tuning

Background Tasks: Document chunking and embedding pipelines run on FastAPI's BackgroundTasks, releasing the API request thread immediately.
Low-Level Stream Buffering: Uses Next.js streams to output token deltas to UI elements without triggering full React page re-renders.
Isolated Namespacing: Pinecone vector queries are filtered by user namespaces, reducing retrieval search space and increasing query throughput.

🔮 Future Improvements

Celery + Redis Worker Pools: Offload document calculations to isolated distributed worker processes.
Redis Cache Integration: Migrate JWT denylists to an in-memory Redis node to achieve sub-millisecond lookup times during API authorization.
Stripe Billing Integration: Implement credit quotas and subscription management to enforce usage limits.

🤝 Contributing

Contributions are welcome. Please open an issue first to discuss changes before submitting a pull request.

Fork the Repository.
Create a branch (git checkout -b feature/improvement).
Commit your changes (git commit -m 'feat: description').
Push to your branch (git push origin feature/improvement).
Open a Pull Request.

📄 License

MIT License — see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.github/workflows		.github/workflows
.vscode		.vscode
backend		backend
docs/screenshots		docs/screenshots
frontend		frontend
.gitignore		.gitignore
README.md		README.md
railway.json		railway.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Simplify AI

📌 Overview

⚡ Live Deployments

⚠️ The Problem

💡 The Solution

✨ Features

🏗️ Architecture

🧬 RAG Pipeline Flow

📸 Screenshots

🛠️ Tech Stack

🚀 Local Setup

Prerequisites

1. Backend Setup

2. Frontend Setup

📋 Environment Variables

Backend Setup (`backend/.env`)

Frontend Setup (`frontend/.env.local`)

📁 Project Structure

🔑 Engineering Decisions

1. Multi-Stage Token Rotation

2. Segregation of Data Boundaries

🔒 Security Specifications

⚡ Performance Tuning

🔮 Future Improvements

🤝 Contributing

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Simplify AI

📌 Overview

⚡ Live Deployments

⚠️ The Problem

💡 The Solution

✨ Features

🏗️ Architecture

🧬 RAG Pipeline Flow

📸 Screenshots

🛠️ Tech Stack

🚀 Local Setup

Prerequisites

1. Backend Setup

2. Frontend Setup

📋 Environment Variables

Backend Setup (backend/.env)

Frontend Setup (frontend/.env.local)

📁 Project Structure

🔑 Engineering Decisions

1. Multi-Stage Token Rotation

2. Segregation of Data Boundaries

🔒 Security Specifications

⚡ Performance Tuning

🔮 Future Improvements

🤝 Contributing

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Backend Setup (`backend/.env`)

Frontend Setup (`frontend/.env.local`)

Packages