Full-Stack SaaS RAG Platform โ Secure document retrieval, conversational QA, and semantic search.
Simplify AI is a decoupled full-stack application that provides users with secure workspaces to upload, index, and query private documents (PDF, DOCX, TXT, MD). The system generates vector embeddings, stores them in a spatial index, and uses grounded prompts to stream contextual answers with page-level citations in real time.
- Production Web App: https://simplify-ai-lilac.vercel.app/
- API Health Endpoint:
https://saas-rag-production.up.railway.app/api/v1/health(FastAPI backend deployed on Railway)
Many Retrieval-Augmented Generation (RAG) tools are built as monolithic scripts or generic wrappers around API endpoints. These designs suffer from:
- Gateway Timeouts: Blocking API threads while parsing large documents or creating hundreds of embeddings.
- Weak Security: Hardcoded single-token structures that risk session hijacking if tokens are stolen.
- High Latency: Waiting for complete LLM inference blocks to finish before returning responses to the user.
- No Contextual Grounding: Generating generic replies that lack verifiable references, leading to AI hallucinations.
Simplify AI addresses these issues using a production-adjacent, decoupled architecture:
- Asynchronous Processing: Immediate API responses with
processingflags, delegating extraction and embedding tasks to background execution queues. - Multi-Stage Auth Rotation: Short-lived JWT access tokens paired with long-lived rotated refresh tokens and revocation lists.
- Low-Latency Stream Injection: Progressive token delivery using Server-Sent Events (SSE) and native browser streams.
- Grounded Verification: A citation metadata pipeline that links every generated block to verified source page numbers and text excerpts.
- SaaS-Ready Security: Email verification via SMTP One-Time Passwords (OTP), and JWT access/refresh token rotation with signature-based MongoDB revocation.
- Multi-Format Parsing: Automatic structure detection and content extraction for
.pdf,.docx,.txt, and.mdfiles. - Semantic Vector Indexes: Chunks parsed text using character-overlapping splitters, converts chunks to 3072-dimensional embeddings via Gemini, and indexes them in Pinecone namespaces.
- SSE Token Streaming: Pushes generator-driven chunk deltas from FastAPI to Next.js using
StreamingResponseand fetches browser-side stream readers. - Grounded Citations: Maps similarity search nodes back to database-backed chunk excerpts, rendering clickable citation cards.
- Responsive Dark Mode: Minimalist interface built on Tailwind CSS, Radix UI primitives, and state managed by Zustand.
graph TD
%% Service boundaries
subgraph Client [Client UI - Next.js / Vercel]
Next[Next.js App Router]
Zustand[Zustand State Store]
end
subgraph API [API Service - FastAPI / Railway]
Fast[FastAPI Web Server]
Auth[JWT & OTP Security Manager]
RAG[RAG Orchestration Service]
end
subgraph Infrastructure [Data Infrastructure]
Mongo[(MongoDB Atlas - Metadata & Revocations)]
Supa[(Supabase Storage - Binary Source Files)]
Pine[(Pinecone Vector Database - Vector Embeddings)]
Gemini[Google Gemini API - Embeddings & Inference]
end
%% Network flows
Next -->|HTTPS REST Request| Fast
Next -->|Fetch SSE stream| Fast
Fast --> Auth
Fast --> RAG
Auth -->|Read/Write User States| Mongo
RAG -->|Upload Source File| Supa
RAG -->|Embed Text Chunks| Gemini
RAG -->|Upsert/Query Vectors| Pine
RAG -->|Read/Write Chat Context| Mongo
classDef default fill:#f9f9f9,stroke:#333,stroke-width:1px;
classDef active fill:#e1f5fe,stroke:#01579b,stroke-width:2px;
class Next,Fast,RAG active;
[ Upload File ]
โ
โผ
[ Extension & Size Validation ]
โ
โผ
[ Upload Binary to Supabase Storage ]
โ
โผ
[ Parse Extract Text (pypdf/docx) ]
โ
โผ
[ LangChain Text Chunking ] (800 chars, 200 overlap)
โ
โผ
[ Gemini 3072-D Embeddings ] (gemini-embedding-001)
โ
โผ
โโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โผ
โผ [ Upsert to Pinecone ]
[ Store Text Excerpts & Metadata ] (Namespace isolation)
(MongoDB Atlas mappings)
-
Ingest & Validate: The client posts a file. The backend checks format restrictions and enforces a
25MBceiling. - Persist Source: The raw binary is uploaded to Supabase Storage, segregating database records from raw binaries.
- Extract & Segment: A background worker parses text, creating overlapping segments via character-based splitters to maintain cross-chunk context.
-
Vector Generation: Text segments are sent to
models/gemini-embedding-001to generate dense vector indices. -
Index & Database Mappings:
- Embeddings are upserted into Pinecone within the user's isolated namespace.
- Excerpts, page numbers, offsets, and document links are indexed in MongoDB Atlas.
-
Semantic Similarity Retrieval: When a query is received, the backend generates an embedding of the query, retrieves the top
$k=8$ matching nodes from Pinecone, and reads the original text chunks from MongoDB. -
Grounded Generation: The server feeds the retrieved context and system instructions into
gemini-2.5-flash, forcing it to respond only with the provided context. - SSE Streams: FastAPI streams delta chunks to the client, while Next.js parses the tokens, rendering interactive citation cards matching the source excerpts.
| Document Library | Conversational RAG Panel |
![]() |
![]() |
| Workspace Settings (SaaS Verification) | |
![]() |
|
| Layer | Technology | Role |
|---|---|---|
| Frontend UI | Next.js 15, React 19, TypeScript, Tailwind CSS | App Router client layout; state stores managed by Zustand. |
| Backend API | FastAPI 0.115, Python 3.11+, Uvicorn | Async route handlers, validation schemas (Pydantic v2). |
| Database | MongoDB Atlas (via motor driver) |
Mappings, chat logs, user schemas, and token denylists. |
| Vector DB | Pinecone | Dense vector indexes and namespace queries. |
| Storage | Supabase Storage | Secure bucket hosting for raw binaries. |
| AI Models | Google Gemini API | gemini-2.5-flash (inference), gemini-embedding-001 (embeddings). |
| SMTP Service | User signup and email OTP delivery. |
- Node.js 22+
- Python 3.11+
- MongoDB, Pinecone, Supabase, and Google Gemini API keys.
- Navigate to the
backend/directory:cd backend - Create and activate a Python virtual environment:
python -m venv .venv # Windows: .venv\Scripts\activate # macOS/Linux: source .venv/bin/activate
- Install dependencies:
pip install -r requirements.txt
- Copy the environment variables template and configure it:
cp .env.example .env # Edit .env with your private credentials - Run the web server:
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
- Navigate to the
frontend/directory:cd ../frontend - Install package dependencies:
npm install
- Copy environment configurations:
cp .env.example .env.local # Edit with Next.js configurations - Start the Next.js development server:
npm run dev
- Open your browser and navigate to
http://localhost:3000.
APP_ENV=development
DEBUG=true
API_V1_PREFIX=/api/v1
CORS_ORIGINS=http://localhost:3000
# Relational & NoSQL Metadata
MONGODB_URI=mongodb+srv://...
MONGODB_DB_NAME=simplify
# Vector Configurations
VECTOR_STORE_PROVIDER=pinecone
PINECONE_API_KEY=your-pinecone-key
PINECONE_INDEX_NAME=simplify-documents
PINECONE_NAMESPACE=simplify
PINECONE_DIMENSION=3072
# Storage & AI Mappings
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_SERVICE_KEY=your-service-role-key
SUPABASE_BUCKET=documents
GEMINI_API_KEY=your-gemini-key
GEMINI_CHAT_MODEL=models/gemini-2.5-flash
GEMINI_EMBEDDING_MODEL=models/gemini-embedding-001
# Security & Mail
JWT_SECRET_KEY=generate-a-strong-random-key
JWT_ALGORITHM=HS256
ACCESS_TOKEN_EXPIRE_MINUTES=30
REFRESH_TOKEN_EXPIRE_DAYS=7
SMTP_HOST=smtp.gmail.com
SMTP_PORT=587
SMTP_USERNAME=your-email@gmail.com
SMTP_PASSWORD=your-app-password
SMTP_FROM_EMAIL=no-reply@simplify.ai
SMTP_FROM_NAME="Simplify AI"NEXT_PUBLIC_API_URL=http://127.0.0.1:8000
NEXT_PUBLIC_MAX_DOCUMENTS_PER_CHAT=8simplify-ai/
โโโ backend/
โ โโโ app/
โ โ โโโ api/ # REST routing entry points
โ โ โโโ core/ # Configs, authorization middlewares
โ โ โโโ db/ # MongoDB clients, session setups
โ โ โโโ models/ # Data schemas and entities
โ โ โโโ repositories/ # Database abstraction queries
โ โ โโโ services/ # RAG pipeline, parsing, JWT managers
โ โ โโโ main.py # FastAPI application initialization
โ โโโ requirements.txt
โ โโโ Dockerfile
โโโ frontend/
โโโ app/ # Next.js App Router (Layouts/Routes)
โโโ components/ # Reusable Radix UI & Citation widgets
โโโ lib/ # Auth context, state store (Zustand), API fetchers
โโโ package.json
โโโ tailwind.config.ts
- Problem: In single-token API designs, if a client-side JWT is stolen, malicious actors gain indefinite system access.
- Solution: Implemented access/refresh token rotation. Access tokens are short-lived (
30 minutes), while refresh tokens exist for7 daysand are rotated upon every validation cycle. - Security Enforcement: When a user logs out, the access token signature is cached in MongoDB under a denylist collection with a TTL index matching its expiration timestamp. This prevents session hijack attempts using discarded tokens.
- To keep database operations fast, datastores are segregated based on operational tasks:
- Supabase Storage hosts the heavy raw binary files (PDFs, DOCX).
- Pinecone is used exclusively for vector similarity search, preventing CPU-intensive calculations on relational or document servers.
- MongoDB Atlas stores metadata references and text chunks, facilitating high-speed citation retrieval.
- Role-Based Access Control (RBAC): Custom middlewares check token scopes, enforcing document deletions to administrative roles.
- Secure Cookies: Auth tokens are transferred via HttpOnly, Secure, and SameSite cookies, shielding the application from XSS vector vulnerabilities.
- Email Verification: Standard SMTP routing requires new signups to verify email ownership via dynamic OTP codes before profiles are activated.
- Background Tasks: Document chunking and embedding pipelines run on FastAPI's
BackgroundTasks, releasing the API request thread immediately. - Low-Level Stream Buffering: Uses Next.js streams to output token deltas to UI elements without triggering full React page re-renders.
- Isolated Namespacing: Pinecone vector queries are filtered by user namespaces, reducing retrieval search space and increasing query throughput.
- Celery + Redis Worker Pools: Offload document calculations to isolated distributed worker processes.
- Redis Cache Integration: Migrate JWT denylists to an in-memory Redis node to achieve sub-millisecond lookup times during API authorization.
- Stripe Billing Integration: Implement credit quotas and subscription management to enforce usage limits.
Contributions are welcome. Please open an issue first to discuss changes before submitting a pull request.
- Fork the Repository.
- Create a branch (
git checkout -b feature/improvement). - Commit your changes (
git commit -m 'feat: description'). - Push to your branch (
git push origin feature/improvement). - Open a Pull Request.
MIT License โ see LICENSE for details.


