DocuMind - Secure AI Document RAG Platform

DocuMind is a multi-user Retrieval-Augmented Generation (RAG) platform. It secures user documents through complete Google OAuth authentication, local/cloud PostgreSQL storage, and isolated ChromaDB vector indexing (metadata filtering).

Key Features

Google OAuth 2.0 Integration: Authenticate securely using Google Login, automatically provisioning user accounts.
Session Security (HttpOnly Cookies): Tokens are signed on the backend (JWT) and stored securely in HttpOnly, SameSite cookies to mitigate XSS risks.
Strict Document Isolation: PostgreSQL collections utilize metadata filters matching the authenticated database user_id. Users can never query, list, or delete another user's documents.
Relational Data Mapping: SQLAlchemy models track Users, Documents, and Vectors in PostgreSQL (Neon serverless setup with pgvector).
Real-time Status Polling: The frontend vault tracks document parsing and embedding status dynamically.
Animated Dark UI: Designed with TailwindCSS v4 and Framer Motion for a modern, glassmorphic dark-theme console experience.

Architecture Diagram

flowchart TD
    User([User]) --> |Uploads PDF| API_Upload[FastAPI /api/upload]
    
    subgraph Ingestion Pipeline
        API_Upload --> Extractor[Extract Text]
        Extractor --> Chunker[Chunking]
        Chunker --> Embedder[Embedding Model]
        Extractor --> Summarizer[LLM Summary Generation]
    end
    
    Embedder --> |Insert Chunks & Embeddings| DB[(Neon PostgreSQL\npgvector)]
    Summarizer --> |Insert Summary & Topics| DB
    
    User --> |Chat Query| API_Chat[FastAPI /api/chat]
    API_Chat --> Router{Intent Router}
    
    Router -->|GREETING / SMALL_TALK| LLM_Greet[LLM Greeting Prompt]
    Router -->|DOC_SUMMARY / DOC_OVERVIEW| DB_Sum[Fetch Stored Summaries\nfrom PostgreSQL]
    Router -->|DOC_QUERY| DB_Vec[pgvector Similarity Search\n+ BM25 Reranking]
    
    DB_Sum --> LLM_RAG[LLM RAG Prompt]
    DB_Vec --> LLM_RAG
    
    LLM_Greet --> Response[Streaming Response]
    LLM_RAG --> Response
    Response --> User

Folder Structure

DocuMind/
├── backend/
│   ├── alembic/                # Database migrations
│   ├── chroma_db/              # Persistent Chroma database storage
│   ├── src/
│   │   ├── auth/               # Auth package (Google verify, JWT, dependencies)
│   │   ├── database/           # DB package (SQLAlchemy connections and models)
│   │   ├── data_loader.py      # Document parser (PDF, TXT, CSV, DOCX, JSON, Excel)
│   │   ├── embedding.py        # Text splitter & Embedding pipe
│   │   ├── search.py           # RAG retrieval & Groq LLM logic
│   │   └── vectorstore.py      # Chroma Store client with user isolation
│   ├── Dockerfile
│   ├── main.py                 # FastAPI application router
│   └── requirements.txt        # Backend python dependencies
├── frontend/
│   ├── src/
│   │   ├── app/
│   │   │   ├── dashboard/      # Protected dashboard workspaces
│   │   │   ├── login/          # Google sign-in landing card
│   │   │   ├── globals.css     # Styling sheets with TailwindCSS import
│   │   │   ├── layout.tsx      # Root html layout shell
│   │   │   └── page.tsx        # Auto-routing landing page
│   │   └── middleware.ts       # Server-side auth route guard middleware
│   ├── package.json
│   └── tsconfig.json
├── docker-compose.yml          # FastAPI + local PostgreSQL config
└── README.md

Setup & Configuration

Prerequisites

Node.js
Python (3.10+)
Google Cloud Console client credentials (configured origin )
Groq API Key (for LLM RAG inference)

1. Environment Variables

Create .env in the backend/ directory:

GROQ_API_KEY="your-groq-api-key"
DATABASE_URL="postgresql://neondb_owner:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.tech/neondb?sslmode=require"
GOOGLE_CLIENT_ID="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.apps.googleusercontent.com"
JWT_SECRET="rag-for-docs-super-secret-key-change-this-in-production"

Create .env in the frontend/ directory:

NEXT_PUBLIC_BACKEND_URL="http://localhost:8000"

How to Run

Development Mode

Step A: Run the Backend

Open terminal in backend/ folder:

cd backend
# Ensure virtual environment is ready
uv venv
# Activate it (Windows)
.venv\Scripts\activate
# Install dependencies
uv pip install -r requirements.txt

Apply database schemas to PostgreSQL:
```
python -m alembic upgrade head
```
Run the FastAPI development server:
```
uvicorn main:app --reload
```
The backend API will run on http://localhost:8000.

Step B: Run the Frontend

Open another terminal in the frontend/ folder:
```
cd frontend
npm install
npm run dev
```
The client application will run on http://localhost:3000.

Production Deployment via Docker Compose

To deploy the stack locally with a local PostgreSQL server, run the following from the root workspace directory:

docker-compose up --build

This launches a PostgreSQL container mapped to port 5432 and builds/starts the FastAPI backend container listening on port 8000.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
PROGRESS.md		PROGRESS.md
README.md		README.md
docker-compose.yml		docker-compose.yml
v2.20.md		v2.20.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DocuMind - Secure AI Document RAG Platform

Key Features

Architecture Diagram

Folder Structure

Setup & Configuration

Prerequisites

1. Environment Variables

How to Run

Development Mode

Step A: Run the Backend

Step B: Run the Frontend

Production Deployment via Docker Compose

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DocuMind - Secure AI Document RAG Platform

Key Features

Architecture Diagram

Folder Structure

Setup & Configuration

Prerequisites

1. Environment Variables

How to Run

Development Mode

Step A: Run the Backend

Step B: Run the Frontend

Production Deployment via Docker Compose

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages