MindVault

MindVault is a beautiful, full‑stack document Q&A system that lets users upload PDF documents and ask natural language questions about their contents. It combines PDF parsing, semantic chunking, vector embeddings, and a local LLM for retrieval‑augmented generation (RAG) so answers are grounded in your documents.

Features

Upload PDF files and automatically extract and index their text
Chunking strategy to keep context windows manageable
Embeddings for semantic search using a local embedding model
Retrieval of the most relevant document chunks and answer generation with a local LLM
Minimal, modern React frontend (Vite) and small Express backend

Demo Screenshots Add screenshots in frontend/public and reference them here to show the UI and PDF viewer.

Why MindVault?

Great for knowledge workers who want instant answers from internal documents
Designed for local-first development with Ollama and MongoDB
Small codebase that's easy to understand, extend, and demo

Project Structure

Top-level layout (only most relevant files shown):

frontend/       # React + Vite UI
backend/        # Node.js + Express API and processing pipeline
WARP.md         # Project guidance and local dev notes
README.md       # This file

Backend highlights (backend/src)

server.js — Express server entry
config/db.js — MongoDB connection
routes/ — API route definitions: upload, query, auth, studySession
controllers/ — Request handlers for upload, query, and other actions
utils/ — PDF parsing, chunking, and embedding helpers

Frontend highlights (frontend/src)

App.jsx, main.jsx — React app entry
components/PDFViewer.jsx — Lightweight PDF viewing UI
pages/StudySession.jsx — Core page for asking questions about documents

Architecture Overview

Upload PDF → stored in uploads/ and parsed with pdf-parse
Text is chunked (configurable chunk size) to keep context manageable
Each chunk is converted to an embedding via a local embedding model (Ollama nomic-embed-text)
Chunks + embeddings saved in MongoDB via Mongoose models
On user query: embed the question, compute cosine similarity, pick top N chunks, and send context + question to LLM (Ollama mistral) to generate an answer

This design keeps the retrieval step simple and effective while letting the LLM produce concise, grounded answers.

Quick Start (Local Development)

Prerequisites

Node.js (v16+ recommended)
npm or pnpm
MongoDB running locally at mongodb://127.0.0.1:27017/MindVault
Ollama running at http://localhost:11434 with models:
- nomic-embed-text (embeddings)
- mistral (generation)

Start services and apps

Install dependencies

cd backend && npm install
cd ../frontend && npm install

Start backend (development)

cd backend
npm run dev

Start frontend (Vite)

cd frontend
npm run dev

Open your browser and go to the Vite dev URL (usually http://localhost:5173)

Notes

The backend expects Ollama and MongoDB to be reachable. See WARP.md for exact commands to start these services.

API Endpoints

POST /api/upload — Upload a PDF to be processed and indexed. Multipart form data with file field file.
POST /api/query — Ask a question about indexed documents. JSON body with question and optional fileId or context filters.

See backend/src/routes and backend/src/controllers for exact request shapes and additional endpoints like auth and study session management.

Configuration & Environment

The backend uses environment variables in backend/.env to configure MongoDB, JWT secrets, and other settings. Review and update backend/.env before running in production.

Development Notes

The RAG pipeline is intentionally small and synchronous for clarity. If you scale this app, consider:
- Moving embedding generation to a background job queue
- Using a dedicated vector DB (e.g., Pinecone, Milvus) for large datasets
- Adding rate limiting and stricter auth for public deployments

Contributing

Contributions are welcome! If you'd like to help:

Fork the repo
Create a topic branch (feature/bugfix)
Open a PR with a clear description and small, focused changes

Please add tests for new behavior. Backend unit tests are not included yet — adding Jest unit tests for the RAG pipeline would be a great first contribution.

License

This repository does not include a license file. Add a license (MIT, Apache, etc.) if you plan to publish the code.

Acknowledgements

Built with love and curiosity. Thanks to the authors of Ollama, pdf-parse, Vite, and Mongoose.

If you want a fancier README (badges, CI status, animated GIF demo, or specific screenshots inserted), tell me which images or badges you want and I’ll update the file.

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
Flowchart		Flowchart
README.md		README.md
Story Board		Story Board
Struct		Struct
WARP.md		WARP.md
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MindVault

Project Structure

Architecture Overview

Quick Start (Local Development)

API Endpoints

Configuration & Environment

Development Notes

Contributing

License

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MindVault

Project Structure

Architecture Overview

Quick Start (Local Development)

API Endpoints

Configuration & Environment

Development Notes

Contributing

License

Acknowledgements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages