Your personal application for intelligent, content management.
SmartDrive is a full-stack, cloud-native application designed to be your personal content manager. It allows you to securely upload, process, and search a wide variety of files — including documents, images, and media.
At its core, SmartDrive uses a sophisticated AI pipeline to automatically understand file content, generate concise summaries, and make everything instantly searchable through an intuitive web interface.
The project is built on a modern microservice architecture, ensuring scalability, resilience, and efficiency across diverse workloads — from simple document parsing to intensive audio transcription.
In today’s world, we’re overwhelmed with unstructured data: meeting transcripts, research papers, scanned documents, personal photos, and more. Finding what you need is like searching for a needle in a haystack.
Commercial cloud storage solutions exist — but they often lack deep content analysis and force you to trust a third party with your most sensitive data.
SmartDrive solves this by providing an intelligent, flexible, privacy-aware AI pipeline.
✅ Privacy-Aware Processing
Uses open-source AI models (e.g., OCR, Whisper) in a secure, containerized environment — reducing API costs and limiting raw data exposure.
✅ Intelligent Routing
A local classifier analyzes images to decide between an OCR or captioning workflow, applying the best tool for the job.
✅ State-of-the-Art Summarization & Captioning
For high-quality reasoning, SmartDrive uses the Google Gemini API, combining powerful third-party summarization with cost-effective local processing.
Open-Source Models Used:
- Document Parsing:
unstructured - Image OCR:
EasyOCR - Audio Transcription and Video:
faster-whisper and ffmpeg
| Category | Technologies |
|---|---|
| Frontend | React, Next.js (App Router), TypeScript, Tailwind CSS, Axios |
| Backend | Node.js, Express.js, TypeScript, MongoDB, Mongoose |
| Authentication | JWT (Access & Refresh Tokens), bcrypt |
| AI / Processing | Python, unstructured, EasyOCR, faster-whisper, ffmpeg, Google Gemini API |
| Database & Search | Weaviate (Vector Database), MongoDB (Metadata) |
| Cloud & DevOps | Google Cloud Platform (GCP), Docker, Cloud Run, Cloud Storage (GCS), Pub/Sub, Cloud Build, GitHub Actions |
SmartDrive uses a decoupled, event-driven architecture for scalability and resilience.
Data Flow:
-
Upload
Authenticated User uploads a file via the Next.js frontend. -
Backend Router
Node.js backend saves the file to a user-specific GCS bucket, stores fileHash in MongoDB, and publishes a Pub/Sub message based on MIME type. -
Specialized Processing
One of three Python microservices, each listening to its own Pub/Sub topic, picks up the message. -
Content Extraction
The microservice downloads the file from GCS and uses specialized models (e.g.,unstructured,EasyOCR,faster-whisper) to extract content. -
Summarization & Embedding
Sends extracted content to the Google Gemini API for summarization, then creates a vector embedding. -
Indexing
Saves the final summary, metadata, and embedding to the correct Weaviate collection.
This guarantees that, for example, a long-running transcription job will not block faster tasks like a document parse.
- Clean user registration and login forms
- Centralized state management with
AuthContext - Drag-and-drop file uploader with status indicators
- Federated search interface across all data types
- Personalized dashboard showing recent uploads and search results
- Secure file actions (view, download, delete)
- Secure API endpoints for file operations
- Custom JWT authentication with bcrypt
- Authentication middleware for protected routes
- Pub/Sub message router based on MIME type
- Federated search endpoint combining multiple Weaviate collections
- Secure GCS signed URL generation for file access
Responsibility: Process standard documents like .pdf, .docx, .pptx, .xlsx, .txt.
Technologies:
unstructured: advanced text parsing- Google Gemini API: summarization
Cloud Integration:
- Listens to
smartdrive-data-extract-subPub/Sub subscription - Downloads files from GCS
- Saves summaries and vectors to
SmartDriveDocumentscollection in Weaviate
Responsibility: Process image files (.png, .jpg, etc.)
Technologies:
- OpenCV (Laplacian variance for classification)
- EasyOCR: OCR
- Google Gemini API: summarization or captioning
Cloud Integration:
- Listens to
smartdrive-image-extract-subPub/Sub subscription - Downloads images from GCS
- Saves results to
SmartDriveImagescollection in Weaviate
Responsibility: Process audio (.mp3, .wav) and video (.mp4) files
Technologies:
- ffmpeg: audio extraction from video
- faster-whisper: audio-to-text transcription
- Google Gemini API: summarization
Cloud Integration:
- Listens to
smartdrive-media-extract-subPub/Sub subscription - Downloads media files from GCS
- Saves summarized transcript + vectors to
SmartDriveMediacollection in Weaviate
- Node.js (v18+)
- Python (v3.11+)
- Docker Desktop
- gcloud CLI
- ffmpeg
# Authenticate with Google Cloud
gcloud auth login
gcloud auth application-default login
# Set up the backend
cd smartdrive-backend
cp .env.example .env # or create .env as shown below
npm install
npm run dev
# Set up each Python microservice
cd smartdrive-image-extractor
python -m venv .venv
source .venv/bin/activate # or .\.venv\Scripts\activate on Windows
pip install uv
uv sync
cp .env.example .env # or create .env as shown below
uv run main.py
# Set up the frontend
cd smartdrive-frontend
cp .env.local.example .env.local
npm install
npm run dev