A sophisticated document-based question-answering chatbot that allows users to interact with their document knowledge base using Retrieval-Augmented Generation (RAG). Built with Flask, Pinecone, Cohere, and powered by advanced embeddings.
- Document Embedding & Storage: Automatically processes PDF documents and stores embeddings in Pinecone
- Intelligent Chunking: Uses hierarchical document splitting for optimal retrieval
- RAG-Powered Responses: Combines document retrieval with Cohere's language model for accurate answers
- Real-time Chat Interface: Clean, responsive web interface for seamless conversations
- Session Management: Persistent chat history across page refreshes
- Audio Inpuy: Audio input avaiable for English, Hindi and Bengali.
- Backend: Flask (Python)
- Vector Database: Pinecone
- Embeddings: Cohere
- Language Model: Cohere
- Document Processing: LangChain + PyPDF
- Frontend: Bootstrap 5 + Vanilla JavaScript
- Speech-to-Text: OpenAI Whisper
- Speaker Diarization: Pyannote Audio
- Audio Processing: FFmpeg
- Python 3.9+
- Pinecone account and API key
- Cohere account and API key
- HuggingFace account and token
- FFmpeg (for audio conversion)
git clone https://github.com/cr7ritesh/smartheal-chatbot.git
cd Runvervepip install -r requirements.txtWindows:
🎬 Install FFmpeg on Windows by TechwithMonir
macOS:
brew install ffmpegUbuntu/Debian:
sudo apt update
sudo apt install ffmpegCreate a .env file in the root directory:
PINECONE_API_KEY=your_pinecone_api_key_here
COHERE_API_KEY=your_cohere_api_key_here
HUGGINGFACE_TOKEN=your_huggingface_token_here- Create a
docsfolder in the project root - Place your PDF files in the
docsfolder
python store_embed.pyThis will:
- Load all PDFs from the folder
- Create intelligent chunks using hierarchical splitting
- Generate embeddings using Cohere
- Store everything in Pinecone
python app.pyVisit http://localhost:5000 to start chatting with your documents!
- Add PDF files to the
documentsfolder - Run
python store_embed.pyto update embeddings - The web app will automatically use the updated knowledge base
- Open the web interface at
http://localhost:5000 - Ensure the system shows "Documents loaded and ready"
- Type your questions in the chat interface
- Get AI-powered responses based on your document content