A hands-free, conversational AI assistant for exploring Indian tourism. Speak your questions, no buttons needed and get instant, spoken, and written answers.
- Features
- Prerequisites
- Installation
- Usage
- Configuration
- Deployment
- Architecture
- Customization & Extensibility
- Troubleshooting
- License
- Credits
User Mic → Frontend (SpeechRecognition + MediaRecorder)
↓
WebSocket (socket.io)
↓
Backend (FastAPI)
Whisper → GPT-4o → gTTS (TTS) → Response
- Hands-free interaction: Speak to ask questions about Indian cities, monuments, cuisine, culture, and more.
- Silence detection: Automatically stops recording when you finish speaking.
- Pause/Stop commands: Say "pause" or "stop" anytime to interrupt the response.
- Real-time streaming: Partial answers appear instantly; full text and audio delivered seamlessly.
- 5-second fallback: If TTS fails or takes too long, text is still displayed.
- Automatic resume: Bot listens for your next query right after replying.
- Node.js (>=14.x)
- Python (>=3.8)
- pip for installing Python packages
- OpenAI API Key (
API_KEY) - Optional: Pinecone for memory and search (
PINE_CONE_DB,PINECONE_ENV)
-
Navigate to the backend folder:
cd guide_ai/backend -
Create a
.envfile and add your keys:API_KEY=sk-... # (Optional) Pinecone keys: PINE_CONE_DB=... PINECONE_ENV=...
-
Install dependencies:
pip install -r requirements.txt
-
Start the server:
uvicorn asgi_app:app --host 0.0.0.0 --port 5000
-
From the project root, navigate to the frontend:
cd guide_ai/frontend -
Create a
.envin this folder (if needed) and set:REACT_APP_API_URL=http://localhost:5000 -
Install dependencies:
npm install
-
Start the React app:
npm start
- Frontend: http://localhost:3000
- Backend: http://localhost:5000
- Open the frontend in your browser.
- Click Start Listening or simply start speaking.
- Ask about any Indian tourist spot or topic.
- Bot answers both in text and audio.
- Say pause or click Stop Chatting anytime to interrupt.
-
Adjust silence sensitivity in
App.tsx:const SILENCE_THRESHOLD = 0.01; const SILENCE_DURATION = 1000; // in ms
-
Change voice-activity threshold:
const VOICE_THRESHOLD = 0.015;
-
Backend TTS settings in
asgi_app.py:from gtts import gTTS
- Frontend: Vercel or Netlify
- Backend: Render.com, Railway, or self-host on a VPS
- Set environment variables in your deployment platform to match
.env.
- Pinecone integration: for long-term memory or retrieval-augmented generation.
- LLM fine-tuning: tailor GPT model prompts for niche domains.
- Authentication: add user accounts for personalized experiences.
- Multilingual support: extend Web Speech API and gTTS to other languages.
- No audio: ensure microphone permissions granted.
- Playback fails: check browser support for MP3 and CORS on backend.
- Connection errors: verify backend is running and
REACT_APP_API_URLis correct.
This project is licensed under the MIT License.