Vision Assistant v1.1 brings revolutionary accessibility features:
- π 8 Languages - English, Indonesian, Spanish, French, German, Portuguese, Japanese, Mandarin Chinese
- ποΈ Face Recognition + Training - Learn who's who with voice commands
- π 3D Audio Localization - Know where sounds come from with 8-directional awareness
- π¨ Intelligent Obstacle Detection - Audio-based hazard warnings
- π― 100+ Voice Commands - Full voice control in all languages
Vision Assistant is an AI-powered voice-controlled visual assistance system designed to empower visually impaired individuals by providing real-time scene understanding, text recognition, object detection, navigation assistance, and intelligent audio awareness through natural voice interaction.
- π€ Multi-Language Voice I/O - 8 languages with dynamic switching
- ποΈ Face Recognition - Detect, recognize, and learn faces with voice commands
- π Audio Localization - 3D sound positioning and classification
- π§ Intent Recognition - Understand 100+ voice commands
- π Audio-Guided Navigation - GPS + sound-based directions
- π¨ Emergency Alerts - Multi-language emergency notifications
- πΎ Persistent Data - Face database, user preferences, history
- π Full Voice Control - No visual UI required
# 1. Clone repository
git clone https://github.com/Khaf-dev/aiforus.git
cd aiforus
# 2. Create virtual environment
python -m venv venv
# Windows:
venv\Scripts\activate
# macOS/Linux:
source venv/bin/activate
# 3. Install dependencies
pip install -r requirements.txt
# 4. Configure (optional)
cp .env.example .env
# Edit .env with your API keys# English (default)
python app.py
# Indonesian
python app.py --lang=id
# Spanish
python app.py --lang=es| Code | Language | Status |
|---|---|---|
| en | English | β Default |
| id | Indonesian | β Full support |
| es | Spanish | β Full support |
| fr | French | β Full support |
| de | German | β Full support |
| pt | Portuguese | β Full support |
| ja | Japanese | β Full support |
| zh | Mandarin Chinese | β Full support |
- "Change language to Indonesian"
- "Switch to Spanish"
- "Speak French"
- "Enroll John" / "Register face as Sarah"
- "Who do you know?" / "Forget John"
- "Face statistics"
- "What do you hear?" / "Detect sounds"
- "Check ahead" / "Detect obstacles"
- "Classify sound" / "What sound is that?"
- "Describe the scene" / "What do you see?"
- "Read any text" / "Detect objects"
- "Identify people" / "Where am I?"
- OS: Windows 10+, macOS 10.14+, Ubuntu 18.04+
- Python: 3.8+
- RAM: 4GB (8GB recommended)
- Storage: 2GB (for ML models)
- Microphone: Required
- Camera: Optional (for vision features)
- OS: Windows 11, macOS 12+, Ubuntu 20.04+
- Python: 3.10+
- RAM: 8GB+
- GPU: NVIDIA (CUDA) for faster processing
- Microphone: External USB mic for audio localization
- CPU: Intel i5/AMD Ryzen 5 or equivalent
- GPU: NVIDIA GPU with CUDA (optional, for faster processing)
- RAM: 8GB or more
- Storage: SSD (faster model loading)
aiforus/
βββ app.py # Main application entry point
βββ config.yaml # Configuration file
βββ requirements.txt # Python dependencies
βββ .env.example # Environment variables template
β
βββ ai_modules/ # Core AI/ML modules
β βββ vision_processor.py # Computer vision (YOLOv8, EasyOCR)
β βββ speech_engine.py # Voice I/O (pyttsx3, SpeechRecognition)
β βββ llm_handler.py # Language model (OpenAI/Local)
β βββ neural_core.py # Model management
β
βββ features/ # Feature modules
β βββ navigation.py # GPS/directions
β βββ object_detection.py # Object recognition
β βββ text_reader.py # OCR pipeline
β βββ face_recognition.py # Face detection
β
βββ database/ # Data persistence
β βββ models.py # SQLAlchemy ORM models
β βββ db_handler.py # Database operations
β
βββ tests/ # Testing
β βββ validation.py # Bootstrap validation
β βββ __init__.py
β
βββ documentation/ # Project documentation
β βββ README.md # This file
β βββ INSTALLATION.md # Detailed setup guide
β βββ CONTRIBUTING.md # Contribution guidelines
β
βββ demo_features.py # Feature demonstration
βββ demo_exit_feature.py # Exit feature demo
βββ test_features.py # Feature testing
β
βββ LICENSE # MIT License
βββ .github/
βββ copilot-instructions.md # AI agent guidelines
| Component | Library | Version |
|---|---|---|
| Computer Vision | YOLOv8, OpenCV, EasyOCR | Latest |
| Speech | pyttsx3, SpeechRecognition, gTTS | 2.90+, 3.10+, 2.3+ |
| Language Model | OpenAI, Transformers | 1.0+, 4.31+ |
| Deep Learning | PyTorch, TorchVision | 2.0+, 0.15+ |
| Backend | FastAPI, SQLAlchemy | 0.100+, 2.0+ |
| Configuration | PyYAML, python-dotenv | Latest |
| Navigation | geopy, geocoder | Latest |
- Object Detection: YOLOv8 (nano - ~6MB)
- Text Recognition: EasyOCR (English, extensible)
- Face Detection: OpenCV Cascade Classifier
- Language Understanding: GPT-3.5-turbo or local alternatives
- Speech Synthesis: pyttsx3 (offline) or Google TTS (online)
# OpenAI Configuration (optional)
OPENAI_API_KEY=your_api_key_here
# Speech Configuration
SPEECH_LANGUAGE=en
SPEECH_RATE=150
# Device Configuration
DEVICE=cpu # Use 'cuda' if GPU available
# Feature Flags
ENABLE_NAVIGATION=true
ENABLE_FACE_RECOGNITION=false
ENABLE_TEXT_EXTRACTION=true
ENABLE_OBJECT_DETECTION=trueapp:
name: "Vision Assistant"
debug: true
version: "1.0.0"
speech:
language: "en"
speech_rate: 150
use_google_tts: false
ai:
llm_provider: "openai" # or "local"
vision_model: "yolov8n"
text_model: "easyocr"python app.pypython app.py --debugpython app.py --test-importpython tests/validation.py
python test_features.py# Linux: Install camera drivers
sudo apt install v4l2-ctl
# Check camera permissions
ls -l /dev/video*
sudo usermod -a -G video $USER# Install audio libraries
pip install --upgrade pyttsx3 pyaudioThe first run downloads ~2GB of models. Use offline mode or:
pip install --no-cache-dir -r requirements.txt- Startup Time: 2-3 seconds (after model cache)
- Voice Command Response: 1-2 seconds
- Object Detection: 100-200ms per image (CPU)
- Text Recognition: 200-500ms per image (CPU)
- Memory Usage: 300-500MB idle, 800MB-1GB during processing
- Use GPU if available: Set
DEVICE=cuda - Use smaller YOLOv8 variant for faster detection
- Batch process images for efficiency
- Cache frequently accessed data
assistant = VisionAssistant()
await assistant.describe_environment(detailed=True)
await assistant.read_text_around()
await assistant.identify_objects()
await assistant.recognize_faces()from ai_modules.speech_engine import SpeechEngine
engine = SpeechEngine()
engine.speak("Hello!")
command = await engine.listen()from ai_modules.llm_handler import LLMHandler
llm = LLMHandler(use_openai=True)
intent = await llm.understand_intent("What do you see?")
response = await llm.generate_response(query)We welcome contributions! Please see CONTRIBUTING.md for guidelines.
This project is licensed under the MIT License - see LICENSE file for details.
- Khaf-dev - Initial development and architecture
- Contributors welcome! See CONTRIBUTING.md
- YOLOv8 by Ultralytics for object detection
- EasyOCR for text recognition
- PyTorch community for deep learning framework
- OpenAI for language models
- Community for accessibility feedback and testing
- Mobile app support (iOS/Android)
- Improved face recognition with training
- Multi-language support
- Enhanced navigation with real-time obstacles
- Sound localization
- Edge device optimization (Raspberry Pi)
- Offline-first architecture
- Custom voice training
- Integration with smart home devices
- Biometric authentication
- π§ Email: rifyatkaffa@gmail.com
- π¬ GitHub Issues: Report bugs
- π Documentation: Full docs
- π Bug reports: Include OS, Python version, error logs
Q: Does it work without internet?
A: Yes! Core vision and speech features work offline. OpenAI features require internet.
Q: Can I use it without a camera?
A: Yes! Chat features work without camera. Vision features are optional.
Q: Is it GDPR/Privacy compliant?
A: Data is stored locally by default. No data is sent to servers without explicit consent.
Q: How can I improve accuracy?
A: Better lighting, clear speech, and positioned camera help significantly.
Q: Can I train custom models?
A: Yes! See documentation for fine-tuning guides.
- All voice data processed locally
- Camera feed never stored by default
- Database encrypted at rest (configurable)
- No telemetry without consent
- GDPR/CCPA compliant
Made with β€οΈ for accessibility and inclusion
Last Updated: February 2026
Version: 1.1.0