VoiceGuide

AI-powered real-time navigation assistant for visually impaired users.

VoiceGuide is a camera-based assistive application that helps blind and low-vision users understand their surroundings through real-time audio descriptions and interactive voice queries. The system captures visual input, processes it using AI, and delivers clear verbal feedback to improve situational awareness.

Purpose

VoiceGuide aims to improve accessibility by enabling greater independence for visually impaired individuals through real-time, AI-driven assistance.

Overview

VoiceGuide provides two primary modes:

Continuous navigation assistance through automatic scene descriptions
Interactive querying through voice-based questions about the environment

The application is designed to be lightweight, accessible, and usable directly through a web browser without requiring installation.

Live demo: https://wics-hackathon-kappa.vercel.app/

For best performance, open in Safari on mobile devices.

Features

Real-Time Navigation Mode

Continuously captures frames from the camera
Generates real-time scene descriptions
Provides periodic audio feedback about surroundings
Helps identify objects, obstacles, and spatial context
Starts automatically when the app is opened

Ask Mode

Allows users to ask questions using voice input
Processes speech into text and generates contextual responses
Answers questions based on the current camera view
Activated with a double tap anywhere on the screen

Audio System

Converts responses into natural-sounding speech
Uses a queue system to prevent overlapping audio
Ensures clear and sequential delivery of alerts

Camera Integration

Uses live camera feed through browser APIs
Requires user permission for camera access
Works on both desktop and mobile browsers

Cross-Platform Compatibility

Accessible via modern browsers
Optimized for mobile usage, including iOS devices

Target Audience

Individuals who are blind or visually impaired
Elderly users who require navigation assistance
Users with limited situational awareness
Accessibility-focused organizations and researchers
Developers exploring assistive AI solutions

How to Use

Start navigation Open the app. Once camera access is granted, VoiceGuide starts providing navigation support automatically.

Stop navigation Close the app.

Start Ask Mode Double tap anywhere on the screen. The app will say "Listening" to indicate it is ready to hear your question.

Stop Ask Mode and get response Double tap again to stop recording. The app will say "Answering" and provide a spoken response based on your question and the current camera view.

Example interaction

Open the app
Allow camera access
Hear automatic navigation guidance
Double tap anywhere to enter Ask Mode
Speak your question
Double tap again to stop recording
Hear the app respond with an answer

Tech Stack

Layer	Technology
Frontend	React + Vite
Styling	Tailwind CSS
Vision AI	OpenAI Vision models
Speech to text	OpenAI Whisper
Text to speech	OpenAI TTS
Camera	MediaStream API
Audio	Web Audio API

Setup

1. Clone the repository

git clone https://github.com/mmehta29/WicsHackathon.git
cd WicsHackathon

2. Install dependencies

npm install

3. Configure environment variables

Create a .env.local file in the root directory:

VITE_OPENAI_API_KEY=your-api-key-here
VITE_OPENAI_NAVIGATE_MODEL=gpt-4o-mini
VITE_OPENAI_ASK_MODEL=gpt-4o
VITE_OPENAI_TTS_VOICE=nova

4. Run the application

npm run dev

5. Access the app

Open https://wics-hackathon-kappa.vercel.app/

How It Works

The camera captures frames from the user's environment
Frames are processed by a vision model to generate descriptions
Text responses are converted into speech
Audio is delivered using a queue to avoid overlap

For Ask Mode:

User double taps the screen to begin recording
The app announces it is listening
The user asks a question
The user double taps again to stop recording
The app announces it is answering
The question is processed using speech to text and visual context
The response is converted to speech and played aloud

Future Improvements

Reduced latency for faster navigation alerts
Object detection with distance estimation
Indoor navigation support
Offline functionality
Integration with wearable devices

Team

Developed during the WiCS Hackathon by a team focused on accessibility and applied artificial intelligence.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
public		public
src		src
.gitignore		.gitignore
README.md		README.md
eslint.config.js		eslint.config.js
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
tailwind.config.js		tailwind.config.js
vite.config.js		vite.config.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VoiceGuide

Purpose

VoiceGuide aims to improve accessibility by enabling greater independence for visually impaired individuals through real-time, AI-driven assistance.

Overview

Features

Target Audience

How to Use

Tech Stack

Setup

How It Works

Future Improvements

Team

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VoiceGuide

Purpose

VoiceGuide aims to improve accessibility by enabling greater independence for visually impaired individuals through real-time, AI-driven assistance.

Overview

Features

Target Audience

How to Use

Tech Stack

Setup

How It Works

Future Improvements

Team

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages