Real-time American Sign Language Recognition in the Browser
SignSync is a web application that recognizes American Sign Language (ASL) gestures in real-time using your webcam. All machine learning inference runs directly in your browser - no server-side processing required.
- Features
- Recognition Capabilities
- Architecture
- Tech Stack
- Getting Started
- Project Structure
- Model Configuration
- Model Training and Conversion
- Deployment
- API Reference
- Contributing
- Troubleshooting
- License
- Real-time Recognition - Instant ASL gesture detection using your webcam
- Browser-based Inference - All ML processing runs locally in your browser for privacy and speed
- Multi-landmark Tracking - Simultaneous pose, face, and hand tracking for comprehensive gesture analysis
- Visual Feedback - Live landmark visualization overlay on video feed
- Confidence Scoring - Each prediction includes a confidence percentage
- Swappable Models - Easy model updates via configuration file
- Privacy-first - No video data sent to servers; all processing is client-side
- Mobile-friendly - Responsive design works on desktop and mobile browsers
- Free Tier Compatible - Designed to run on Render's free hosting tier
SignSync is designed to recognize:
| Category | Examples |
|---|---|
| ASL Alphabets | A-Z (26 letters) |
| Numbers | 0-9 |
| Common Words | hello, thank you, I love you etc. |
| Simple Sentences | Multi-sign phrases and expressions |
Note: The currently deployed model recognizes a subset of gestures. Additional gestures can be added by training and deploying an updated model.
SignSync uses a client-side machine learning architecture where all inference runs in the browser:
+------------------+ +----------------------+ +------------------+
| Webcam | --> | MediaPipe (WASM) | --> | TensorFlow.js |
| Video Feed | | Landmark Extraction | | LSTM Model |
+------------------+ +----------------------+ +------------------+
| |
v v
+------------------+ +------------------+
| 1692 Keypoints | | Prediction |
| per Frame | | + Confidence |
+------------------+ +------------------+
MediaPipe extracts 1692 keypoints per frame from three landmark models:
| Model | Landmarks | Coordinates | Total Values |
|---|---|---|---|
| Pose | 33 landmarks | x, y, z, visibility | 132 |
| Face | 478 landmarks | x, y, z | 1434 |
| Left Hand | 21 landmarks | x, y, z | 63 |
| Right Hand | 21 landmarks | x, y, z | 63 |
| Total | - | - | 1692 |
The LSTM model processes 30-frame sequences (approximately 1 second of video at 30fps):
- Input shape:
[batch, 30, 1692] - Output: Probability distribution over gesture classes
- Inference: Runs on every frame once the buffer is full (sliding window)
- Webcam captures video frames at 30fps
- MediaPipe (running via WebAssembly) extracts landmarks from each frame
- Buffer accumulates 30 frames of keypoint data
- TensorFlow.js runs the LSTM model on the buffered sequence
- UI displays the predicted gesture with confidence score
| Layer | Technology | Purpose |
|---|---|---|
| Frontend | HTML5, CSS3, JavaScript | User interface and interaction |
| ML - Landmarks | MediaPipe Vision Tasks | Pose, face, and hand landmark extraction |
| ML - Inference | TensorFlow.js | LSTM model execution in browser |
| Backend | Flask | Static file serving |
| WSGI | Gunicorn | Production server |
| Deployment | Render | Cloud hosting (free tier) |
- Privacy: Video never leaves the user's device
- Speed: No network latency for inference
- Cost: No GPU servers required; runs on free hosting
- Scalability: Server load is minimal (static files only)
- Python 3.11 or higher
- A modern web browser (Chrome, Firefox, Edge, Safari)
- Webcam access
-
Clone the repository
git clone https://github.com/theChosen-1/SignSync-WebApp.git cd SignSync-WebApp -
Create a virtual environment
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies
pip install -r requirements.txt
-
Set up environment variables (optional for development)
cp .env.example .env
-
Run the development server
python app.py
-
Open your browser
Navigate to http://localhost:5001
- Click the Start button to enable your webcam
- Position yourself so your upper body is visible
- Perform ASL gestures
- View real-time predictions with confidence scores
SignSync-WebApp/
├── app.py # Flask application (routes only)
├── requirements.txt # Python dependencies
├── render.yaml # Render deployment configuration
├── convert_model.py # Keras to TensorFlow.js converter
├── .env.example # Environment variable template
│
├── template/
│ ├── index.html # Main application page
│ └── docs.html # "How It Works" documentation
│
├── static/
│ ├── script.js # MediaPipe + TensorFlow.js integration
│ ├── style.css # Main application styles
│ ├── docStyle.css # Documentation page styles
│ ├── model_config.json # Model configuration and labels
│ ├── images/ # Static images
│ └── tfjs_model/ # TensorFlow.js model files
│ ├── model.json # Model architecture
│ └── group1-shard*.bin # Model weights
│
├── tests/ # Test suite
│ └── test_app.py # Flask route tests
│
└── gesture_model.keras # Source Keras model (for conversion)
The model behavior is controlled by static/model_config.json:
{
"labels": ["hello", "thanks", "iloveyou"],
"sequenceLength": 30,
"keypointSize": 1692,
"confidenceThreshold": 0.7,
"version": "1.0.0",
"description": "LSTM model for ASL gesture recognition",
"keypointBreakdown": {
"pose": 132,
"face": 1434,
"leftHand": 63,
"rightHand": 63,
"total": 1692
}
}| Field | Description |
|---|---|
labels |
Array of gesture class names (in order of model output) |
sequenceLength |
Number of frames to buffer before inference |
keypointSize |
Total keypoints per frame (must match model input) |
confidenceThreshold |
Minimum confidence for high-confidence display (0-1) |
To use a different model:
- Convert your Keras model to TensorFlow.js format (see below)
- Replace files in
static/tfjs_model/ - Update
static/model_config.jsonwith new labels - Deploy or restart the application
The model expects input sequences with shape [batch, 30, 1692]:
- 30 frames of sequential landmark data
- 1692 features per frame (pose + face + hands)
- Features must be normalized (0-1 range from MediaPipe)
Use the provided conversion script:
# Create a separate environment (requires Python 3.10 or 3.11)
python3.11 -m venv tfjs_env
source tfjs_env/bin/activate
# Install conversion dependencies
pip install tensorflow==2.15.0 tensorflowjs==4.17.0
# Run conversion
python convert_model.pyAlternatively, use Docker:
docker run -it --rm -v $(pwd):/app -w /app python:3.11-slim bash -c "
pip install tensorflow==2.15.0 tensorflowjs==4.17.0 &&
python convert_model.py
"The script will:
- Load
gesture_model.keras - Convert to TensorFlow.js LayersModel format
- Save to
static/tfjs_model/
Your Keras model should:
- Accept input shape
(None, 30, 1692) - Output a probability distribution over gesture classes
- Use layers supported by TensorFlow.js (LSTM, Dense, Dropout, etc.)
-
Fork or push to GitHub
-
Create a new Web Service on Render
-
Connect your repository
-
Render auto-detects
render.yamland configures:- Python 3.11 runtime
- Build:
pip install -r requirements.txt - Start:
gunicorn --bind 0.0.0.0:$PORT --workers 2 --threads 2 --timeout 120 app:app - Health check:
/health - Auto-deploy on push
-
Your app will be live at
https://your-app.onrender.com
| Variable | Description | Required |
|---|---|---|
SECRET_KEY |
Flask secret key | Yes (auto-generated on Render) |
FLASK_DEBUG |
Enable debug mode | No (default: False) |
PORT |
Server port | No (default: 10000 on Render) |
SignSync is optimized for Render's free tier:
- Minimal server load: Flask only serves static files
- No GPU required: All ML runs in browser
- Small footprint: ~50MB total deployment size
- Cold starts: First request may take 30-60 seconds to spin up
| Method | Endpoint | Description |
|---|---|---|
| GET | / |
Main application page |
| GET | /docs |
"How It Works" documentation |
| GET | /health |
Health check endpoint |
{
"status": "healthy",
"timestamp": "2024-01-15T12:30:45.123456",
"version": "2.0.0",
"message": "Frontend ready - ML model coming soon"
}Contributions are welcome! Here's how to get started:
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Make your changes
- Run tests:
pytest - Commit:
git commit -m 'Add amazing feature' - Push:
git push origin feature/amazing-feature - Open a Pull Request
# Run all tests
pytest
# Run with coverage report
pytest --cov=app --cov-report=term-missing
# Run specific test file
pytest tests/test_app.py- Python: Follow PEP 8
- JavaScript: Use consistent formatting (4-space indentation)
- CSS: Use BEM-style class naming
- Add support for more ASL gestures
- Improve model accuracy
- Add mobile-specific optimizations
- Create training data collection tools
- Write documentation or tutorials
- Report and fix bugs
| Issue | Solution |
|---|---|
| Permission denied | Click the camera icon in browser address bar and allow access |
| No camera found | Connect a webcam and refresh the page |
| Camera in use | Close other applications using the camera |
| HTTPS required | Use localhost or deploy with HTTPS |
| Issue | Solution |
|---|---|
| 404 on model files | Ensure static/tfjs_model/model.json exists |
| Tensor shape mismatch | Verify model_config.json matches model architecture |
| WebGL errors | Try a different browser or disable GPU acceleration |
| Issue | Solution |
|---|---|
| Low confidence scores | Ensure good lighting and plain background |
| Incorrect predictions | Keep entire upper body in frame |
| Delayed predictions | Wait for buffer to fill (30 frames) |
| Issue | Solution |
|---|---|
| Build fails | Check requirements.txt for syntax errors |
| App crashes on start | Verify SECRET_KEY is set in production |
| Slow initial load | Normal on free tier (cold start); wait 30-60 seconds |
MIT License
Copyright (c) 2024 inshaal81
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Built with care by inshaal81
