SignSync

Real-time American Sign Language Recognition in the Browser

SignSync is a web application that recognizes American Sign Language (ASL) gestures in real-time using your webcam. All machine learning inference runs directly in your browser - no server-side processing required.

Try the Live Demo

Features

Real-time Recognition - Instant ASL gesture detection using your webcam
Browser-based Inference - All ML processing runs locally in your browser for privacy and speed
Multi-landmark Tracking - Simultaneous pose, face, and hand tracking for comprehensive gesture analysis
Visual Feedback - Live landmark visualization overlay on video feed
Confidence Scoring - Each prediction includes a confidence percentage
Swappable Models - Easy model updates via configuration file
Privacy-first - No video data sent to servers; all processing is client-side
Mobile-friendly - Responsive design works on desktop and mobile browsers
Free Tier Compatible - Designed to run on Render's free hosting tier

Recognition Capabilities

SignSync is designed to recognize:

Category	Examples
ASL Alphabets	A-Z (26 letters)
Numbers	0-9
Common Words	hello, thank you, I love you etc.
Simple Sentences	Multi-sign phrases and expressions

Note: The currently deployed model recognizes a subset of gestures. Additional gestures can be added by training and deploying an updated model.

Architecture

SignSync uses a client-side machine learning architecture where all inference runs in the browser:

+------------------+     +----------------------+     +------------------+
|     Webcam       | --> |   MediaPipe (WASM)   | --> |   TensorFlow.js  |
|   Video Feed     |     |   Landmark Extraction |     |   LSTM Model     |
+------------------+     +----------------------+     +------------------+
                                   |                           |
                                   v                           v
                         +------------------+         +------------------+
                         |   1692 Keypoints |         |   Prediction     |
                         |    per Frame     |         |   + Confidence   |
                         +------------------+         +------------------+

Keypoint Extraction

MediaPipe extracts 1692 keypoints per frame from three landmark models:

Model	Landmarks	Coordinates	Total Values
Pose	33 landmarks	x, y, z, visibility	132
Face	478 landmarks	x, y, z	1434
Left Hand	21 landmarks	x, y, z	63
Right Hand	21 landmarks	x, y, z	63
Total	-	-	1692

LSTM Processing

The LSTM model processes 30-frame sequences (approximately 1 second of video at 30fps):

Input shape: [batch, 30, 1692]
Output: Probability distribution over gesture classes
Inference: Runs on every frame once the buffer is full (sliding window)

Data Flow

Webcam captures video frames at 30fps
MediaPipe (running via WebAssembly) extracts landmarks from each frame
Buffer accumulates 30 frames of keypoint data
TensorFlow.js runs the LSTM model on the buffered sequence
UI displays the predicted gesture with confidence score

Tech Stack

Layer	Technology	Purpose
Frontend	HTML5, CSS3, JavaScript	User interface and interaction
ML - Landmarks	MediaPipe Vision Tasks	Pose, face, and hand landmark extraction
ML - Inference	TensorFlow.js	LSTM model execution in browser
Backend	Flask	Static file serving
WSGI	Gunicorn	Production server
Deployment	Render	Cloud hosting (free tier)

Why This Architecture?

Privacy: Video never leaves the user's device
Speed: No network latency for inference
Cost: No GPU servers required; runs on free hosting
Scalability: Server load is minimal (static files only)

Getting Started

Prerequisites

Python 3.11 or higher
A modern web browser (Chrome, Firefox, Edge, Safari)
Webcam access

Installation

Clone the repository

git clone https://github.com/theChosen-1/SignSync-WebApp.git
cd SignSync-WebApp

Create a virtual environment

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies
```
pip install -r requirements.txt
```
Set up environment variables (optional for development)
```
cp .env.example .env
```
Run the development server
```
python app.py
```
Open your browser

Navigate to http://localhost:5001

Quick Start

Click the Start button to enable your webcam
Position yourself so your upper body is visible
Perform ASL gestures
View real-time predictions with confidence scores

Project Structure

SignSync-WebApp/
├── app.py                      # Flask application (routes only)
├── requirements.txt            # Python dependencies
├── render.yaml                 # Render deployment configuration
├── convert_model.py            # Keras to TensorFlow.js converter
├── .env.example                # Environment variable template
│
├── template/
│   ├── index.html              # Main application page
│   └── docs.html               # "How It Works" documentation
│
├── static/
│   ├── script.js               # MediaPipe + TensorFlow.js integration
│   ├── style.css               # Main application styles
│   ├── docStyle.css            # Documentation page styles
│   ├── model_config.json       # Model configuration and labels
│   ├── images/                 # Static images
│   └── tfjs_model/             # TensorFlow.js model files
│       ├── model.json          # Model architecture
│       └── group1-shard*.bin   # Model weights
│
├── tests/                      # Test suite
│   └── test_app.py             # Flask route tests
│
└── gesture_model.keras         # Source Keras model (for conversion)

Model Configuration

The model behavior is controlled by static/model_config.json:

{
    "labels": ["hello", "thanks", "iloveyou"],
    "sequenceLength": 30,
    "keypointSize": 1692,
    "confidenceThreshold": 0.7,
    "version": "1.0.0",
    "description": "LSTM model for ASL gesture recognition",
    "keypointBreakdown": {
        "pose": 132,
        "face": 1434,
        "leftHand": 63,
        "rightHand": 63,
        "total": 1692
    }
}

Configuration Options

Field	Description
`labels`	Array of gesture class names (in order of model output)
`sequenceLength`	Number of frames to buffer before inference
`keypointSize`	Total keypoints per frame (must match model input)
`confidenceThreshold`	Minimum confidence for high-confidence display (0-1)

Swapping Models

To use a different model:

Convert your Keras model to TensorFlow.js format (see below)
Replace files in static/tfjs_model/
Update static/model_config.json with new labels
Deploy or restart the application

Model Training and Conversion

Training Data Format

The model expects input sequences with shape [batch, 30, 1692]:

30 frames of sequential landmark data
1692 features per frame (pose + face + hands)
Features must be normalized (0-1 range from MediaPipe)

Converting Keras to TensorFlow.js

Use the provided conversion script:

# Create a separate environment (requires Python 3.10 or 3.11)
python3.11 -m venv tfjs_env
source tfjs_env/bin/activate

# Install conversion dependencies
pip install tensorflow==2.15.0 tensorflowjs==4.17.0

# Run conversion
python convert_model.py

Alternatively, use Docker:

docker run -it --rm -v $(pwd):/app -w /app python:3.11-slim bash -c "
    pip install tensorflow==2.15.0 tensorflowjs==4.17.0 &&
    python convert_model.py
"

The script will:

Load gesture_model.keras
Convert to TensorFlow.js LayersModel format
Save to static/tfjs_model/

Model Requirements

Your Keras model should:

Accept input shape (None, 30, 1692)
Output a probability distribution over gesture classes
Use layers supported by TensorFlow.js (LSTM, Dense, Dropout, etc.)

Deployment

Deploy to Render

Fork or push to GitHub
Create a new Web Service on Render
Connect your repository
Render auto-detects render.yaml and configures:
- Python 3.11 runtime
- Build: pip install -r requirements.txt
- Start: gunicorn --bind 0.0.0.0:$PORT --workers 2 --threads 2 --timeout 120 app:app
- Health check: /health
- Auto-deploy on push
Your app will be live at https://your-app.onrender.com

Environment Variables

Variable	Description	Required
`SECRET_KEY`	Flask secret key	Yes (auto-generated on Render)
`FLASK_DEBUG`	Enable debug mode	No (default: False)
`PORT`	Server port	No (default: 10000 on Render)

Free Tier Considerations

SignSync is optimized for Render's free tier:

Minimal server load: Flask only serves static files
No GPU required: All ML runs in browser
Small footprint: ~50MB total deployment size
Cold starts: First request may take 30-60 seconds to spin up

API Reference

Routes

Method	Endpoint	Description
GET	`/`	Main application page
GET	`/docs`	"How It Works" documentation
GET	`/health`	Health check endpoint

Health Check Response

{
  "status": "healthy",
  "timestamp": "2024-01-15T12:30:45.123456",
  "version": "2.0.0",
  "message": "Frontend ready - ML model coming soon"
}

Contributing

Contributions are welcome! Here's how to get started:

Development Setup

Fork the repository
Create a feature branch: git checkout -b feature/amazing-feature
Make your changes
Run tests: pytest
Commit: git commit -m 'Add amazing feature'
Push: git push origin feature/amazing-feature
Open a Pull Request

Running Tests

# Run all tests
pytest

# Run with coverage report
pytest --cov=app --cov-report=term-missing

# Run specific test file
pytest tests/test_app.py

Code Style

Python: Follow PEP 8
JavaScript: Use consistent formatting (4-space indentation)
CSS: Use BEM-style class naming

Areas for Contribution

Add support for more ASL gestures
Improve model accuracy
Add mobile-specific optimizations
Create training data collection tools
Write documentation or tutorials
Report and fix bugs

Troubleshooting

Camera Not Working

Issue	Solution
Permission denied	Click the camera icon in browser address bar and allow access
No camera found	Connect a webcam and refresh the page
Camera in use	Close other applications using the camera
HTTPS required	Use localhost or deploy with HTTPS

Model Not Loading

Issue	Solution
404 on model files	Ensure `static/tfjs_model/model.json` exists
Tensor shape mismatch	Verify `model_config.json` matches model architecture
WebGL errors	Try a different browser or disable GPU acceleration

Poor Recognition Accuracy

Issue	Solution
Low confidence scores	Ensure good lighting and plain background
Incorrect predictions	Keep entire upper body in frame
Delayed predictions	Wait for buffer to fill (30 frames)

Deployment Issues

Issue	Solution
Build fails	Check `requirements.txt` for syntax errors
App crashes on start	Verify `SECRET_KEY` is set in production
Slow initial load	Normal on free tier (cold start); wait 30-60 seconds

License

MIT License

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Built with care by inshaal81

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github/workflows		.github/workflows
static		static
template		template
tests		tests
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
app.py		app.py
render.yaml		render.yaml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation