Skip to content

BadrinathanTV/LipNet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LipNet: End-to-End Sentence-Level Lipreading

This project is a complete implementation of a LipNet model, capable of translating video sequences of human lip movements into text sentences. The architecture employs spatiotemporal convolutional networks (3D CNNs) combined with Recurrent Neural Networks (BiGRUs) and Connectionist Temporal Classification (CTC) loss to perform word-level and sentence-level predictions purely from visual data without audio.

🚀 Features

  • End-to-End Deep Learning Architecture: Built with TensorFlow and Keras, fully leveraging GPU acceleration for both training and inference.
  • FastAPI Backend: An asynchronous, lightweight REST API serving predictions dynamically.
  • Interactive UI: A vanilla Web frontend allowing you to test videos directly against the model. It automatically serves static video files alongside predictions.
  • Built-in Evaluation Suite: Dedicated evaluation script (evaluate.py) to systematically measure Character Error Rate (CER) and Word Error Rate (WER) across datasets.
  • Blazing Fast Setup: Managed with uv for strict, lightning-fast dependency resolution.

🏗️ Project Structure

.
├── backend/                  # FastAPI Application and endpoints
│   ├── app.py                # Main backend API entry point
│   ├── data.py               # Dataset processing, normalization, and tokenization logic
│   ├── model.py              # LipNet 3D CNN + BiGRU Architecture definition
│   └── predict.py            # Inference wrapper for the Checkpoint
├── data/
│   ├── s1/                   # GRID corpus .mpg video samples
│   └── alignments/s1/        # Ground-truth .align transcription files
├── frontend/                 # Static web assets
│   ├── index.html
│   ├── script.js
│   └── style.css
├── models/                   # TensorFlow Checkpoints (`checkpoint` files)
├── evaluate.py               # Comprehensive CER/WER evaluation script
├── pyproject.toml            # Project metdata and dependencies
├── run.sh                    # Quick-start script for the backend
└── uv.lock                   # Deterministic dependency lockfile

⚙️ Setup Instructions

Prerequisites

  • Python 3.13+
  • (Optional but Reccomended) NVIDIA GPU + CUDA drivers for accelerated prediction and training.

Installation

We use uv for dependency management.

  1. Install uv if you haven't already:
curl -LsSf https://astral.sh/uv/install.sh | sh
  1. Clone this repository and navigate to the project directory.
  2. Sync the dependencies into your environment:
uv sync

🖥️ Running the Application

1. Start the Server

Start the FastAPI backend server using the provided shell script or directly via uv:

# Using the shell script
bash run.sh

# Or directly via uv
uv run uvicorn backend.app:app --reload

The server will initialize on http://127.0.0.1:8000.

2. Access the User Interface

The FastAPI application automatically mounts and serves the static frontend content on the root address. Simply open your browser and navigate to: 👉 http://localhost:8000/

From here, you can select videos from the connected GRID corpus or upload custom .mpg files to see real-time lipreading predictions!


📊 Evaluation & Metrics

The project includes an evaluation script (evaluate.py) that calculates the Character Error Rate (CER) and Word Error Rate (WER) across the s1 dataset directory using the Levenshtein distance metric.

On a locally provided benchmark of 1,000 samples, the LipNet model achieves a strong accuracy of ~1.65% WER and ~0.69% CER.

To run the evaluation yourself:

# Run over a subset (e.g. 50 files)
uv run python evaluate.py --num_samples 50

# Run evaluation over the entire dataset
uv run python evaluate.py --num_samples -1

The script will output predictions alongside ground-truth text for every sample and summarize the final metrics at the end.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors