LipNet: End-to-End Sentence-Level Lipreading

This project is a complete implementation of a LipNet model, capable of translating video sequences of human lip movements into text sentences. The architecture employs spatiotemporal convolutional networks (3D CNNs) combined with Recurrent Neural Networks (BiGRUs) and Connectionist Temporal Classification (CTC) loss to perform word-level and sentence-level predictions purely from visual data without audio.

🚀 Features

End-to-End Deep Learning Architecture: Built with TensorFlow and Keras, fully leveraging GPU acceleration for both training and inference.
FastAPI Backend: An asynchronous, lightweight REST API serving predictions dynamically.
Interactive UI: A vanilla Web frontend allowing you to test videos directly against the model. It automatically serves static video files alongside predictions.
Built-in Evaluation Suite: Dedicated evaluation script (evaluate.py) to systematically measure Character Error Rate (CER) and Word Error Rate (WER) across datasets.
Blazing Fast Setup: Managed with uv for strict, lightning-fast dependency resolution.

🏗️ Project Structure

.
├── backend/                  # FastAPI Application and endpoints
│   ├── app.py                # Main backend API entry point
│   ├── data.py               # Dataset processing, normalization, and tokenization logic
│   ├── model.py              # LipNet 3D CNN + BiGRU Architecture definition
│   └── predict.py            # Inference wrapper for the Checkpoint
├── data/
│   ├── s1/                   # GRID corpus .mpg video samples
│   └── alignments/s1/        # Ground-truth .align transcription files
├── frontend/                 # Static web assets
│   ├── index.html
│   ├── script.js
│   └── style.css
├── models/                   # TensorFlow Checkpoints (`checkpoint` files)
├── evaluate.py               # Comprehensive CER/WER evaluation script
├── pyproject.toml            # Project metdata and dependencies
├── run.sh                    # Quick-start script for the backend
└── uv.lock                   # Deterministic dependency lockfile

⚙️ Setup Instructions

Prerequisites

Python 3.13+
(Optional but Reccomended) NVIDIA GPU + CUDA drivers for accelerated prediction and training.

Installation

We use uv for dependency management.

Install uv if you haven't already:

curl -LsSf https://astral.sh/uv/install.sh | sh

Clone this repository and navigate to the project directory.
Sync the dependencies into your environment:

uv sync

🖥️ Running the Application

1. Start the Server

Start the FastAPI backend server using the provided shell script or directly via uv:

# Using the shell script
bash run.sh

# Or directly via uv
uv run uvicorn backend.app:app --reload

The server will initialize on http://127.0.0.1:8000.

2. Access the User Interface

The FastAPI application automatically mounts and serves the static frontend content on the root address. Simply open your browser and navigate to: 👉 http://localhost:8000/

From here, you can select videos from the connected GRID corpus or upload custom .mpg files to see real-time lipreading predictions!

📊 Evaluation & Metrics

The project includes an evaluation script (evaluate.py) that calculates the Character Error Rate (CER) and Word Error Rate (WER) across the s1 dataset directory using the Levenshtein distance metric.

On a locally provided benchmark of 1,000 samples, the LipNet model achieves a strong accuracy of ~1.65% WER and ~0.69% CER.

To run the evaluation yourself:

# Run over a subset (e.g. 50 files)
uv run python evaluate.py --num_samples 50

# Run evaluation over the entire dataset
uv run python evaluate.py --num_samples -1

The script will output predictions alongside ground-truth text for every sample and summarize the final metrics at the end.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LipNet: End-to-End Sentence-Level Lipreading

🚀 Features

🏗️ Project Structure

⚙️ Setup Instructions

Prerequisites

Installation

🖥️ Running the Application

1. Start the Server

2. Access the User Interface

📊 Evaluation & Metrics

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
backend		backend
data		data
frontend		frontend
models		models
.python-version		.python-version
README.md		README.md
debug_frame.png		debug_frame.png
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

LipNet: End-to-End Sentence-Level Lipreading

🚀 Features

🏗️ Project Structure

⚙️ Setup Instructions

Prerequisites

Installation

🖥️ Running the Application

1. Start the Server

2. Access the User Interface

📊 Evaluation & Metrics

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages