GitHub - hritika20002/transcript_analyzer: Python tool to evaluate Automatic Speech Recognition (ASR) outputs by calculating Word Error Rate (WER) for multiple models. Ideal for NLP research, ASR benchmarking, and improving transcription accuracy.

Transcript Analyzer (ASR Evaluation) Description Transcript Analyzer is a Python-based tool for evaluating Automatic Speech Recognition (ASR) model outputs by calculating Word Error Rate (WER) against a reference transcript. The tool supports comparison of multiple transcription outputs to benchmark model performance. This project was developed as part of an academic research and coursework project focused on speech recognition and natural language processing evaluation.

Technologies Used Python Natural Language Processing (NLP) jiwer (WER evaluation library)

Project Features Loads and processes reference and hypothesis transcripts Computes Word Error Rate (WER) for multiple ASR outputs Ranks transcription models based on accuracy Simple, script-based evaluation workflow

Learning Objectives ASR evaluation metrics (WER) NLP text preprocessing and comparison Experimental benchmarking of ML models Research-oriented scripting and result analysis

Future Improvements

Advanced ASR Metrics Add Character Error Rate (CER) for languages with complex morphology Support Sentence Error Rate (SER) for full-utterance evaluation Why it matters: Shows deeper understanding of speech recognition evaluation beyond WER.
Text Normalization Pipeline Normalize text before evaluation (lowercasing, punctuation removal, number expansion) Handle fillers (uh, um) and hesitations Why it matters: This is exactly what real ASR research pipelines do.
Model-wise Result Export Export evaluation results to CSV or JSON Include model name, WER score, and ranking Why it matters: Makes results reproducible and research-ready.
Visualization of Results Plot WER comparison using matplotlib / seaborn Bar charts comparing ASR models Why it matters: Recruiters love seeing “analysis + visualization”.
Dataset Scaling Support multiple reference transcripts Batch evaluation across datasets Why it matters: Moves the project from “assignment” → “research tool”.
Command-Line Interface (CLI) Run evaluations using CLI arguments: python evaluate.py --data data/ --metric wer Why it matters: Signals engineering maturity.
Integration with ASR Models Directly evaluate outputs from: OpenAI Whisper Google Speech-to-Text Vosk / Wav2Vec2 Why it matters: This connects ML theory to real systems.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
src		src
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages