Finnish Speech Recognition using LSTM-CTC

A deep learning model for Finnish speech recognition using LSTM networks with CTC (Connectionist Temporal Classification) loss.

Overview

This project implements an LSTM-based speech recognition system specifically trained for Finnish language audio. It utilizes both Kielipankki and Hugging Face datasets for training and evaluation.

Prerequisites

Python 3.12.7+

Install dependencies

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
pip install -r requirements.txt

exact versions used in training and running locally defined in freeze.txt notice pywin32 commented out in requirements.txt

Directory Structure

data/
├── kielipankki/
│   ├── dev-test/
│   ├── set1-part1/
│   └── set1-part2/
└── hf/
    └── [huggingface_dataset (common_voice_17)]

Kielipankki Data

For all datasets except dev-test, place the inner folder from each part directly into the kielipankki directory.

Hugging Face Data

If not pre-downloaded, you'll need to provide your Hugging Face token during runtime.

Basic Training

python train.py \
    --data /path/to/data/folder \
    --lr 0.001 \
    --epochs 100 \
    --batch_size 32 \
    --num_workers 4

Using Pre-downloaded Data

python train.py \
    --data /path/to/data/folder \
    --predl true \
    --batch_size 32

Using Hugging Face Datasets

python train.py \
    --data /path/to/data/folder \
    --token YOUR_HUGGINGFACE_TOKEN \
    --batch_size 32

Command Line Arguments

Argument	Description	Default
`--data`	Path to data directory	Required
`--token`	Hugging Face API token	None
`--predl`	Use pre-downloaded data	False
`--lr`	Learning rate	0.001
`--epochs`	Number of training epochs	100
`--batch_size`	Batch size for training	32
`--num_workers`	Number of data loading workers	4

Model Architecture

The model uses a bidirectional LSTM architecture with CTC loss for sequence-to-sequence learning.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
data		data
examples		examples
notebooks		notebooks
output		output
src		src
.gitignore		.gitignore
README.md		README.md
apptainer.def		apptainer.def
freeze.txt		freeze.txt
model_comparison_results.csv		model_comparison_results.csv
model_comparison_results_best.csv		model_comparison_results_best.csv
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Finnish Speech Recognition using LSTM-CTC

Overview

Prerequisites

Install dependencies

Directory Structure

Kielipankki Data

Hugging Face Data

Basic Training

Using Pre-downloaded Data

Using Hugging Face Datasets

Command Line Arguments

Model Architecture

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Finnish Speech Recognition using LSTM-CTC

Overview

Prerequisites

Install dependencies

Directory Structure

Kielipankki Data

Hugging Face Data

Basic Training

Using Pre-downloaded Data

Using Hugging Face Datasets

Command Line Arguments

Model Architecture

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages