SpExNote 🎙️

Intelligent Chip-based Lecture Recording System with SpEx+ Speaker Extraction and RAG-Enhanced Automated Summarization for Medical Physics Education

Overview

Survey shows 83.63% of dual-degree students need automated lecture assistance due to intensive schedules. SpExNote integrates IoT hardware with a multi-stage AI pipeline — Silero-VAD, SpEx+ speaker extraction, Thonburian-Whisper Thai ASR, and Qwen-RAG — enabling efficient lecture tracking for students with scheduling conflicts.

System Architecture

The full pipeline consists of three major stages:

Audio Input → [Silero-VAD] → [Thonburian-Whisper ASR] → [Qwen + RAG] → Lecture Summary
                              (deployed pipeline)

Audio Data → [Silero-VAD + ClearVoice] → [Pairing & Mixing] → SpEx+ Fine-tune Dataset
                                          (dataset creation only)

Note: ClearerVoice-Studio is used only during dataset creation for SpEx+ fine-tuning. In the actual deployment pipeline, it is intentionally omitted — Thonburian-Whisper was trained on noisy audio and performs better with natural noise present, rather than pre-enhanced audio.

Repository Structure

SpExNote/
├── assets/
│   └── pipeline.png              # System architecture diagram
├── poster/
│   └── SpExNote_poster.pdf       # Research poster
├── notebooks/
│   ├── Fine_tune_SpEx_Plus.ipynb # Dataset preparation & SpEx+ fine-tuning (Google Colab)
│   └── Deploy_SpExNote_RAG_LLM.ipynb  # Deployment pipeline with Gradio UI (Google Colab)
├── data/
│   └── youtube_urls.txt          # YouTube URLs used for dataset creation
├── LICENSE
└── README.md

Note: Raw audio files are not included. The dataset was derived from publicly available YouTube content (see data/youtube_urls.txt). Trained model weights are not included due to file size constraints.

Notebooks

1. `Fine_tune_SpEx_Plus.ipynb` — Dataset Preparation & SpEx+ Fine-tuning

Purpose: Creates the training dataset and fine-tunes the SpEx+ speaker extraction model.

Pipeline:

Download audio from YouTube using yt-dlp
Apply Silero-VAD to segment speech regions
Apply ClearerVoice-Studio (FRCRN_SE_16K) to enhance audio segments
Pair clean (single-speaker) + interfere (multi-speaker) segments to create mix / clean / ref triplets
Fine-tune SpEx+ model on the resulting dataset

Dataset sources: See data/youtube_urls.txt — Thai educational YouTube videos (single-speaker and multi-speaker), totaling ~15.28 hours of audio.

Base model: SpEx+ by gemengtju/SpEx_Plus

Fine-tuning result: Best Val Loss: -8.6007 — the model extracts teacher's voice but with residual noise from unfrozen decoder. This stage is experimental and not integrated into the current deployment.

2. `Deploy_SpExNote_RAG_LLM.ipynb` — Deployment Pipeline

Purpose: Full inference pipeline deployed as a Gradio web app.

Pipeline:

Silero-VAD — detect speech segments
Thonburian-Whisper — Thai ASR transcription
Qwen Embedding + FAISS — embed lecture PDF documents into vector DB
Qwen LLM + RAG — generate context-aware lecture summary from transcript + PDF documents

Dataset Sources

Audio data for SpEx+ fine-tuning was sourced from Thai educational YouTube content. See data/youtube_urls.txt for the full list.

Category	Description
Single-speaker	Thai lecture recordings (used as target/clean speaker)
Multi-speaker	Thai discussion/panel videos (used as interference)
Noise sample	Classroom ambient noise

These videos are used solely for academic, non-commercial research purposes. No audio files are redistributed in this repository.

Dependencies

# Core
pip install torch torchaudio
pip install yt-dlp
pip install gradio
pip install faiss-cpu

# ASR
# Thonburian-Whisper: https://github.com/biodatlab/thonburian-whisper

# VAD
# Silero-VAD: loaded via torch.hub from snakers4/silero-vad

# Speech Enhancement (dataset creation only)
# ClearerVoice-Studio: https://github.com/modelscope/ClearerVoice-Studio

# LLM / Embedding
# Qwen: https://github.com/QwenLM/Qwen

Acknowledgements & Third-Party Licenses

This project builds upon the following open-source works:

Component	Role	Repository	License
SpEx+	Speaker extraction model (base)	gemengtju/SpEx_Plus	MIT
Silero-VAD	Voice activity detection	snakers4/silero-vad	CC BY-NC-SA 4.0 (models) / MIT (code)
Thonburian-Whisper	Thai ASR	biodatlab/thonburian-whisper	MIT
Qwen	LLM + Embedding for RAG	QwenLM/Qwen	Apache 2.0
ClearerVoice-Studio	Speech enhancement (dataset only)	modelscope/ClearerVoice-Studio	Apache 2.0

Important License Notes

Silero-VAD models are licensed under CC BY-NC-SA 4.0 — non-commercial use only. This project is academic/non-commercial research.
Qwen and ClearerVoice-Studio are Apache 2.0 licensed — permissive for modification and redistribution.

Results

SpExNote successfully integrates IoT hardware with a multi-stage AI pipeline:

SpEx+ fine-tuned with 15.28 hours of Thai audio effectively extracts the teacher's voice
Despite residual noise from the unfrozen decoder, it surprisingly improves Thonburian-Whisper's ASR performance
RAG ensures accurate, document-grounded summaries, addressing 83.63% of students' needs

Future work: Extended fine-tuning of TSE, ASR, and LLM components; expansion to other courses.

License

This project's own code is released under the MIT License — see LICENSE.

Note that third-party model weights and components retain their original licenses as listed above. Users must comply with each component's license terms, particularly the non-commercial restriction of Silero-VAD models.

Citation

If you use this work, please cite:

Tamprasert, N., & Panchalal, P. (2025). SpExNote: Intelligent Chip-based Lecture Recording
System with SpEx+ Speaker Extraction and RAG-Enhanced Automated Summarization for Medical
Physics Education. KMITL.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SpExNote 🎙️

Overview

System Architecture

Repository Structure

Notebooks

1. `Fine_tune_SpEx_Plus.ipynb` — Dataset Preparation & SpEx+ Fine-tuning

2. `Deploy_SpExNote_RAG_LLM.ipynb` — Deployment Pipeline

Dataset Sources

Dependencies

Acknowledgements & Third-Party Licenses

Important License Notes

Results

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
data		data
notebooks		notebooks
poster		poster
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

SpExNote 🎙️

Overview

System Architecture

Repository Structure

Notebooks

1. Fine_tune_SpEx_Plus.ipynb — Dataset Preparation & SpEx+ Fine-tuning

2. Deploy_SpExNote_RAG_LLM.ipynb — Deployment Pipeline

Dataset Sources

Dependencies

Acknowledgements & Third-Party Licenses

Important License Notes

Results

License

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. `Fine_tune_SpEx_Plus.ipynb` — Dataset Preparation & SpEx+ Fine-tuning

2. `Deploy_SpExNote_RAG_LLM.ipynb` — Deployment Pipeline

Packages