🎥 YouTube Transcript RAG Chatbot

A Retrieval-Augmented Generation (RAG) system that enables users to converse with YouTube videos and retrieve timestamp-grounded insights across multiple videos.

The application combines transcript extraction, semantic search, conversational memory, and LLM-powered reasoning to provide explainable answers linked directly to relevant video segments.

🚀 Key Features

📺 Single Video Conversational Assistant

Chat with any YouTube video using its transcript
Multi-turn conversations with chat history awareness
Follow-up question handling through query rewriting
Transcript-grounded responses with optional LLM reasoning when context is insufficient

🔍 Multi-Video Knowledge Retrieval

Search YouTube using a natural language query
Automatically discover and retrieve relevant videos
Aggregate information across multiple transcripts
Return timestamp-grounded answers from different videos
Surface direct video references for further exploration

⏱ Timestamp-Aware Retrieval

Unlike traditional transcript chunking approaches, this project preserves temporal information throughout the retrieval pipeline.

Custom time-based transcript chunking
Timestamp metadata attached to every chunk
Source-grounded retrieval
Direct navigation to relevant video segments

🏗️ System Architecture

Single Video Mode

User Question → Query Rewriting → Transcript Retrieval → FAISS Similarity Search → LLM Response Generation → Conversational Memory Update

Multi Video Mode

User Query → YouTube Video Discovery → Transcript Extraction → Timestamp-Aware Chunking → Embedding Generation → FAISS Vector Search → Multi-Video Retrieval → LLM Summarization → Timestamp-Grounded Answers

🛠️ Tech Stack

LLM & Retrieval

HuggingFace Inference API
Meta Llama 3 8B Instruct
Google Gemini 2.5 Flash
LangChain
FAISS Vector Store
Sentence Transformers

Data Sources

YouTube Transcript API
yt-dlp

Frontend

Streamlit

Backend

Python

⚙️ Setup

Clone Repository

git clone https://github.com/YOUR_USERNAME/YOUR_REPO_NAME.git
cd YOUR_REPO_NAME

Create Virtual Environment

python -m venv myenv

Activate Environment

Windows

myenv\Scripts\activate

Linux / MacOS

source myenv/bin/activate

Install Dependencies

pip install -r requirements.txt

🔑 Environment Variables

Create a .env file in the project root:

HUGGINGFACEHUB_API_TOKEN=YOUR_TOKEN

▶️ Run Application

streamlit run app.py

💡 Engineering Challenges Solved

Preserving Video Timestamps During Retrieval

Traditional text chunking loses temporal information, making it difficult to trace answers back to the source video.

A custom timestamp-aware chunker was implemented to:

preserve transcript timing information
maintain source attribution
enable timestamp-grounded responses

Multi-Video Answer Aggregation

Responses are consolidated across multiple videos while limiting duplication and prioritizing the most relevant evidence from each source.

Conversational Retrieval

Follow-up questions are rewritten into standalone queries before retrieval, improving retrieval quality and enabling natural conversations over video content.

🔮 Future Improvements

Hybrid Search (Dense + BM25)
Cross-Encoder Reranking
Persistent Vector Database (Chroma / Qdrant)
Streaming Responses
Whisper-based Audio Transcription
Multi-language Support
Agentic Video Research Workflow
Evaluation Pipeline for Retrieval Quality

🏷️ Topics

rag genai langchain faiss huggingface streamlit youtube chatbot retrieval-augmented-generation semantic-search vector-database llm conversational-ai

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.gitignore		.gitignore
README.md		README.md
app.py		app.py
app2.py		app2.py
requirements.txt		requirements.txt
reranker.py		reranker.py
transcript_extract.py		transcript_extract.py
vid_ques_pipeline.py		vid_ques_pipeline.py
vid_timestamp_fetcher_pipeline.py		vid_timestamp_fetcher_pipeline.py
youtube_vid_search.py		youtube_vid_search.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎥 YouTube Transcript RAG Chatbot

🚀 Key Features

📺 Single Video Conversational Assistant

🔍 Multi-Video Knowledge Retrieval

⏱ Timestamp-Aware Retrieval

🏗️ System Architecture

Single Video Mode

Multi Video Mode

🛠️ Tech Stack

LLM & Retrieval

Data Sources

Frontend

Backend

⚙️ Setup

Clone Repository

Create Virtual Environment

Activate Environment

Windows

Linux / MacOS

Install Dependencies

🔑 Environment Variables

▶️ Run Application

💡 Engineering Challenges Solved

Preserving Video Timestamps During Retrieval

Multi-Video Answer Aggregation

Conversational Retrieval

🔮 Future Improvements

🏷️ Topics

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🎥 YouTube Transcript RAG Chatbot

🚀 Key Features

📺 Single Video Conversational Assistant

🔍 Multi-Video Knowledge Retrieval

⏱ Timestamp-Aware Retrieval

🏗️ System Architecture

Single Video Mode

Multi Video Mode

🛠️ Tech Stack

LLM & Retrieval

Data Sources

Frontend

Backend

⚙️ Setup

Clone Repository

Create Virtual Environment

Activate Environment

Windows

Linux / MacOS

Install Dependencies

🔑 Environment Variables

▶️ Run Application

💡 Engineering Challenges Solved

Preserving Video Timestamps During Retrieval

Multi-Video Answer Aggregation

Conversational Retrieval

🔮 Future Improvements

🏷️ Topics

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages