Skip to content

adjacentai/rag-telegram-assistant

Repository files navigation

RAG Telegram Assistant

Python aiogram OpenAI License

An intelligent Telegram bot powered by a Retrieval-Augmented Generation (RAG) pipeline, built from scratch to answer questions based on a custom knowledge base. It handles both text and voice messages, maintains conversation history, and cites its sources.

This project serves as a clear, practical demonstration of how to build a modern LLM assistant without high-level frameworks like LangChain, offering a deep dive into the mechanics of a RAG pipeline.

➡️ For a detailed component breakdown and logic, see ARCHITECTURE.md.

🚀 Key Features

  • Data-Grounded Responses (RAG): The bot uses documents from a knowledge base as its primary source of truth, preventing confabulation.
  • Voice Support: Integrated speech recognition via OpenAI allows users to ask questions using voice messages.
  • Conversation Memory: Remembers recent messages to maintain a coherent dialogue.
  • Source Citations: Cites the source document (by filename) for answers drawn from the knowledge base.
  • Flexible Configuration: Key parameters (GPT model, relevance threshold, chunk size) are managed in a central config file.
  • Usage Limiting: A built-in system to control the number of requests per user.

🛠️ How It Works

  1. Indexing: Local .txt files from the data/vectordb directory are loaded, split into smaller chunks, and vectorized using OpenAI's embedding models. The vectors are stored in a local FAISS index alongside SQLite metadata.
  2. Retrieval: When a user asks a question, it's also vectorized. The system then searches the database for the most semantically similar text chunks.
  3. Generation: The retrieved chunks, conversation history, and the user's original question are combined into a comprehensive prompt, which is sent to the GPT model to generate a final, context-aware answer.

⚙️ Getting Started

Prerequisites

Quick start with Docker (recommended)

The fastest path — Redis and the bot come up in one command.

git clone https://github.com/adjacentai/rag-telegram-assistant.git
cd rag-telegram-assistant
cp .env.example .env       # fill in TELEGRAM_BOT_TOKEN and OPENAI_API_KEY
# drop your .txt knowledge files into data/vectordb/
docker compose up --build  # builds the image, starts Redis, indexes, runs the bot

On first run the bot builds the FAISS index from data/vectordb/*.txt automatically, then starts polling. Subsequent restarts reuse the index. To stop everything:

docker compose down

Manual setup

1. Clone the repository:

git clone https://github.com/adjacentai/rag-telegram-assistant.git
cd rag-telegram-assistant

2. Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows, use: venv\Scripts\activate

3. Install dependencies:

make install

4. Configure environment variables: Copy .env.example to .env and fill in the values:

cp .env.example .env

5. Prepare your knowledge base: Place your custom .txt files into the data/vectordb directory.

6. Start Redis (required for FSM storage):

make up        # starts a local Redis container on :6379

7. Build the vector index: Run this once, and again whenever the knowledge base changes:

make index

8. Run the bot:

make run

When you're done:

make down      # stops the Redis container

Your assistant is now live and ready to chat in Telegram.

About

An intelligent Telegram bot powered by a Retrieval-Augmented Generation (RAG) pipeline, built from scratch to answer questions based on a custom knowledge base.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages