This is the code used for the SARAL x RAG Chatbot streamlit application.
#Make sure you have anaconda installed.
git clone https://github.com/botmahn/saral_chatbot.git
cd saral_chatbot
conda env create -f environment.yml
streamlit run app.py
- PDF and LaTeX support.
- Automated chunking and embedding into ChromaDB.
- Ollama backend.
- MMR retrieval.
- LangChain Orchestration.
- LangSmith Logging.
- Provenance of chunks and section associations.
- Chat History Retention.
- Evaluation based on coverage, number of chunks, number of sentences, etc.
- Run the app.
- Upload the main.tex file of a paper (I used the DeepSeek-OCR paper) in the sidebar.
- Set the DB paths.
- The backend is ollama. Set the ollama model in the sidebar.
- Enter the query (more details --> better results).
- Choose the speaker script duration in the buttons above the chat input placeholder.
- Run.
- Once the generation starts, we can observe the retrieved chunks and their parent sections.
- After the initial generation is over, we can chat with the model and choose to refine the slides.
- Observe the coverage and evaluation in the "evaluation" tab.