Skip to content

Youssef3082004/CrocoIT_RAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CrocoIT RAG Chatbot

KemetPass Logo

Python Jupyter LangChain Google Gemini Hugging Face ChromaDB

📑 Description

A powerful, Retrieval-Augmented Generation (RAG) powered customer support chatbot designed specifically for CrocoIT (Integrated Solutions Gate Inc.). This tool scrapes the official Croco IT website, processes the data into a local vector database, and uses Google's Gemini 2.5 Flash model to provide highly accurate, context-aware answers to user inquiries.

✨ Features

  • Automated Web Scraping: Uses LangChain's RecursiveUrlLoader and BeautifulSoup to recursively crawl and extract text from https://crocoit.com.
  • Intelligent Document Retrieval (RAG): Splits extracted website data into manageable chunks and converts them into searchable vector embeddings.
  • Local Vector Database: Utilizes ChromaDB for fast, local similarity search without relying on external cloud vector stores.
  • State-of-the-Art LLM: Powered by Google Gemini 2.5 Flash to generate professional, accurate responses based only on the retrieved company context.
  • Interactive CLI Interface: Provides a seamless, terminal-based chatting experience for testing and interacting with the bot.

🧠 Architecture

  1. Ingestion: Extracts HTML from the target website and cleans it using BeautifulSoup.
  2. Chunking: Splits the text using RecursiveCharacterTextSplitter (chunk size: 1000, overlap: 200).
  3. Embedding: Embeds the text chunks using HuggingFace's all-MiniLM-L6-v2 model.
  4. Storage: Saves the embeddings locally in a chroma_db directory.
  5. Generation: Retrieves the most relevant chunks using Maximum Marginal Relevance (MMR) and passes them to the Gemini LLM with a strict system prompt to formulate the final answer.

🚀 Getting Started

Prerequisites

Ensure you have Python 3.9+ installed on your machine. You will also need a Google Gemini API Key.

Installation

  1. Clone the repository:

    git clone [https://github.com/yourusername/CrocoIT_RAG.git](https://github.com/yourusername/CrocoIT_RAG.git)
    cd CrocoIT_RAG
  2. Create a Virtual Environment (Recommended):

    python -m venv venv
    source venv/bin/activate  # On Windows use `venv\Scripts\activate`
  3. Install Dependencies: All required packages are listed in requirements.txt.

    pip install -r requirements.txt

Configuration

This project securely manages API keys using environment variables. Before running the notebook, you must set up your local environment file.

  1. Create your .env file: Copy the provided .env.example template to create a new .env file in the root directory.

    cp .env.example .env
  2. Add your API Key: Open the newly created .env file and insert your actual Google Gemini API key into the GEMINI_API_KEY variable:

    APP_NAME="CrocoIT Chatbot"
    APP_VERSION="0.1"
    GEMINI_API_KEY="your_actual_api_key_here"

    (Note: Your .env file is safely ignored by Git to prevent your API key from leaking).

(Optional Note: If you haven't already, you may want to install the python-dotenv package and add load_dotenv() to the top of your notebook so Jupyter can automatically read the variables from the .env file into os.environ!)

💻 Usage

  1. Open the Jupyter Notebook:
    jupyter notebook Notebook.ipynb
  2. Run all cells in the notebook sequentially.
    • The notebook will first scrape the CrocoIT website and generate the required embeddings (this may take a moment on the first run).
    • Once the database (chroma_db/) is built, it will initialize the bot interface in the final cell.
  3. Chat with the bot: Interact with the bot directly in the cell output. Type your questions, and type exit or quit to gracefully shut down the chatbot.

📂 Project Structure

├── Deployment
│   ├── Dockerfile
│   ├── app.py
│   └── requirements.txt
├── assets
│   └── logo.jfif
├── .env.example
├── .gitignore
├── Notebook.ipynb
├── README.md
└── requirements.txt

⚠️ Important Notes

  • Data Persistence: The vector database is saved locally in the chroma_db folder. To force a fresh scrape and re-embedding of the website, delete this folder before running the notebook.
  • Fallback Protocol: The bot's system prompt is strictly configured. If it cannot find the answer within the scraped website context, it will not hallucinate; it will apologize and direct the user to contact info@crocoit.com.

💻 Testing RAG Chatbot

  1. User Question: What services does CrocoIT offer?

    CrocoIT offers a variety of services, including:
    *   Business Coaching
    *   Design and development of custom e-commerce platforms (online stores)
    *   Web application development
    *   Commerce mobile apps
    *   Mobile applications for Android and iOS
    *   Building amazing web and mobile apps
    *   ERP systems
    
  2. User Question: What is 2Loyal?

    2Loyal is an AI-powered loyalty management software designed to enhance customer retention. It offers customizable rewards, real-time advanced analytics, and a flexible points system to drive engagement across all digital channels. Its purpose is to foster lasting relationships and ensure customers feel valued and connected to a business.

Built for the CrocoIT Company Interview Task

About

a smart chatbot that knows everything about CrocoIT's website and helps answer customer questions instantly.

Topics

Resources

Stars

Watchers

Forks

Contributors