A powerful, Retrieval-Augmented Generation (RAG) powered customer support chatbot designed specifically for CrocoIT (Integrated Solutions Gate Inc.). This tool scrapes the official Croco IT website, processes the data into a local vector database, and uses Google's Gemini 2.5 Flash model to provide highly accurate, context-aware answers to user inquiries.
- Automated Web Scraping: Uses LangChain's
RecursiveUrlLoaderandBeautifulSoupto recursively crawl and extract text fromhttps://crocoit.com. - Intelligent Document Retrieval (RAG): Splits extracted website data into manageable chunks and converts them into searchable vector embeddings.
- Local Vector Database: Utilizes
ChromaDBfor fast, local similarity search without relying on external cloud vector stores. - State-of-the-Art LLM: Powered by
Google Gemini 2.5 Flashto generate professional, accurate responses based only on the retrieved company context. - Interactive CLI Interface: Provides a seamless, terminal-based chatting experience for testing and interacting with the bot.
- Ingestion: Extracts HTML from the target website and cleans it using BeautifulSoup.
- Chunking: Splits the text using
RecursiveCharacterTextSplitter(chunk size: 1000, overlap: 200). - Embedding: Embeds the text chunks using HuggingFace's
all-MiniLM-L6-v2model. - Storage: Saves the embeddings locally in a
chroma_dbdirectory. - Generation: Retrieves the most relevant chunks using Maximum Marginal Relevance (MMR) and passes them to the Gemini LLM with a strict system prompt to formulate the final answer.
Ensure you have Python 3.9+ installed on your machine. You will also need a Google Gemini API Key.
-
Clone the repository:
git clone [https://github.com/yourusername/CrocoIT_RAG.git](https://github.com/yourusername/CrocoIT_RAG.git) cd CrocoIT_RAG -
Create a Virtual Environment (Recommended):
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install Dependencies: All required packages are listed in
requirements.txt.pip install -r requirements.txt
This project securely manages API keys using environment variables. Before running the notebook, you must set up your local environment file.
-
Create your
.envfile: Copy the provided.env.exampletemplate to create a new.envfile in the root directory.cp .env.example .env
-
Add your API Key: Open the newly created
.envfile and insert your actual Google Gemini API key into theGEMINI_API_KEYvariable:APP_NAME="CrocoIT Chatbot" APP_VERSION="0.1" GEMINI_API_KEY="your_actual_api_key_here"
(Note: Your
.envfile is safely ignored by Git to prevent your API key from leaking).
(Optional Note: If you haven't already, you may want to install the python-dotenv package and add load_dotenv() to the top of your notebook so Jupyter can automatically read the variables from the .env file into os.environ!)
- Open the Jupyter Notebook:
jupyter notebook Notebook.ipynb
- Run all cells in the notebook sequentially.
- The notebook will first scrape the CrocoIT website and generate the required embeddings (this may take a moment on the first run).
- Once the database (
chroma_db/) is built, it will initialize the bot interface in the final cell.
- Chat with the bot:
Interact with the bot directly in the cell output. Type your questions, and type
exitorquitto gracefully shut down the chatbot.
├── Deployment
│ ├── Dockerfile
│ ├── app.py
│ └── requirements.txt
├── assets
│ └── logo.jfif
├── .env.example
├── .gitignore
├── Notebook.ipynb
├── README.md
└── requirements.txt
- Data Persistence: The vector database is saved locally in the
chroma_dbfolder. To force a fresh scrape and re-embedding of the website, delete this folder before running the notebook. - Fallback Protocol: The bot's system prompt is strictly configured. If it cannot find the answer within the scraped website context, it will not hallucinate; it will apologize and direct the user to contact
info@crocoit.com.
-
User Question: What services does CrocoIT offer?
CrocoIT offers a variety of services, including: * Business Coaching * Design and development of custom e-commerce platforms (online stores) * Web application development * Commerce mobile apps * Mobile applications for Android and iOS * Building amazing web and mobile apps * ERP systems
-
User Question: What is 2Loyal?
2Loyal is an AI-powered loyalty management software designed to enhance customer retention. It offers customizable rewards, real-time advanced analytics, and a flexible points system to drive engagement across all digital channels. Its purpose is to foster lasting relationships and ensure customers feel valued and connected to a business.