A retrieval-augmented chatbot for the University of St.Gallen Executive Education programmes. The current system covers EMBA HSG, IEMBA HSG, and emba X, supports English and German, and combines scraping, document import, vector retrieval, and a Gradio-based chat interface.
- A multi-agent RAG chat application for programme information and admissions guidance
- A scraping and import pipeline for keeping programme content up to date
- Weaviate-based retrieval across language-specific collections
- A Gradio chat UI plus a separate database management UI
- A growing pytest suite for consent flow, scraping, prompts, and formatting
- Programme-specific support for EMBA HSG, IEMBA HSG, and emba X
- Language handling for English and German
- Lead-agent routing with programme-specific sub-agents
- Response formatting, ambiguity checks, scope guarding, and quality fallback handling
- Booking / handover flow with advisor-specific widgets
- Consent handling and user-profile tracking
- Scraping, chunking, import, and Weaviate collection management
HSG_RAG/
├── docs/ # Architecture and operations documentation
├── src/
│ ├── apps/
│ │ ├── chat/ # Gradio chatbot application
│ │ └── dbapp/ # Database management UI
│ ├── config/ # Runtime config loader
│ ├── const/ # Static response and content constants
│ ├── database/ # Weaviate services and collection strategies
│ ├── notification/ # Notification helpers
│ ├── pipeline/ # Import pipeline orchestration
│ ├── rag/ # Agent chain, prompts, formatting, scope handling
│ ├── scraping/ # Scraper, HTML processing, URL normalization
│ └── utils/ # Shared utilities
├── tests/ # Pytest suite
├── tools/ # Operational scripts
├── config.py # Repository-level default settings
├── main.py # Main CLI entry point
├── pytest.ini # Default pytest behaviour
└── requirements.txt # Python dependencies
Required values depend on the mode you want to run.
See .env.example and docs/configuration_system_documentation.md for the full configuration surface.
Following variables are required for every mode to run:
OPENAI_API_KEY=...
OPEN_ROUTER_API_KEY=...
WEAVIATE_API_KEY=...
WEAVIATE_CLUSTER_URL=...Optional but commonly useful:
LANGSMITH_TRACING=true
LANGSMITH_API_KEY=...
LANGSMITH_PROJECT=...
LANGSMITH_ENDPOINT=https://api.smith.langchain.com
GROQ_API_KEY=...This application can be run locally or on a cloud VM using Docker.
- Install Docker on your machine/VM
- Clone this repository
- Fill the
.envfile with all required environment variables (copy from .env.example)
You can build the container using the following command (recommended):
docker build --no-cache -t hsg-rag .You can use this command to start the container:
docker run --env-file .env \
-p 7860:7860 \
--name hsg-rag \
hsg-ragAfter starting the container, open your browser and go to:
http://localhost:7860(or http://:7860 on a server)
The application can be run directly from the project's root directory.
- Clone the repository.
- Create and activate a virtual environment.
python -m venv venv
source venv/bin/activate- Install dependencies.
pip install -r requirements.txt- Create a local
.envfile from.env.example.
Start the chat UI in German:
python main.py --app deStart the chat UI in English:
python main.py --app enShow all CLI options:
python main.py --helpUseful operational commands:
python main.py --scrape
python main.py --scrape --full_scrape
python main.py --imports path/to/file1 path/to/file2
python main.py --weaviate checkhealth
python main.py --weaviate init
python main.py --weaviate redo
python main.py --dbappEmbedding model changes require a Weaviate collection rebuild and re-import:
python main.py --weaviate redo
python main.py --scrape
# plus python main.py --imports ... for any local source files you maintainThe default cloud embedding path uses OpenRouter openai/text-embedding-3-small
and stores app-generated vectors in Weaviate. The existing scraper restoration
flow is unchanged.
The default pytest configuration only runs tests that do not require network access or external services.
pytest -qCurrent default behaviour from pytest.ini:
networktests are excluded by defaultintegrationtests are excluded by default
Examples:
pytest -q tests/test_pricing_prompts.py
pytest -q tests/test_tone_and_handover.py
pytest -q -m integrationIf optional dependencies are missing, some tests are skipped during collection via tests/conftest.py.
The repository uses config.py as the default configuration source, with environment-based overrides loaded through src/config/configs.py.
Important defaults in the current repository state:
- Available languages:
en,de - Lead response target:
100words - Sub-agent response target:
200words - User-profile tracking: enabled
For details, see:
- docs/configuration_system_documentation.md
- docs/user_profile_tracking.md
- docs/weaviate_database_setup.md
main.pyis the supported entry point for local execution.tools/scraping.pyis an operational scheduler / scraping helper, not the main app entry.- The chatbot UI and the database UI are separate applications under
src/apps/.
MIT