AI Medical Chatbot

A production-deployed RAG (Retrieval-Augmented Generation) system that answers medical questions grounded in verified clinical documents — not hallucinations.

What This Project Demonstrates

This project goes beyond a simple chatbot. It showcases an end-to-end ML engineering workflow including:

Designing and deploying a production RAG pipeline with semantic search over medical PDFs
Building a REST API with FastAPI, complete with request/response schema validation
Containerizing with Docker and deploying to AWS EC2 via Amazon ECR
Automating the full build-test-deploy cycle with GitHub Actions CI/CD
Handling real-world infrastructure challenges (disk management, port mapping, runner configuration)

System Architecture

User Query
    │
    ▼
┌─────────────┐     POST /chat      ┌──────────────────────────────────────────┐
│  Client /   │ ──────────────────► │            FastAPI Server                │
│  Streamlit  │                     │                                          │
└─────────────┘                     │  1. Validate input (Pydantic schema)     │
                                    │  2. Embed query (all-MiniLM-L6-v2)       │
                                    │  3. Retrieve top-k docs from Pinecone    │
                                    │  4. Build prompt with context            │
                                    │  5. Generate answer via Gemini LLM       │
                                    │  6. Return structured JSON response      │
                                    └──────────────────────────────────────────┘
                                                        │
                              ┌─────────────────────────┼─────────────────────┐
                              ▼                         ▼                     ▼
                    ┌──────────────────┐   ┌────────────────────┐   ┌──────────────────┐
                    │  Pinecone Index  │   │  Google Gemini LLM │   │  Sentence        │
                    │  (medical-       │   │  (Generation)      │   │  Transformers    │
                    │   chatbot)       │   └────────────────────┘   │  (Embeddings)    │
                    └──────────────────┘                            └──────────────────┘

CI/CD Pipeline

Push to main
    │
    ▼
┌──────────────────────────────┐
│   CI: GitHub Actions         │
│   (ubuntu-latest runner)     │
│                              │
│  1. Checkout code            │
│  2. Configure AWS creds      │
│  3. Login to Amazon ECR      │
│  4. docker build             │
│  5. docker push → ECR        │
└──────────────┬───────────────┘
               │ on success
               ▼
┌──────────────────────────────┐
│   CD: Self-hosted EC2 Runner │
│                              │
│  1. docker system prune      │
│  2. Pull latest image        │
│  3. docker run on port 8000  │
└──────────────────────────────┘

Tech Stack

Layer	Technology	Purpose
LLM	Google Gemini (via `langchain_google_genai`)	Answer generation
Embeddings	`all-MiniLM-L6-v2` (Sentence Transformers)	Semantic vector encoding
Vector DB	Pinecone (Serverless, cosine similarity, dim=384)	Document retrieval
RAG Framework	LangChain (LCEL chain)	Orchestration
API	FastAPI + Uvicorn	REST endpoints
UI	Streamlit	Web chat interface
Containerization	Docker	Reproducible builds
Registry	Amazon ECR	Docker image storage
Compute	AWS EC2 (self-hosted runner)	Production deployment
CI/CD	GitHub Actions	Automated build & deploy

Project Structure

AI-Medical-Chatbot/
├── .github/
│   └── workflows/
│       └── cicd.yaml          # CI/CD pipeline (build → ECR → EC2)
├── data/                      # Source medical PDFs for ingestion
├── research/                  # Notebooks for experimentation
├── src/
│   ├── helper.py              # PDF loading, chunking, embedding utils
│   ├── prompts.py             # System prompt for the medical RAG chain
│   └── llm.py                 # Gemini LLM initialization
├── app.py                     # FastAPI server with /chat endpoint
├── streamlit_app.py           # Streamlit chat UI
├── store_index.py             # One-time script: chunk PDFs → Pinecone
├── Dockerfile                 # Container definition
├── requirements.txt           # Python dependencies
└── setup.py                   # Package setup

How It Works

1. Data Ingestion (`store_index.py`)

Medical PDFs from the data/ directory are loaded, cleaned, and split into overlapping chunks. Each chunk is embedded using all-MiniLM-L6-v2 (384-dimensional vectors) and stored in a Pinecone serverless index with cosine similarity.

2. RAG Chain (`app.py`)

On every /chat request:

The user query is embedded with the same model
Pinecone retrieves the top-3 most semantically similar chunks
A structured prompt (system prompt + retrieved context + user question) is built via LangChain's LCEL
Google Gemini generates a grounded answer
The answer is returned as a validated JSON response

3. Deployment

The app runs in a Docker container on an AWS EC2 instance. Every push to main triggers GitHub Actions to rebuild the image, push it to ECR, and redeploy on EC2 automatically.

Running Locally

Prerequisites

Python 3.10+
Docker (optional)
Pinecone account + API key
Google AI Studio API key

Setup

git clone https://github.com/Spandan752/AI-Medical-Chatbot.git
cd AI-Medical-Chatbot
python -m venv .venv
source .venv/bin/activate      # Windows: .venv\Scripts\activate
pip install -r requirements.txt

Environment Variables

Create a .env file:

PINECONE_API_KEY=your_pinecone_api_key
GOOGLE_API_KEY=your_google_api_key

Ingest Documents (one-time)

# Add your medical PDFs to the data/ directory, then:
python store_index.py

Run the API

python app.py
# API available at http://localhost:8000
# Swagger docs at http://localhost:8000/docs

Run the Streamlit UI

streamlit run streamlit_app.py

Run with Docker

docker build -t ai-medical-chatbot .
docker run -p 8000:8000 \
  -e PINECONE_API_KEY=your_key \
  -e GOOGLE_API_KEY=your_key \
  ai-medical-chatbot

API Reference

`GET /`

Health check.

Response:

{ "status": "ok", "message": "Medical chatbot is running" }

`POST /chat`

Ask a medical question.

Request:

{ "input": "What are the symptoms of type 2 diabetes?" }

Response:

{
  "response": "Type 2 diabetes commonly presents with increased thirst, frequent urination, fatigue, blurred vision, and slow-healing sores..."
}

Deploying to AWS

Required GitHub Secrets

Secret	Description
`AWS_ACCESS_KEY_ID`	IAM user access key
`AWS_SECRET_ACCESS_KEY`	IAM user secret key
`AWS_DEFAULT_REGION`	e.g. `us-east-1`
`ECR_REPO`	ECR repository name
`PINECONE_API_KEY`	Pinecone API key
`GOOGLE_API_KEY`	Google Gemini API key

EC2 Setup

Launch an EC2 instance (Ubuntu 22.04, t3.medium+ recommended)
Install Docker: sudo apt install docker.io -y
Register a GitHub Actions self-hosted runner on the instance
Open port 8000 in the EC2 security group (inbound TCP)

Every push to main will automatically build, push, and redeploy.

🧠 Key Engineering Decisions

Why RAG over fine-tuning? Medical knowledge changes; RAG allows updating the knowledge base (adding new PDFs) without retraining. It also keeps answers grounded and reduces hallucination risk — critical in a healthcare context.

Why Pinecone serverless? Zero infrastructure management, automatic scaling, and cosine similarity search that works natively with sentence-transformer embeddings.

Why all-MiniLM-L6-v2? It offers an excellent trade-off between speed and semantic quality for retrieval tasks, runs without a GPU, and produces compact 384-dimensional vectors that keep Pinecone costs low.

Why FastAPI over Flask? Async-first, automatic OpenAPI docs, built-in Pydantic validation — production-ready out of the box.

Disclaimer

This chatbot is for informational purposes only and does not constitute medical advice. Always consult a qualified healthcare professional for medical decisions.

License

MIT License — see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.github/workflows		.github/workflows
data		data
research		research
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
DOCKERFILE		DOCKERFILE
Dockerfile.streamlit		Dockerfile.streamlit
README.md		README.md
app.py		app.py
config.py		config.py
docker-compose.yaml		docker-compose.yaml
requirements.txt		requirements.txt
setup.py		setup.py
store_index.py		store_index.py
streamlit_app.py		streamlit_app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Medical Chatbot

What This Project Demonstrates

System Architecture

CI/CD Pipeline

Tech Stack

Project Structure

How It Works

1. Data Ingestion (`store_index.py`)

2. RAG Chain (`app.py`)

3. Deployment

Running Locally

Prerequisites

Setup

Environment Variables

Ingest Documents (one-time)

Run the API

Run the Streamlit UI

Run with Docker

API Reference

`GET /`

`POST /chat`

Deploying to AWS

Required GitHub Secrets

EC2 Setup

🧠 Key Engineering Decisions

Disclaimer

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI Medical Chatbot

What This Project Demonstrates

System Architecture

CI/CD Pipeline

Tech Stack

Project Structure

How It Works

1. Data Ingestion (store_index.py)

2. RAG Chain (app.py)

3. Deployment

Running Locally

Prerequisites

Setup

Environment Variables

Ingest Documents (one-time)

Run the API

Run the Streamlit UI

Run with Docker

API Reference

GET /

POST /chat

Deploying to AWS

Required GitHub Secrets

EC2 Setup

🧠 Key Engineering Decisions

Disclaimer

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. Data Ingestion (`store_index.py`)

2. RAG Chain (`app.py`)

`GET /`

`POST /chat`

Packages