SQL-based analysis of hospital operations, patient trends, and financial performance.
The dataset used in this project was sourced from Kaggle: Hospital Management Dataset by Kanak Baghel
It contains synthetic healthcare records, including patient demographics, clinical visits, and financial transactions.
This project presents a comprehensive SQL analysis of a hospital database, covering patient behavior, clinical operations, treatment patterns, and financial performance.
The analysis was designed to simulate real-world healthcare analytics tasks — the type of work performed by Medical Data Analysts at hospitals, NGOs, and health-tech organizations.
To better communicate the insights derived from this SQL analysis, I have included a conceptual dashboard.
Note on Methodology: As I am currently focusing on mastering SQL for data extraction and analysis, this dashboard was conceptualized and generated using AI assistance (Claude/Gemini). This visual representation is based on the actual query results obtained from the PostgreSQL database, used here to demonstrate how clinical data can be translated into business intelligence.
Below is the Entity Relationship Diagram showing the connections between Patients, Doctors, Appointments, and Treatments.
| Table | Rows | Description |
|---|---|---|
| patients | 50 | Demographics and insurance data |
| doctors | 10 | Specialization and branch info |
| appointments | 200 | Visit records and status |
| treatments | 200 | Procedures and costs |
| billing | 200 | Payment records and methods |
- Gender and age group distribution
- Loyal patient identification (3+ visits)
- Insurance provider market share
- Doctor workload ranking
- No-show rate analysis by doctor
- Branch performance comparison
- Most performed treatments
- Revenue contribution by treatment type
- Average cost analysis
- Monthly revenue dynamics
- Cumulative revenue (Running Total)
- Payment status and collection risk
- Payment method preferences
- 76% of patients are returning visitors (3+ visits)
- Central Hospital handles 42% of appointments but has the highest no-show rate (31%)
- MRI has the highest average cost per treatment ($3,224)
- Only 31.5% of billing is collected —
$377,825 in uncollected revenue
⚠️ - April was the strongest revenue month ($64,271)
👉 For a more detailed breakdown of the analysis and business recommendations, check out the Detailed Findings.
- PostgreSQL — Database and query execution
- pgAdmin — Database management
- Docker & Docker Compose — Reproducible environment setup*
- Git & GitHub — Version control and portfolio
- AI Assistance (Gemini/Claude) — Used for conceptualizing data visualizations and structuring the final report.
healthcare-analytics-sql/
├── data/
│ ├── appointments.csv
│ ├── billing.csv
│ ├── doctors.csv
│ ├── patients.csv
│ └── treatments.csv
├── init/
│ ├── 01_create_extensions.sql
│ ├── 02_create_schema.sql
│ ├── 03_create_tables.sql
│ └── 04_load_data.sql
├── schema/
│ ├── create_tables.sql
│ └── erd_schema.png
├── analysis/
│ ├── 00_overview.sql
│ ├── 01_patient_analysis.sql
│ ├── 02_doctor_analysis.sql
│ ├── 03_treatment_analysis.sql
│ └── 04_financial_analysis.sql
├── insights/
│ ├── dashboard_analysis.png
│ └── final_report.md
├── docs/
│ └── data_dictionary.md
├── docker-compose.yaml
├── .env
├── .gitignore
└── README.md
- Make sure Docker Desktop is installed.
- Clone the repository:
git clone https://github.com/arpidata/healthcare-analytics-sql.git
cd healthcare-analytics-sql- Create a
.envfile based on the example:
POSTGRES_DB=healthcare_project
POSTGRES_USER=analyst
POSTGRES_PASSWORD=your_password
PGADMIN_EMAIL=your_email@example.com
PGADMIN_PASSWORD=your_pgadmin_password- Start the containers:
docker-compose up -d- Open pgAdmin at
http://localhost:8080and run the analysis queries from theanalysis/folder.
- Install PostgreSQL and pgAdmin.
- Create a new database:
healthcare_project. - Run scripts from the
init/folder in order (01 → 04). - Execute queries from the
analysis/folder.
- Dataset contains 50 patients and 200 appointments — results are directional and not statistically generalizable.
- Data is synthetic, meaning it may not reflect the full complexity of real-world clinical workflows.
- Analysis covers a single fiscal year (2023) — longitudinal trends cannot be assessed.
- No patient identifiers or sensitive data are included, in line with data privacy best practices.
Arpenik Radiologist transitioning into Health Data Analytics. Focused on SQL, healthcare data, and clinical insights.

