A REST API platform for training and serving medical image classification models using MedMNIST datasets. The platform lets you browse and download any MedMNIST dataset, train a CNN on it, and classify new images — all through a simple REST API.
| Component | Technology |
|---|---|
| API server | FastAPI + Uvicorn |
| Background jobs | arq (asyncio job queue) |
| Job broker | Redis |
| Database | PostgreSQL (SQLModel / asyncpg) |
| Object storage | S3-compatible (AWS S3, MinIO, etc.) |
| Package manager | uv |
- Docker and Docker Compose (v2)
- uv (only needed for local development outside Docker)
git clone https://github.com/ianbakst/quantivly.git
cd quantivlyCreate a .env file in the project root. The application reads it automatically via pydantic-settings.
# S3-compatible object storage
AWS__BUCKET_NAME=quantivly
AWS__ACCESS_KEY_ID=minioadmin
AWS__SECRET_ACCESS_KEY=minioadmin
AWS__ENDPOINT_URL=http://minio:9000
AWS__REGION_NAME=us-east-1
# PostgreSQL
DB__USERNAME=quantivly
DB__PASSWORD=quantivly
DB__DB_NAME=quantivly
DB__HOST=postgres
DB__PORT=5432
# Redis (arq job broker)
REDIS_URL=redis://redis:6379| Variable | Description |
|---|---|
AWS__BUCKET_NAME |
S3 bucket for datasets and model weights |
AWS__ACCESS_KEY_ID |
S3 access key ID |
AWS__SECRET_ACCESS_KEY |
S3 secret access key |
AWS__ENDPOINT_URL |
S3 endpoint URL (use your MinIO or AWS regional endpoint) |
AWS__REGION_NAME |
S3 region (default: us-east-1) |
DB__USERNAME |
PostgreSQL username |
DB__PASSWORD |
PostgreSQL password |
DB__DB_NAME |
PostgreSQL database name |
DB__HOST |
PostgreSQL hostname |
DB__PORT |
PostgreSQL port (default: 5432) |
REDIS_URL |
Redis DSN used by both the API and the arq worker |
docker compose up --buildThis starts the following services:
| Service | Description | Default port |
|---|---|---|
api |
FastAPI application (Uvicorn) | 8000 |
worker |
arq background worker | — |
postgres |
PostgreSQL database | 5432 |
redis |
Redis job broker | 6379 |
The interactive API docs are available at http://localhost:8000/docs once the stack is running.
curl http://localhost:8000/dataset/remoteReturns a JSON array of MedMNIST dataset names, e.g. ["pathmnist", "chestmnist", "dermamnist", ...].
curl -X POST http://localhost:8000/dataset/remote/pathmnist/download{"workflow_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"}The dataset is downloaded from the MedMNIST CDN and uploaded to S3 in the background. Use the returned workflow_id to track progress.
curl http://localhost:8000/workflow/a1b2c3d4-e5f6-7890-abcd-ef1234567890{
"id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"status": "completed",
"task": "download_dataset",
"dataset_name": "pathmnist",
"arq_job_id": "arq:job:abc123",
"s3_key": null,
"error": null,
"created_at": "2026-03-24T12:00:00Z",
"updated_at": "2026-03-24T12:01:30Z"
}Possible status values: pending, running, completed, failed, cancelled.
curl http://localhost:8000/model/[
{
"name": "CNN",
"parameters": [
{"name": "in_channels", "type": "int", "default": null, "required": true},
{"name": "num_classes", "type": "int", "default": null, "required": true},
{"name": "dropout", "type": "float", "default": "0.3", "required": false}
]
}
]First, note the dataset UUID from GET /dataset/ (after the download completes):
curl http://localhost:8000/dataset/Then start a training job:
curl -X POST http://localhost:8000/model/CNN/train \
-H "Content-Type: application/json" \
-d '{
"dataset_id": "f7e6d5c4-b3a2-1098-fedc-ba9876543210",
"learning_rate": 0.001,
"batch_size": 64,
"n_epochs": 10
}'{"workflow_id": "b2c3d4e5-f6a7-8901-bcde-f12345678901"}curl http://localhost:8000/workflow/b2c3d4e5-f6a7-8901-bcde-f12345678901The response is the same shape as shown in step c. Once status is completed, the trained model record will appear in GET /trained-model/.
curl http://localhost:8000/trained-model/[
{
"id": "c3d4e5f6-a7b8-9012-cdef-123456789012",
"architecture": "CNN",
"dataset_id": "f7e6d5c4-b3a2-1098-fedc-ba9876543210",
"s3_key": "models/CNN/pathmnist.pth",
"accuracy": 0.9123,
"f1": 0.9045,
"auc": 0.9876,
"confusion_matrix": [[...], ...],
"training_params": {"learning_rate": 0.001, "batch_size": 64, "n_epochs": 10},
"created_at": "2026-03-24T12:05:00Z"
}
]curl -X POST http://localhost:8000/trained-model/c3d4e5f6-a7b8-9012-cdef-123456789012/inference \
-F "image=@/path/to/your/image.png"{
"predicted_class": 3,
"probabilities": {
"0": 0.01,
"1": 0.02,
"2": 0.04,
"3": 0.91,
"4": 0.02
}
}The image is normalised to [-1, 1] and converted to the channel mode (grayscale or RGB) that matches the dataset the model was trained on.
Any pending or running workflow can be cancelled:
curl -X DELETE http://localhost:8000/workflow/b2c3d4e5-f6a7-8901-bcde-f12345678901/cancelThe job is aborted in Redis and the workflow status is set to cancelled. Attempting to cancel a job that is already completed, failed, or cancelled returns HTTP 422.
To the best of my limited research (as the goal of this exercise was not to achieve optimal model performance), appropriate architectures for these models are CNN-based. I took inspiration from the mnist character recognition dataset/model. Through some research, I also chose a slightly more complicated architecture. I didn't want to make it too complicated, since I wanted to be able to train on CPU in short enough time-scales to prove the pipeline works. This more complicated architecture has 3 convolutional layers with a final AdaptiveAvgPool to handle the spatial dimensions and different image sizes flexibly. If given more time, I would research the recommended model architectures for this problem and first implement what the literature suggests is the optimal model.
As with all projects, the perfect cannot be the enemy of the good. This project was time-boxed, and thus, I was not able to get to everything I would've wanted.
At a high level, with more time I would've implemented the following:
- Training already happens on its own worker, but I would've made the training worker more optimized for training models. If this was actually deployed to the cloud, I would've provisioned some accelerated hardware (GPUs) for the training image.
- On that note, inference is served through the main app, which isn't ideal. With more time I would've built a more robust serving system that would promote a model to its own service. I would then make that served model available to handle batch classification as well as single-image classification.
- The bulk of the time of this project was spent setting up the environment to reliably load data and train on that data. Not too much time was spent investigating model architectures and their performance.
- The ability to manage/add metrics would be added.
- On that note, I would have loved to make model archetecture composable via REST endpoints. This would require a significant effort though.
- Transforms are currently baked into the training pipeline and are minimal (just what's needed to get the data into a usable format). With more time I would like to extract transforms out as their own configurable objects that can be added to either the download step, a separate preprocessing step (I'd need to make this), or a configurable piece of the training invocation.
- I would implement logging and tracing to more than just stdout. I would write log files somewhere that is accessible, either a logging/tracing platform or files linked to the model/training run. This way, there's a full trace of what happened in the development and performance of a model.
- Lastly, I would research and implemnent an optimal model for these classification problems. This would require reading the literature surrounding the field, and these datasets, as well as training, hyperparameter tuning, and comparison of performance before serving a production-ready model.