Auto-DataScientists

AutoDataScientists: Self-Organizing Agent Teams for Clinical, Healthcare, and Biomedical Data Science

AutoDataScientists

Research use only. This system is for methods research and exploratory analysis. It is not a medical device, is not FDA/CE cleared, and must not be used for diagnosis, treatment decisions, or any clinical purpose without independent validation and the appropriate regulatory clearance. See Disclaimer.

AutoDataScientists is a decentralized team of AI agents that run end-to-end data science for clinical, healthcare, and biomedical data science problems, ingesting messy real-world data (EHR, multi-omics, imaging, trials), proposing and critiquing analysis plans, building and validating models, and producing interpretable, reproducible write-ups.

Unlike a single autonomous agent that follows one analysis trajectory, AutoDataScientists agents self-organize into teams around competing hypotheses and modeling strategies, critique each other's analysis plans before spending compute, and share results, dead-ends, and intermediate artifacts so the system avoids redundant work and sustains parallel exploration as evidence accumulates over hours or days. Domain guardrails are privacy/de-identification, statistical rigor, subgroup fairness, and leakage detection are first-class steps in the workflow, not afterthoughts.

This repository packages the system as Claude Code subagents coordinating through a local message-board / workspace server. The orchestrator is a pure coordinator , it launches agents and harvests their results; it never analyzes data itself.

Why biomedical data science is different

Generic AutoML tends to fail in biomedicine for predictable, domain-specific reasons. AutoDataScientists is built to respect them:

Privacy & governance : PHI/PII handling, de-identification, IRB/ethics approval, and data-use agreements gate what leaves the machine. No identifiable data is sent to external APIs without a BAA / equivalent.
Statistical rigor over leaderboard chasing : proper study design, confounder handling, multiple-testing correction, calibration, and time-to-event (survival) framing rather than naive accuracy.
Leakage is everywhere : patient-level and site-level grouping in cross-validation, temporal splits for prospective questions, and explicit checks for target leakage from clinical workflows.
Batch & site effects : harmonization across assays, platforms, and hospitals before modeling.
Fairness & equity : performance reported across demographic and clinical subgroups, not just in aggregate.
Interpretability & biological plausibility : feature importance, pathway/enrichment context, and clinically meaningful explanations are required outputs, not optional.
Reproducibility : every run logs data versions, seeds, parameters, and decisions.

What it does

The agents cover the full data-science lifecycle for a given task:

Ingest & profile : load data; map to standards (FHIR / OMOP / HGNC / SNOMED-CT / LOINC); profile schema, missingness, and distributions.
De-identify & govern : verify de-identification, flag PHI, record consent/IRB constraints.
Design : frame the question (predictive vs. causal vs. descriptive), define endpoints, power/sample-size sanity checks, and the validation protocol.
Prepare : QC, batch-effect correction/harmonization, feature selection, dimensionality reduction, embeddings.
Model : propose candidate approaches, run sweeps, train with grouping-aware cross-validation.
Validate : leakage audit, external/temporal validation, calibration, subgroup/fairness analysis.
Interpret & report : feature attributions, biological/clinical context, limitations, and a reproducible report.

Use cases

Clinical

Risk prediction (ICU mortality, 30-day readmission, sepsis early warning, length-of-stay)
EHR phenotyping and cohort definition
Clinical deterioration / time-to-event modeling

Personalized / precision medicine

Patient subtyping and stratification
Treatment-response prediction
Polygenic risk scoring and pharmacogenomics

Biomedical

Biomarker discovery and prioritization
Drug-response and target-identification analyses
Assay/screen data modeling

Computational biology

Single-cell cell-type annotation and differential expression
Variant effect prediction and interpretation
Multi-omics integration (genomics + transcriptomics + proteomics)

Supported data modalities

EHR / claims (FHIR, OMOP CDM) · genomics (WGS/WES, variants) · transcriptomics (bulk & single-cell) · proteomics / metabolomics · medical imaging (radiology, digital pathology) · clinical-trial data · wearable / sensor time series.

Each modality has its own loader and QC profile under task-<name>/. Start with the bundled examples and add your own.

How it works

A lightweight coordination layer hosts shared workspaces and a message board; agents post proposals, critiques, and results there. Roles include:

Agent	Responsibility
Data Steward	Ingestion, standards mapping, de-identification & PHI checks, missingness, batch detection
Biostatistician	Study design, confounders, power, multiple testing, survival/causal framing
Feature / Representation	Feature selection, harmonization, embeddings, dimensionality reduction
Modeler(s)	Candidate models and sweeps; grouping-aware cross-validation (parallel teams)
Validator / Critic	Leakage audits, external/temporal validation, calibration, subgroup fairness
Translator	Feature attribution, pathway/clinical context, limitations, final report
Orchestrator	Pure coordinator : launches agents, harvests results, never analyzes data

Agents self-organize around promising directions and must pass peer critique before consuming compute, so the system explores several strategies in parallel without duplicating effort.

Setup

Prerequisites: Python 3.10+, Node.js 22+ (for npx), and the Claude Code CLI (claude).

# 1. Start the local coordination server (agents coordinate through this)
#    Replace with your chosen message-board/workspace server.
npx <coordination-server> start

# 2. Python dependencies
pip install -r requirements.txt

Data security: run on infrastructure approved for your data classification. Configure external model access so that no identifiable data leaves the environment without a BAA / DUA in place.

Quickstart

From the repo root, in a separate shell:

claude -p "Read runbook.md and execute. Task: task-readmission-risk. Run name: readmit_v1."
claude -p "Read runbook.md and execute. Task: task-singlecell-annotation. Run name: scrna_v1."
claude -p "Read runbook.md and execute. Task: task-biomarker-discovery. Run name: biomarker_v1."

Each launch materializes a new sibling directory ../<run-name>/ with its own copy of the system, agents, workspace, and logs, so the template stays clean across runs. Hardware/data requirements vary per task : see each task-<name>/README.md.

Adding a new task

Drop a task-<name>/ directory at the repo root with:

TASK.md : the spec. YAML frontmatter sets task_type (e.g. ehr-risk, omics-classification, survival, imaging, singlecell), name, endpoint, and validation (e.g. patient-grouped-cv, temporal-split). The body describes the data, cohort, constraints, and success criteria.
LAUNCH.md : fills the workflow hooks the runbook references (launch_command, deident_policy, cv_strategy, fairness_subgroups, leakage_checks, promotion_criteria, exit_condition, …). Easiest path: copy the closest bundled task-*/LAUNCH.md and edit.

Optionally add a download_data.sh / loader to fetch the dataset. Then launch with --task task-<name>.

Results

Benchmark	Metric	AutoDataScientists	Strongest baseline	Δ
e.g. MIMIC readmission	AUROC	TBD	TBD	TBD
e.g. single-cell annotation	macro-F1	TBD	TBD	TBD
e.g. ProteinGym subset	Spearman	TBD	TBD	TBD

Report subgroup/fairness breakdowns and calibration alongside headline metrics.

Data handling & compliance

Use only data you are authorized to use, under an approved IRB/ethics protocol and any applicable DUA.
De-identify per HIPAA Safe Harbor / Expert Determination or your local equivalent (e.g. GDPR) before analysis.
Keep an audit log of data access and agent decisions.
Do not transmit identifiable data to third-party services without a Business Associate Agreement (or equivalent).
This repository ships no patient data; example tasks reference public/synthetic datasets only.

Citation

License

Disclaimer

This software is provided for research and educational purposes only. It is not a medical device and has not been reviewed or cleared by any regulatory authority. Outputs may be incorrect, biased, or incomplete and must be independently validated. Nothing produced by this system constitutes medical advice or a substitute for the judgment of qualified clinicians. The authors accept no liability for use of this software in clinical or operational settings.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Auto-DataScientists

AutoDataScientists

Why biomedical data science is different

What it does

Use cases

Supported data modalities

How it works

Setup

Quickstart

Adding a new task

Results

Data handling & compliance

Citation

License

Disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Auto-DataScientists

AutoDataScientists

Why biomedical data science is different

What it does

Use cases

Supported data modalities

How it works

Setup

Quickstart

Adding a new task

Results

Data handling & compliance

Citation

License

Disclaimer

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages