transfer FHIR pipeline to branch by jhnwu3 · Pull Request #1155 · sunlabuiuc/PyHealth

jhnwu3 · 2026-05-31T00:35:21Z

This pull request introduces comprehensive support for FHIR (Fast Healthcare Interoperability Resources) datasets in PyHealth, including a generic, YAML-configurable FHIR ingest engine, a pre-configured MIMIC-IV-on-FHIR dataset, and a full clinical prediction pipeline using new tasks and models. The documentation is significantly expanded to cover these new features, and an end-to-end example is provided for users. Key changes are grouped below:

FHIR Dataset Support and Documentation:

Added FHIRDataset, a generic, YAML-configurable dataset for ingesting HL7 FHIR NDJSON exports, along with detailed documentation and usage instructions. This engine supports flexible configuration of resource flattening and event schema via YAML, with caching and validation. (docs/api/datasets/pyhealth.datasets.FHIRDataset.rst [1] pyhealth/datasets/fhir/__init__.py [2]
Introduced MIMIC4FHIR, a subclass of FHIRDataset pre-configured for the PhysioNet MIMIC-IV-on-FHIR export, including documentation and resource coverage details. (docs/api/datasets/pyhealth.datasets.MIMIC4FHIR.rst docs/api/datasets/pyhealth.datasets.MIMIC4FHIR.rstR1-R78)
Registered FHIRDataset and MIMIC4FHIR in the main datasets API and documentation. (docs/api/datasets.rst [1] pyhealth/datasets/__init__.py [2]

New Task and Model for FHIR-based Clinical Prediction:

Added MPFClinicalPredictionTask, supporting multitask prompted fine-tuning (MPF) style binary clinical prediction on FHIR token timelines, with documentation. (docs/api/tasks/pyhealth.tasks.mpf_clinical_prediction.rst [1] docs/api/tasks.rst [2]
Introduced EHRMambaCEHR, a model combining CEHR-style embeddings and Mamba blocks for FHIR token streams, with API documentation and registration. (docs/api/models/pyhealth.models.EHRMambaCEHR.rst [1] docs/api/models.rst [2]

Example and Usability Improvements:

Added a runnable example (examples/mimic4fhir_mpf_ehrmamba.py) demonstrating the full pipeline: dataset loading, task setup, model instantiation, training, and evaluation on the MIMIC-IV FHIR demo dataset.

Internal Improvements:

Improved temporary directory cleanup in BaseDataset to tolerate stream-writer finalizers and avoid errors during cache cleanup. (pyhealth/datasets/base_dataset.py pyhealth/datasets/base_dataset.pyL423-R432)

These changes make PyHealth a first-class tool for working with FHIR data, enabling both out-of-the-box use with MIMIC-IV and easy adaptation to other FHIR exports.

jhnwu3 added 2 commits May 30, 2026 19:29

transfer FHIR pipeline to branch

2e9f2f8

fix

7ce0206

jhnwu3 requested a review from Logiquo May 31, 2026 03:11

fix unit test using fast json readers

f620aeb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

transfer FHIR pipeline to branch#1155

transfer FHIR pipeline to branch#1155
jhnwu3 wants to merge 3 commits into
masterfrom
add/fhir_ehr_mamba

jhnwu3 commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jhnwu3 commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant