[preprint] [weights] [notebook]
Phoenix is a (latent) flow matching generative model that predicts spatially resolved single-cell gene expression directly from routine H&E-stained histology images. It generalizes across cohorts, donors, organs, and tissues — enabling in silico analysis of tissue organization and treatment response at population scale.
You can install Phoenix with the following command
pip install git+https://github.com/peng-lab/phoenix
To load the 224x224 patches saved in an H5 file use
import numpy as np
from torch.utils.data import DataLoader
from torchvision.transforms import v2
from torchvision.transforms import InterpolationMode
from github.datasets.h5py_dataset import H5PYDataset
gene_path = './xenium_human_multi.npy'
gene_list = list(np.load(gene_path))
stats_path = "./stats_table.npz"
statistics = np.load(stats_path)
bicubic = InterpolationMode.BICUBIC
image_transform = v2.Compose(
[
v2.Resize((224, 224), bicubic),
v2.CenterCrop((224, 224)),
v2.ToTensor(),
v2.Normalize(
(0.707223, 0.578729, 0.703617),
(0.211883, 0.230117, 0.177517),
),
]
)
image_path = "./demo_patch.h5"
dataset = H5PYDataset(
image_path=image_path,
transform=image_transform,
)
dataloader = DataLoader(
dataset,
batch_size=128,
shuffle=False,
num_workers=4,
pin_memory=True,
)
print('Length dataset & dataloader:', (len(dataset), len(dataloader)))To load the model weights hosted on HuggingFace use
(We recommend using the model trained on the Nest)
#https://huggingface.co/peng-lab/phoenix/resolve/main/weights/flow/tenx/multi/cell/20x/discrete/flow_model.pth
https://huggingface.co/peng-lab/phoenix/resolve/main/weights/flow/nest/multi/cell/20x/discrete/flow_model.pth
To load the vision encoder and flow transformer use
(We recommend using the optimized implementation)
from phoenix.models.flow_llama3 import FlowTransformerModel, FlowTransformerConfig
#from phoenix.models.flow_simple import FlowTransformerModel, FlowTransformerConfig
vision_model = timm.create_model(
"vit_giant_patch14_reg4_dinov2",
pretrained=False,
img_size=224,
num_classes=0,
global_pool="token",
init_values=1e-5,
dynamic_img_size=False,
)
flow_model = FlowTransformerModel(
FlowTransformerConfig(
d_genes=1,
d_image=1536,
d_model=512,
d_cross=512,
n_heads=8,
n_layers=8,
qkv_bias=False,
ffn_bias=False,
ffn_mult=4,
attn_drop=0.0,
proj_drop=0.0,
n_classes=0,
cls_drop=0.1,
checkpoint=False,
),
vision_model=vision_model
)
state_dict = torch.load(state_path, map_location='cuda:0')
flow_model.load_state_dict(state_dict, strict=True)
flow_model = flow_model.eval().cuda()To make a forward pass and check that it works use
x = torch.rand(1, 377, 1).cuda()
t = torch.rand(x.shape[0]).cuda()
c = torch.rand(1, 256, 1536).cuda()
output = flow_model(x, t, c)
print("Output:", output.size())To predict gene expression from histology images use
from github.helpers.inference import FlowPipeline
pipeline = FlowPipeline(
model=flow_model,
stats=statistics,
t_0=0.0,
t_1=1.0,
atol=1e-1,
rtol=1e-1,
)
gex_pred, coords_list = pipeline(gene_list, dataloader)In case you found our work useful, please consider citing us:
@article{tran/gindra2026.04.25.720812,
author = {Tran, Manuel and Gindra, Rushin H. and Putze, Philipp and Senbai, Kang and Palla, Giovanni and Kos, Tina and Falcomat{\`a}, Chiara and Wang, Chen and Guo, Ruifeng (Ray) and Boxberg, Melanie and Berclaz, Luc M. and Lindner, Lars H. and Bergmayr, Linda and Kn{\"o}sel, Thomas and Jurmeister, Philipp and Klauschen, Frederick and Homicsko, Krisztian and Gottardo, Raphael and Eckstein, Markus and Matek, Christian and Mock, Andreas and Theis, Fabian J. and Saur, Dieter and Peng, Tingying},
title = {Pan-cancer virtual spatial transcriptomics from routine histology with Phoenix},
year = {2026},
journal = {bioRxiv},
doi = {https://doi.org/10.64898/2026.04.25.720812},
}
