Physics-Grounded Text-to-3D Chest CT Synthesis

ECE 228 (Machine Learning for Physical Applications), UC San Diego.

Text → 2D X-ray projections → 3D CT, grounded in the physics of CT imaging. A fine-tuned Stable Diffusion 3.5 model generates Beer–Lambert attenuation projections of the chest from a radiology report; a generalizable neural field (DIF-Gaussian) solves the corresponding sparse-view tomographic inverse problem to recover a 3D attenuation volume. A differentiable, geometry- consistent forward projector ties the two stages together. Everything runs on Intel Gaudi (HPU).

report  ──[SD3.5 + LoRA]──▶  K Beer–Lambert projections  ──[DIF-Gaussian]──▶  3D η volume
                 (Eq. 3: p = 1 − exp(−μ_eff ∫ η dl))        (sparse-view inverse problem)

See report/final.tex for the full write-up and CLAUDE.md for an architectural deep-dive.

Repository structure

Path	Contents
`diffusion/SD3_singleview_nifti/`	Stage 1: SD3.5 LoRA fine-tuning to generate CT projections (precompute, train, sample). Beer–Lambert projection in `datamodule_nifti.py::camera_sweep_projection`.
`diffusion/SD3_singleview_nifti/joint/`	Differentiable torch forward projectors (`projector.py`) + smoke tests (`smoke/`); coupling utilities (iterative recon, TTO).
`diffusion/SD3_singleview_nifti/recon/`	Lift2D3D learned 2D→3D lifting model (`lift2d3d.py`) + DDP training (`train_lift2d3d.py`).
`scripts/recon/`	Paradigm-comparison harness: `fbp.py`, `iterative.py` (training-free), `compare.py` (4-method figure + metrics), `recon_common.py`.
`works/DIF-Gaussian/`	Learned implicit field: vendored DIF-Gaussian, ported to HPU (pytorch3d kNN → pure-torch, device-agnostic, checkpoint fix).
`scripts/data/`	Data generators: `generate_ct_cubes.py` (CT-RATE → 256³ η cubes, shardable), `verify_ct_cubes.py`, `generate_difg_dataset.py` (Beer–Lambert recon dataset).
`scripts/visualizer/`	`render_difg_recon.py`, `sd3_sharpness.py` — recon + single-vs-multi-view figures.
`report/`	Final report (`final.tex`) + figures.

Environment (Intel Gaudi / HPU)

Base image: vault.habana.ai/.../pytorch-installer-2.9.0-py311. Install the known-good stack (newer versions break on the Habana torch build):

pip install transformers==5.8.0 tokenizers==0.22.2 huggingface_hub==1.14.0 \
            diffusers==0.38.0 accelerate==1.13.0 peft==0.19.1 safetensors \
            SimpleITK nibabel scikit-image scipy easydict wandb

Create a .env (git-ignored) with your gated-model credentials:

export HF_TOKEN=...        # CT-RATE + SD3.5 are gated on HuggingFace
export WANDB_API_KEY=...   # optional, for logging

SD3.5 weights are cached under hf_cache/; run offline with HF_HUB_OFFLINE=1 TRANSFORMERS_OFFLINE=1.

Stage 1 — generate projections from text (SD3.5 + LoRA)

set -a; . ./.env; set +a
export HF_HOME=$PWD/hf_cache HF_HUB_OFFLINE=1 TRANSFORMERS_OFFLINE=1

# (1) Precompute the projection dataset, then sanity-check it
python -m diffusion.SD3_singleview_nifti.precompute --num-train 1024 --num-valid 128
python -m diffusion.SD3_singleview_nifti.inspect

# (2) Train the LoRA generator (8x HPU)
python -m diffusion.SD3_singleview_nifti.train \
    --config diffusion/SD3_singleview_nifti/config.yaml --override train.devices=8

Stage 2 — reconstruct 3D from projections (DIF-Gaussian, HPU)

# (1) Build a Beer–Lambert DIF-Gaussian dataset (8 train / 4 held-out)
python scripts/data/generate_difg_dataset.py \
    --train-cubes datasets/sd3_singleview_nifti_smoke/volumes/train_*.nii.gz \
    --test-cubes  datasets/sd3_singleview_nifti_smoke/volumes/valid_*.nii.gz \
    --dataset-name difg_multivol_bl12 --num-projections 180

# (2) Train DIF-Gaussian on HPU (8 train volumes, 10 views over 180°)
cd works/DIF-Gaussian/code
export PYTHONPATH=$PWD:$PWD/../../..
python -u train.py --name multivol_bl12 --dst_name difg_multivol_bl12 \
    --cfg_path ../configs/overfit_train_1232.yaml --num_views 10 --epoch 300

# (3) Evaluate on held-out patients (PSNR/SSIM) + save reconstructed NIfTIs
python -u evaluate.py --name multivol_bl12 --dst_name difg_multivol_bl12 \
    --cfg_path ../configs/overfit_train_1232.yaml --epoch 300 --split test --save_results

Render a GT-vs-reconstruction figure (Fig. in the report):

python scripts/visualizer/render_difg_recon.py \
    --gt   datasets/difg_multivol_bl12/images/valid_81_a_1.nii.gz \
    --pred works/DIF-Gaussian/code/logs/multivol_bl12/results/ep_300/predictions_0.5x/valid_81_a_1.nii.gz \
    --out  recon.png

Reconstruction-paradigm comparison on full CT-RATE (report Table 2 / Fig.)

# (0) Stream CT-RATE -> 256^3 eta cubes (shardable across pods; HF_HOME on Ceph)
python -m scripts.data.generate_ct_cubes --output-dir datasets/ct_cubes_full \
    --num-train 8192 --num-valid 256 --num-shards N --shard-index K
python -m scripts.data.verify_ct_cubes --cube-dir datasets/ct_cubes_full/volumes \
    --proj-dir datasets/<precompute>/projections     # bit-exact reprojection guard

# (1) Classical baselines (training-free, HPU/CPU). --selftest gates orientation.
python -m scripts.recon.fbp       --selftest         # blob phantom, dense views > 35 dB
python -m scripts.recon.iterative --selftest         # PT_HPU_LAZY_MODE=1; blob > 30 dB

# (2) Learned 2D->3D lifting (Lift2D3D): build cube cache + train (1 HPU card)
PT_HPU_LAZY_MODE=1 HABANA_VISIBLE_MODULES=0 python -m \
    diffusion.SD3_singleview_nifti.recon.train_lift2d3d \
    --num-train 1024 --num-val 32 --epochs 80 --name lift_full1024

# (3) Learned implicit field (DIF-Gaussian): full-data dataset + 8-HPU DDP retrain
python scripts/data/generate_difg_dataset.py --dataset-name difg_full1024 \
    --train-cubes datasets/ct_cubes_full/volumes/train_*.nii.gz \
    --test-cubes  datasets/ct_cubes_full/volumes/valid_*.nii.gz --num-projections 180
cd works/DIF-Gaussian/code && PT_HPU_LAZY_MODE=1 torchrun --standalone --nproc_per_node=8 \
    train.py --dist --name difg_full1024_ddp --dst_name difg_full1024 \
    --cfg_path logs/multivol_bl12/config.yaml --num_views 10 --epoch 50 --optimizer adam
python evaluate.py --name difg_full1024_ddp --dst_name difg_full1024 --epoch 50 \
    --cfg_path logs/multivol_bl12/config.yaml --split test --out_res_scale 0.5 --save_results

# (4) Unified 4-method comparison -> figure + metrics.csv (report Table 2)
PT_HPU_LAZY_MODE=1 python -m scripts.recon.compare \
    --cubes datasets/ct_cubes_full/volumes/valid_*.nii.gz \
    --methods fbp iterative lift2d3d difg --views 10 --res 64 --source gt \
    --lift-ckpt outputs/recon/lift_full1024/last.pth \
    --difg-recon-dir works/DIF-Gaussian/code/logs/difg_full1024_ddp/results/ep_50/predictions_0.5x

Headline (held-out, K=10, unified 64³ η-window): DIF-Gaussian 24.7 dB > iterative-TV 22.7 > Lift2D3D 19.3 > FBP 17.5. The earlier "per-scene iterative ≈12 dB" figure was a forward-operator alignment bug, fixed here.

DIF-Gaussian capacity/generalization ablations (report Table 3)

Result	Command
Single-volume overfit (26.8 dB)	`train.py --dst_name difg_<case> --num_views 10` (gs_res 12)
Capacity: gs_res 24 / 20 views	`configs/overfit_v3.yaml` / `--num_views 20`
+ Fourier PE + Adam (SSIM 0.70)	`configs/overfit_v4_pe.yaml --optimizer adam`
Held-out, 8-volume train (24.5 dB)	Stage-2 (1)–(3) above
Held-out, full CT-RATE (27.7 dB)	paradigm-comparison (3) above

Validating the differentiable forward projector

python -m diffusion.SD3_singleview_nifti.joint.smoke.test_projector       # matches camera_sweep_projection
python -m diffusion.SD3_singleview_nifti.joint.smoke.test_difg_forward     # matches the DIF-G dataset projections
python works/DIF-Gaussian/code/tests/test_model_hpu.py                     # full DIF-G fwd/bwd on HPU

Notes

Data, weights, and HF cache live on persistent Ceph (/Data/... inside the pod).
The HPU-portability gotchas (pin transformers==5.8.0, num_workers=0 for DDP) are documented in CLAUDE.md.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
configs/sd3_multiview_ctrate/csv		configs/sd3_multiview_ctrate/csv
data/CT-RATE		data/CT-RATE
diffusion		diffusion
generation/difg_from_train_volume/train_1_a_1_difg		generation/difg_from_train_volume/train_1_a_1_difg
outputs/visualizations		outputs/visualizations
scripts		scripts
utils		utils
works		works
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
_constraints.txt		_constraints.txt
hl-smi_log.txt		hl-smi_log.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Physics-Grounded Text-to-3D Chest CT Synthesis

Repository structure

Environment (Intel Gaudi / HPU)

Stage 1 — generate projections from text (SD3.5 + LoRA)

Stage 2 — reconstruct 3D from projections (DIF-Gaussian, HPU)

Reconstruction-paradigm comparison on full CT-RATE (report Table 2 / Fig.)

DIF-Gaussian capacity/generalization ablations (report Table 3)

Validating the differentiable forward projector

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Physics-Grounded Text-to-3D Chest CT Synthesis

Repository structure

Environment (Intel Gaudi / HPU)

Stage 1 — generate projections from text (SD3.5 + LoRA)

Stage 2 — reconstruct 3D from projections (DIF-Gaussian, HPU)

Reconstruction-paradigm comparison on full CT-RATE (report Table 2 / Fig.)

DIF-Gaussian capacity/generalization ablations (report Table 3)

Validating the differentiable forward projector

Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages