Skip to content

lcmoore/ECE-228-Project

Repository files navigation

Physics-Grounded Text-to-3D Chest CT Synthesis

ECE 228 (Machine Learning for Physical Applications), UC San Diego.

Text → 2D X-ray projections → 3D CT, grounded in the physics of CT imaging. A fine-tuned Stable Diffusion 3.5 model generates Beer–Lambert attenuation projections of the chest from a radiology report; a generalizable neural field (DIF-Gaussian) solves the corresponding sparse-view tomographic inverse problem to recover a 3D attenuation volume. A differentiable, geometry- consistent forward projector ties the two stages together. Everything runs on Intel Gaudi (HPU).

report  ──[SD3.5 + LoRA]──▶  K Beer–Lambert projections  ──[DIF-Gaussian]──▶  3D η volume
                 (Eq. 3: p = 1 − exp(−μ_eff ∫ η dl))        (sparse-view inverse problem)

See report/final.tex for the full write-up and CLAUDE.md for an architectural deep-dive.

Repository structure

Path Contents
diffusion/SD3_singleview_nifti/ Stage 1: SD3.5 LoRA fine-tuning to generate CT projections (precompute, train, sample). Beer–Lambert projection in datamodule_nifti.py::camera_sweep_projection.
diffusion/SD3_singleview_nifti/joint/ Differentiable torch forward projectors (projector.py) + smoke tests (smoke/); coupling utilities (iterative recon, TTO).
diffusion/SD3_singleview_nifti/recon/ Lift2D3D learned 2D→3D lifting model (lift2d3d.py) + DDP training (train_lift2d3d.py).
scripts/recon/ Paradigm-comparison harness: fbp.py, iterative.py (training-free), compare.py (4-method figure + metrics), recon_common.py.
works/DIF-Gaussian/ Learned implicit field: vendored DIF-Gaussian, ported to HPU (pytorch3d kNN → pure-torch, device-agnostic, checkpoint fix).
scripts/data/ Data generators: generate_ct_cubes.py (CT-RATE → 256³ η cubes, shardable), verify_ct_cubes.py, generate_difg_dataset.py (Beer–Lambert recon dataset).
scripts/visualizer/ render_difg_recon.py, sd3_sharpness.py — recon + single-vs-multi-view figures.
report/ Final report (final.tex) + figures.

Environment (Intel Gaudi / HPU)

Base image: vault.habana.ai/.../pytorch-installer-2.9.0-py311. Install the known-good stack (newer versions break on the Habana torch build):

pip install transformers==5.8.0 tokenizers==0.22.2 huggingface_hub==1.14.0 \
            diffusers==0.38.0 accelerate==1.13.0 peft==0.19.1 safetensors \
            SimpleITK nibabel scikit-image scipy easydict wandb

Create a .env (git-ignored) with your gated-model credentials:

export HF_TOKEN=...        # CT-RATE + SD3.5 are gated on HuggingFace
export WANDB_API_KEY=...   # optional, for logging

SD3.5 weights are cached under hf_cache/; run offline with HF_HUB_OFFLINE=1 TRANSFORMERS_OFFLINE=1.

Stage 1 — generate projections from text (SD3.5 + LoRA)

set -a; . ./.env; set +a
export HF_HOME=$PWD/hf_cache HF_HUB_OFFLINE=1 TRANSFORMERS_OFFLINE=1

# (1) Precompute the projection dataset, then sanity-check it
python -m diffusion.SD3_singleview_nifti.precompute --num-train 1024 --num-valid 128
python -m diffusion.SD3_singleview_nifti.inspect

# (2) Train the LoRA generator (8x HPU)
python -m diffusion.SD3_singleview_nifti.train \
    --config diffusion/SD3_singleview_nifti/config.yaml --override train.devices=8

Stage 2 — reconstruct 3D from projections (DIF-Gaussian, HPU)

# (1) Build a Beer–Lambert DIF-Gaussian dataset (8 train / 4 held-out)
python scripts/data/generate_difg_dataset.py \
    --train-cubes datasets/sd3_singleview_nifti_smoke/volumes/train_*.nii.gz \
    --test-cubes  datasets/sd3_singleview_nifti_smoke/volumes/valid_*.nii.gz \
    --dataset-name difg_multivol_bl12 --num-projections 180

# (2) Train DIF-Gaussian on HPU (8 train volumes, 10 views over 180°)
cd works/DIF-Gaussian/code
export PYTHONPATH=$PWD:$PWD/../../..
python -u train.py --name multivol_bl12 --dst_name difg_multivol_bl12 \
    --cfg_path ../configs/overfit_train_1232.yaml --num_views 10 --epoch 300

# (3) Evaluate on held-out patients (PSNR/SSIM) + save reconstructed NIfTIs
python -u evaluate.py --name multivol_bl12 --dst_name difg_multivol_bl12 \
    --cfg_path ../configs/overfit_train_1232.yaml --epoch 300 --split test --save_results

Render a GT-vs-reconstruction figure (Fig. in the report):

python scripts/visualizer/render_difg_recon.py \
    --gt   datasets/difg_multivol_bl12/images/valid_81_a_1.nii.gz \
    --pred works/DIF-Gaussian/code/logs/multivol_bl12/results/ep_300/predictions_0.5x/valid_81_a_1.nii.gz \
    --out  recon.png

Reconstruction-paradigm comparison on full CT-RATE (report Table 2 / Fig.)

# (0) Stream CT-RATE -> 256^3 eta cubes (shardable across pods; HF_HOME on Ceph)
python -m scripts.data.generate_ct_cubes --output-dir datasets/ct_cubes_full \
    --num-train 8192 --num-valid 256 --num-shards N --shard-index K
python -m scripts.data.verify_ct_cubes --cube-dir datasets/ct_cubes_full/volumes \
    --proj-dir datasets/<precompute>/projections     # bit-exact reprojection guard

# (1) Classical baselines (training-free, HPU/CPU). --selftest gates orientation.
python -m scripts.recon.fbp       --selftest         # blob phantom, dense views > 35 dB
python -m scripts.recon.iterative --selftest         # PT_HPU_LAZY_MODE=1; blob > 30 dB

# (2) Learned 2D->3D lifting (Lift2D3D): build cube cache + train (1 HPU card)
PT_HPU_LAZY_MODE=1 HABANA_VISIBLE_MODULES=0 python -m \
    diffusion.SD3_singleview_nifti.recon.train_lift2d3d \
    --num-train 1024 --num-val 32 --epochs 80 --name lift_full1024

# (3) Learned implicit field (DIF-Gaussian): full-data dataset + 8-HPU DDP retrain
python scripts/data/generate_difg_dataset.py --dataset-name difg_full1024 \
    --train-cubes datasets/ct_cubes_full/volumes/train_*.nii.gz \
    --test-cubes  datasets/ct_cubes_full/volumes/valid_*.nii.gz --num-projections 180
cd works/DIF-Gaussian/code && PT_HPU_LAZY_MODE=1 torchrun --standalone --nproc_per_node=8 \
    train.py --dist --name difg_full1024_ddp --dst_name difg_full1024 \
    --cfg_path logs/multivol_bl12/config.yaml --num_views 10 --epoch 50 --optimizer adam
python evaluate.py --name difg_full1024_ddp --dst_name difg_full1024 --epoch 50 \
    --cfg_path logs/multivol_bl12/config.yaml --split test --out_res_scale 0.5 --save_results

# (4) Unified 4-method comparison -> figure + metrics.csv (report Table 2)
PT_HPU_LAZY_MODE=1 python -m scripts.recon.compare \
    --cubes datasets/ct_cubes_full/volumes/valid_*.nii.gz \
    --methods fbp iterative lift2d3d difg --views 10 --res 64 --source gt \
    --lift-ckpt outputs/recon/lift_full1024/last.pth \
    --difg-recon-dir works/DIF-Gaussian/code/logs/difg_full1024_ddp/results/ep_50/predictions_0.5x

Headline (held-out, K=10, unified 64³ η-window): DIF-Gaussian 24.7 dB > iterative-TV 22.7 > Lift2D3D 19.3 > FBP 17.5. The earlier "per-scene iterative ≈12 dB" figure was a forward-operator alignment bug, fixed here.

DIF-Gaussian capacity/generalization ablations (report Table 3)

Result Command
Single-volume overfit (26.8 dB) train.py --dst_name difg_<case> --num_views 10 (gs_res 12)
Capacity: gs_res 24 / 20 views configs/overfit_v3.yaml / --num_views 20
+ Fourier PE + Adam (SSIM 0.70) configs/overfit_v4_pe.yaml --optimizer adam
Held-out, 8-volume train (24.5 dB) Stage-2 (1)–(3) above
Held-out, full CT-RATE (27.7 dB) paradigm-comparison (3) above

Validating the differentiable forward projector

python -m diffusion.SD3_singleview_nifti.joint.smoke.test_projector       # matches camera_sweep_projection
python -m diffusion.SD3_singleview_nifti.joint.smoke.test_difg_forward     # matches the DIF-G dataset projections
python works/DIF-Gaussian/code/tests/test_model_hpu.py                     # full DIF-G fwd/bwd on HPU

Notes

  • Data, weights, and HF cache live on persistent Ceph (/Data/... inside the pod).
  • The HPU-portability gotchas (pin transformers==5.8.0, num_workers=0 for DDP) are documented in CLAUDE.md.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors