ECE 228 (Machine Learning for Physical Applications), UC San Diego.
Text → 2D X-ray projections → 3D CT, grounded in the physics of CT imaging. A fine-tuned Stable Diffusion 3.5 model generates Beer–Lambert attenuation projections of the chest from a radiology report; a generalizable neural field (DIF-Gaussian) solves the corresponding sparse-view tomographic inverse problem to recover a 3D attenuation volume. A differentiable, geometry- consistent forward projector ties the two stages together. Everything runs on Intel Gaudi (HPU).
report ──[SD3.5 + LoRA]──▶ K Beer–Lambert projections ──[DIF-Gaussian]──▶ 3D η volume
(Eq. 3: p = 1 − exp(−μ_eff ∫ η dl)) (sparse-view inverse problem)
See report/final.tex for the full write-up and CLAUDE.md for an architectural
deep-dive.
| Path | Contents |
|---|---|
diffusion/SD3_singleview_nifti/ |
Stage 1: SD3.5 LoRA fine-tuning to generate CT projections (precompute, train, sample). Beer–Lambert projection in datamodule_nifti.py::camera_sweep_projection. |
diffusion/SD3_singleview_nifti/joint/ |
Differentiable torch forward projectors (projector.py) + smoke tests (smoke/); coupling utilities (iterative recon, TTO). |
diffusion/SD3_singleview_nifti/recon/ |
Lift2D3D learned 2D→3D lifting model (lift2d3d.py) + DDP training (train_lift2d3d.py). |
scripts/recon/ |
Paradigm-comparison harness: fbp.py, iterative.py (training-free), compare.py (4-method figure + metrics), recon_common.py. |
works/DIF-Gaussian/ |
Learned implicit field: vendored DIF-Gaussian, ported to HPU (pytorch3d kNN → pure-torch, device-agnostic, checkpoint fix). |
scripts/data/ |
Data generators: generate_ct_cubes.py (CT-RATE → 256³ η cubes, shardable), verify_ct_cubes.py, generate_difg_dataset.py (Beer–Lambert recon dataset). |
scripts/visualizer/ |
render_difg_recon.py, sd3_sharpness.py — recon + single-vs-multi-view figures. |
report/ |
Final report (final.tex) + figures. |
Base image: vault.habana.ai/.../pytorch-installer-2.9.0-py311. Install the
known-good stack (newer versions break on the Habana torch build):
pip install transformers==5.8.0 tokenizers==0.22.2 huggingface_hub==1.14.0 \
diffusers==0.38.0 accelerate==1.13.0 peft==0.19.1 safetensors \
SimpleITK nibabel scikit-image scipy easydict wandbCreate a .env (git-ignored) with your gated-model credentials:
export HF_TOKEN=... # CT-RATE + SD3.5 are gated on HuggingFace
export WANDB_API_KEY=... # optional, for loggingSD3.5 weights are cached under hf_cache/; run offline with
HF_HUB_OFFLINE=1 TRANSFORMERS_OFFLINE=1.
set -a; . ./.env; set +a
export HF_HOME=$PWD/hf_cache HF_HUB_OFFLINE=1 TRANSFORMERS_OFFLINE=1
# (1) Precompute the projection dataset, then sanity-check it
python -m diffusion.SD3_singleview_nifti.precompute --num-train 1024 --num-valid 128
python -m diffusion.SD3_singleview_nifti.inspect
# (2) Train the LoRA generator (8x HPU)
python -m diffusion.SD3_singleview_nifti.train \
--config diffusion/SD3_singleview_nifti/config.yaml --override train.devices=8# (1) Build a Beer–Lambert DIF-Gaussian dataset (8 train / 4 held-out)
python scripts/data/generate_difg_dataset.py \
--train-cubes datasets/sd3_singleview_nifti_smoke/volumes/train_*.nii.gz \
--test-cubes datasets/sd3_singleview_nifti_smoke/volumes/valid_*.nii.gz \
--dataset-name difg_multivol_bl12 --num-projections 180
# (2) Train DIF-Gaussian on HPU (8 train volumes, 10 views over 180°)
cd works/DIF-Gaussian/code
export PYTHONPATH=$PWD:$PWD/../../..
python -u train.py --name multivol_bl12 --dst_name difg_multivol_bl12 \
--cfg_path ../configs/overfit_train_1232.yaml --num_views 10 --epoch 300
# (3) Evaluate on held-out patients (PSNR/SSIM) + save reconstructed NIfTIs
python -u evaluate.py --name multivol_bl12 --dst_name difg_multivol_bl12 \
--cfg_path ../configs/overfit_train_1232.yaml --epoch 300 --split test --save_resultsRender a GT-vs-reconstruction figure (Fig. in the report):
python scripts/visualizer/render_difg_recon.py \
--gt datasets/difg_multivol_bl12/images/valid_81_a_1.nii.gz \
--pred works/DIF-Gaussian/code/logs/multivol_bl12/results/ep_300/predictions_0.5x/valid_81_a_1.nii.gz \
--out recon.png# (0) Stream CT-RATE -> 256^3 eta cubes (shardable across pods; HF_HOME on Ceph)
python -m scripts.data.generate_ct_cubes --output-dir datasets/ct_cubes_full \
--num-train 8192 --num-valid 256 --num-shards N --shard-index K
python -m scripts.data.verify_ct_cubes --cube-dir datasets/ct_cubes_full/volumes \
--proj-dir datasets/<precompute>/projections # bit-exact reprojection guard
# (1) Classical baselines (training-free, HPU/CPU). --selftest gates orientation.
python -m scripts.recon.fbp --selftest # blob phantom, dense views > 35 dB
python -m scripts.recon.iterative --selftest # PT_HPU_LAZY_MODE=1; blob > 30 dB
# (2) Learned 2D->3D lifting (Lift2D3D): build cube cache + train (1 HPU card)
PT_HPU_LAZY_MODE=1 HABANA_VISIBLE_MODULES=0 python -m \
diffusion.SD3_singleview_nifti.recon.train_lift2d3d \
--num-train 1024 --num-val 32 --epochs 80 --name lift_full1024
# (3) Learned implicit field (DIF-Gaussian): full-data dataset + 8-HPU DDP retrain
python scripts/data/generate_difg_dataset.py --dataset-name difg_full1024 \
--train-cubes datasets/ct_cubes_full/volumes/train_*.nii.gz \
--test-cubes datasets/ct_cubes_full/volumes/valid_*.nii.gz --num-projections 180
cd works/DIF-Gaussian/code && PT_HPU_LAZY_MODE=1 torchrun --standalone --nproc_per_node=8 \
train.py --dist --name difg_full1024_ddp --dst_name difg_full1024 \
--cfg_path logs/multivol_bl12/config.yaml --num_views 10 --epoch 50 --optimizer adam
python evaluate.py --name difg_full1024_ddp --dst_name difg_full1024 --epoch 50 \
--cfg_path logs/multivol_bl12/config.yaml --split test --out_res_scale 0.5 --save_results
# (4) Unified 4-method comparison -> figure + metrics.csv (report Table 2)
PT_HPU_LAZY_MODE=1 python -m scripts.recon.compare \
--cubes datasets/ct_cubes_full/volumes/valid_*.nii.gz \
--methods fbp iterative lift2d3d difg --views 10 --res 64 --source gt \
--lift-ckpt outputs/recon/lift_full1024/last.pth \
--difg-recon-dir works/DIF-Gaussian/code/logs/difg_full1024_ddp/results/ep_50/predictions_0.5xHeadline (held-out, K=10, unified 64³ η-window): DIF-Gaussian 24.7 dB > iterative-TV 22.7 > Lift2D3D 19.3 > FBP 17.5. The earlier "per-scene iterative ≈12 dB" figure was a forward-operator alignment bug, fixed here.
| Result | Command |
|---|---|
| Single-volume overfit (26.8 dB) | train.py --dst_name difg_<case> --num_views 10 (gs_res 12) |
| Capacity: gs_res 24 / 20 views | configs/overfit_v3.yaml / --num_views 20 |
| + Fourier PE + Adam (SSIM 0.70) | configs/overfit_v4_pe.yaml --optimizer adam |
| Held-out, 8-volume train (24.5 dB) | Stage-2 (1)–(3) above |
| Held-out, full CT-RATE (27.7 dB) | paradigm-comparison (3) above |
python -m diffusion.SD3_singleview_nifti.joint.smoke.test_projector # matches camera_sweep_projection
python -m diffusion.SD3_singleview_nifti.joint.smoke.test_difg_forward # matches the DIF-G dataset projections
python works/DIF-Gaussian/code/tests/test_model_hpu.py # full DIF-G fwd/bwd on HPU- Data, weights, and HF cache live on persistent Ceph (
/Data/...inside the pod). - The HPU-portability gotchas (pin
transformers==5.8.0,num_workers=0for DDP) are documented inCLAUDE.md.