Etai Sella*1, Hao Phung*2, Nitay Amiel3, Or Litany3, Or Patashnik1, Hadar Averbuch-Elor2
1 Tel Aviv University 2 Cornell University 3 Technion - Israel Institute of Technology
This is the official PyTorch implementation of Prox-E.
Text-based 2D image editing models have recently reached an impressive level of maturity, motivating a growing body of work that heavily depends on these models to drive 3D edits. While effective for appearance-based modifications, such 2D-centric 3D editing pipelines often struggle with fine-grained 3D editing, where localized structural changes must be applied while strictly preserving an object's overall identity. To address this limitation, we propose Prox-E, a training-free framework that enables fine-grained 3D control through an explicit, primitive-based geometric abstraction. Our framework first abstracts an input 3D shape into a compact set of geometric primitives. A pretrained vision-language model (VLM) then edits this abstraction to specify primitive-level changes. These structural edits are subsequently used to guide a 3D generative model, enabling fine-grained, localized modifications while preserving unchanged regions of the original shape. Through extensive experiments, we demonstrate that our method consistently balances identity preservation, shape quality, and instruction fidelity more effectively than various existing approaches, including 2D-based 3D editors and training-based methods.
Clone the repo and initialize submodules if your checkout stores prox_e/submodules as git submodules:
git clone https://github.com/etaisella/Prox-E.git
cd Prox-E
git submodule update --init --recursiveCreate the environment:
bash scripts/setup_environment.sh
conda activate prox-eThe setup script creates a Python 3.11 conda environment, installs PyTorch and the remaining Python dependencies, installs the two source-built rasterization packages, and downloads SuperDec checkpoints if they are missing.
Download the TRELLIS-image-large model, and add the following entries to TRELLIS-image-large/pipeline.json:
{
"sparse_structure_encoder": "ckpts/ss_enc_conv3d_16l8_fp16",
"slat_encoder": "ckpts/slat_enc_swin8_B_64l8_fp16"
}Download the TRELLIS-text-large model, and add the following entries to TRELLIS-text-large/pipeline.json:
{
"sparse_structure_encoder": "path/to/TRELLIS-image-large/ckpts/ss_enc_conv3d_16l8_fp16",
"sparse_structure_decoder": "path/to/TRELLIS-image-large/ckpts/ss_dec_conv3d_16l8_fp16",
"slat_encoder": "path/to/TRELLIS-image-large/ckpts/slat_enc_swin8_B_64l8_fp16",
"slat_decoder_gs": "path/to/TRELLIS-image-large/ckpts/slat_dec_gs_swin8_B_64l8gs32_fp16",
"slat_decoder_rf": "path/to/TRELLIS-image-large/ckpts/slat_dec_rf_swin8_B_64l8r16_fp16",
"slat_decoder_mesh": "path/to/TRELLIS-image-large/ckpts/slat_dec_mesh_swin8_B_64l8m256c_fp16",
}The code expects Blender for utility renders. It uses BLENDER_PATH if set, otherwise blender on PATH. On a fresh machine, install Blender and make sure blender is on PATH; otherwise set:
export BLENDER_PATH=/path/to/blenderBy default, Prox-E uses Gemini as the VLM backbone in the proxy editing and prompt parsing stages. Set your Google API key before running the default pipeline:
export GOOGLE_API_KEY=<your-key>Prox-E also supports GPT as the VLM backend. To use it, set your OpenAI API key and pass --vlm gpt:
export OPENAI_API_KEY=<your-key>
python inference.py ... --vlm gptQwen is also supported for local prompt parsing and local VLM proxy editing. The VLM stage loads a local Qwen/Qwen3-VL-<size>-Instruct checkpoint, so no Qwen API key is required for --vlm qwen; choose the checkpoint size with --qwen_model_size if needed:
python inference.py ... --vlm qwen --qwen_model_size 4BNOTE: In our testing the local Qwen models significantly underperformed in proxy editing compared to the high end GPT and Gemini models.
The SuperDec checkpoints are expected under prox_e/submodules/superdec/checkpoints/normalized/. If they are missing, run:
cd prox_e/submodules/superdec
bash scripts/download_checkpoints.sh
cd ../../..We include an demo edit example for each datset used in our work:
ShapeNet:
python inference.py \
--input_mesh demo/shapenet/chair/model_normalized.obj \
--category chair \
--edit_instruction "make the chair 1.5 times wider"Edit3D-Bench:
python inference.py \
--input_mesh demo/edit3dbench/elephant/model.glb \
--category elephant \
--edit_instruction "make the elephant wear a red hat" \
--orientation_index 15Toys4K:
python inference.py \
--input_mesh demo/toys4k/sheep/model.glb \
--category sheep \
--edit_instruction "turn the sheep's head 30 degrees to the right"Final results are saved in the outputs/ folder.
For a custom mesh, set --input_mesh to the mesh file, --category to the object class, and --edit_instruction to the requested edit:
python inference.py \
--input_mesh /path/to/model.glb \
--category lamp \
--edit_instruction "make the lamp shade wider"If the mesh orientation is wrong, render all supported input orientations:
python scripts/orientation_sweep.py --input_mesh /path/to/model.glbOpen the generated orientation_sweep_overview.png, pick the best index, then rerun inference with it:
python inference.py \
--input_mesh /path/to/model.glb \
--category lamp \
--edit_instruction "make the lamp shade wider" \
--orientation_index 12If you change the orientation for a mesh you already processed, use a fresh --output_folder so cached abstractions are not reused.
If you find our work useful in your research, please consider citing:
@misc{sella2026proxefinegrained3dshape,
title={Prox-E: Fine-Grained 3D Shape Editing via Primitive-Based Abstractions},
author={Etai Sella and Hao Phung and Nitay Amiel and Or Litany and Or Patashnik and Hadar Averbuch-Elor},
year={2026},
eprint={2604.23774},
archivePrefix={arXiv},
primaryClass={cs.GR},
eprint={2604.23774},
url={https://arxiv.org/abs/2604.23774},
}
This code builds upon the VoxHammer, SuperDec and TRELLIS repositories, we thank their creators for their great work.
