Code for reproducing all experiments in Graph Heat Field Signatures: Multiscale Feature-Structure Alignment for Graph Learning.
code/
├── ghfs/ # Core GHFS encoding
│ ├── encoding.py # GHFSEncoder class (main entry point)
│ ├── kernels.py # Graph diffusion & feature kernel computations
│ └── utils.py # Preprocessing, scale selection
├── datasets/
│ └── loaders.py # All 10 dataset loaders with correct split protocols
├── models/
│ ├── mlp.py # MLP baseline
│ └── gnn.py # GCN, GraphSAGE, GAT, APPNP, GPS
├── pe/
│ ├── lape.py # Laplacian Positional Encoding
│ └── rwse.py # Random Walk Structural Encoding
├── experiments/
│ ├── trainer.py # Training loop with early stopping
│ ├── main.py # Main experiment runner (all methods × all datasets)
│ └── ablations.py # Ablation studies (channels, scales, hop radius, kernels)
├── analysis/
│ └── alignment.py # Alignment spectrum computation & figures
└── scripts/
├── run_main_experiments.sh
├── run_ablations.sh
├── run_analysis.sh
└── summarize_results.py
# Create environment
conda create -n ghfs python=3.10
conda activate ghfs
# Install PyTorch (adjust CUDA version as needed)
pip install torch --index-url https://download.pytorch.org/whl/cu118
# Install PyG and extensions
pip install torch_geometric
pip install torch_scatter torch_sparse torch_cluster torch_spline_conv \
-f https://data.pyg.org/whl/torch-2.0.0+cu118.html
# Install remaining dependencies
pip install -r requirements.txtAll commands should be run from the code/ directory.
# Full run (~hours on GPU, ~days on CPU)
bash scripts/run_main_experiments.sh
# Single dataset, all methods
bash scripts/run_main_experiments.sh cora
# Single method on single dataset
bash scripts/run_main_experiments.sh texas mlp+ghfsResults written to ./results/results.csv and ./results/results_full.json.
# Console table
python scripts/summarize_results.py
# Also write LaTeX
python scripts/summarize_results.py --latexbash scripts/run_ablations.shResults in ./results/ablations/ablations.json.
Run after main experiments so the accuracy-vs-alignment scatter is populated:
bash scripts/run_analysis.shFigures saved to ./results/figures/.
# MLP baseline
python -m experiments.main --dataset cora --method mlp
# GCN
python -m experiments.main --dataset texas --method gcn
# MLP + GHFS (no message passing, GHFS as fixed encoding)
python -m experiments.main --dataset roman-empire --method mlp+ghfs
# GCN + GHFS
python -m experiments.main --dataset roman-empire --method gcn+ghfs
# GPS + GHFS
python -m experiments.main --dataset cora --method gps+ghfs
# MLP + LapPE (structural encoding baseline)
python -m experiments.main --dataset texas --method mlp+lape
# MLP + RWSE (structural encoding baseline)
python -m experiments.main --dataset texas --method mlp+rwse| Dataset | Protocol | Splits |
|---|---|---|
| Cora, CiteSeer, PubMed | Standard Planetoid public split | 1 |
| Texas, Cornell, Wisconsin, Actor | Pei et al. (2019) 10-split | 10 |
| Roman-Empire, Amazon-Ratings, Tolokers | Platonov et al. (2023) fixed 10-split | 10 |
All datasets are downloaded automatically by PyG on first use.
The following heterophily-aware methods are referenced from their original implementations:
Numbers for these can be taken from the original papers or Platonov et al. (2023).
| Parameter | Default | Meaning |
|---|---|---|
T |
4 | Number of feature scales |
S |
4 | Number of graph diffusion scales |
k |
2 | Diffusion truncation radius |
pca_dim |
64 | PCA target dimension (applied when d > 500) |
cmin |
0.05 | Feature scale lower multiplier |
cmax |
0.25 | Feature scale upper multiplier |
Override via CLI: --ghfs_T 8 --ghfs_S 8 --ghfs_k 3
With default T=S=4: 64 dimensions (4 channels × 4 × 4 scales).
With density channel enabled: 80 dimensions.
Preprocessing timings are logged automatically during fit_transform.
GHFS is computed once and cached to ./cache/ghfs/.
Training time after preprocessing is identical to the base model.
All experiments use --seed 42 by default. The random seed is set for both
torch and numpy at the start of each experiment run. Dataset splits for
Planetoid and HeterophilousGraphDataset are fixed by PyG; WebKB / Actor splits
are the pre-computed masks stored in PyG's dataset objects.