GHFS Experiments

Code for reproducing all experiments in Graph Heat Field Signatures: Multiscale Feature-Structure Alignment for Graph Learning.

Structure

code/
├── ghfs/               # Core GHFS encoding
│   ├── encoding.py     # GHFSEncoder class (main entry point)
│   ├── kernels.py      # Graph diffusion & feature kernel computations
│   └── utils.py        # Preprocessing, scale selection
├── datasets/
│   └── loaders.py      # All 10 dataset loaders with correct split protocols
├── models/
│   ├── mlp.py          # MLP baseline
│   └── gnn.py          # GCN, GraphSAGE, GAT, APPNP, GPS
├── pe/
│   ├── lape.py         # Laplacian Positional Encoding
│   └── rwse.py         # Random Walk Structural Encoding
├── experiments/
│   ├── trainer.py      # Training loop with early stopping
│   ├── main.py         # Main experiment runner (all methods × all datasets)
│   └── ablations.py    # Ablation studies (channels, scales, hop radius, kernels)
├── analysis/
│   └── alignment.py    # Alignment spectrum computation & figures
└── scripts/
    ├── run_main_experiments.sh
    ├── run_ablations.sh
    ├── run_analysis.sh
    └── summarize_results.py

Setup

# Create environment
conda create -n ghfs python=3.10
conda activate ghfs

# Install PyTorch (adjust CUDA version as needed)
pip install torch --index-url https://download.pytorch.org/whl/cu118

# Install PyG and extensions
pip install torch_geometric
pip install torch_scatter torch_sparse torch_cluster torch_spline_conv \
    -f https://data.pyg.org/whl/torch-2.0.0+cu118.html

# Install remaining dependencies
pip install -r requirements.txt

Running Experiments

All commands should be run from the code/ directory.

Table 2: Main results (all methods × all datasets)

# Full run (~hours on GPU, ~days on CPU)
bash scripts/run_main_experiments.sh

# Single dataset, all methods
bash scripts/run_main_experiments.sh cora

# Single method on single dataset
bash scripts/run_main_experiments.sh texas mlp+ghfs

Results written to ./results/results.csv and ./results/results_full.json.

Print / export results table

# Console table
python scripts/summarize_results.py

# Also write LaTeX
python scripts/summarize_results.py --latex

Tables 3-6: Ablation studies

bash scripts/run_ablations.sh

Results in ./results/ablations/ablations.json.

Alignment figures (Section 4.4)

Run after main experiments so the accuracy-vs-alignment scatter is populated:

bash scripts/run_analysis.sh

Figures saved to ./results/figures/.

Individual method examples

# MLP baseline
python -m experiments.main --dataset cora --method mlp

# GCN
python -m experiments.main --dataset texas --method gcn

# MLP + GHFS (no message passing, GHFS as fixed encoding)
python -m experiments.main --dataset roman-empire --method mlp+ghfs

# GCN + GHFS
python -m experiments.main --dataset roman-empire --method gcn+ghfs

# GPS + GHFS
python -m experiments.main --dataset cora --method gps+ghfs

# MLP + LapPE (structural encoding baseline)
python -m experiments.main --dataset texas --method mlp+lape

# MLP + RWSE (structural encoding baseline)
python -m experiments.main --dataset texas --method mlp+rwse

Datasets

Dataset	Protocol	Splits
Cora, CiteSeer, PubMed	Standard Planetoid public split	1
Texas, Cornell, Wisconsin, Actor	Pei et al. (2019) 10-split	10
Roman-Empire, Amazon-Ratings, Tolokers	Platonov et al. (2023) fixed 10-split	10

All datasets are downloaded automatically by PyG on first use.

Baselines NOT implemented here

The following heterophily-aware methods are referenced from their original implementations:

H2GCN — GitHub
GPRGNN — GitHub
ACM-GCN — GitHub
LINKX — GitHub

Numbers for these can be taken from the original papers or Platonov et al. (2023).

GHFS hyperparameters

Parameter	Default	Meaning
`T`	4	Number of feature scales
`S`	4	Number of graph diffusion scales
`k`	2	Diffusion truncation radius
`pca_dim`	64	PCA target dimension (applied when `d > 500`)
`cmin`	0.05	Feature scale lower multiplier
`cmax`	0.25	Feature scale upper multiplier

Override via CLI: --ghfs_T 8 --ghfs_S 8 --ghfs_k 3

Encoding dimension

With default T=S=4: 64 dimensions (4 channels × 4 × 4 scales). With density channel enabled: 80 dimensions.

Scalability (Table 7)

Preprocessing timings are logged automatically during fit_transform. GHFS is computed once and cached to ./cache/ghfs/. Training time after preprocessing is identical to the base model.

Reproducibility

All experiments use --seed 42 by default. The random seed is set for both torch and numpy at the start of each experiment run. Dataset splits for Planetoid and HeterophilousGraphDataset are fixed by PyG; WebKB / Actor splits are the pre-computed masks stored in PyG's dataset objects.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GHFS Experiments

Structure

Setup

Running Experiments

Table 2: Main results (all methods × all datasets)

Print / export results table

Tables 3-6: Ablation studies

Alignment figures (Section 4.4)

Individual method examples

Datasets

Baselines NOT implemented here

GHFS hyperparameters

Encoding dimension

Scalability (Table 7)

Reproducibility

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
analysis		analysis
datasets		datasets
experiments		experiments
ghfs		ghfs
models		models
pe		pe
results		results
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

GHFS Experiments

Structure

Setup

Running Experiments

Table 2: Main results (all methods × all datasets)

Print / export results table

Tables 3-6: Ablation studies

Alignment figures (Section 4.4)

Individual method examples

Datasets

Baselines NOT implemented here

GHFS hyperparameters

Encoding dimension

Scalability (Table 7)

Reproducibility

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages