While inspecting the parameter distribution of the model, I noticed that ~43% of the total trainable parameters are concentrated in the linear layers alone:
| Component |
Parameters |
gnn |
1.5M |
linear_layers |
365K |
final_layer |
783K |
| Total |
2.648M |
I was wondering about a few architectural questions:
- Is this heavy reliance on linear layers intentional, or could it be simplified? It may overshadow the representational power of the GNN backbone.
- Could we experiment with a single linear head, as commonly done in standard models like those in PyG?
- Alternatively, would it make sense to shift more capacity back into the GNN module, where the inductive bias is typically stronger?
Metrics:
train_macro-f1: 0.960
train_micro-f1: 0.990
val_macro-f1: 0.699
val_micro-f1: 0.911
| Metric |
Train |
Validation |
Gap |
| Macro F1 |
0.960 |
0.699 |
0.261 |
| Micro F1 |
0.990 |
0.911 |
0.079 |
While inspecting the parameter distribution of the model, I noticed that ~43% of the total trainable parameters are concentrated in the linear layers alone:
gnnlinear_layersfinal_layerI was wondering about a few architectural questions:
Metrics:
train_macro-f1: 0.960train_micro-f1: 0.990val_macro-f1: 0.699val_micro-f1: 0.911