Adding top-k attention edges script#35
Conversation
| return path | ||
|
|
||
| # Fallback - scan directory | ||
| for fname in os.listdir(attn_map_dir): |
There was a problem hiding this comment.
Is there a risk here that the file doesn't exist and throws an error? It might be good to make a check before hand
| continue | ||
|
|
||
| # Save per layer head TSV | ||
| per_file = os.path.join( |
There was a problem hiding this comment.
You might want to sanitize the file name, i.e. removing whitespaces, quotes, etc.
There was a problem hiding this comment.
The performance for extracting top_k per (layer, head) is the dominating cost here and would take a really long time for very large sequences and many heads/layers. For the future, we may want to look into Cyython/numba or a batched top-k across heads to reduce the Python overhead. It might be overkill for our current purposes though.
| ) | ||
| ) | ||
|
|
||
| if combined_rows: |
There was a problem hiding this comment.
The combined TSV writer is really helpful, but it seems like it duplicates a lot of logic from the per-head writer. Maybe having a small helper function for table writing would improve readability and reduce the chance of the two formats drifting apart if updates are made later.
| # - "msa_row_attention": [L, H, N, N] | ||
| # - "msa_col_attention": [L, H, N, N] | ||
| # - "pair_attention": [L, H, N, N] | ||
| attn_types = args.attn_types |
There was a problem hiding this comment.
The auto-detection of available attention types is helpful, but it might mask cases where expected keys are missing. It could be safer to alert the user when the defaults don't match the typical "msa_row_attention", "msa_col_attention", "pair_attention" keys
Add standalone script for top-k attention edge extraction
This adds a new command line utility
generate_attention_edges.pythatconverts saved attention map pickles (produced by
generate_viz_data.py)into compact TSV edge lists suitable for downstream visualization.
Key features:
• Supports msa row, msa col, and pair attention
• Exports one TSV per layer/head and an aggregated TSV per attention type
• Configurable top-k filtering to control output size
• Optional removal of self-edges (drop diagonal)
This provides a more direct bridge between OpenFold attention outputs
and the existing visualization demos in this repository.
Example usage:
python generate_attention_edges.py
attention_maps
--output_dir attention_edges
--top_k 500
--drop_diagonal
Outputs one TSV per layer/head and a combined TSV per attention type
(msa_row_attention, msa_col_attention, pair_attention).