Adding top-k attention edges script by naman-goyall · Pull Request #35 · AI2Science/vizfold-foundation

naman-goyall · 2025-12-08T21:28:00Z

Add standalone script for top-k attention edge extraction

This adds a new command line utility generate_attention_edges.py that
converts saved attention map pickles (produced by generate_viz_data.py)
into compact TSV edge lists suitable for downstream visualization.

Key features:
• Supports msa row, msa col, and pair attention
• Exports one TSV per layer/head and an aggregated TSV per attention type
• Configurable top-k filtering to control output size
• Optional removal of self-edges (drop diagonal)

This provides a more direct bridge between OpenFold attention outputs
and the existing visualization demos in this repository.

Example usage:
python generate_attention_edges.py
attention_maps

--output_dir attention_edges
--top_k 500
--drop_diagonal

Outputs one TSV per layer/head and a combined TSV per attention type
(msa_row_attention, msa_col_attention, pair_attention).

SreeDan · 2025-12-08T21:56:53Z

+            return path
+
+    # Fallback - scan directory
+    for fname in os.listdir(attn_map_dir):


Is there a risk here that the file doesn't exist and throws an error? It might be good to make a check before hand

SreeDan · 2025-12-08T21:58:49Z

+                continue
+
+            # Save per layer head TSV
+            per_file = os.path.join(


You might want to sanitize the file name, i.e. removing whitespaces, quotes, etc.

SreeDan · 2025-12-08T22:00:49Z

The performance for extracting top_k per (layer, head) is the dominating cost here and would take a really long time for very large sequences and many heads/layers. For the future, we may want to look into Cyython/numba or a batched top-k across heads to reduce the Python overhead. It might be overkill for our current purposes though.

PranavNarala1 · 2025-12-08T23:06:43Z

+                    )
+                )
+
+    if combined_rows:


The combined TSV writer is really helpful, but it seems like it duplicates a lot of logic from the per-head writer. Maybe having a small helper function for table writing would improve readability and reduce the chance of the two formats drifting apart if updates are made later.

PranavNarala1 · 2025-12-08T23:07:24Z

+    #   - "msa_row_attention": [L, H, N, N]
+    #   - "msa_col_attention": [L, H, N, N]
+    #   - "pair_attention": [L, H, N, N]
+    attn_types = args.attn_types


The auto-detection of available attention types is helpful, but it might mask cases where expected keys are missing. It could be safer to alert the user when the defaults don't match the typical "msa_row_attention", "msa_col_attention", "pair_attention" keys

adding top-k attention edges script

9a80e53

SreeDan reviewed Dec 8, 2025

View reviewed changes

PranavNarala1 reviewed Dec 8, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding top-k attention edges script#35

Adding top-k attention edges script#35
naman-goyall wants to merge 1 commit into
AI2Science:mainfrom
naman-goyall:main

naman-goyall commented Dec 8, 2025

Uh oh!

SreeDan Dec 8, 2025

Uh oh!

SreeDan Dec 8, 2025

Uh oh!

SreeDan Dec 8, 2025

Uh oh!

PranavNarala1 Dec 8, 2025

Uh oh!

PranavNarala1 Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

naman-goyall commented Dec 8, 2025

Uh oh!

SreeDan Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

SreeDan Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

SreeDan Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

PranavNarala1 Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

PranavNarala1 Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants