feat: add RMSD metric#227
Conversation
|
Warning Review limit reached
More reviews will be available in 38 minutes and 55 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
📝 WalkthroughWalkthroughAdds a new AllAtomRMSD metric that computes all-atom RMSD between predicted and reference atom stacks (optional Kabsch superposition), groups per-residue RMSD by token, supports mask-based residue selection for reported scores, and includes tests plus shared fixtures for altloc/backbone selections. ChangesAllAtomRMSD Metric Implementation
Test Fixtures & Refactoring
AllAtomRMSD Tests
Sequence Diagram(s)sequenceDiagram
participant Test
participant AllAtomRMSD
participant Annotator
participant Filter
participant Kabsch
participant RMSDCalc
Test->>AllAtomRMSD: compute(predicted, reference, selection?)
AllAtomRMSD->>Annotator: annotate global token IDs
Annotator-->>AllAtomRMSD: annotated stacks
AllAtomRMSD->>Filter: filter to common atoms (matching order)
Filter-->>AllAtomRMSD: filtered stacks
alt superimpose = True
AllAtomRMSD->>Kabsch: compute alignment (pred → ref)
Kabsch-->>AllAtomRMSD: superimposed predicted coords
end
AllAtomRMSD->>RMSDCalc: compute global RMSD per model
RMSDCalc-->>AllAtomRMSD: per-model RMSD series
AllAtomRMSD->>Annotator: group atoms by token -> residue keys
Annotator-->>AllAtomRMSD: per-residue atom groups
alt selection provided
AllAtomRMSD->>AllAtomRMSD: mask -> selected token IDs
end
AllAtomRMSD->>RMSDCalc: compute per-residue RMSD lists
RMSDCalc-->>AllAtomRMSD: residue_rmsd_scores
AllAtomRMSD-->>Test: {best_of_1_rmsd, best_of_N_rmsd, residue_rmsd_scores, optional logs}
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
Adds an all-atom RMSD metric to sampleworks.metrics with residue-level breakdown, plus tests, to enable RMSD-based evaluation consistent with existing LDDT metric outputs.
Changes:
- Introduce
AllAtomRMSDmetric producing global and per-residue RMSD outputs. - Add end-to-end and identity tests for the RMSD metric.
- Refactor LDDT metric tests to use shared fixtures in a new
tests/metrics/conftest.py.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/sampleworks/metrics/rmsd.py |
Adds the new RMSD metric implementation and output schema. |
tests/metrics/test_rmsd_metrics.py |
Adds RMSD-specific test coverage (identity, end-to-end, selection behavior). |
tests/metrics/conftest.py |
Adds shared altloc/backbone fixtures used across metric tests. |
tests/metrics/test_lddt_metrics.py |
Removes duplicated fixtures in favor of the shared conftest fixtures. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
tests/metrics/test_rmsd_metrics.py (1)
9-66: Consider adding coverage forsuperimpose=Trueandlog_rmsd_for_every_batch=True.The current tests exercise only the default constructor. Both
superimpose(Kabsch path, including its subtle interaction withselection— see rmsd.py review) andlog_rmsd_for_every_batch(which mutates the result dict) are untested. A quick test of the Kabsch path would be to rigidly rotatealtlocA_backbonecoords, compute RMSD with and withoutsuperimpose=True, and assert the superimposed RMSD is ~0 while the un-superimposed RMSD is large.Want me to draft the additional tests, including one that composes
superimpose=Truewith a non-Noneselectionto lock in behavior?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/metrics/test_rmsd_metrics.py` around lines 9 - 66, Add tests that exercise AllAtomRMSD.compute with the optional flags: (1) create a rigidly rotated copy of altlocA_backbone and assert that compute(..., superimpose=False) yields a large best_of_1_rmsd while compute(..., superimpose=True) yields ~0, and also verify combining superimpose=True with a non-None selection string keeps the global best_of_1_rmsd equal to the unselected call; (2) call compute with log_rmsd_for_every_batch=True and assert the returned dict is mutated to include the per-batch/log entries (e.g., additional keys or batch-level RMSD structure) compared to the default call; reference AllAtomRMSD, compute, superimpose, log_rmsd_for_every_batch, and selection to locate where to add these new tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/sampleworks/metrics/rmsd.py`:
- Around line 145-149: The result dict currently sets both "best_of_1_rmsd" and
f"best_of_{len(global_rmsd)}_rmsd", causing a key collision when
len(global_rmsd) == 1; update the code that builds the result (the result:
dict[...] assignment using global_rmsd and residue_rmsd_scores) to always
include "best_of_1_rmsd" but only add the f"best_of_{len(global_rmsd)}_rmsd"
entry when len(global_rmsd) > 1 (i.e., guard the min-across-N key with an if
len(global_rmsd) > 1 to avoid overwriting or duplicating the same key).
- Around line 69-75: The docstring incorrectly uses the term "superpose" but the
actual parameter is named "superimpose"; update the docstring text around the
selection/residue_rmsd_scores description to replace "superpose=True" with
"superimpose=True" (and any other occurrences of "superpose") so it matches the
function signature (e.g., where residue_rmsd_scores and the superimpose
parameter are documented) and keep the clarifying sentence that atoms used for
Kabsch superposition/superimposition always use every atom common to the
predicted and reference stacks.
- Around line 108-125: The code may lose atomworks-added methods on
pred_aa_stack when you reassign it with superimpose(pred_aa_stack), causing
mask() to be missing when computing selected_token_ids; move the selection logic
to run before calling superimpose: compute mask = pred_aa_stack.mask and derive
selected_token_ids from the original pred_aa_stack (e.g., selected_arr =
pred_aa_stack[0, mask] and unique token_id extraction) while keeping the
superimpose(pred_aa_stack) call afterward so you only reassign pred_aa_stack
after selected_token_ids is resolved.
---
Nitpick comments:
In `@tests/metrics/test_rmsd_metrics.py`:
- Around line 9-66: Add tests that exercise AllAtomRMSD.compute with the
optional flags: (1) create a rigidly rotated copy of altlocA_backbone and assert
that compute(..., superimpose=False) yields a large best_of_1_rmsd while
compute(..., superimpose=True) yields ~0, and also verify combining
superimpose=True with a non-None selection string keeps the global
best_of_1_rmsd equal to the unselected call; (2) call compute with
log_rmsd_for_every_batch=True and assert the returned dict is mutated to include
the per-batch/log entries (e.g., additional keys or batch-level RMSD structure)
compared to the default call; reference AllAtomRMSD, compute, superimpose,
log_rmsd_for_every_batch, and selection to locate where to add these new tests.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: be81920d-2553-4651-b5af-48727b2bae6d
📒 Files selected for processing (4)
src/sampleworks/metrics/rmsd.pytests/metrics/conftest.pytests/metrics/test_lddt_metrics.pytests/metrics/test_rmsd_metrics.py
💤 Files with no reviewable changes (1)
- tests/metrics/test_lddt_metrics.py
2d3c16d to
462743f
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
src/sampleworks/metrics/rmsd.py (2)
145-149:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winKey collision when N=1:
best_of_1_rmsdandbest_of_{N}_rmsdresolve to the same key.When the stack contains a single model, lines 146 and 147 both write to
"best_of_1_rmsd". While the values are identical, downstream consumers expecting a distinctbest_of_{N}_rmsdentry when N > 1 may break. Guard the min-across-N key withif len(global_rmsd) > 1.🔧 Proposed fix
result: dict[str, Any] = { "best_of_1_rmsd": float(global_rmsd[0]), - f"best_of_{len(global_rmsd)}_rmsd": float(global_rmsd.min()), "residue_rmsd_scores": residue_rmsd_scores, } + if len(global_rmsd) > 1: + result[f"best_of_{len(global_rmsd)}_rmsd"] = float(global_rmsd.min())🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/sampleworks/metrics/rmsd.py` around lines 145 - 149, The result dict currently always sets both "best_of_1_rmsd" and f"best_of_{len(global_rmsd)}_rmsd", causing a key collision when len(global_rmsd)==1; update the code that builds result (the block referencing global_rmsd and residue_rmsd_scores) to always include "best_of_1_rmsd" but only add the f"best_of_{len(global_rmsd)}_rmsd" entry when len(global_rmsd) > 1 (i.e., guard the min-across-N key with if len(global_rmsd) > 1) so downstream consumers see the aggregated key only for N>1.
108-125:⚠️ Potential issue | 🟠 Major | ⚖️ Poor tradeoffPotential runtime failure when combining
superimpose=Truewithselection.The code reassigns
pred_aa_stackon line 109 viasuperimpose(), then accesses.mask()on line 116. If biotite's copy operation does not preserve atomworks-added methods, this will fail. Resolveselected_token_idsbefore callingsuperimpose()to eliminate the dependency on biotite's internal copy behavior.🛠️ Suggested restructuring
Move selection logic before superimpose:
gt_ref = gt_aa_stack[0] + tok_idx = cast(np.ndarray, gt_ref.token_id).astype(np.int64) + + # Resolve the subset of tokens to report BEFORE superimposing + selected_token_ids: set[int] | None = None + if selection is not None: + mask_fn = pred_aa_stack.mask + if mask_fn is None: + raise RuntimeError( + "pred_aa_stack does not support mask(). Load atom arrays with " + "`atomworks.io.utils.io_utils.load_any()` to access this method." + ) + mask = mask_fn(selection) + selected_arr = cast(AtomArray, pred_aa_stack[0, mask]) + if selected_arr.token_id is not None: + selected_token_ids = {int(t) for t in np.unique(selected_arr.token_id)} + + # 3. Optional Kabsch superposition if self.superimpose: pred_aa_stack, _ = superimpose(gt_ref, pred_aa_stack) - - tok_idx = cast(np.ndarray, gt_ref.token_id).astype(np.int64) - - # Resolve the subset of tokens to report, if a residue selection was given. - selected_token_ids: set[int] | None = None - if selection is not None: - mask_fn = pred_aa_stack.mask - if mask_fn is None: - raise RuntimeError( - "pred_aa_stack does not support mask(). Load atom arrays with " - "`atomworks.io.utils.io_utils.load_any()` to access this method." - ) - mask = mask_fn(selection) - selected_arr = cast(AtomArray, pred_aa_stack[0, mask]) - if selected_arr.token_id is not None: - selected_token_ids = {int(t) for t in np.unique(selected_arr.token_id)}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/sampleworks/metrics/rmsd.py` around lines 108 - 125, The selection-based token resolution should occur before mutating pred_aa_stack via superimpose to avoid losing atomworks methods; move the block that computes selected_token_ids (the selection check, mask_fn = pred_aa_stack.mask, mask = mask_fn(selection), selected_arr = cast(AtomArray, pred_aa_stack[0, mask]) and extraction of token_id) to before the superimpose call, leaving only the superimpose(pred_aa_stack, ...) call afterward and preserving use of pred_aa_stack in superimpose; ensure you still reference the same names (selected_token_ids, selection, pred_aa_stack, mask_fn) so downstream code uses the precomputed selected_token_ids.
🧹 Nitpick comments (1)
tests/metrics/test_rmsd_metrics.py (1)
9-9: Address or remove the TODO comment.The TODO suggests making tests more rigorous. While the current tests cover core functionality well, consider adding tests for edge cases (e.g.,
superimpose=TruevsFalse, AtomArray vs AtomArrayStack inputs, error conditions).Would you like me to suggest additional test cases or open an issue to track this enhancement?
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/metrics/test_rmsd_metrics.py` at line 9, Remove or address the TODO in tests/metrics/test_rmsd_metrics.py: either delete the comment or add concrete unit tests that cover the suggested edge cases — specifically add tests exercising rmsd functions with superimpose=True and superimpose=False, with both AtomArray and AtomArrayStack inputs, and tests for expected error conditions (invalid shapes/types). If you prefer not to implement now, replace the TODO with a short comment pointing to a newly opened issue tracking these enhancements and include the issue number.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/sampleworks/metrics/rmsd.py`:
- Around line 88-93: Add a short rationale to each `# ty:
ignore[invalid-argument-type]` next to the calls to
add_global_token_id_annotation for predicted_atom_array_stack and
ground_truth_atom_array_stack explaining why the runtime type is valid despite
the type checker complaint (e.g., the arrays are numpy/torch tensors or a Union
that narrows at runtime, or the helper performs casting/validation), for
example: indicate that the function accepts array-like structures and we know at
this call site they conform to the expected shape/dtype so the static checker is
a false positive; update the two comments that follow the calls to
predicted_atom_array_stack and ground_truth_atom_array_stack accordingly.
In `@tests/metrics/test_rmsd_metrics.py`:
- Line 23: The test function test_all_atom_rmsd_end_to_end is missing a
NumPy-style docstring; add a brief NumPy-style docstring inside the function
that states what the test verifies (e.g., that all-atom RMSD between
altlocA_backbone and altlocB_backbone is computed correctly end-to-end), include
expected inputs (fixtures altlocA_backbone, altlocB_backbone) and the high-level
assertion intent so readers understand the purpose of
test_all_atom_rmsd_end_to_end.
---
Duplicate comments:
In `@src/sampleworks/metrics/rmsd.py`:
- Around line 145-149: The result dict currently always sets both
"best_of_1_rmsd" and f"best_of_{len(global_rmsd)}_rmsd", causing a key collision
when len(global_rmsd)==1; update the code that builds result (the block
referencing global_rmsd and residue_rmsd_scores) to always include
"best_of_1_rmsd" but only add the f"best_of_{len(global_rmsd)}_rmsd" entry when
len(global_rmsd) > 1 (i.e., guard the min-across-N key with if len(global_rmsd)
> 1) so downstream consumers see the aggregated key only for N>1.
- Around line 108-125: The selection-based token resolution should occur before
mutating pred_aa_stack via superimpose to avoid losing atomworks methods; move
the block that computes selected_token_ids (the selection check, mask_fn =
pred_aa_stack.mask, mask = mask_fn(selection), selected_arr = cast(AtomArray,
pred_aa_stack[0, mask]) and extraction of token_id) to before the superimpose
call, leaving only the superimpose(pred_aa_stack, ...) call afterward and
preserving use of pred_aa_stack in superimpose; ensure you still reference the
same names (selected_token_ids, selection, pred_aa_stack, mask_fn) so downstream
code uses the precomputed selected_token_ids.
---
Nitpick comments:
In `@tests/metrics/test_rmsd_metrics.py`:
- Line 9: Remove or address the TODO in tests/metrics/test_rmsd_metrics.py:
either delete the comment or add concrete unit tests that cover the suggested
edge cases — specifically add tests exercising rmsd functions with
superimpose=True and superimpose=False, with both AtomArray and AtomArrayStack
inputs, and tests for expected error conditions (invalid shapes/types). If you
prefer not to implement now, replace the TODO with a short comment pointing to a
newly opened issue tracking these enhancements and include the issue number.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 1acddfb1-aff1-467f-96f5-a0653cc0d581
📒 Files selected for processing (4)
src/sampleworks/metrics/rmsd.pytests/metrics/conftest.pytests/metrics/test_lddt_metrics.pytests/metrics/test_rmsd_metrics.py
💤 Files with no reviewable changes (1)
- tests/metrics/test_lddt_metrics.py
✅ Files skipped from review due to trivial changes (1)
- tests/metrics/conftest.py
b33579d to
e3f107c
Compare
e3f107c to
cc98dc2
Compare
Adds RMSD metric, which we can use to hook up to downstream clustering analysis if we want.
Summary by CodeRabbit
New Features
Tests