Skip to content

cli: add option to not get the all-altloc selection string from find_altloc_selections.py#221

Merged
k-chrispens merged 3 commits into
mainfrom
kmc/add-altloc-selection-option
Jun 16, 2026
Merged

cli: add option to not get the all-altloc selection string from find_altloc_selections.py#221
k-chrispens merged 3 commits into
mainfrom
kmc/add-altloc-selection-option

Conversation

@k-chrispens

@k-chrispens k-chrispens commented Apr 13, 2026

Copy link
Copy Markdown
Collaborator

Summary by CodeRabbit

  • New Features

    • Added --no-all-altlocs CLI option to exclude per-chain aggregate selections from evaluation script output, retaining only span-based selections instead.
  • Bug Fixes

    • Fixed formatting in help text output.
  • Tests

    • Added integration tests covering new selection behavior options.

@coderabbitai

coderabbitai Bot commented Apr 13, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Warning

Review limit reached

@k-chrispens, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 50 minutes and 56 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3a295937-af73-4e90-97d6-57ef12b90813

📥 Commits

Reviewing files that changed from the base of the PR and between cc91e38 and 7051383.

📒 Files selected for processing (3)
  • docker-entrypoint.sh
  • src/sampleworks/utils/cif_utils.py
  • tests/eval/test_find_altloc_selections_script.py
📝 Walkthrough

Walkthrough

Adds an include_all_altlocs: bool parameter (default True) to find_altloc_selections in cif_utils.py, which gates the per-chain aggregate selection output. The eval script's _process_row and main are updated to thread this flag through, and a new --no-all-altlocs CLI option is added. Integration tests covering all three selection modes are introduced. A trailing whitespace character in docker-entrypoint.sh help text is also removed.

Changes

include_all_altlocs flag and integration tests

Layer / File(s) Summary
find_altloc_selections signature and conditional logic
src/sampleworks/utils/cif_utils.py
Adds include_all_altlocs parameter (default True) to find_altloc_selections, updates docstring and param docs, and wraps per-chain aggregate selection accumulation in a conditional so it is skipped when the flag is False.
Script _process_row, apply wiring, and CLI flag
scripts/eval/find_altloc_selections.py
Adds include_all_altlocs to _process_row's signature and forwards it to find_altloc_selections; threads the value through the dataframe .apply() call in main; adds --no-all-altlocs CLI option using action="store_false".
Integration tests for all three selection modes
tests/eval/test_find_altloc_selections_script.py
New test module dynamically loads the script, generates temporary 1vme-based input CSVs, and tests output schema/derived fields under include_all_altlocs=True, include_all_altlocs=False, and large min_span with include_all_altlocs=False.

docker-entrypoint.sh whitespace fix

Layer / File(s) Summary
Trailing whitespace removal in help text
docker-entrypoint.sh
Removes a trailing whitespace character after "For protenix model" in the usage/help text.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

  • diff-use/sampleworks#106: Introduced the original find_altloc_selections function in cif_utils.py and the scripts/eval/find_altloc_selections.py script that this PR extends with the include_all_altlocs flag.

Suggested reviewers

  • marcuscollins

🐇 A boolean flag, so small, so neat,
Controls which altloc strings to emit or delete.
--no-all-altlocs says "keep it lean,"
Only span-based chains on the selection scene.
The rabbit hops and tests pass green! 🌿

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly reflects the main objective of the changeset: adding a CLI option to control whether the all-altloc selection string is retrieved, which aligns with the new --no-all-altlocs flag introduced across multiple files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch kmc/add-altloc-selection-option

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a CLI-controlled option to suppress the “all altloc residues per chain” selection emitted by find_altloc_selections(), enabling workflows that only want span-based selections.

Changes:

  • Extend find_altloc_selections() with include_all_altlocs to optionally omit the final per-chain “all altlocs” selection.
  • Add --no-all-altlocs to scripts/eval/find_altloc_selections.py to expose the behavior via CLI.
  • Minor formatting/maintenance updates (docs whitespace, ty rule ordering, lockfile hash, trailing whitespace).

Reviewed changes

Copilot reviewed 3 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/sampleworks/utils/cif_utils.py Adds include_all_altlocs flag and gates emission of the final per-chain selection.
scripts/eval/find_altloc_selections.py Wires CLI flag through to find_altloc_selections() and updates row processing.
scripts/eval/EVALUATION.md Whitespace/formatting cleanup only.
pyproject.toml Reorders tool.ty.rules entries (no functional behavior change expected).
pixi.lock Updates local package hash due to changes.
docker-entrypoint.sh Removes trailing whitespace in help text.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 42 to +46
Spans of altlocs shorter than this are not yielded as selection strings, but ARE
included in the final selections which includes all residues with altlocs in each chain.
include_all_altlocs : bool
If True (default), yield a final per-chain selection string containing all residues
with altlocs regardless of span length.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should update this too.

Comment on lines 21 to +87
@@ -38,6 +41,9 @@ def find_altloc_selections(
Minimum number of consecutive residues to consider an altloc selection.
Spans of altlocs shorter than this are not yielded as selection strings, but ARE
included in the final selections which includes all residues with altlocs in each chain.
include_all_altlocs : bool
If True (default), yield a final per-chain selection string containing all residues
with altlocs regardless of span length.

Yields
------
@@ -72,12 +78,13 @@ def find_altloc_selections(
# FIXME use new style selection https://github.com/diff-use/sampleworks/issues/56
yield f"chain {chain} and resi {start}-{end}" # old style, more compact, selection

if chain not in all_altloc_selections:
all_altloc_selections[chain] = []
if start == end:
all_altloc_selections[chain].append(f"(res_id == {start})")
else:
all_altloc_selections[chain].append(f"(res_id >= {start} and res_id <= {end})")
if include_all_altlocs:
if chain not in all_altloc_selections:
all_altloc_selections[chain] = []
if start == end:
all_altloc_selections[chain].append(f"(res_id == {start})")
else:
all_altloc_selections[chain].append(f"(res_id >= {start} and res_id <= {end})")
Comment thread scripts/eval/find_altloc_selections.py Outdated
find_altloc_selections(cif_file, altloc_label, min_span, include_all_altlocs)
)
if not selections:
logger.warning(f"No altlocs found for {cif_file}")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems worth doing.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/sampleworks/utils/cif_utils.py (1)

40-46: ⚠️ Potential issue | 🟡 Minor

Docstring now overstates short-span inclusion behavior.

The min_span description still reads as unconditional inclusion in final selections, but this is now conditional on include_all_altlocs=True. Please align this text to prevent API confusion.

📝 Proposed docstring fix
     min_span : int
         Minimum number of consecutive residues to consider an altloc selection.
-        Spans of altlocs shorter than this are not yielded as selection strings, but ARE
-        included in the final selections which includes all residues with altlocs in each chain.
+        Spans shorter than this are not yielded as individual span selections.
+        When ``include_all_altlocs`` is True, they are still included in the final
+        per-chain aggregate selections.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/sampleworks/utils/cif_utils.py` around lines 40 - 46, Update the
docstring for the parameters min_span and include_all_altlocs in
src/sampleworks/utils/cif_utils.py to clarify behavior: state that spans of
altlocs shorter than min_span are not yielded as selection strings, and that
those short spans will only be included in the final per-chain selection string
if include_all_altlocs is True; mention both parameter names (min_span,
include_all_altlocs) so the maintainer can locate the docstring to adjust the
wording accordingly.
🧹 Nitpick comments (1)
scripts/eval/find_altloc_selections.py (1)

9-11: Add a NumPy-style docstring to _process_row().

This function is modified in this PR but still lacks a NumPy-style docstring, and it has an observable side effect (warning log when selections are empty).

📚 Proposed docstring addition
 def _process_row(
     row: pd.Series, altloc_label: str, min_span: int, include_all_altlocs: bool
 ) -> pd.Series:
+    """Convert one input row into the output selection schema.
+
+    Parameters
+    ----------
+    row : pd.Series
+        Input row with structure and map metadata.
+    altloc_label : str
+        CIF altloc field name.
+    min_span : int
+        Minimum span length for yielded altloc segments.
+    include_all_altlocs : bool
+        Whether to include per-chain aggregate altloc selections.
+
+    Returns
+    -------
+    pd.Series
+        Output row used by downstream evaluation scripts.
+
+    Notes
+    -----
+    Logs a warning when no altloc selection is found.
+    """

As per coding guidelines, "Always include NumPy-style docstrings for every function and class."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/eval/find_altloc_selections.py` around lines 9 - 11, Add a
NumPy-style docstring to the function _process_row describing its purpose,
parameters (row: pd.Series, altloc_label: str, min_span: int,
include_all_altlocs: bool), return type (pd.Series) and behavior; explicitly
document the observable side effect that it may emit a warning log when
selections are empty and any exceptions or edge cases (e.g., empty inputs or
filtered results). Keep the docstring in NumPy style with short summary,
Parameters, Returns, and Notes/Warnings sections and reference the function's
behavior on empty selections so callers know about the logging side effect.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/sampleworks/utils/cif_utils.py`:
- Around line 40-46: Update the docstring for the parameters min_span and
include_all_altlocs in src/sampleworks/utils/cif_utils.py to clarify behavior:
state that spans of altlocs shorter than min_span are not yielded as selection
strings, and that those short spans will only be included in the final per-chain
selection string if include_all_altlocs is True; mention both parameter names
(min_span, include_all_altlocs) so the maintainer can locate the docstring to
adjust the wording accordingly.

---

Nitpick comments:
In `@scripts/eval/find_altloc_selections.py`:
- Around line 9-11: Add a NumPy-style docstring to the function _process_row
describing its purpose, parameters (row: pd.Series, altloc_label: str, min_span:
int, include_all_altlocs: bool), return type (pd.Series) and behavior;
explicitly document the observable side effect that it may emit a warning log
when selections are empty and any exceptions or edge cases (e.g., empty inputs
or filtered results). Keep the docstring in NumPy style with short summary,
Parameters, Returns, and Notes/Warnings sections and reference the function's
behavior on empty selections so callers know about the logging side effect.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8f537b4e-5d11-48a4-8fdc-9852fde03f03

📥 Commits

Reviewing files that changed from the base of the PR and between f57044e and 17e624f.

⛔ Files ignored due to path filters (1)
  • pixi.lock is excluded by !**/*.lock
📒 Files selected for processing (5)
  • docker-entrypoint.sh
  • pyproject.toml
  • scripts/eval/EVALUATION.md
  • scripts/eval/find_altloc_selections.py
  • src/sampleworks/utils/cif_utils.py

@k-chrispens k-chrispens requested a review from DorisMai April 14, 2026 04:35
@k-chrispens k-chrispens marked this pull request as draft April 14, 2026 04:37
@k-chrispens

Copy link
Copy Markdown
Collaborator Author

I will add some tests, converting to draft

@marcuscollins marcuscollins left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably worth addressing the Copilot suggestions, but otherwise looks good to me.

Comment thread scripts/eval/find_altloc_selections.py Outdated
find_altloc_selections(cif_file, altloc_label, min_span, include_all_altlocs)
)
if not selections:
logger.warning(f"No altlocs found for {cif_file}")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems worth doing.

Comment on lines 42 to +46
Spans of altlocs shorter than this are not yielded as selection strings, but ARE
included in the final selections which includes all residues with altlocs in each chain.
include_all_altlocs : bool
If True (default), yield a final per-chain selection string containing all residues
with altlocs regardless of span length.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should update this too.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/eval/test_find_altloc_selections_script.py (1)

20-27: ⚡ Quick win

Add NumPy-style docstrings to helper functions and fixture.

As per coding guidelines, every function and class should have a NumPy-style docstring. The following are missing docstrings:

  • _load_script (lines 20-27): Should document that it dynamically imports the script module by file path for testing without requiring installation.
  • find_altloc_script (lines 30-32): Should document that it provides the loaded script module.
  • _make_args (lines 58-72): Should document that it constructs an argparse.Namespace for test invocation.

This is inconsistent with _altloc_input_csv (lines 35-55), which does include a docstring.

Also applies to: 30-32, 58-72

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/eval/test_find_altloc_selections_script.py` around lines 20 - 27, Add
NumPy-style docstrings to three functions that currently lack documentation. In
the _load_script function, document that it dynamically imports the script
module from a file path to enable testing without requiring installation. In the
find_altloc_script function, document that it provides the loaded script module
as a fixture. In the _make_args function, document that it constructs an
argparse.Namespace object for test invocation. Follow the same docstring format
and style used in the existing _altloc_input_csv function to maintain
consistency across the test file.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/eval/test_find_altloc_selections_script.py`:
- Around line 20-27: Add NumPy-style docstrings to three functions that
currently lack documentation. In the _load_script function, document that it
dynamically imports the script module from a file path to enable testing without
requiring installation. In the find_altloc_script function, document that it
provides the loaded script module as a fixture. In the _make_args function,
document that it constructs an argparse.Namespace object for test invocation.
Follow the same docstring format and style used in the existing
_altloc_input_csv function to maintain consistency across the test file.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b203d5f7-78a6-40ae-8374-049fe821e8ab

📥 Commits

Reviewing files that changed from the base of the PR and between 17e624f and cc91e38.

📒 Files selected for processing (4)
  • docker-entrypoint.sh
  • scripts/eval/find_altloc_selections.py
  • src/sampleworks/utils/cif_utils.py
  • tests/eval/test_find_altloc_selections_script.py
✅ Files skipped from review due to trivial changes (1)
  • docker-entrypoint.sh
🚧 Files skipped from review as they are similar to previous changes (2)
  • scripts/eval/find_altloc_selections.py
  • src/sampleworks/utils/cif_utils.py

@k-chrispens k-chrispens force-pushed the kmc/add-altloc-selection-option branch from cc91e38 to 7051383 Compare June 14, 2026 23:40
@k-chrispens k-chrispens merged commit 3e6aba6 into main Jun 16, 2026
10 of 14 checks passed
@k-chrispens k-chrispens deleted the kmc/add-altloc-selection-option branch June 16, 2026 22:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants